What is code anyway?

Good question, actually. The answer from Paul Ford — “What is Code?”– is tearing through the journo tech listservs and FB groups. So many pretty colors and it’s fun!

Here’s my favorite part:

Why Are We Here?

We are here because the editor of this magazine asked me, “Can you tell me what code is?”

“No,” I said. “First of all, I’m not good at the math. I’m a programmer, yes, but I’m an East Coast programmer, not one of these serious platform people from the Bay Area.”




Posted in Journalism | Tagged | Leave a comment

Growing pains in Python

Sometimes I feel like I am in a jungle with a dull machete and no compass. It’s 110 degrees, I’m sweating and trying not to panic. The feeling is what Erik Trautman would probably classify as “the Cliff of Confusion” in his funny, somewhat reassuring, somewhat unsettling piece: Why learning to code is so damn hard.  One woman’s jungle is another person’s cliff. Same sinking feeling. Either way, the feeling is normal. You just have to get through the desert of despair and you will find the Upswing of Awesome.

To summarize: Don’t try this s*$! alone. Get a mentor and find someone you can sit with or skype with or whatever when you have questions.

The boxing ring has taught me that it might hurt to get hit in the nose (my ego more than my nose usually) but that the only recourse is to keep my composure (it’s normal to get frustrated when you get hit or get stuck in Python), dig in deeper and get back into the ring.

Posted in Data science for the naive, Hitting your head against the wall, How to learn Python | Leave a comment

Python and the search for cuberoot and other adventures

I am about a month into the Data Science in Python course. The Python is okay, the data science is okay. Yes, even the linear regression turned out to be okay.

But learning on top of everything else the ins and outs of Pandas, lamdas, and all kinds of other modules and helpers is killing me slowly. Actually it happened pretty fast. A lot of machine problems too. So much learning!

I have to say that the Python tutorials on Code Academy redeemed my opinion of the site, which was, I admit, based on JavaScript tutorials circa 2012. The Python I learned in the past (Learn Python the Hard Way!) and this time are coming together. But it’s still hard for me to work the other way around. You know, like when you learn another language and feel confident but then translate from your native language into the foreign one? God it would great if people just walked around speaking Python instead of English or Spanish. I’d pick it up really fast!

Instead things are slowly coming together after an intense review. I’ve always been a good troubleshooter. Now I’m so good at it I have an inside joke that I should rename this blog “Finding the Cuberoot.”


Posted in Hitting your head against the wall | Tagged , , , , | Leave a comment

Changing the world with data journalism

In January 2014, Kenyan broadcaster NTV aired a 12-minute, data-driven video about the impact of drought in Turkana, an impoverished, isolated region of northern Kenya.

The piece combined personal stories with government data to show the impact of intense and frequent drought-related famines on childhood malnutrition. The story had a measurable impact. 

However, as Internews President Jeanne Bourgault wrote from the World Economic Forum in Davos: “Open data has the potential to drive positive social change, and data-journalists can be key to creating the stories that bring that data to life. But getting there is not easy.”

It’s a thoughtful and unusually (for journalism) level-headed discussion about data journalism.

Posted in Data-Driven Journalism, Journalism, Open data | Tagged | Leave a comment

Thinkful-Girl Develop It Data Science in Python scholarship!!!

It happened! I received a ThinkfulGirl Develop It scholarship for a Data Science in Python course. So things are about to get very busy until mid-May. But this is the kind of chance I’ve been waiting for to get past just scratching the surface of data journalism.


I’ve wanted to dig deeper. But there has been a big thick wall standing in the way of me and anything more complicated than Excel — even though I am no stranger to computers, apps and coding. I taught myself HTML years ago (when we were called “webmasters”). I even started a JavaScript study group a few years ago at Sudo Room in Oakland. I also teach data journalism techniques and founded a civic app project, Civic Playground, which produced several prize-winning apps. I teach all kinds of digital journalism skills at San Francisco State University. I do a monthly data-driven column for a magazine and rely on all sorts of datasets for my regular and investigative reporting.

But that wall has not budged, at least not with just me pushing on it. Now, beginning on Wednesday, I’ll be tackling SQLite, APIs and scraping, as well as wrestling with probability, hypothesis testing and linear regression.

I will admit that just the words linear regression freak me out a little bit. But I got a note from my mentor and she (SHE!) is doing exactly what I’m shooting for. And there is no getting around the statistics side.

I have a few goals, some practical and short-term. The first  is to write a scraper that will allow me to compare the number of stories written by men versus the number by women about the firing of former New York Times editor Jill Abramson. I want to learn things like how to write an API that will scrape lawsuits filed in county courts and automatically feed into a Google spreadsheet (this is entirely legal and done by other journalists). And I want to expand my knowledge of Python. The bigger goal is data mining — finding those patterns that would otherwise be hidden. That is what I want for my reporting and what I want to show other women, including my daughters in college, they can do.   

In case you are wondering why a journalist wants to study data science, reporters are waking up to the potential of public data as more and more are being produced and put online. A subset of reporters have been using technology for their work since the practice was called computer assisted reporting and Excel, SASS and MySQL were the tool of choice. Now data journalism — which CAR has branched into — is becoming way more sophisticated as reporters enter the realm of data science. I think it’s going to make investigative reporting even richer.

I’m not expecting to emerge in three months as a full-fledged data science butterfly although I will certainly be working to get as far out of the chrysalis as possible. But the Thinkful course is a start that includes a real foundation. So Naive Bayes and cluster analysis, here I come!  


Posted in Data-Driven Journalism, Journalism, Open data | Tagged , , | Leave a comment

Where data and multimedia collide

I started teaching multimedia journalism and digital news gathering at San Francisco State University in late August. I won’t tell you all the lessons I have learned and weaknesses revealed to me and others in the past two months. No use dwelling on these things: fix what can be controlled and do your best to remedy the rest.

The classes are a return for me to the backpack, MOJO days. I started as a reporter with a camera and switched to writing (no use calling it “print” journalism anymore) because pen and paper were easier for me. I’m not an artist and fail at lighting to the degree that my best advice usually boils down to don’t shoot btw noon-2 pm, turn off the ugly lamps and put your person near a window in low-light situations. However, I love shooting video and audio and love even more editing it.

But I keep having to pull myself back from treading straight into data journalism because  infographics/data visualization don’t seem to be counted as multimedia. I’m just still trying to pin down exactly what multimedia is — or could be — now.

It’s defined often as video, audio and photo + text. Maybe transmedia and interactive games. But for most people it’s the latter.data-visualisation1

So I wrote this post on the class blog I keep (it mirrors blogs the students in my classes keep as well at my insistence). I’m planning to teach them video animation, JSTimeline, Meograph, simple Google Fusion maps and we’ll build a Bootstrap site or use a tarbell template.

Why do the two worlds of data and multimedia feel separate, and isolated? Why don’t I hear much cross-platform discussions? Maybe I have to listen closer.


Posted in Data-Driven Journalism, Journalism, Uncategorized | Tagged | Leave a comment

Migrahack at CSUN

Just got back last night from the Migrahack at Cal State Northridge. I’ll post photos and more but wanted to get the links up here from my presentations. I got a chance to sit in on Ron Campbell’s Census workshop because I finished mine early. Campbell was one of my early data journalism heroes. Sure enough, in  just 5 minutes of his workshop I walked away with IPUMS and Census Reporter. The NAHJ student journalists gave me inspiration and a shot of determination. Anyway, I’ll be updating my presentations in the next few weeks with more Tableau, BLS, Census Reporter and IPUMS. But I’ll post the information here when that happens.



Posted in Uncategorized | Leave a comment

Los Angeles County is full of Germans…or?

I’m here somewhere near Burbank getting ready for Migrahack at Cal State University at Northridge, or, as I have heard people here call it, CSUN (as in see-sun).  Anyway, I’ve been playing around with some Census figures – the amount of time men and women spend commuting to work or how early men compared to women leave home for work every day. I stumbled across the ancestry table. Who knew there were so many people in L.A. County who counted German ancestry: 495,046. Not a lot when you consider there are nearly 10 million people here.  But it’s No. 1. EXCEPT for the 7 million people shoved into “other groups.” You tell me which group is missing in the table below. I am not sure why the Census puts the information together like this. It’s  an example for the class of what you can find on the Census, or in this case what’s missing sometimes.

Posted in Data-Driven Journalism, Journalism, Open data, scrapers | Tagged , , , | Leave a comment

School of Data: Evidence is Power

I was looking for a new AngelaWoodall.com blog theme and found one that happened to have this video in the example with journalists from around the world — from villages to major cites — eager to use data for transparency and accountability. It took me about a minute to choose the template after watching the video (I took it as a sign) although I am still not sure where they got “Wrangle the Newsfloor.”
(Just as an aside, this is the first time I paid for a template so it might be a coincidence but the money was worth the quality and instructions included in the deal. I’m just putting it out there for any of you who have hit your head against the wall with WordPress. There’s a reason why Tech Liminal has WordPress Support Group meetings.)

Posted in Data-Driven Journalism, Gov 2.0, Hackers, Hitting your head against the wall, Journalism, Journo-Apps, news apps, Open data, scrapers | Tagged | Leave a comment

Data Diving How-to

The timing was unbelievable: My rocket-fast Internet fizzled on a Friday and by Sunday had crashed completely. Four of six neighbors also lost all connection. One of them is a Pandora project manager. I have no idea what the other ones do but for five days we were offline. And I was trying to get ready to teach an Institute for Justice and Journalism storytelling with data workshop on Feb. 28.
Which is really the point of this post. I won’t drag out the details of the Internet drama except to say it took an incredible amount of yelling and Twitter to get online again. The statement, “I will write a letter every senator on the Comcast -Time Warner merger oversight committee starting with Sen. Leahy!” was involved.
Anyway, here is the presentation with links and lessons for the data-diving and data viz sessions, which did not put a single person to sleep!  I started with the most basic searches and spreadsheets and worked up to scrapers (speaking of which, there’s a scraper workshop coming up soon I’ll send word about). I’ll be updating the information in the presentation with California Secretary of State campaign contribution data information as well as how to background a nonprofit or charity. For now this is a great start. Looks like I may be in Northridge for a workshop in May and next semester at SF State.

Until then, happy hunting!

Posted in Data-Driven Journalism, Hackathon, Journalism, Open data | Tagged , | Leave a comment