It happened! I received a Thinkful–Girl Develop It scholarship for a Data Science in Python course. So things are about to get very busy until mid-May. But this is the kind of chance I’ve been waiting for to get past just scratching the surface of data journalism.
But that wall has not budged, at least not with just me pushing on it. Now, beginning on Wednesday, I’ll be tackling SQLite, APIs and scraping, as well as wrestling with probability, hypothesis testing and linear regression.
I will admit that just the words linear regression freak me out a little bit. But I got a note from my mentor and she (SHE!) is doing exactly what I’m shooting for. And there is no getting around the statistics side.
I have a few goals, some practical and short-term. The first is to write a scraper that will allow me to compare the number of stories written by men versus the number by women about the firing of former New York Times editor Jill Abramson. I want to learn things like how to write an API that will scrape lawsuits filed in county courts and automatically feed into a Google spreadsheet (this is entirely legal and done by other journalists). And I want to expand my knowledge of Python. The bigger goal is data mining — finding those patterns that would otherwise be hidden. That is what I want for my reporting and what I want to show other women, including my daughters in college, they can do.
In case you are wondering why a journalist wants to study data science, reporters are waking up to the potential of public data as more and more are being produced and put online. A subset of reporters have been using technology for their work since the practice was called computer assisted reporting and Excel, SASS and MySQL were the tool of choice. Now data journalism — which CAR has branched into — is becoming way more sophisticated as reporters enter the realm of data science. I think it’s going to make investigative reporting even richer.
I’m not expecting to emerge in three months as a full-fledged data science butterfly although I will certainly be working to get as far out of the chrysalis as possible. But the Thinkful course is a start that includes a real foundation. So Naive Bayes and cluster analysis, here I come!