School of Data: Evidence is Power

I was looking for a new blog theme and found one that happened to have this video in the example with journalists from around the world — from villages to major cites — eager to use data for transparency and accountability. It took me about a minute to choose the template after watching the video (I took it as a sign) although I am still not sure where they got ”Wrangle the Newsfloor.”
(Just as an aside, this is the first time I paid for a template so it might be a coincidence but the money was worth the quality and instructions included in the deal. I’m just putting it out there for any of you who have hit your head against the wall with WordPress. There’s a reason why Tech Liminal has WordPress Support Group meetings.)

Posted in Data-Driven Journalism, Gov 2.0, Hackers, Hitting your head against the wall, Journalism, Journo-Apps, news apps, Open data, scrapers | Tagged | Leave a comment

Data Diving How-to

The timing was unbelievable: My rocket-fast Internet fizzled on a Friday and by Sunday had crashed completely. Four of six neighbors also lost all connection. One of them is a Pandora project manager. I have no idea what the other ones do but for five days we were offline. And I was trying to get ready to teach an Institute for Justice and Journalism storytelling with data workshop on Feb. 28.
Which is really the point of this post. I won’t drag out the details of the Internet drama except to say it took an incredible amount of yelling and Twitter to get online again. The statement, “I will write a letter every senator on the Comcast -Time Warner merger oversight committee starting with Sen. Leahy!” was involved.
Anyway, here is the presentation with links and lessons for the data-diving and data viz sessions, which did not put a single person to sleep!  I started with the most basic searches and spreadsheets and worked up to scrapers (speaking of which, there’s a scraper workshop coming up soon I’ll send word about). I’ll be updating the information in the presentation with California Secretary of State campaign contribution data information as well as how to background a nonprofit or charity. For now this is a great start. Looks like I may be in Northridge for a workshop in May and next semester at SF State.

Until then, happy hunting!

Posted in Data-Driven Journalism, Hackathon, Journalism, Open data | Tagged , | Leave a comment

Migrahack workshop Feb. 28

I’ll be leading two workshops at the upcoming HandsOn Tech and Institute for Justice and Journalism‘s Migrahack in San Jose Feb. 28: “Engaging & Insightful Storytelling with Numbers.” Migrahack’s unstoppable Claudia Nunez organized a roster of workshops taught by fabulous people. Here are the details and below the workshops I am handling.

9:00 a.m.-10:00 a.m.

Intro to DataViz: A guide to designing with data and open software. Learn about free tools available to make your reports stand out with data.

1:30 p.m. – 3:30 p.m.  

The basics of data diving: Where to find the data you want and what to do with it: We will use an example to walk through the basic steps in a project using immigration data then wrap up with a “real-life hit your head against the wall” example and talk about where to go for help, technical and with the data.

Posted in Data-Driven Journalism, Hackathon, Journalism, Journo-Apps, Open data, scrapers | Leave a comment

Scrapers: ScraperWiki

In just a few months the world of data scrapers has changed – for the better. At least if you consider point-and-click instead of bang-your-head-against-the-wall an improvement. But there are limits to the technology and I just found one of them.

footer_tractorScraperWiki, which I consider to be an early innovator of scrapers, has made it incredibly easy to extract data from the web and PDFs. They started testing it a few months ago. I finally had a go a few weeks ago (holidays and all) and it worked seamlessly on already structured data from No coding, no waiting around.

I’m writing a book about the struggle to save a family ranch in Sonoma County so I was looking at farmers markets and other ag data. Today I was trying to make a list of farmers markets within a 100-mile radius from California Federation of Farmers Markets search results. I copied then pasted the column of market cities into a Google spreadsheet then sorted them alphabetically and started numbering them. There were gaps between the rows so it was tedious. Then it dawned on me: Why am I cutting and pasting when I could just grab the results with ScraperWiki. However, what I got back from the California Federation was not pretty. The web is still a little wild out there because people throw together all kinds of stuff that machines can’t read. That’s one of the campaigns the open source community is trying to get local governments to realize – make data machine readable so you don’t get sludge in your beautiful machine. (And finally I’ll have up soon. They’re getting more streamlined too. Good news for journalism!)

Screenshot (28)

Posted in Data-Driven Journalism, Gov 2.0, Hitting your head against the wall, Journo-Apps, news apps, Open data, scrapers | Leave a comment

Digital First, the end

DFMCivic Playground started as an in-house project while I was a reporter at the Oakland Tribune, which is part of the Bay Area News Group and Digital Media First.  It was the CEO of the whole umbrella group, Digital Media First, that decided, in the spirit of tech start-ups, to throw some ideas at the wall and see if they stuck. The project as a whole was called ideaLab. I just heard that ideaLab was not one the ideas that stuck as one of the fellows describes here.  I love it that DMF was willing to try although there was a lot of grumbling about how the money and gear we received as fellows could have paid for salaries and raises in the newsrooms. Since I left staffs have shrunk more than I imagined they could while still keeping a paper filled with news. 

As you can tell, Civic Playground is independent from ideaLab and instead of focusing on apps I’m now working with newsrooms and reporters. I have been feeling the air start to leak out of the balloon that got inflated around technology in 2011 and 2012. Feels like a bit of reality has set in. 

We were supposed to host a scraper workshop at Hacks/Hackers and SPJ already but the holiday schedules involved were too much. So look for them in 2014. In the meantime, I am testing ScraperWiki’s new interface and I finally got to know better. I want to first try their how-to session

In the meantime, happy New Year, 2014. 

Posted in Data-Driven Journalism, Gov 2.0, Hackers, Journalism, Journo-Apps, news apps, Open data, scrapers | Leave a comment

Are these mobile news apps scary or brilliant or just the future?

The zeitgeist of the mobile era (era here meaning the past and coming few years) might be “Currency is Time.” The line came from the San Francisco head of News Republic, a mobile news aggregation app from Mobiles Republic. He was demoing last night at the ONA-organized “What users want from mobile news.” The app is not about excellence in news or saving journalism or anything so squishy. Mobiles Republic is a tech company using news to make a buck. But the hope among “content providers” is that News Republic will bring more readers to their stories. If it’s successful, I hope news companies give reporters a cut. mobilesrep

In any case, News Republic is moving toward journalism somewhat untethered from its origins. Circa takes it a step further and depending on how you look at it, the app could be considered brilliant or terrifying. I vote for the category of “intriguing but unsettling.” The mobile-native app, still in the angel investor stage, chops up news into facts, quotes, stats, images, videos. It destroys the idea of the article by stringing bits together to tell a story.  But that may make the more pleasurable experience on mobile. 

It’s intriguing and tempting and I have been wondering since I started a heavily data-driven project in August about whether it is okay to abandon narrative sometimes. No story, no infographics and no data visualization.

How many times have you read to the third graph and trailed off because, my god, the article just kept going on and on? I thought about instances in which economic data could be reported straight out. For example, politicians in Louisiana want to stop a lawsuit against the oil and gas companies because, they say, the industry drives employment and the local economy. But it’s not true. Pages about the conflict have filled the papers and some of the national ones have been inaccurate and lack data to back up the claims by the politicians. I would list the claims, the data and the source of the data in a table w links to the original sources and the spreadsheets the reporter is using. The digital director of an online site said his team helps journalists get readers to engage with their stories (he’s from a town that comments a lot on the local reporting unlike the Bay Area). But sometimes, he said, he has to ask if the story is worth reading when it’s ignored. Maybe the information is important but will turn out better in raw form as opposed to a story. That’s true for mobile, anyway, according to serial news innovator and Circa news director David Cohn, — who I am fond of because he showed up at News Hack SF a couple years ago talking about unicorns and rainbows to explain the latest start-up he was working on.

So yes “atomizing” the news is intriguing. But unsettling: What are the consequences? 






Posted in Data-Driven Journalism, Journalism, Journo-Apps, news apps, Open data, Uncategorized | Leave a comment

Digital control: It’s worse than you thought

It’s worse than you thought. That should have been the title of the talk Journalists’ Rights at Protests hosted last week the San Francisco Bay Area Journalists with Committee to Protect Journalist internet advocate Geoffrey King and journalist Ali Winston. King covered some much needed digital security for journalists, much of which is in this CPJ guide. Hopefully SFBAJ will post some more details from the discussion as well. 

On the way home I made a to-do list for myself starting with clearing cookies, cache and so on.  That may explain the results of an experiment that was part of a class I’m taking “Understanding the Media by Understanding Google.” Students were supposed to log the advertisements that popped up with search results on 20 sites. We had been reading about Google’s use of data to fuel the $50 billion in annual revenues collected last year. All but 5 percent comes from advertising and it all hinges on our data even though Google has been wildly successful here in maintaining its start-up, “do no evil” veneer that lets us forget the whole arrangement. 

I’ve written about surveillance but for me there were two important discoveries in this experiment. 1. How much advertisers/Google were making ads appear to be regular content. 2. The opacity of sites that appear to be information hubs whose business model is identical to Google: profiting from the data they collect. See Goodreads and for examples. sells online petitions instead of ads but the model is the very same. I found the site after noticing a fake poll about ACA on the home page of a news organization paid for by a front group Coalition to Protect America’s Health Care. Google returned a link to when I searched for the group’s name. I opened the link and scrolled down to the bottom to check for a link to information about advertising and sure enough there it was. Then I looked at the petitions listed under the coalition and not surprisingly found they were “sponsored” by the group, meaning paid for. I’m not sure yet whose money is behind the coalition but they appear to be  pro-ACA. (That can be tricky though because the insurance industry was pro-ACA publicly even while using front groups to attack it privately. It pays to check carefully.) Even gathers data about us and advertises on third party sites but is a different beast than The latter is a business, the former a non-profit. 

All this led me to Evidon, via tiny icons on some ads labeled “AdChoice”, a service used by many sites where one can opt out of data-targeted ads. Google and many other companies warn about opting out of allowing data to be gathered. In other words, you trade convenience of personalized search for control. It’s true to some extent. But I really don’t care about finely honed advertising delivered almost magically to me. That is good because many of the ads I logged seemed irrelevant to me, from luxury Jaguar models to cheap flights to Birmingham, Ala., and, from the New York Times, a questionable dietary supplement for people suffering from Alzheimer’s disease. I am not sure they would be any better if I hadn’t sanitized my browser data. 

But from the Washington Post I got running shoes and Bay Area Chevrolet. The shoes could be appropriate although the ad seemed quite random and the car ad was based on geography and nothing else. Even though they were off target, those ads were more understandable than solar panels and gold coins. The closest anyone got was an ad for a Python GUI app builder generated from an email but it was very creepy because of the clear indication Google was scanning my email (it’s one thing to know another to see the proof).  In contrast, news organizations were absent, which is strange given my profession and interests. The weirdest ad was for beef ravioli by Chef Boyardee on Yahoo. Seems completely bizarre except several years ago I looked up the history of the company bc the real chef Bioardi was from Piacenza, where my family lived in Italy in the 1950s, when we were considering visiting their old home.

That was even creepier than the email peeping. 

At first it occurred to me that we have more rights to keep our digital content from the authorities than we do Google and the companies soaking up data about us. But then I remembered their privacy policies and that they are providing the digital content to the authorities with our consent.   

Posted in Uncategorized | Leave a comment

Journalists: ScraperWiki made even easier!

This is so exciting:
footer_tractorIn the latest iteration of ScraperWiki we’ve kept the “code in your browser” facility for people who want to code or learn to code. However we’ve also identified that many journalists don’t want to code, and we are therefore making end user tools that make it easier for them to get data without coding.
If you’re a Journalist and an existing ScraperWiki account holder please email with the subject journalist [your full name] and ask us to upgrade you…more.

Posted in Uncategorized | Leave a comment