I’m experimenting with visualizing Census immigration. Here is the information displayed using a Google graph.
I’ll be leading two workshops at the upcoming HandsOn Tech and Institute for Justice and Journalism‘s Migrahack in San Jose Feb. 28: “Engaging & Insightful Storytelling with Numbers.” Migrahack’s unstoppable Claudia Nunez organized a roster of workshops taught by fabulous people. Here are the details and below the workshops I am handling.
9:00 a.m.-10:00 a.m.
Intro to DataViz: A guide to designing with data and open software. Learn about free tools available to make your reports stand out with data.
1:30 p.m. – 3:30 p.m.
The basics of data diving: Where to find the data you want and what to do with it: We will use an example to walk through the basic steps in a project using immigration data then wrap up with a “real-life hit your head against the wall” example and talk about where to go for help, technical and with the data.
Reading this guide about data scraping from Natalia Rodriguez at Open Company. Intrigued by the addition of the word should know instead of the usual needs to know. Anyway…it looks good.
In just a few months the world of data scrapers has changed – for the better. At least if you consider point-and-click instead of bang-your-head-against-the-wall an improvement. But there are limits to the technology and I just found one of them.
ScraperWiki, which I consider to be an early innovator of scrapers, has made it incredibly easy to extract data from the web and PDFs. They started testing it a few months ago. I finally had a go a few weeks ago (holidays and all) and it worked seamlessly on already structured data from Data.gov. No coding, no waiting around.
I’m writing a book about the struggle to save a family ranch in Sonoma County so I was looking at farmers markets and other ag data. Today I was trying to make a list of farmers markets within a 100-mile radius from California Federation of Farmers Markets search results. I copied then pasted the column of market cities into a Google spreadsheet then sorted them alphabetically and started numbering them. There were gaps between the rows so it was tedious. Then it dawned on me: Why am I cutting and pasting when I could just grab the results with ScraperWiki. However, what I got back from the California Federation was not pretty. The web is still a little wild out there because people throw together all kinds of stuff that machines can’t read. That’s one of the campaigns the open source community is trying to get local governments to realize – make data machine readable so you don’t get sludge in your beautiful machine. (And finally I’ll have Import.io up soon. They’re getting more streamlined too. Good news for journalism!)
Civic Playground started as an in-house project while I was a reporter at the Oakland Tribune, which is part of the Bay Area News Group and Digital Media First. It was the CEO of the whole umbrella group, Digital Media First, that decided, in the spirit of tech start-ups, to throw some ideas at the wall and see if they stuck. The project as a whole was called ideaLab. I just heard that ideaLab was not one the ideas that stuck as one of the fellows describes here. I love it that DMF was willing to try although there was a lot of grumbling about how the money and gear we received as fellows could have paid for salaries and raises in the newsrooms. Since I left staffs have shrunk more than I imagined they could while still keeping a paper filled with news.
As you can tell, Civic Playground is independent from ideaLab and instead of focusing on apps I’m now working with newsrooms and reporters. I have been feeling the air start to leak out of the balloon that got inflated around technology in 2011 and 2012. Feels like a bit of reality has set in.
We were supposed to host a scraper workshop at Hacks/Hackers and SPJ already but the holiday schedules involved were too much. So look for them in 2014. In the meantime, I am testing ScraperWiki’s new interface and I finally got to know Import.io better. I want to first try their Udemy.com how-to session.
In the meantime, happy New Year, 2014.
The zeitgeist of the mobile era (era here meaning the past and coming few years) might be “Currency is Time.” The line came from the San Francisco head of News Republic, a mobile news aggregation app from Mobiles Republic. He was demoing last night at the ONA-organized “What users want from mobile news.” The app is not about excellence in news or saving journalism or anything so squishy. Mobiles Republic is a tech company using news to make a buck. But the hope among “content providers” is that News Republic will bring more readers to their stories. If it’s successful, I hope news companies give reporters a cut.
In any case, News Republic is moving toward journalism somewhat untethered from its origins. Circa takes it a step further and depending on how you look at it, the app could be considered brilliant or terrifying. I vote for the category of “intriguing but unsettling.” The mobile-native app, still in the angel investor stage, chops up news into facts, quotes, stats, images, videos. It destroys the idea of the article by stringing bits together to tell a story. But that may make the more pleasurable experience on mobile.
It’s intriguing and tempting and I have been wondering since I started a heavily data-driven project in August about whether it is okay to abandon narrative sometimes. No story, no infographics and no data visualization.
How many times have you read to the third graph and trailed off because, my god, the article just kept going on and on? I thought about instances in which economic data could be reported straight out. For example, politicians in Louisiana want to stop a lawsuit against the oil and gas companies because, they say, the industry drives employment and the local economy. But it’s not true. Pages about the conflict have filled the papers and some of the national ones have been inaccurate and lack data to back up the claims by the politicians. I would list the claims, the data and the source of the data in a table w links to the original sources and the spreadsheets the reporter is using. The digital director of an online site said his team helps journalists get readers to engage with their stories (he’s from a town that comments a lot on the local reporting unlike the Bay Area). But sometimes, he said, he has to ask if the story is worth reading when it’s ignored. Maybe the information is important but will turn out better in raw form as opposed to a story. That’s true for mobile, anyway, according to serial news innovator and Circa news director David Cohn, — who I am fond of because he showed up at News Hack SF a couple years ago talking about unicorns and rainbows to explain the latest start-up he was working on.
So yes “atomizing” the news is intriguing. But unsettling: What are the consequences?
It’s worse than you thought. That should have been the title of the talk Journalists’ Rights at Protests hosted last week the San Francisco Bay Area Journalists with Committee to Protect Journalist internet advocate Geoffrey King and journalist Ali Winston. King covered some much needed digital security for journalists, much of which is in this CPJ guide. Hopefully SFBAJ will post some more details from the discussion as well.
On the way home I made a to-do list for myself starting with clearing cookies, cache and so on. That may explain the results of an experiment that was part of a class I’m taking “Understanding the Media by Understanding Google.” Students were supposed to log the advertisements that popped up with search results on 20 sites. We had been reading about Google’s use of data to fuel the $50 billion in annual revenues collected last year. All but 5 percent comes from advertising and it all hinges on our data even though Google has been wildly successful here in maintaining its start-up, “do no evil” veneer that lets us forget the whole arrangement.
I’ve written about surveillance but for me there were two important discoveries in this experiment. 1. How much advertisers/Google were making ads appear to be regular content. 2. The opacity of sites that appear to be information hubs whose business model is identical to Google: profiting from the data they collect. See Goodreads and FixYa.com for examples.
Change.org sells online petitions instead of ads but the model is the very same. I found the site after noticing a fake poll about ACA on the home page of a news organization paid for by a front group Coalition to Protect America’s Health Care. Google returned a link to Change.org when I searched for the group’s name. I opened the link and scrolled down to the bottom to check for a link to information about advertising and sure enough there it was. Then I looked at the petitions listed under the coalition and not surprisingly found they were “sponsored” by the group, meaning paid for. I’m not sure yet whose money is behind the coalition but they appear to be pro-ACA. (That can be tricky though because the insurance industry was pro-ACA publicly even while using front groups to attack it privately. It pays to check carefully.) Even MoveOn.org gathers data about us and advertises on third party sites but is a different beast than Change.org. The latter is a business, the former a non-profit.
All this led me to Evidon, via tiny icons on some ads labeled “AdChoice”, a service used by many sites where one can opt out of data-targeted ads. Google and many other companies warn about opting out of allowing data to be gathered. In other words, you trade convenience of personalized search for control. It’s true to some extent. But I really don’t care about finely honed advertising delivered almost magically to me. That is good because many of the ads I logged seemed irrelevant to me, from luxury Jaguar models to cheap flights to Birmingham, Ala., and, from the New York Times, a questionable dietary supplement for people suffering from Alzheimer’s disease. I am not sure they would be any better if I hadn’t sanitized my browser data.
But from the Washington Post I got running shoes and Bay Area Chevrolet. The shoes could be appropriate although the ad seemed quite random and the car ad was based on geography and nothing else. Even though they were off target, those ads were more understandable than solar panels and gold coins. The closest anyone got was an ad for a Python GUI app builder generated from an email but it was very creepy because of the clear indication Google was scanning my email (it’s one thing to know another to see the proof). In contrast, news organizations were absent, which is strange given my profession and interests. The weirdest ad was for beef ravioli by Chef Boyardee on Yahoo. Seems completely bizarre except several years ago I looked up the history of the company bc the real chef Bioardi was from Piacenza, where my family lived in Italy in the 1950s, when we were considering visiting their old home.
That was even creepier than the email peeping.
At first it occurred to me that we have more rights to keep our digital content from the authorities than we do Google and the companies soaking up data about us. But then I remembered their privacy policies and that they are providing the digital content to the authorities with our consent.
This is so exciting:
In the latest iteration of ScraperWiki we’ve kept the “code in your browser” facility for people who want to code or learn to code. However we’ve also identified that many journalists don’t want to code, and we are therefore making end user tools that make it easier for them to get data without coding.
If you’re a Journalist and an existing ScraperWiki account holder please email firstname.lastname@example.org with the subject journalist [your full name] and ask us to upgrade you…more.
Here I am trying out Many Eyes, an incredibly easy visualization app from IBM and recommended by Sisi Wei, news applications developer at ProPublica, via a data dive journalism online class. NOTE: First obstacle is that it may not display on Chrome or Firefox, or just about any other browser but Internet Explorer (I hate having regressed to the 90s in browser compatibility!).
You can see some other obvious problems with the visualization but Many Eyes is great for testing out ideas. At least for me, working my way toward complexity makes learning new skills easier. For some data viz/info graphics comes easy. Not for me: Both require not only reporting and editing but also learning how to find data, new tools (technical know-how) and good design principles. There are many so-so or even terrible examples online but if you do data viz well, you are at once a journalist and an artist.
The apps and software alone are a goal. For example, I was ready to pounce on an Illustrator tutorial suggested by Wei. It took me five hours to download Illustrator via my Adobe Creative Cloud suite because I had to first download the Adobe suite and my machine kept going to sleep, cutting out the Internet connection. I changed the settings but it still took a long time to then download Illustrator. All is well now. I had signed up for the special discount months ago but never downloaded any of the software because I was so busy with other projects.
THEN, with Many Eyes, I went through Chrome and Firefox before I had to use IE to run the visualization because I couldn’t get Java extension to install needed to run on the other two. If you got a broken image above you know what I mean.
See, there’s always, always something. This weekend everyone with a Windows machine had to install VirtualBox and run Ubuntu to use Scrapy during a data scraper workshop. Like I said, there is always something. We had mentors at the workshop but most of the journalists I know are even less familiar with their computer than I am and have few sources to rely on. So installing a Java extension might not seem like a big deal but when someone can’t get an app to work even after installing it he or she might ditch the whole thing. (Don’t give up! Whatever problem you have just type in the error message in Google — or whatever search engine you use — and read the advice that has been posted online. Someone is sure to have the same problem you did many times! Fixing your problem might take a little time but you’ll figure it out and feel super awesome because you did!)
Anyway, here is a plain bar chart from the same data made using Google charts. The question is which is more effective? I should add that for some reason it’s not displaying the all jobs category, which is why the measure is 15000K. I also want to figure out how to display percentages of employment along with the numbers in the pop-up boxes. Work in Progress.
Comments can seem to be a force of nature that, like weeds, elude curation. Except for crime, sports and immigration, you never know what is going to touch a nerve in Bay Area newspaper stories.
I still think some of that is true but last week’s Meetup at Hacks/Hackers – Annotation + future of engagement: An audience that writes more than you – introduced me to the theory of engagement and community. The take-home message: We can improve our engagement with readers, in particular through reporter participation in commenting, as well as encouraging people to identify themselves instead of posting anonymously. We also are seeing start-ups like Disqus and Hypothes.is use algorithms to curate content and influence community building. Curation by algorithm! Using machine learning to influence discourse! Yes, that’s not totally new but the sophistication and nuance are. That said, there is only so much machines can do, Disqus product chief Sam Parker said. I asked him what theory drives the company’s thinking about engagement and he directed me to Lawrence Lessig and the Disqus blog.
Then this week I came across a story about how our brain works in, of all places, the Southwest Airlines inflight magazine. I’ve already written about the potential gamification and immersive journalism have for engaging audiences.
“The important thing about virtual reality isn’t that people see a dramatic 3-D panorama but that “you yourself change,” Jaron Lanier, who coined the phrase virtual reality, said during a 2011 online interview. “You experience yourself in a different way than you ever have before.”
Or, as Noni De La Peña, with whom I had the pleasure of designing a game during ONA 2012,”Both sides of the brain light up.”
The Southwest article, written by Jennifer Miller, helped me answer why it works! For one, our brain likes new stuff but, according to the article, it “gets high on participation.” When we experience pleasure the brain releases dopamine and the brain likes to remember things that releases dopamine.
Video games may be addictive because they force players into an active state of decision making, fueling the constant release of dopamine. AND the strategic thinking process turned on during video games and other activities that involve participation encourages us to think about what they can do with the information they’re presented and how they can use it in the future.
Moreover, for the dopamine reward to system to work, feedback needs to be immediate, like video games that let you know right away if you’re succeeding. In other words, we like a challenge and, even better, like the gratification that comes from the points we accrue by being successful.
So how can we design experiences that reward our readers? Not just engage, but reward them? I am convinced games, immersive journalism and self actualization practices can play a part. On another level, it raises the question about why certain stories get a conversation going and what can be done to encourage engagement around stories. How do you get them to be haptic, so to speak, so that people “feel” the story/information?
Some significant stories, some prize-winning, got very few comments. People were likely to say “no one wrote about that” when in fact numerous stories had been published. The stories just didn’t “speak to them” so they were overlooked, forgotten or poorly understood.
Here are baby steps in the meantime from a panel of four women at BlogHer ’13 hosted by Disqus about building community, return visits, and monetization through comments. According to Disqus, the four accomplished bloggers — Danielle Smith, Lindsay Ferrier, Fadra Nally, and Lizz Porter — shared their wisdom, and what follows are 5 key takeaways that any blogger can put to use for building a stronger community, which you can read here.
Their top tips:
1. Reply, reply, reply.
“When you’re commenting I’m commenting back. I want you to know I’m listening.” — Danielle Smith
2. Know your top community members.
“Knowing the people that come to your site…is a good way to keep them coming back.” — Fadra Nally
3. Show your community that you care.
“It’s about the connection, engagement — we’re just people doing this together.” — Lizz Porter
4. Know where your audience comes from.
“Once you get hold of that data, it gives insight to your readers, maybe the ones that never even wrote comments.” — Fadra Nally
5. Create great content.
“Create great content. Put your heart and soul into it and people will respond.” — Lindsay Ferrier
Lastly, here is an app, Refine — Better Commenting, which organizes comments by key terms so readers can easily view comments on sub-topics of interest to them. That’s good for helping sort discussion threads with hundreds, sometimes thousands of comments — often on political or news sites, according to the Knight Lab team that developed the app.