July 22nd, 2008 — API, data, news, programming
In May announced its intention to build an Application Programming Interface for its data. MediaBistro quoted Aron Pilhofer:
The goal, according to Aron Pilhofer, editor of interactive news, is to “make the NYT programmable. Everything we produce should be organized data.”
More details, if they can be called that:
Once the API is complete, the Times’ internal developers will use it to build platforms to organize all the structured data such as events listings, restaurants reviews, recipes, etc. They will offer a key to programmers, developers and others who are interested in mashing-up various data sets on the site. “The plan is definitely to open [the code] up,” Frons said. “How far we don’t know.”
I haven’t heard anything since then, although the article mentioned that something would be ready “in a matter of weeks.”
Today I spent some time reading the API documentation for National Public Radio.
That’s right, NPR has an API. (mmm, I love my alphabet soup.)
NPR’s API provides a flexible, powerful way to access your favorite NPR content, including audio from most NPR programs dating back to 1995 as well as text, images and other web-only content from NPR and NPR member stations. This archive consists of over 250,000 stories that are grouped into more than 5,000 different aggregations.
You can get results from Topics, Music Genres, Programs, Bios, Music Artists, Columns and Series in XML, RSS, MediaRSS, JSON, and Atom or through HTML and JavaScript widgets.
Now, I’m a bit of an NPR junkie, so I’m thinking of ways to access all this information for my personal use. And I can see how it could be useful as an internal product for NPR.
But how would another news organization use this? Oh wait, they can’t:
The API is for personal, non-commercial use, or for noncommercial online use by a nonprofit corporation which is exempt from federal income taxes under Section 501(c)(3) of the Internal Revenue Code.
This one doesn’t make sense either:
Content from the API must be used for non-promotional, internet-based purposes only. Uses can include desktop gadgets, blog posts and widgets, but must not include e-newsletters.
And way down at the bottom of the page is a huge block of text describing excluded content. Boooo.
Check out these blog posts from Inside NPR.org, where they explain some of their decisions.
I think this was a great first step, but if you’re gonna jump on the bandwagon, make sure you don’t miss and land on the hitch.

Further, really understand what purpose this bandwagon has. If you’re going to free your data, free it! Let people and news organizations use it (always with a link back) for all kinds of crazy things. Remember kids, sharing is caring!
June 30th, 2008 — data, internship, Miami Herald, PHP, Website
Well, my first project is live! The Health section of the Miami Herald’s Web site has been redesigned.
My contribution is that slick-looking sidebar on the right. I had some help from Stephanie Rosenblatt for the graphics, and of course she put together the Doctor Sleuth. (They are using Caspio and I have been too busy for training!) The tabs on the results pages are mine though.
There’s some more projects on the table for the Health section, so hopefully I’ll get to be more involved over the next few weeks.
I finished working on a little PHP script today, with Rob Barry’s help, that queries, parses and geocodes some data. Hopefully we’ll have that into the DataSleuth system soon.
June 22nd, 2008 — data, django, flash, internship, Miami Herald
I gave my impressions from the first day or so of work, but a full (sort of) week has given me more time to get acquainted with my new job.
I’ve worked on several projects, thought none of them are quite ready to go live yet. I’ll link to them when they do. But so far the work has been pretty easy and well within my skills. I was surprised at how much Flash I remember, even though I haven’t touched the program in over a year.
I’m also working on a story for next week! I pitched this one myself, and while its nothing big, I’m happy to be writing. My greatest fear is being pigeonholed into the programming room.
I’m supposed to see about some database work in the next week or so, which will be something new to add to my arsenal. I know how databases work and how to work with them, but I’ve never actually built one.
On the side, I’m continuing to work through Django tutorials and plan on buying some books soon. I’m also in the market for a job after my internship is over.
I’ve got a couple of posts coming up that should be more stimulating, but I’ve been too busy to really organize my thoughts yet. Here’s hoping I can get one or two out next week.
June 6th, 2008 — conferences, data, ire, journalism, newspapers
This morning I met with my IRE mentor, Steve Doig, who is a CAR teacher at the University of Arizona. We talked about some of the work I’d done, people in the industry to learn from, and ways to stay on top of projects at different newspapers.
I love mentorship programs because I get a basically captive audience for my pro-online and data visualization ranting. I guess it’s also a networking shortcut.
I spent a frustrating hour and a half tracking down an internet connection so I could clear out the ::gasp:: 1000+ items that have accumulated in Google Reader after 3 days of neglect.
Then I went to a session called Cutting Edge Digital Journalism from Around the World.
The session was led by Rosental Alves, University of Texas; Sandra Crucianelli, Knight Center for Journalism in the Americas; and Fernando Rodriguez, Brazilian Association for Investigative Journalism.
One of the things that surprised me was the idea that in Central/South America, CAR/investigative reporting/databases are viewed as “as a gringo thing.”
Rodriguez showed off a database he worked on of politicians in Brazil, called “25,000 politicians and their personal assets.” Politicians have to submit a certain amount of information in order to run for office, including a listing of assets. It took 2 years to track down all this information because the records were not organized and were available only in hard format. Eventually, the database could provide a view of who the politicians were.
The database was published online and stories were written for the newspaper (Folha) as well. Readers started to call in and report inconsistencies. Other newspapers started to use the database for their own stories.
Crucianelli presented a way to monitor government documents online in 4 different countries. (El Salvador, Panama, Honduras and Nicaragua) All 4 countries had recently changed their access laws for public information.
She found that Panama had the best online access to government documents. El Salvador had the worst access.
At noon, Matt Waite presented PolitiFact. Sexy, sexy Politifact. He gave a tour of all the features of the site as well as showing us a little of the back-end: the Django admin setup.
I followed Matt and Aron to a session with Knight grant winner David Cohn, talking about Spot.Us.
Spot.Us is supposed to be an answer to the question: How will we fund reporting that keeps communities informed?
The answer is based on the premise of citizen journalism. Writing is not the only means of participation.
On Spot.Us, anyone can create a story idea. Reporters can pitch stories based on contributed ideas to their communities. People in the community commit money for pitches. Then the reporters cover the stories. Some of the money goes to pay editors. The stories can be republished for free or published exclusively if the original donor is refunded.
And that’s it for me today. I’ll be in for some afternoon sessions tomorrow.
June 5th, 2008 — Adrian Holovaty, conferences, data, django, ire, python
Today through Sunday I’ll be attending the 2008 IRE Conference in Miami. Today I’m locked in a room with about 10 others being sprayed with the firehose of Django.
I’ve played with Django a bit before, but now we’re getting serious. I’ve got my local Django session running and am poking around while Matt Waite, Aron Pilhofer and Chase Davis break us down and rebuild us in the image of Adrian Holovaty or Derek Willis.
This morning we went over the concepts behind Web frameworks and Django, looked at the code behind a homicide database and set up the local administration page. This afternoon we’ll be going over each type of file necessary to build a Web application in Django.
January 19th, 2008 — class, data, Independent Florida Alligator, journalism, public records, University of Florida
My first assignment for my CAR independent study was to get some data from the Alachua County Health Department.
Professor Armstrong charged me with getting all current salaries, as of Jan. 1, 2008 for nurse practitioners and physician assistants working in the Alachua County Health Department, both full and part time. It took a couple of tries to get someone on the line. Then they asked me to send an e-mail. But in 3 business days, I had the data. Much easier than I thought.
I know all data requests won’t be so easy, but it’s good practice in asking for it. The experience was similar to what I did to get a gas prices map on The Independent Florida Alligator’s Web site: Figure out who has it, find a contact number or e-mail address, and ask.
My next assignment was to decide on a story I wanted to do the data analysis for. I had a lot of trouble with this, because I had to choose something that was timely, accessible, etc.
After going through a bunch of ideas
- location trends for car accidents in the gainesville area. are holidays/game days a factor?
- something about uf sustainability. the website was basically a bunch of press releases, but i bet if i went and asked they could dig me up some data.
- I looked at http://earmarkwatch.org/ and found that all the earmarks for the state of Florida are for defense bills. UF and some other Florida universities were getting some cash too.
- go back to crime or poverty :( i’m trying to avoid these because they seem too obvious/easy.
I finally hit on something:
Given that Crist just put out the budget for public universities and UF is apparently not getting any help, I think that would be a good direction to take. I can compare funding for public universities in Florida and maybe other states, compare growth in attendance, that sort of thing. Look at how funding for UF has changed now that we have fewer people in legislature and other schools are building strength. (UCF, SFU) Is UF still the “flagship” university? I’ll also be looking at tuition.
So the next step is to figure out how far back to look. I’ll start at 10 years, hit up Lexis and see what I can dig up.
I’m much more confident now that the topic is locked down.
January 15th, 2008 — data, news, rss
RSS has got to be one of my favorite reporting tools. Although my writing lately is limited to this blog and News Videographer, I still have to find something to write about and keep current in my field. That means communicating with a lot of people.
But I don’t have time to talk to all those people. Many of them have Web sites and blogs, and those who don’t get written about online by the former. It’s much easier and faster for all this information to be compiled in one place for my viewing pleasure.
RSS stands for Rich Site Summary or Really Simple Syndication. It is most often characterized by the orange and white icon you may see on many Web sites. (See the icon at the top of the right column?) An RSS feed basically delivers new content from a chosen site to a feed reader of your choice.
A feed reader, also known as a news aggregator, can be compared to your e-mail inbox. Instead of e-mail addressed to you, it receives the updates you have subscribed to. Some readers let you interact and organize your subscriptions in many different ways.
So start receiving these handy-dandy updates, you first need a feed reader. My favorite is Google Reader, but other options are available such as Bloglines and NewsGator. You can also choose, like e-mail, to use a Web-based or desktop feed reader. You can peruse these options by simply doing a search for feed readers.
Having chosen your feed reader, start subscribing! In most cases, the orange and white RSS icon will appear somewhere on a Web site. Some browsers will also show the icon in the address bar if there is a feed for that site. Some sites do not have feeds.
I’ve subscribed to a slew of different sites, from news to blogs to entertainment and more. If your city government has a Web site, chances are it has some sort of feed (even Gainesville has one for municipal minutes). State and federal governments are more likely to provide more information. And don’t discount blogs! Even though you will have to double-check the information, blogs are an amazing resource, and with a little hunting you can find the good ones.
Now, all you have to do is remember to check the feed reader every day.
This post was also published at Wired Journalists.
January 1st, 2008 — CAR, class, data, public records, University of Florida
I’m doing an independent study on Computer Assisted Reporting with Professor Cory Armstrong in the Spring. I was told at a couple of job interviews that I need CAR experience, but the University of Florida takes data no further than the Fact Finding class.
So I’m going to find a dataset, explore it, and hopefully be able to produce a story package.
Right now I’m doing some research on different datasets currently available, but I’m having trouble narrowing down my subject.
I’ve been looking at some PEW studies for ideas on what sort of data to look at, as well as the IRE Database Library.
Some ideas so far:
- Campus Crime: compare Florida colleges or SEC colleges or just look at UF crime
- Walter Reed: I’m not sure how to find this data, or if it is readily available. But it was one of the seriously under covered stories listed by PEW. This could be taken more broadly: reduced funding in VA hospitals, funding vs. number of troops vs. number of living vets, 2001 to present for all kinds of money issues, number of wounded, currently enlisted, vets no longer enlisted, maybe also insurance
- Fluctuating Gas Prices
- Tasering Cases in Florida
Edit: I’m also trolling the Sunlight Foundation’s “Insanely Useful Web Sites.”
That’s it so far. (Thanks to Mindy for the help.)
Picking a subject has always been the hardest thing for me. I just want to look at everything!
Suggestions, as always, are welcome.
November 20th, 2007 — data, journalism, writing
Today I was inspired by Joe Grimm’s “Ask the Recruiter,” a daily column about problems getting journalism jobs and internships. Today he wrote about a reporter who is having trouble cultivating sources.
For some reason, this brought to mind Adrian Holovaty’s data collection of hotels he has stayed in. Which led to my spending an hour or two creating a Google Spreadsheet of every source I’d ever spoken to for a story. (I always kept my notes in a box in the closet.)
No, this won’t lead to some crazy database on a news Web site with all my source info and notes. But I am willing to share my template. (I’ve exported it as an Excel Spreadsheet.)
I think this would be especially useful for reporters covering beats, but a great resource either way.
Here’s how it works: One column for source names. This includes titles, where they work. The next column is for phone numbers. Then e-mail addresses. Then stories they helped you with. Simple right? The next two columns are trickier. One column will record the first date on which you spoke to this source. The next will record your notes, whatever it was you talked about. If you are granting a source anonymity, make sure to make a note of it here as well. Now, on each subsequent talk, you add two more columns for this source: date and notes. Get it?
I think it’s a pretty cool way to keep track of this information. However, some newsrooms have policies against keeping these types of notes for legal reasons. Please check your newspaper’s policy before you implement this.
October 13th, 2007 — Adrian Holovaty, computer, conferences, data, design, journalism, map, news, seo, SND
with Adrian Holovaty! This is the highlight for me, since my background is more programming and I’m defenitely a huge geek. Seeing Adrian speak was the deciding factor in coming to SND.
How to take data and make it efficient in terms of how the hypertext is laid out. Example: Wikipedia = Serendipity
Journalists are essentially collectors of data.
Rant #1 No serendipity in online journalism. Bullshit!
Data browseability: people want it and expect it. (IMDB, Amazon.com)
Serendipity increases stickiness and usefulness.
It all starts with structure. Have a structured list of data (facts) like an Excel spreadsheet. Journalists take clean data and turn it into a story. Computer programs can’t read the story. News orgs have the infrastructure to collect data, edit and verify the data and get the data to people. But they don’t leverage the data!
Lesson #1 Structure your data
Everything has structure. Sports. Obits. Even photos: subject, photographer, where, when, camera, size, colors (Flickr)
After the structure, the easy part.
Lesson #2 Give your data “the treatment”
Example: crime data
Step 1: lists fields (date, time, type, address, location, arrests, case number)
Step 2: key concepts (what data is useful? date, time, type, address, location)
Step 3: make breakdowns (list all possible values for each field)
Step 4: make list pages (pages for each value in each field)
Step 5: detail pages (pages for each crime)
Things to note
- Permalinks for concepts (distinct URL) linkability/bookmarkability
- SEO
- Serendipity
Example sites: chicagocrime.org, Faces of the Fallen, Video Game Reviews, Mixed Messages.