Open data's promise

Posted 1/23/15 by Megan Rhyne

IN EARLY December, President Obama announced a series of measures aimed at closing the gap between citizens and law enforcement. One of those measures was a plan to distribute $263 million in funding for agencies to purchase body cameras that can be used during police interactions with citizens.

Immediately, there was discussion among my counterparts in other states about whether video captured by the cameras would be subject to release under state public records laws (in Virginia, it’s called the Freedom of Information Act). On one side is the need for public accountability, on the other side are privacy concerns for victims, witnesses and informants (certainly there are other issues on both sides, but for now, those are the two biggies).

In Seattle, instead of gathering talking heads like myself in a room to hammer out statutes or regulations, the police department there convened a “hackathon” to figure out a technological solution.

By the end of the day seven groups of “civic hackers” came up with potential redaction tools, “each with a different balance of automation and human review,” according to a Slate article.

Everyone knew that any one or a combination of these tools still needed tweaking, but there was something concrete to work with.

Hacking for good

There are scores of civic hackers among us. They are civic-minded individuals who believe that in data—government data—there is power to transform government and improve citizens’ lives. But they do not operate in a vacuum. They are often dependent on citizens or government bringing them into the conversation to address problems they each view as paramount.

Their currency is data. Without access to the data they cannot fix problems. They cannot come up with solutions. They cannot innovate. They need the raw data. That’s not always an easy task, not because the data doesn’t exist, but because the data is often essentially unusable or is inaccessible.

For singular use by individuals, it is the content of a public record that is of most interest. I want to know what a record says, who said it and when it was said. It doesn’t matter whether it’s said in an email, text message, Word document or PDF.

But put several citizens together, put several records together and then you can analyze trends, create visuals, provide statistics and present opportunities for combining with other data to provide an even richer picture. An example of this is taking a Google map of a locality and overlaying it with data about where fire hydrants are located, or recycling stations, sewer mains or what have you. To utilize these multiple records, they must be easy to manipulate, and that requires a basic format. Usually that’s a spreadsheet of some sort.

Data format matters

PDFs, useful in the individual context, are not particularly helpful in the collective context, as VCOG learned in 2013 when it surveyed Virginia’s county and city websites to find out how their budgets. In some localities, VCOG found PDFs that could be searched like documents. But it also found many PDFs that could not be searched because they’d been scanned in as images. These were essentially useless to citizens or researchers unwilling to parse 50 pages of numbers.

In addition to formats that are difficult to work with, there is the issue of how that data gets to the user, be it a citizen or a developer. Many have been thwarted when they ask for databases. Recently the Office of the Executive Secretary to the Supreme Court of Virginia said the database underlying its online search of court cases was not a public record even though the individual case information was. Some databases have been withheld because they have exempt information, even though FOIA says that exempt information fields can be withheld while the rest is released. Some requesters have been charged huge amounts justified because an agency has used customized software that cannot be easily exported.

At worst, this looks like an intentional barrier to access. At best, it looks like indifference or maybe a feeling that it would be too much to add to one’s already full duty of work responsibilities.
That is where the proactive publication of databases—“data sets”—is in the interest of government and both the coding community and the public. For every data set put online, that’s one more request for individual or collective records the government does not have to provide through FOIA. That saves staff time and it makes citizens happy. Further, with access to the data, developers just may come up with creative ways for government workers to do certain tasks quicker.

Of course, the data sets that are offered also have to be data that the public actually wants. And that is where the disconnect with government is at its greatest. Without communication about what is most useful, data sets of minimal interest will be pushed out rather than those sets that are most valuable.

Making data useful

How do I know this? Because services that have already been created from public data came from sets that were not offered through data portals. The Virginia Public Access Project (vpap.org) takes publicly available data, enters it into its own databases and produces a rich, graphic picture of money and politics.

Open data wunderkind Waldo Jaquith took the data produced by the Division of Legislative Automated Systems and created his own legislation tracking website (RichmondSun light.com) that is both visually appealing and gives individuals tools to freely create tags, tracking and social sharing. Jaquith has also taken the Code of Virginia and made it into a dynamic site (vacode.org) with annotations, definitions, legislative history and relevant court cases all interconnected.

Speaking of Jaquith, he figured out a way to take the State Corporation Commission’s basic data on 1.7 million corporations and mashed them into a quick search database (vabusiness es.org); the site also provides its own bulk downloads of various information fields within the database. Meanwhile, Ben Schoenfeld of Blacksburg put together a site (vacircuitcourtsearch.com) that allows users to search all of the states’ circuit courts for pending civil and criminal against a single individual instead of having to go one locality at a time on the Supreme Court’s website.
It is exciting to see what these coding brigades can come up with. In Hampton Roads they took data generated by city buses and created real-time arrival estimates. They took restaurant inspections from the state health department and created a grading system for Virginia’s restaurants. In Northern Virginia, they created an app that searches federal databases so government workers can easily find out if their lunch partner is a federal lobbyist, in which case the worker would have to insist on paying his own way. There are countless apps and tools created by motivated developers.
It’s fair to ask why government isn’t doing this on its own. Resources and time are certainly the biggest obstacles. But sometimes these are overestimated. Several years ago, legislative staff estimated it would cost tens of thousands of dollars to put the voting histories of state legislators’ online. DLAS knew that wasn’t the case, and folks like Jaquith knew it, too: Within hours he’d figured out a way to do it. But because the policy folks weren’t talking to the tech folks, the assumption was that there was only one way to do this, and it was going to be expensive.

To make data work for us, government needs to start thinking creatively at the same time they are converting more of their records into databases that are then pushed out to the public. Citizens need to let their government know what records would make good data sets. And developers need to talk to both government and citizens about what they want those data sets to do.

It’s win–win for everyone.

Megan Rhyne's blog