We at IEI recently had the opportunity to talk to Chris Taggart of OpenCorporates about open data, data accessibility, and data quality. OpenCorporates aims to be “an open database of the corporate world, with the goal of having URL for every company worldwide. In addition, they’re slowly importing company-related government data and trying to match it to specific companies. In addition to co-founding OpenCorporates, Taggart built OpenlyLocal.com and OpenCharities. He is also a member of the UK Government’s Local Public Data Panel, the UK’s Tax Transparency Board, and Open Knowledge Foundation’s open government working group.
Q. What are the biggest challenges to open data projects?
A. Biggest challenge by far is getting the data. Often the data isn’t available as structured data, and if it is access is sadly restricted to only those that pay for the data, inhibiting use and innovation. Open data is about removing barriers, and allowing data on the public record to be truly public, usable and useful to all. This is the spirit that has driven so much data based on U.S. Federal data, and which other countries and U.S. States are starting to realize the importance of. To use the words of the G8 open data charter, open data are an “untapped resource with huge potential to encourage the building of stronger, more interconnected societies that better meet the needs of our citizens and allow innovation and prosperity to flourish”.
The second biggest barrier is making sense of the data. Clear documentation is rarely available. OpenCorporates has just finishing turning U.S. financial licenses into a structured dataset, extracting the data from every state, and the biggest problem by far was understanding what the different data meant in each state, and what, for example, was the exact legal status of a State Chartered Bank. So, making this information useful is not just a technical matter of getting and parsing the data, but a cognitive and modeling issue too.
Q. Which national governments and levels of government are the most active in terms of making data more accessible?
A. U.S. Federal Government (though not uniformly), the UK, New Zealand, Norway. The U.S. states tend to be lagging in open data, particularly relating to corporate data, although New York now makes a subset of its corporate registry data available on its open data site.
Q. What are some of the more exciting projects that have recently been deployed?
A. Of course, we think OpenCorporates is the probably the most exciting and important open data project in the world, disrupting both legacy business models, and massively increasing access to a dataset that is critical not just for transparency and open data users, but for other businesses, governments, banks and law enforcement, all of whom are now using it.
Q. How does crowdsourcing play into open data initiatives?
A. By making the data available as open data, you’re going to increase the quality, both by increasing the audience (the so-called many eyes approach), but also by increasing uses to which it is put, and combining datasets is a proven and effective way of identifying data quality problems, both micro and macro. What governments now need to do is to close the loop by allowing end users to report errors and problems, to create a virtuous circle of data.