Guest post from Jim Harper: Transforming legislative text into open data – without waiting for Congress to do it


Jim Harper, director of information policy studies at the Cato Institute and a member of the Data Transparency Coalition’s Board of Advisors, here previews the future of legislative data transparency. Rather than wait for Congress to adopt a standardized data format to turn legislative text into open data, Cato’s Deepbills project is doing that work on its own. These are the early results. 
In our Deepbills project, we are well into production on data that more fully reveals what the bills introduced in Congress contain.We are transforming the text of legislative proposals into an XML format that allows us to electronically identify key information. That means we’re able to more fully reveal what the bills introduced in Congress contain. We’ve now performed our semantically rich markup on over 4,000 of the 5,000+ bills introduced in Congress so far.

Following a hunch, I recently looked into open-ended authorizations of appropriations, finding a surprising incidence of bills that place no limit on the spending they permit. In the blog post at the link, I make some observations on the fact that 40% of bills that authorize spending allow appropriators to spend whatever they want. It’s an issue worth investigating further.

The data has much more to reveal, and it has many more uses. We’ve been using it to enrich articles about significant bills in Congress on Wikipedia, for example. Articles we’ve generated have gotten 70,000 hits. You can practically feel our democracy strengthening…
It’s really up to the transparency community to find all the uses to which this data can be put. So help yourself! Bulk download and an API are here. If we can improve it for you, please let us know. And pass the word!

Of course, someday we would like to see Congress adopt a semantically rich format like the one we’re using and apply it, officially, to legislative proposals –  instead of publishing bills as minimally-enriched text, as happens today. The more uses you find for our data, the more likely it is that the many cooks in the kitchen of legislative drafting and publication will work toward that future.