Guest Post from John Turner: How to publish agency data in a way that satisfies both people and computers
John Turner is an expert in digital performance reporting, one of the inventors of iXBRL and one of the pioneers of XBRL. He’s an international consultant who operates in a number of fields related to digital reporting and software. You can reach him through LinkedIn here.
Transparency across government operations serves many purposes. Making data open and accessible in support of this goal is increasingly an official objective for the U.S. as well as governments and leading agencies worldwide.
Setting policy objectives is one thing. Implementing them is another entirely. Agencies have huge holdings of data. They publish much of it continuously and they must now publish, or make accessible, even more of it. How can this be achieved?
Publishing government data on paper, or paper-like formats (PDFs, HTML and even Excel) is human-friendly. Someone who wants to use a single report, without carrying out analytics or comparison, obviously wants to be able to read it. But paper-like formats don’t help those who want to use or re-use the information more systematically. They incur significant costs retyping or transforming it into a format that can be used in analysis. Machine-readable formats like the eXtensible Business Reporting Language (XBRL) allow instant analysis.
But what if you want to support both kinds of use?
HMRC, the United Kingdom’s tax authority, was faced with exactly this kind of problem. The agency wanted to shift UK company financial statement reporting from a human-friendly format over to a machine-readable one, XBRL. This would enhance the analysis that HMRC could do on the more than 1.5 million such reports received from tax paying businesses each year. With data being prepared in XBRL by company accountants, HMRC would be able to stop silly mistakes creeping in (“the balance sheet doesn’t balance”) and sift out anomalous filings for further investigation.
The problem was that an XBRL-based report is anything but human-readable. HMRC investigators discussing financial statements with company officials, needed to be able to talk about “Page 3,” not “Line 3074.” Companies needed to be able to produce clear reports and format them they way they wanted to. At the same time, HMRC still needed to be able to ingest the data into their business intelligence systems without retyping or transformation.
HMRC turned to Inline XBRL, or iXBRL, an extension to the XBRL standard, developed to solve the problem. Inline XBRL is a way of embedding, right inside an ordinary HTML file, instructions that convert human readable text and numbers into machine-readable XBRL. Inline XBRL does this by wrapping each piece of reportable data in tags. Browsers display the text (“$10,000” or “The Directors were unanimous”) but not the semantics (“<US GAAP Profit>” or “<Executive Remuneration Recommendation>”).
Companies create their reports (using off-the-shelf accounting software from more than a dozen vendors) and they look exactly the way the companies expect. The XBRL tags are embedded inside the documents and HMRC can both resolve quality issues and carry out advanced analytics using that format.
HMRC has successfully received some 3,500,000 iXBRL filings since the arrangements were formalized in April 2011, and now agencies in a number of other parts of Europe are either using iXBRL or trialing it.
What’s the takeaway?
Paper, or paper-like formats for data are useful to individual people wanting to use individual reports. Machine-readable formats are needed wherever people want to carry out significant value-add or significant analysis. Agencies need to understand this distinction when developing their open-data policies.
Where any kind of performance measurement data needs to be published or made available by government agencies, the use of XBRL needs to be strongly considered. XBRL is a well developed and internationally accepted standard way of communicating that kind of data.
Where there is a need to provide human-readable versions of the data as well, agencies can reduce cost and simplify the process by publishing their data on their web sites, in human-readable web pages, with the data markup built right into the web page.
To do this, they can use the iXBRL standard.