Thoughts on the transformation from documents to data
Governments, businesses, institutions and individuals are amidst a technical and digital reporting transition. Printed documents and PDFs just aren’t accessible enough when you want to work with the data locked inside these documents. For example, the popularity of personal financial management software like Mint, Personal Capital, and Quicken speaks to this transition among consumers. Moving from documents to data enables automation for reporting and analysis in a fashion that increases efficiency and reduces errors. This is an effective use of our tax dollars and good for government management.
In government, the transition from documents to data is transpiring in various ways including, most notably, the Digital Accountability and Transparency Act of 2014 (DATA Act). The DATA Act is the United States’s first open data law that requires all 24 CFO Act agencies to first standardized, then publish all of their spending data on one accounting platform. The open data law is a leading federal government effort through the inevitable transition from documents to data. I think of this transition as follows:
- Reporting is shifting from “what I look like” documents to “what I mean” data.
- Data will not be just values but almost always “tagged” with terms that associate these values with descriptions of their specific meaning.
- Terms are defined in standardized formats that have shared dictionaries to bring consistent meaning to the values.
- Data is of increasing importance for evaluation and oversight as well as prediction.
- Data will replace pixel-perfect documents.
The DATA Act is the first modern attempt to bring together three broad categories of federal spending reporting requirements: cash-based agency budgets, accrual-based accounting data, and award data. The open data law requires the federal government to define and apply standard data elements (what I referred to as “tags” – above) and a government-wide data format to all federal spending. The standard elements have been defined in the DATA Act Information Model Schema or DAIMS version 1.0 (released in April 2016), and the technical data format is in the data language called eXtensible Business Reporting Languages or XBRL.
I cannot overstate the importance of having a good set of standard elements. With it, we can describe our data without ambiguity and without losing its meaning when we transfer it from one data user to another. These are special challenges for the federal government. Even broadly used, high-level terms like “award” and “program” do not have a specific, in-common meaning across federal agencies. It is still common practice for data to be largely stored and transfered only as values, disconnected from what it actually means. Given these circumstances, the DATA Act’s DAIMS is an impressive departure from “data as usual.”
This departure began when the two lead DATA Act agencies, the Department of Treasury and the Office of Management and
Budget (OMB), defined the terms in, and the structure for, DAIMS. The design goals for DAIMS are as follows:
- Facilitate automated testing of data quality.
- Standard terms will be useful beyond their initial purpose.
- Provide for the creation of new agency-specific terms (when necessary) that captures how the new terms relate to standard terms.
Unlike humans, all data is not created equal. It takes thoughtful efforts to achieve the above. I have advised on many private sector and government efforts to create a standardized schema where the above design goals were not considered. Think of what happens when you try to design a house without considering the requirements for the future, how it might accommodate changing needs and new people. The same is true for building a schema; the model for how data will be used must be designed around meaning. Without thoughtful planning you’ll likely end up with a plan that fails to accommodate changes in usage, and expansion.
The DAIMS is designed to accommodate change and expansion. This is necessary given its objectives of bringing together the three broad categories of federal spending reporting requirements of cash-based agency budgets, accrual-based accounting data, and award data. After all, the $3.7 trillion federal budget suggests a tremendous growth in the quantity of data, and the commensurate need for capturing it with unambiguous meaning.
We need a schema that is conceived with data interaction in mind. A schema that captures the kind of information needed so that machines (and their people) may receive, store, retrieve, transform, generate, and transmit the data. The DATA Act Information Model Schema is designed with this in mind. It will be transformative for how agencies report federal spending.
First, it provides shared data definition standards. Every consumer of the Treasury Department’s data benefits from having transparency on data element definitions, references to reporting rules and regulations, and description of allowable values. The benefit of this transparency is that every consumer can use the data in a consistent manner. It’s not just for the Treasury Department anymore.
Second, it is designed to clearly communicate how and where agencies can add new or customize standard terms in the data model. This means that agencies can relate their perspective of the data with the perspective of the Treasury Department. This was a lesson gained from our experience with the U.S. GAAP financial reporting taxonomy.
Third, it is designed to simplify the creation of automated data quality rules. Agencies will be able to test data quality before submission to Treasury. Additional data quality tests can be written and executed by the public if they are so inclined.
The transition from documents to data isn’t just about getting information into digital format on computers. Data Standards and structured matter also matter. The DAIMS version 1.0 provides a great start for the federal government as it joins along with businesses, institutions and individuals in this transition from “what I look like” documents to “what I mean” data.