What if Open Data is Structured and Unstructured? Use a Double-Barreled Solution.
Across government, leaders are adopting structured data formats to make public information easier to collect, organize, and use. Even the Securities and Exchange Commission (SEC), whose six-year-old open data initiative is a good example of how not to do it, has recently made headway toward fully transforming corporate financial statements into structured data.
However, some government information will always remain unstructured.
Although public companies file structured financial data with the SEC—in which every number and line item in their financial statements is separately tagged using the XBRL format—they also disclose paragraphs of unstructured text describing their businesses, performance, and prospects.
To pull insights out of unstructured text, the technology industry has developed some astonishing text-analysis capabilities. Companies like Quid have become known for being able to determine the significance of specific words and phrases occurring and recurring in patterns. Other analytics solutions, such as those offered by Information Builders, allow users to gather and interpret sentiment from text with a high degree of accuracy.
But what happens when structured data and unstructured data are used to describe the same thing? That is what the SEC’s corporate disclosures are like.
Public companies’ financial statements—now available as structured data—describe their performance quantitatively. The management’s discussion and analysis (“MD&A”) that public companies disclose as part of their SEC reports describe the same performance, over the same period, qualitatively.
Structured and Unstructured Insights Require a Double-Barreled Solution
Some of the newest analytics applications are double-barreled, in a sense. They can derive insights from structured and unstructured data sets that are related to one another. Ez-XBRL’s Contexxia platform is one such application. Contexxia can recognize when prose from a public company’s MD&A is related to numbers in the accompanying financial statement.
Contexxia’s document review tool shows changes between filings for different periods and highlights new content. That means Contexxia can show how certain passages in the company’s unstructured MD&A narrative are related to certain parts of the structured XBRL financial statements that came along with that narrative. Users can quickly see if the structured data backs up what the unstructured narrative is saying. (If the narrative says one thing and the structured data says another, there’s your red flag.)
Contexxia’s numeric line view provides a consolidated view of XBRL data and the underlying textual SEC filing. This view has filters for financial numbers—including accounting changes and outliers—and lets users drill down from numbers to underlying footnotes within documents, enabling greater insight into the company’s performance. That help users analyze and understand SEC filings faster.
Contexxia and double-barreled applications like it show that the open data analytics industry isn’t bifurcated between structured data insights and unstructured text analysis. You can have both – and for situations like the SEC’s, you should.