Industry Comments Focus on Data Quality, Standards, and Common Digital Identifiers for Federal Data Strategy

Federal Open Data

Industry Comments Focus on Data Quality, Standards, and Common Digital Identifiers for Federal Data Strategy

by Christian Hoehner and Douglas Hummel-Price


As we summarized earlier this month, fifty-two speakers and over 100 experts came together at the Coalition-White House public forum to voice comments about the recently released Phase III Draft 2019-2020 Federal Data Strategy Action Plan (background here).

In addition to the forum, over thirty organizations submitted written comments on the plan’s valuable themes, limitations, and shortcomings. While the feedback touched on a variety of issues, a review of the publicly available comments and forum’s remarks brought some common themes to light including:

  1. The importance of enforcing data quality,
  2. Creating standards and schemas (taxonomies and ontologies),
  3. Promulgating digital identifiers, supporting data literacy in the workforce,
  4. Adhering to ethics and privacy rules and frameworks, and
  5. The need for supporting resources and state and local government coordination mechanisms.

While this summary is not comprehensive of all the written comments, it should help clarify the primary industry and academic themes that most resonate with the Data Coalition’s policy objectives.

1) Data Quality is Paramount. Good data analysis requires high quality data. Coalition-member Deloitte’s Vishal Kapur spoke to this need, calling for an emphasis on “data quality as an enterprise initiative”. The Department of Housing and Urban Development echoed this in their written feedback, calling for all agencies to create a data quality scorecard. Workiva’s Renata Maziarz, a Coalition board member, agreed, stating that improved data quality starts with establishing a baseline for current processes such as enforcing standards and rules at the point of ingestion, or manual data entry.

2) Standardization and Machine-Readability Require Well-Defined Ontologies and Taxonomies. For high quality data to be useful, it must be machine-readable and standardized. Standardization ensures that industry, academia, and government do not waste time on data grunt work such as cleaning and merging data sets. Coalition-member Morningstar’s Jake Spiegelspoke on financial markets’ need for standardization, building off of spoken comments by Conan French of the Institute for International Finance about the necessity of machine-readability and standardization as bedrocks of effective and efficient analysis. Drew Leety, of Coalition-Executive-Partner Booz Allen Hamilton (BAH), discussed this further, speaking to the need for clearly-defined ontologies and taxonomies to serve as a foundation for standardization, leveraging and strengthening relationships between existing standards. Written comments by The International Association of Scientific, Technical, and Medical Publishers (STM) and Coalition-member Object Management Group (OMG) also explicitly discuss the central importance of standardization and machine-readability.

3) Digital Identifiers/LEIs Create Market Clarity. The Data Coalition has known for some time that data involving many organizations would greatly benefit from the adoption of a universal entity identifier, such as the Legal Entity Identifier (LEI). These unique identifiers provide significant clarity into organizational relationships. Researchers have stated that the usage of LEIs would have significantly increased financial regulators’ abilities to respond to the collapse of Lehman Brothers in 2008, limiting the spill-over damage to the greater economy. Robin Doyle of JPMorgan Chase, BAH’s Drew Leety, and Dave Lindsay of Coalition-member Delv all addressed this need by pointing to the LEI. The LEI is a global, nonproprietary solution that meets this need. In written commentary, Dr. Mirek Sopek, CEO of Coalition-member LEI.INFO further laid out the case for the LEI, pointing to “the endorsement of the G20, [Financial Stability Board] FSB, and multiple regulators across the globe.”

4) Data Literacy Facilitates Smart Creation and Use of Data. A large number of respondents discussed data literacy, and the general lack of understanding about data across the Federal workforce. Kathy Rondon, a former CIA collection management officer, now the vice president of talent management at The Reports and Requirements Company, stated the problem in direct terms, “Your most highly technical employees may, in fact, be data illiterate”. Jason Briefel, the Executive Director of the Senior Executives Association, called for a baseline analysis of data literacy; decision-makers must understand how the data feeding their dashboards are created to avoid “potentially dangerous” decisions.

In both written and verbal comments, Coalition-member Qlik stressed the importance of data literacy across the board to facilitate making evidence-based arguments. Senior Director of Public Sector and Healthcare at Qlik, Heather Gittings stated, “Make sure that everyone is able to leverage the data and what comes out of it,” (see Federal Times). CEO of the Institute for Excellence Jane Wiseman put this more directly, saying, “It’s not just mid-level employees either – all decision makers within an agency need to be both digitally and data literate”. Written comments by Tyler Tech’s Socrata also spoke to the need for enterprise-wide training, which it said “allows an organization to have a more sustained and deeper adoption of the Federal Data Strategies.”

5) Ethical Concerns with the Expansion of Data. Another common theme in the forum was the importance of ethics, preservation, and privacy when collecting, sharing, and using data. As mentioned by Qlik’s Heather Gittings, the government must ensure that proper ethical considerations are in place when using data of citizens. Without this, it is a real possibility for citizens to lose faith in the government’s ability to properly handle data, or worse, may not support further data reforms. Microsoft echoed such concerns, especially in the area of artificial intelligence (AI) and mentioned that the Federal Data Strategy should leverage Microsoft’s AI and Ethics in Engineering and Research (AETHER) Committee as a possible resource. The American Council for Technology and Industry Advisory Council (ACT-IAC) AI working group also discussed that the Federal Data Strategy needs to be aware of “data poverty” or the issue that some areas and population are underrepresented due to the lack of resources and available technology to collect data from certain subpopulations. This underrepresentation could cause possible bias concerns in the future.

Additional ethical concerns include the Preservation of Electronic Government Information Project’s (PEGI) comment that they support the Federal Data Strategy’s commitment to enhance data preservation, data but also encourage the Administration to take direct steps in 2019-2020 in order to turn this commitment into action. On privacy, the Postsecondary Data Collaborative (PostsecData) discussed how proper infrastructure must be in place to ensure the protection of personally identifiable information.

6) Funding Seems Missing. The Federal Data Strategy is an ambitious plan that will require funding to properly implement. “The thing that’s glaring to me is the lack of funding,” said Mike Anderson, chief federal strategist at software company Informatica, “The thing that’s glaring to me is the lack of funding. …None of the other actions are presently funded. I think that’s going to be a big challenge to overcome” (see NextGov). SAP also states the need for funding in their written comments: “SAP believes that in order for the Federal Data Strategy to be a successful initiative, a proper funding mechanism shall need to be made to invest in the technical infrastructure to support a secure environment for sharing potentially sensitive data.” HUD and the American Statistical Association each brought up in their respective comments that the training required for the FDS to be successful will require specific funds beyond current allocations.

7) State and Local Governments Provide Opportunities. Many respondents spoke about the need for the Strategy to properly address the role that state and local governments should play. May Ellen Wiggins of the Forum for Youth Investment pointed out that many states, tribes, and localities will need to be included to effectively manage federal dollars spent on the ground. Richard Coffin from Coalition-member USAFacts also suggested that Action 8, “Pilot Standard Data Catalogs for Data.gov”, be made available to state and local governments for testing. The Data Coalition generally notes that the Strategy is conspicuously silent on the topic.

To read the Data Coalition’s comments on the Strategy’s Draft Action Plan, click here.