The US Patent and Trademark Office should switch from documents to data


The debate over patent reform — one of Silicon Valley’s top legislative priorities — is once again in focus with last week’s introduction of the Innovation Act (H.R. 9) by House Judiciary Committee Chairman Bob Goodlatte (R-Va.), Rep. Peter DeFazio (D-Ore.), Subcommittee on Courts, Intellectual Property, and the Internet Chairman Darrell Issa (R-Calif.) and Ranking Member Jerrold Nadler (D-N.Y.), and 15 other original cosponsors.

The Innovation Act largely takes aim at patent trolls (formally “non-practicing entities”), who use patent litigation as a business strategy and make money by threatening lawsuits against other companies. While cracking down on litigious patent trolls is important, that challenge is only one facet of what should be a larger context for patent reform.

The need to transform patent information into open data deserves some attention, too.

The United States Patent and Trademark Office (PTO), the agency within the Department of Commerce that grants patents and registers trademarks, plays a crucial role in empowering American innovators and entrepreneurs to create new technologies. Ironically, many of the PTO’s own systems and technologies are out of date.

Last summer, Data Transparency Coalition advisor Joel Gurin and his colleagues organized an Open Data Roundtable with the Department of Commerce, co-hosted by the Governance Lab at New York University (GovLab) and the White House Office of Science and Technology Policy (OSTP). The roundtable focused on ways to improve data management, dissemination, and use at the Department of Commerce. It shed some light on problems faced by the PTO.

According to GovLab’s report of the day’s findings and recommendations, the PTO is currently working to improve the use and availability of some patent data by putting it in a more centralized, easily searchable form.

patent search room

Patent Search Room – U.S. Patent and Trademark Office

But there is still a long way to go. The PTO’s most important information sources – patent applications – are still submitted to the agency as PDF and text documents, not as data. That means crucial information has to be gleaned manually, or by using scraping technologies.

To make patent applications easier to navigate – for inventors, investors, the public, and the agency itself – the PTO should more fully embrace the use of structured data formats, like XML, to express the information currently collected as PDFs or text documents.

The PTO is taking steps in the right direction. First, by encouraging users to file online during the patent filing process. The Electronic Filing System, or EFS-Web, is the PTO’s web-based patent application and document submission solution.

But the system still converts everything to PDF. In fact, it advertises, “EFS-Web gives you all of the same benefits as paper filings.” What exactly are the benefits of paper filings?

Second, the PTO has announced the availability of Form-Fillable PDFs. Currently, applicants can opt to use fillable PDFs when filing online, and the PDFs  automatically populate the PTO’s internal systems. This is better than if the PDFs had to be manually transcribed by the PTO. But the information isn’t made public in any structured format, so the benefits of open data are lost.

Structured data allows for dynamic filings that can be easily amended, transferred, edited, and used for troubleshooting. The agency should collect all patent filings as data rather than PDF or paper-based documents.

Mr. Gurin’s GovLab report, covering the Open Data Roundtable with the Department of Commerce, focused on that need, noting specifically that the PTO should “move from paper-based or Portable Document Format (PDF) systems to all-digital text-searchable format” and “promote the system for e-filing of patent applications.”

Additional GovLab recommendations included:

  • PTO [should] make more information available about the scope of patent rights, including expiration dates, or decisions by the agency and/or courts about patent claims.
  • PTO should add more context to its data to make it usable by non-experts – e.g. trademark transaction data and trademark assignment.
  • Provide Application Programming Interfaces (APIs) to enable third parties to build better interfaces for the existing legacy systems. Access to Patent Application Information Retrieval (PAIR) and Patent Trial and Appeal Board (PTAB) data are most important here.
  • Improve access to Cooperative Patent Classification (CPC)/U.S. Patent Classification (USPC) harmonization data; tie this data more closely to economic data to facilitate analysis.

Deputy PTO director Michelle K. Lee, whom President Obama recently nominated to lead the office, told NextGov that her agency “is a numbers-driven organization like no other organization is.” She went on to note that the PTO plans to use data more extensively in the patent examination process.

We hope that’s true, but even that wouldn’t address the root problem: applications aren’t expressed as searchable data to begin with.

Updating outdated systems and fixing built-in mistakes at the PTO must be part of the larger patent reform conversation. As the agency that most encourages American innovators and entrepreneurs to look to the future, the PTO should be leading the effort to transform government information from disconnected documents into open data.