The Value of Open Data for Patent Policy

Guest Post by Christopher A. Cotropia, Professor of Law and Austin Owen Research Fellow, University of Richmond School of Law; Jay P. Kesan, Professor and H. Ross & Helen Workman Research Scholar, University of Illinois College of Law; and David L. Schwartz, Associate Professor and co-Director of the Center for Empirical Studies of Intellectual Property at Chicago-Kent College of Law

Harlan Krumholz, one of the nation’s leading medical researchers, recently wrote an important New York Times Op-Ed piece called Give the Data to the People. Dr. Krumholz praised Johnson & Johnson for making all of its clinical trial data available to scientists around the world. This included not only the conclusions in published articles, but also unpublished raw data. Companies are often reluctant to share raw data because their competitors may benefit. In the medical field, there are also patient privacy concerns. But releasing the raw data permits other researchers to learn from and build upon existing data. It also offers other researchers the ability to replicate and verify the findings of important medical studies. Dr. Krumholz concludes: “For the good of society, this is a breakthrough that should be replicated throughout the research world.”

We believe that Dr. Kumholz’s call for more publicly available data is applicable to empirical legal studies. It is especially critical, in our view, to the study of patent assertion entities (“PAEs,” which some refer to as patent trolls). There is an important public policy debate underway about the role of PAEs within patent law. There have been reports in the press about PAEs including a high profile report by the President’s Council of Economic Advisors
that relied upon confidential data. In his State of the Union address, President Obama called on Congress to enact patent reform legislation. The main basis for the reform is alleged abuses by “patent trolls.” Unfortunately, much of the raw data about patent litigation is not publicly available.

As academic researchers, we are interested in data about PAEs. We have previously studied and written an article about patent infringement lawsuits filed in 2010 and 2012. Because of the importance of the debate about PAEs, we released the raw data from our study to the public (here) to permit others to evaluate and study. Others have downloaded and commented on our data to us, and we have gone back and verified particular classifications in some instances. We believe more publicly available data is necessary.

Why is more information about PAE litigation not public? After all, the underlying data relates to litigation in the federal courts, and thus does not implicate privacy concerns like in the medical context. However, most of the raw data has been gathered and coded by private companies. For-profit businesses legitimately desire to use the information within their business and to prevent competitors and others from using commercially valuable information. That said, we believe corporate owners should release as much of the raw data (not merely descriptive statistics) as they can. To the extent that the raw data is not released or shared, society should be extremely cautious before relying upon it to make important public policy decisions.

Perhaps more importantly, we call upon academic researchers to gather and release more data. The data should be gathered and released in a form most useful to others. For instance, there is a debate about the definition of a PAE. Some believe that it excludes original ownersfor instance, universities and individual inventors – while others believe PAEs include any non-operating company. Ideally, the raw data should be coded and released on a granular level. That way, future researchers can analyze the data relying upon different definitions, rather than only the definition used by the original researcher.

In sum, we ask that other researchers release more raw data on PAEs, which will permit both robustness checks on the results, as well as future empirical research on the topic. The time for more transparency has come.

12 thoughts on “The Value of Open Data for Patent Policy

  1. 5

    To listen to the complainers, the problem is about low quality, invalid patents that are being used to shake down companies.

    Now, if the patent arguably reads on the accused device, any lawsuit is not a sham — at least not in the sense it was filed without a basis for infringement. Because the Federal Circuit simply does not recognize indefiniteness, has no problem at all with functional claiming, is struggling with 101, a study of lawsuits is not likely to reveal problems with Troll patents in this area.

    One would think that there would be a correlation between Troll patents and IPRs. But IPRs are limited in issues that can be addressed — 102, 103, and prior art, patents and printed publications. So if a problem is elsewhere with the patent, an IPR cannot help.

    Researchers need to ask Troll victims why they say that troll patents are of low quality and invalid. Now that would be an interesting study.

    1. 5.1

      “Researchers need to ask Troll victims why they say that troll patents are of low quality and invalid. Now that would be an interesting study.”

      I would assume it’s one part poor CAFCing (unclear/contradictory decisions), one part poor PTOing (examiner turnover and incentives creates a huge disparity between skill of the public protector and skill of the applicant’s counsel) and one part the length of protection relative to the agility of some arts is just dramatically out of whack, which I suppose makes it poor Congressing.

  2. 4

    I’m all for more data.

    But the idea that collecting more data will somehow make the patent troll problem go away (“the facts show that there is no problem according to our definition of what a problem is!”) is very silly indeed.

    Just sayin’.

    To review the facts: for years now the PTO has been churning out patents at an unprecedented rate. A great many of those patents are functionally claimed junk in the computer-implemented arts, junk that requires no technical skill whatsoever to claim, unless you count lawyering as a technical skill. Because of the incredible cost to litigate patents, a certain class of patent lawyers has “discovered” that deep-pocketed corporations can be targeted with junk patents and shaken down for licensing fees. Those licensing fees, in turn, are used by the litigants to argue that their claims are not junk (thanks to some ill-conceived judge-crafted garbage sometimes to referred to as “secondary factors) as they threaten other players in the industry. These bottom-feeding lawyer “inventors”, who will never be known for anything other than their bottom-feeding skills, are quite shameless about what they are doing when they are confronted with. The only information they wish to hide, generally, is their next plan of attack, the identity of PAE shells in which they have interest, and the identity of any entity who is pulling their puppet strings.

    A statistical analysis isn’t really necessary to realize that there is a problem and some people are more responsible for the problem than others. The reasoning and tactics employed by the lobby in charge of promoting the awesomeness of “More Patents For Everyone All The Time” is virtually indistinguishable from that of the gun lobby, except that (for whatever reason) they are far less competent than the gun lobby.

    1. 4.1

      MM, why don’t we start pointing fingers who are responsible for giving us Alappat, State Street Bank, who overturned Halliburtion, who like functional claiming, and who try to divert attention to so-called litigation abuse in order to pass fee shifting in congress to effective wall off the big companies from patents held by so-called trolls?

      My I present to you the


    2. 4.2

      I’m all for more data.

      LOL – says the guy who opted for the highest level of secrecy in the Disqus era.


  3. 3

    For instance, there is a debate about the definition of a PAE.

    Don’t the people who coined the term get to decide what the definition is?

    If you want a different term with a different meaning, then make up your own term and let everyone know you’re talking about something other than a PAE.

  4. 2

    Sorry I am not current on this issue, but I am confused–exactly what do you mean by “raw data [that] has been gathered and coded by private businesses”? Data about what? Aren’t litigation records public? Are you saying you don’t want to have to pay for PACER??

    1. 2.1

      We shouldn’t have to pay for PACER. This is the third millennium, for Pete’s sake. If everything on the Internet can be accessed from The Internet Archive for free – and that includes cached documents made available from PACER by people participating in the RECAP caching project – then there is absolutely no good reason why PACER can’t be made free as well.

      1. 2.1.1

        Agreed. The costs of PACER could be paid for by a sub-microscopic tax on large corporate litigants (sometimes known as a “filing fee”).

    2. 2.2

      Movies and music are ‘just data.’

      Free the Data ! Free Data for all – um, no wait, the ‘liberal-tards’ did not mean their data…

Comments are closed.