Our society’s move to digital has given rise to enormous uncertainty in legal quarters. Specifically, the ownership of information has never been more unclear at a time when we’re inundated with ever-increasing volumes of data.
We got into this situation because of:
- the assertion of private rights over public data
- the assertion of private rights over un-copyrightable data (like a correct physical address)
- the lack of field-level provenance information for public data
- the lack of an effective dispute resolution framework
- the lack of easy-to-use government APIs for access to public data
Where is copyright in all this?
As the Supreme Court reaffirmed in 2012, facilitating the dissemination of creative expression is critical to the country’s ability to innovate, so “creators” need the rights to those works protected. But 17-70% of all creative works are “orphaned” because the rights holder is unknown. This breeds uncertainty. Cautious libraries, archives, and museums often exclude important works from public discourse to its detriment. As the Copyright Office put it, “the judiciary has yet to explicitly address how to apply fair use to orphan works.”
While public institutions must err on the side of caution, it is the Wild West for private companies that gather all kinds of data, add value to it, mix it with other data, and create diverse, innovative products and services. As long as these private services power in-house analytics or under-the-hood algorithmic recommendations, they remain virtually invisible. That’s all well and good, apart from the fact that this has postponed the real debate over “digital fair use.”
So what are the bounds of fair use for “data that is publicly available on the Internet”? The Feist decision of 1991 made it clear that accurate information on the location of a company or person is inherently not copyrightable. Still, pretty much every firm asserts rights to that data in their Terms of Use agreements. Firms can often protect their data from machine-to-machine requests (the majority of all Internet traffic as of 2014) using technology, even though their asserted rights are unenforceable. That leaves us more or less at a stand-off between content creators and those who want to use their data (usually in noncompetitive ways the creators never imagined). Creators battle the bots, but don’t go so far as to sue the folks sending the bots in their direction.
Everyone seems to understand that it’s unlikely the bots will stop any time soon. In the current environment, there is valuable information without any copyright protection, government data that is copyright-free and easy to use, and privately owned data that is “publicly accessible.” The broad presumption that “data wants to be free,” somewhat based on Feist, runs headlong into the problem that “publicly accessible” data is frequently offered under an advertising-supported model. Some of the folks that “use” these data without displaying the associated advertising, like Google News and Facebook, offer “bartered” benefits to the content owners so they can offer free current awareness news services incorporating data they do not own.
But not every company can drive enough traffic to the content owners’ advertisers to make an arrangement like that palatable to the owners, and other types of data may not lend themselves to these models with such obvious cost-benefit trade-offs for both parties. This unclear legal landscape and lack of other workable reuse models means there is a great deal of uncertainty in the information industry as a whole. Strangely enough, this uncertainty has unleashed a massive amount of creativity that could not exist in a restrictive regulatory environment like that of the EU. Our innovators are finding all kinds of ways to craft data into original and useful new services as they wait for legal direction to come from the judiciary, just as we’ve seen “sharing economy” business models that are slowly becoming subject to new regulations.