Exploring the Interfaces Between Big Data and Intellectual Property Law

AuthorDaniel Gervais
PositionDr. Gervais is Professor of Information Law at the University of Amsterdam and the Milton R. Underwood Chair in Law at Vanderbilt University. The author is grateful to Drs. Balász Bodó, João Quintais, and to Svetlana Yakovleva of the Institute for Information Law (IvIR), to participants at the University of Lucerne conference on Big Data and ...
Pages3-19
Exploring the Interfaces Between Big Data and Intellectual Property Law
2019
3
1
Exploring the Interfaces Between Big
Data and Intellectual Property Law
by Daniel Gervais*
© 2019 Daniel Gervais
Everybody may disseminate this ar ticle by electronic m eans and make it available for downloa d under the terms and
conditions of the Digital P eer Publishing Licence (DPPL). A copy of the license text may be obtain ed at http://nbn-resolving.
de/urn:nbn:de:0009-dppl-v3-en8.
Recommended citation: Dani el Gervais, Exploring the Interface s Between Big Data and Intellectual Prope rty Law, 10 (2019)
JIPITEC 3 para 1
Keywords: Copyright; patent; data exclusivity; artificial intelligence; big data; trade secret
ber of legal systems and likely to emerge to allow the
creation and use of corpora of literary and artistic
works, such as texts and images. In the patent field,
AI systems using Big Data corpora of patents and sci-
entific literature can be used to expand patent appli-
cations. They can also be used to “guess” and disclose
future incremental innovation. These developments
pose serious doctrinal and normative challenges to
the patent system and the incentives it creates in a
number of areas, though data exclusivity regimes can
fill certain gaps in patent protection for pharmaceu-
tical and chemical products. Finally, trade secret law,
in combination with contracts and technological pro-
tection measures, can protect data corpora and sets
of correlations and insights generated by AI systems.
Abstract: This article reviews the application
of several IP rights (copyright, patent, sui generis da-
tabase right, data exclusivity and trade secret) to Big
Data. Beyond the protection of software used to col-
lect and process Big Data corpora, copyright’s tradi-
tional role is challenged by the relatively unstructured
nature of the non-relational (noSQL) databases typ-
ical of Big Data corpora. This also impacts the appli-
cation of the EU sui generis right in databases. Mis-
appropriation (tort-based) or anti-parasitic behaviour
protection might apply, where available, to data gen-
erated by AI systems that has high but short-lived
value. Copyright in material contained in Big Data
corpora must also be considered. Exceptions for Text
and Data Mining (TDM) are already in place in a num-
2019
Daniel Gervais
4
1
and value.4 “Volume” or size is, as the term Big Data
suggests, the rst characteristic that distinguishes
Big Data from other (“small data”) datasets. Because
Big Data corpora are often generated automatically,
the question of the quality or trustworthiness of the
data (“veracity”) is crucial. “Velocity” refers to “the
speed at which corpora of data are being generated,
collected and analyzed”.
5
The term “variety” denotes
the many types of data and data sources from which
data can be collected, including Internet browsers,
social media sites and apps, cameras, cars, and a host
of other data-collection tools.
6
Finally, if all previous
features are present, a Big Data corpus likely has
signicant “value”.
3 The way in which “Big Data” is generated and used
can be separated into two phases.7
4
First, the creation of a Big Data corpus requires
processes to collect data from sources such as those
mentioned in the previous paragraph. Second, the
corpus is analysed, a process that may involve Text
and Data Mining (TDM).8 TDM is a process that uses
an Articial Intelligence (AI) algorithm. It allows
the machine to learn from the corpus—hence the
term “machine learning” (ML) is sometimes used
as a synonym of AI in the press.9 As it analyses a
Big Data corpus, the machine learns and gets better at
what it does. This process often requires human input
to assist the machine in correcting errors or faulty
correlations derived from, or decisions based on, the
data.
10
This processing of corpora of Big Data is done
to nd correlations and generate predictions or other
valuable analytical outcomes. These correlations and
4 Jenn Cano, ‘The V’s of Big Data: Velocity, Volume, Value,
Variety, and Veracity’, XSNet (March 11, 2014), <https://
www.xsnet.com/blog/bid/205405/the-v-s-of-big-data-
velocity-volume-value-variety-and-veracity> (accessed 10
December 2018).
5 Ibid.
6 The list includes “cars” as cars as personal vehicles are
one of the main sources of (personal) data—up to 25
Gigabytes per hour of driving. The data are fed back to
the manufacturer. See Uwe Rattay, ‘Untersuchung an vier
Fahrzeugen - Welche Daten erzeugt ein modernes Auto?’,
ADAC, <https://www.adac.de/infotestrat/technik-und-
zubehoer/fahrerassistenzsysteme/daten_im_auto/default.
aspx> (accessed 11 December 2018).
7 The two components are not necessarily sequential. They
can and often do proceed in parallel.
8 See Maria Lillà Montagnani, ‘Il text and data mining e il
diritto d’autore’ (2017) 26 AIDA 376.
9 Cassie Kozyrkov, ‘Are you using the term ‘AI’ incorrectly?’,
Hackernoon (26 May 2018), <https://hackernoon.com/are-
you-using-the-term-ai-incorrectly-911ac23ab4f5>.
10 How IP will apply to the work involved in the human
training function of machine learning is one of the
interesting questions at the interface of Big Data and IP. The
term “training data” is used in this context to suggest that
the machine training is supervised (by humans). See Brian
D Ripley, Pattern Recognition and Neural Networks (Cambridge:
Cambridge University Press, 1996) 354.
A. Introduction
1
The interfaces between “Big Data” (as the term is
dened below) and IP matters both because of the
impact of Intellectual Property (IP) rights in Big
Data, and because IP rights might interfere with the
generation, analysis and use of Big Data. This Article
looks at both sides of the interface coin, focusing
on several IP rights, namely copyright, patent,
data exclusivity and trade secret/condential
information.1 The paper does not discuss trade
marks in any detail, although the potential role of
Articial Intelligence (AI), using Big Data corpora,
2
in
designing and selecting trade marks certainly seems
a topic worthy of further discussion.3
B. Defining Big Data
2 The term “Big Data” can be dened in a number of
ways. A common way to dene it is to enumerate
its three essential features, a fourth that, though
not essential, is increasingly typical, and a fth that
is derived from the other three (or four). Those
features are: volume, veracity, velocity, variety,
* Dr. Gervais is Professor of Information Law at the University
of Amsterdam and the Milton R. Underwood Chair in Law
at Vanderbilt University. The author is grateful to Drs.
Balász Bodó, João Quintais, and to Svetlana Yakovleva of
the Institute for Information Law (IvIR), to participants at
the University of Lucerne conference on Big Data and Trade
Law (November 2018), to Ole-Andreas Rognstad and other
participants at the Data as a Commodity workshop at the
University of Oslo (December 2018), and to the anonymous
reviewers at JIPITEC for most useful comments on earlier
versions of this Article.
1 The Article considers IP rights applied by all or almost
all countries, namely those contained in the Agreement
on Trade-related Aspects of Intellectual Property Rights,
Annex 1C of the Agreement Establishing the World Trade
Organization, 15 April 1994. As of January 2019, it applied
to the 164 members of the WTO, including all EU member
States and the EU itself.
2 This use of the term “corpus” in this context is an extension
of its original meaning as either a “body or complete
collection of writings or the like; the whole body of
literature on any subject”, or the “body of written or spoken
material upon which a linguistic analysis is based”. Oxford
English Dictionary Online (accessed 21 December 2018).
There is a debate about the proper form of the plural. Both
Oxford and Merriam-Webster indicate that “corpora” is
the proper form, although the author has encountered the
form “corpuses” in the literature discussing Big Data. See
e.g., the 2014 White House report to the President from the
President’s Council of Advisors on Science and Technology
titled “Big Data and Privacy: A Technological Perspective”,
at x. “Corpora” is the form chosen here, although the
predicable future is that the perhaps more intuitive form
“corpuses” will win this linguistic tug-of-war.
3 For example, AI systems can create correlations between
trademark features (look, sound etc.) and their appeal, thus
allowing the creation and selection of “better” marks.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT