24
Tensions between Copyright and Knowledge Discovery Susan Reilly 23-24 March 2015 Library Science Talks, Geneva & Bern

Library Science Talk: Tensions between copyright and knowledge discovery

Embed Size (px)

Citation preview

Tensions between Copyright and Knowledge Discovery

Susan Reilly

23-24 March 2015

Library Science Talks, Geneva & Bern

Text & Data Mining is the future

“Text and data mining (TDM) is the process of deriving information from machine-read material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns.” JISC

Why do we call this Knowledge Discovery?

• Ultimate goal is to extract high level knowledge from low level data

• Allows analysis across disciplines• “Undiscovered public knowledge” (Swanson)• Identifies patterns in the data to produce new

knowledge• It’s not a new thing, it’s just digital information

makes it a whole lot more powerful and relevant!

Alternative to literature review

• Over 50 million articles online• 1.5 million articles published annually• Advanced discovery and visualisation• A more efficient way to discover what is already

out there

Malhotra A, Younesi E, Gurulingappa H, Hofmann-Apitius M (2013) ‘HypothesisFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text. PLoS Comput Biol 9(7): e1003117. doi:10.1371/journal.pcbi.1003117

“TDM saves lives”

http://arxiv.org/abs/1407.7094

• Tools in the armoury of every biologist and biotecnician

• Discover new treatments for diseases e.g. fish oil for Raynaud’s Syndrome

• Controlling malaria outbreaks• Links between gene mutation and cancers

“TDM saves lives”

….

Cultural Insight

http://arxiv.org/abs/1407.7094

Is this reproducible? Where is the data?

Increased Transparency: anyone can use TDM tools!

E.G. Sentiment analysis of IFLA open letter to European Union

Economics & Competitiveness (Europe)

• TDM potentially worth 5.3 billion euro a year to European research budget (2%)

• Knock-on effect would be a minimum of 32.5 billion euro increase in GDP

• US responsible for over halfthe articles and patents on TDM- 1100 US patents compared to 39 EU by 2013• Non-english speaking countries falliing behind

Copyright v TDM

• Because it involves the copying of content in order to convert into machine readable format TDM may infringe copyright

• European Database Directive

prohibits copying of substantial

parts of databases• In US TDM is covered

by fair use, other parts of the

world have a specific exception

e.g. Japan, UKhttps://www.flickr.com/photos/apelad/304195427/

The debate in Europe

• Licences for Europe, Feb 2013– “The Commission's objective is to promote the efficient use of text and data

mining (TDM) for scientific research purposes. ……The Group should explore solutions such as standard licensing models as well as technology platforms to facilitate TDM access.”

• No discussion of copyright e.g. does TDM infringe copyright law?

• Engaging the wrong stakeholders• An attempt to systematise a problem/not a

solution

The problem with licences

• Permission culture: Why relicence? Can’t licence everything!

• Not scalable or cost effective• Will licence reflect how the researcher actually

performs TDM?

ME 442 Permission" by Nina Paley - http://mimiandeunice.com/2011/08/30/permission-2/. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:ME_442_Permission.png#mediaviewer/File:ME_442_Permission.png

Elsevier TDM Policy

• Access through API only• Text only- no images, tables• Research must register details• Click-through licence• Terms can change any time• Reproducibility of results

The debate in Europe continued…..

• Copyright consultation in March 2014• Commission to present a proposal for reform in

September• JURI rapporteur, Julia Reda, draft report on

InfoSoc Directive to be voted on in May

The Perfect Swell: ideal conditions for growth of TDM in Europe

• Stakeholder workshop (60 attendees)• Views from industry, researchers, infrastructure, OA

publishers, legal experts• Main findings:

– Licencing not scalable

– Need to address lack of legal clarity (does TDM infringe copyright?)

– Need for harmonisation of copyright law– Lack of awareness amongst researchers– Publisher infrastructure not threatened by TDM

http://blogs.plos.org/opens/2014/03/09/best-practice-enabling-content-mining/

So, what do we want?

• Legal clarity– A specific exception in EU law to allow TDM– A reinterpretation of EU law

• Legal interoperability– A solution at WIPO

• Open licences– CC-by and CC0

What do we not want?

• Licences for subscriptions which explicitly forbid machine crawling

• A licence with every single publisher for every single research project

• Publishers placing conditions on how TDM results are disseminated

• Click-through licences• “Open access” licences that are NOT interoperable (STM model licences)

Spreading the Message

• Global and multistakeholder• Take a holistic approach• Articulation of the value of TDM• Case studies• Practitioners• Common vison• Actions

Elsevier TDM Policy

• Access through API only• Text only- no images, tables• Research must register details• Click-through licence• Terms can change any time• Reproducibility of results

Key Principles

• Common vision• Copyright not intended to govern access to

facts, ideas and data, nor should it• Need to move beyond the tipping point of open

access• Protect academic freedom• Actions

Thank You!Any questions?

@skreilly

www.libereurope.eu