51
Fran Berman, Data and Society, CSCI 4370/6370 Data and Society Lecture 8: Data in the Global Landscape 4/8/16

Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Embed Size (px)

Citation preview

Page 1: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Data and Society Lecture 8: Data in the Global Landscape

4/8/16

Page 2: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Announcements

• Paper due before class today.

• If you’re interested in your grade so far, come talk to Fran. (Office hours: 1-2 or by appt.)

• Bulent Yener lectures on April 22 about Data and Security!

• It looks like we will have 5-6 slots during the last 2 classes for Data Roundtable “do-overs”

– Groundrules:

• Presentation / review graded like usual

• Student gets the best 2 of 3 Roundtable grades

Page 3: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Today (4/8/16)

• Lecture 8: Data in the Global Landscape

• L6 + L7 Data Roundtable

3

Page 4: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Section Theme Date First “half” Second “half”

Section 1: The Data Ecosystem -- Fundamentals

January 29 Class introduction; Digital data in the 21st Century (L1)

Data Roundtable / Fran

February 5 Data Stewardship and Preservation (L2) L1 Data Roundtable / 5 students

February 12 Data-driven Science (L3) L2 Data Roundtable / 5 students

February 19 Future infrastructure – Internet of Things (L4)

L3 Data Roundtable / 5 students

February 26 Section 1 Exam L4 Data Roundtable / 5 students

Section 2: Data and Innovation – How has data transformed science and society?

March 4 Paper assignment description Section 1 Data Roundtable / 5 students

March 11 Data and Health: Phil Bourne guest lecture (L5)

Section 2 Data Roundtable / 3 students

March 18 Spring Break / no class

March 25 Data and Entertainment (L6) Section 2 Data Roundtable / 5 students

April 1 Big Data Applications (L7) Privacy Panel / 6 students

Section 3: Data and Community – Social infrastructure for a data-driven world

April 8 Data in the Global Landscape (L8) Section 2 paper due

L7 Data Roundtable / 5 students

April 15 Digital Rights in the U.S. (L9) L8 Data Roundtable / 5 students

April 22 Bulent Yener: Review of Privacy, Anonymity, and Cryptocurrency (L10)

Digital Rights Forum / 7 students

April 29 Digital Governance and Ethics (L11) L10 Data Roundtable / 5 students

May 6 Section 3 Exam L11 Data Roundtable / 5 students

We are here

Page 5: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Lecture 8: Data in the Global Landscape

Page 6: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Perspectives on digital data vary globally

• “Social infrastructure” around rights, privacy, data sharing vary around the world

– Complex interaction of data potential, privacy, policy, innovation driving critical national conversations

• Even for scientific communities, different national approaches to R&D investments, data sharing, stewardship and preservation, public-private partnerships vary

• At the same time, there is universal recognition of the importance of digital data as a driver for innovation and progress

– each nation finding their own solutions to common, fundamental problems within their own cultures

Page 7: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Today’s lecture

• Digital Rights in Europe

• Health Data in Iceland

• International Coopetition in Research

• Research Data Alliance

Page 8: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Digital Rights in Europe

Page 9: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

European Union (EU) Digital Agenda

• Overall aim is to deliver sustainable economic and social benefits to Europeans from information and communication technologies.

– Europe perceives itself as lagging behind in terms of use and deployment if IT

• EU launched Europe 2020 strategy in March 2010. Digital Agenda for Europe one of the 7 flagship initiatives of the Europe 2020 Strategy.

Page 10: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

EU Data Challenges

• Fragmented digital markets

– 27 countries in EU, much variation between content, services, and infrastructure across boarders; unification difficult

• Lack of interoperability

– “weaknesses in standard setting”, difficulty in coordination

• Rising cybercrime and low risk of trust in networks

• Lack of investment in networks

• Insufficient research and innovation efforts

• Lack of digital literacy and skills

• Missed opportunities in addressing societal challenges

Page 11: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

EU Digital Agenda Action Areas 1

• Single digital market

– Want to unify telecom, services, rules, and content

– Rights and protection for consumers and businesses when doing business on-line

• Interoperability and Standards

• Trust and security

– “Europeans will not embrace technology they do not trust – the digital age is neither ‘big brother’ nor ‘cyber wild west’.” (Digital Agenda for Europe, COM(2010) 245, 19.05.2010)

Page 12: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

EU Digital Agenda Action Areas 2

• Fast and ultra fast internet access

– Universal broadband coverage, open and neutral internet

• Research and Innovation

– Leverage private investment and accelerate innovation

– Increase digital literacy, skills and services

• ICT-enabled benefits for EU society

– ICT-enabled energy, environment, health care, independent living, cultural diversity / arts, e-government, transportation.

Page 13: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Code of EU on-line rights

• Rights and Principles applicable when you access and use online services

– “Universal” access to electronic communication networks and services

– Access to services and applications of your choice

– Non-discrimination when accessing services provided online

– Privacy, protection of personal data and security

• Rights and Principles applicable when you buy goods or services online

– Information prior to the conclusion of a contract

– Timely, clear and complete contractual information

– Fair contract terms & conditions

– Protection against unfair practices

– Delivery of goods and services without defects and in good time

– Withdrawal from a contract

• Rights and Principles protecting you in case of conflict

– Access to justice and dispute resolution

Page 14: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Individual rights

• “The right to data protection and the right to privacy are two distinct human rights recognized in the Charter of Fundamental Rights of the European Union, the Treaty on the Functioning of the EU (TFEU), and in two legal instruments of the Council of Europe, to which all the EU Member States are parties.”

• New rights proposed in 2012:

– Right of portability (can access and transfer data easily from one service provider to another)

– Right to be forgotten (can access, object, correct, erase) – upheld in 2014

– New provisions on profiling

– Requirement that data controllers notify individuals in the event of a security breach in order to avoid identity fraud

– Enhancement of privacy rights of children and their right to personal data protection -- draft regulation prohibits the processing of personal data of a child below 13 without the consent of a parent or guardian.

• Proposal beefs up enforcement powers of Data Protection Authorities.

Information from the Library of Congress: https://www.loc.gov/law/help/online-privacy-law/eu.php

Page 15: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Europe vs. Facebook

• Advocacy group started by Austrian University student (Max Schrems) grew to a grass-roots movement of 25,000+ people

• Issue is potential violation of EU data protection law due to personal data collected by Facebook, etc.

• Schrems filed a complaint with Irish Data Protection Commissioner alleging 22 violations of European law

Page 16: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Europe vs. Facebook, 2011/2012

• Schrems claimed that Facebook collected data he never consented to provide: physical location, data he had deleted, etc. Schrems started the “Europe vs. Facebook” movement and 25,000+ other users also requested FB data.

– Legal case has been crowdfunded …

• Irish Data Protection Commissioner (DPC) started investigation

– Complaints filed in Ireland because European users have a contract with “Facebook Ireland Ltd”. Under European law, Facebook Ireland is the “data controller” for facebook.com and therefore facebook.com is governed by European data protection laws.

• Schrems eventually recovered 1,222 pages of material 57 data categories from FB in 2011

– Schrems claims that Facebook did not provide all data and that Facebook holds at least 84 data categories about every user.

• FB developed a download tool to provide users a quick overview of the data being kept on file. FB also agreed to cut the amount of time it retains data on user activities to less than one year.

Page 17: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Safe Harbor questioned

• Irish DPA dismissed Schrem’s original

complaint by saying that FB was

protected by Safe Harbor agreement.

• But … the Judge asked the European

Court of Justice (ECJ) to examine whether

Ireland’s data watchdog is bound by “safe

harbor” and whether an investigation

should be launched.

• Safe Harbor allows companies to self-

certify to provide “adequate protection”

for EU data users to comply with the

European data protection directive

Safe Harbor Principles:

• Notice - Individuals must be informed that their data is being collected and about how it will be used.

• Choice - Individuals must have the option to opt out of the collection and forward transfer of the data to third parties.

• Onward Transfer - Transfers of data to third parties may only occur to other organizations that follow adequate data protection principles.

• Security - Reasonable efforts must be made to prevent loss of collected information.

• Data Integrity - Data must be relevant and reliable for the purpose it was collected for.

• Access - Individuals must be able to access information held about them, and correct or delete it if it is inaccurate.

• Enforcement - There must be effective means of enforcing these rules.

Page 18: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Max Schrems on the eve of the EU court decision on Safe Harbor

https://www.youtube.com/watch?v=OJ1-lY0UEBA

~ 5 minutes

Page 19: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

What happened after the interview

• EJC Safe Harbor ruling in 2015 invalidated Safe Harbor. (Replacement

agreement being developed for 2016)

• Schrems filed an updated complaint against Facebook with the Irish Data

Protection Authority (DPA)

– Complaint claims that EU FB users’ data being pulled into NSA surveillance programs

once it has been exported to the U.S. undermines EU data protection rights.

• Schrems also filed complaints to Belgian DPA and Hamburg DPA.

Complaints call for DPAs to suspend all data transfers from FB EU HQ to

U.S. Also calling for audit of FB as a data importer.

• Irish judgement is valid for 28 countries, Schrems is also bringing suit in

other countries to build a class-action appeal against Facebook’s internal

privacy policies.

• Latest news: http://europe-v-facebook.org/EN/en.html

Page 20: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Health Data in Iceland

Page 21: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Icelanders

• Iceland has a population of ~326,000 and is the most sparsely populated country in Europe.

• Iceland provides universal health care to its citizens and spends a fair amount on health care, ranking 11th in health care expenditures as a percentage of GDP and 14th in spending per capita.

– Health care system is ranked 15th in performance by the World Health Organization.

• Ethnically homogeneous. Most Icelanders descendants of Germanic and Gaelic (Celtic) settlers.

– 93% Icelandic

– 3.13% Polish

– 3.84% Other

• Iceland has extensive genealogical records dating back to the late 17th century and fragmentary records extending back to the 9th century.

Source: Wikipedia articles on Iceland, Icelanders

Page 22: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Whole Country Health Data

• 1996: deCODE Genetics corporation founded to identify human genes associated with common diseases using populations studies and apply the knowledge gained to guide the development of candidate drug treatments.

• Company worked with government on the Health Sector Database Act with the intention of creating the Icelandic Health Sector Database (HSD) to merged genealogical, genetic and health records for the entire population of Iceland

– Opt-out model of presumed consent

– Services and infrastructure developed as well for mining data in HSD

• DeCODE collected full DNA sequnces on 10,000 individuals.

– Because Icelandic population is so homogeneous, DeCODE says it can

extrapolate (“impute”) to accurately guess the DNA makeup of the other 330K

citizens, including those who never participated in the studies.

Page 23: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Scientific results; problems with informed consent

• deCODE never built the controversial DB but pursued traditional genome-wide association studies to try to identify genetic changes contributing to common diseases

• deCODE data used for discoveries about genes that increase risk for kidney disease, cancer, lupus, vascular disease, schizophrenia, osteoporosis, etc.

– One result identified a gene that protects against Alzheimer’s

– DeCODE has identified mutations in BRCA2 that convey sharply increased risk of breast and ovarian cancers.

• Problems with informed consent

– DeCODE’s data could identify 2K people with the gene mutation but there are legal and ethical issues that prevent DeCODE from informing people who are at risk

• Inferences go “beyond informed consent”

Information from http://www.els.net/WileyCDA/ElsArticle/refId-a0005180.html

Page 24: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Controversy and Transition

deCODE founded in 1996; filed for

bankruptcy in 2009

Saga Investments LLC purchased

deCODE services and assets in 2010

Amgen purchased deCODE in 2012,

spun off NextCODE Health in 2013

NextCODE acquired by WuXi

PharmaTech in 2015

• Legal judgement from the Icelandic Supreme Court effectively killed off the HSD project in 2003

– Court case focused on legal rights and rights to not participate from deceased Icelander. Legal issues included legal standing and personal rights of deceased individual, identifiability due to the richness of data

– Part of the problem was the original Health Sector Database Act which did not provide information and guidelines on how DB should be set up, who should run it, who should have access to the data, and what control Icelandic citizens should have over samples.

– Company believed it could continue to identify disease-related genes without the database

• Commercial Failure

– Studies led to development of DB and scientific results but company was a commercial failure and went bankrupt in 2009.

– Continued as private company (NextCODE) and was bought by Amgen in 2012. No compensation given to Icelanders.

• Services and assets of deCODE went through many transitions:

Page 25: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Iceland / public China / private

• Website Description: “NextCODE was founded in 2013 as a spinout of Iceland’s deCODE genetics, to apply the unique population-scale genomic big data solutions developed there to realize precision medicine around the globe. WuXi NextCODE was created in 2015 through the acquisition of NextCODE by WuXi AppTec, the leading China- and US-based open-access R&D platform. NextCODE was founded in 2013 as a spinout of Iceland’s deCODE genetics, to apply the unique population-scale genomic big data solutions developed there to realize precision medicine around the globe. WuXi NextCODE was created in 2015 through the acquisition of NextCODE by WuXi AppTec, the leading China- and US-based open-access R&D platform.

• “The result is the only end-to-end global solution in the industry, bringing together NextCODE’s unrivalled bioinformatics and human genetics with the Shanghai-based WuXi Genome Center’s CAP/CLIA-sequencing and WuXi AppTec's R&D capabilities. Through partnerships and our own products we are leading the application of the genome to benefit patients and improve health worldwide.”

https://www.nextcode.com/

Page 26: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

International “Coopetition” in Research

• Research and Innovation seen as key drivers for international leadership, social advancement, and economic health.

• Countries strongly influenced by “best practices” and trends in other parts of the world.

• Competition and collaboration: Scientific communities span borders and contribute to global cross-fertilization, coordination and synergy

Page 27: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

OECD Comparisons: Human and Financial Resources devoted to R&D http://www.oecd.org/innovation/inno/researchanddevelopmentstatisticsrds.htm

• OECD: Organization for Economic Cooperation and Development

– Provides a forum for sharing best practices and metrics and measures of global standing

BRICS: Brazil, Russia, India, China, South Africa

Page 28: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Researchers per 1000 professionals • U.S. greater than

OECD average • U.S.: 8.74 • Countries with

average > U.S. in 2012: – Finland – Iceland – Sweden – Japan – Denmark – Canada – Singapore – Slovenia – Korea – Belgium, etc.

http://www.oecd-ilibrary.org/industry-and-services/researchers/indicator/english_20ddfb0f-en?isPartOf=/content/indicatorgroup/09614029-en

Page 29: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Research as measured by publication citation

Top 10% most cited papers [Graph from Nao Tsunematsu]

US

DE

KR

FR CN JP

UK

Page 30: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Open Science data sharing drives greater innovation

• International community adopting various strategies to promote open science and create a driving infrastructure of open data.

• Strategies include:

– Digital data storage infrastructure (repositories and archives, libraries in research centers and governments)

– Open data (digital format for research outputs, open government)

– Open access (open licenses for datasets and libraries, publication in open access journals or open resources

– Greater focus on collaboration

Page 31: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Global focus on infrastructure, services and data sharing policies to optimize data-driven innovation

Page 32: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

International efforts – the Research Data Alliance

RDA Plenary 3

Dublin, Ireland

Page 33: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Slide courtesy of Kim Fortun, RDA Digital Practices in History and Ethnography Interest Group

Are you more likely to get asthma if you live in Mexico City

or Los Angeles?

Page 34: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Making the data available is not enough

• Infrastructure needed to make data useful

– Data is not an asset if you don’t know what it means.

– Data is not useful if you can’t find it.

– Data needs to be in the right form for analysis.

– Data needs to be preserved for results to be reproducible.

Page 35: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Data Sharing Policy

Interoperability Frameworks

Data Discovery Tools

Common Metadata Standards

Digital Object Identifiers

Sustainable Economics

Data Analytics Algorithms

Domain and Institutional Repositories

Data Access and Distribution Policy

Data Citation Standards

Curation Practice and Policy

Auditing, Certification and Reporting Practice

Fran Berman

What kind of infrastructure do we need?

Who is at risk

for asthma?

How do we increase agricultural

productivity?

How accurate is the Standard Model of

Physics?

What will happen in an

earthquake?

Page 36: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Accelerating the development of infrastructure worldwide – the Research Data Alliance

Research Data

Alliance (RDA)

rd-alliance.org:

Global community-

driven

organization whose

mission is to build the

social and technical

bridges

(infrastructure) that

enable data sharing.

Launched: March, 2013

Membership: 3700+ from 110 countries, all sectors, and a broad spectrum of domains

Representation: 2/3 academic sector, 1/3 public, private sectors; ~1200+ participants in the U.S.

Page 37: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

RDA Members come together as

• Working Groups (WG) – 12-18 month efforts to build, adopt, and use specific pieces of

infrastructure

• Interest Groups (IG) – longer-lived discussion forums that spawn Working Groups as specific

pieces of needed infrastructure are identified.

RDA culture focuses on the pragmatic:

• No “build it and they will come” -- Working Groups must incorporate adopters

• Avoid universal “esperanto” infrastructure -- Infrastructure must solve someone’s problem but not necessarily everyone’s problems

• Promote technology-neutrality -- RDA not a platform for specific infrastructure promotion or endorsement

• No “world domination” – partner with other organizations to achieve mutual goals

• Amplify infrastructure impact when possible

RDA Approach: Solve Problems, Make Progress

Page 38: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

• Federated identity management

• Domain repositories

• Preservation e-infrastructure

• Repository platforms for research data

• Big data analytics

• Data Fabric

• Geospatial data

• Biodiversity data integration

• Digital practices in History and Ethnography

• Marine Data harmonization

• RDA/CODATA Materials data, infrastructure and interoperability

• Structural biology

• Community Capability Model

• Development of Cloud computing in the Developing World

• Long tail of research data

• Quality of urban life

• Libraries for Research Data

• Research data needs of Photon and Neutron Science communities

• Archiving multimedia interactive / dynamic data and projects

• RDA/CODATA legal interoperability

• RDA/WDS Publishing Data Cost Recovery for Centers

• Research Data Provenance

• Certification of digital repositories

Data Provider BENEFICIARY Data Consumer

Tech

nic

al

S

OLU

TIO

N

So

cial

Focus of RDA Working and Interest Groups: What kind of infrastructure is needed to solve problems?

Technical solution aimed at data provider

“repository, fabric, data

dissemination, data publication, analytics, infrastructure, data

management

Technical solution aimed at data consumer

“interoperability, harmonization, integration, metadata, knowledge

organization”

Social/organizational solution aimed at data

consumer

“data literacy, education, bridging, community, research practices,

values/ethics”

Social/organizational solution aimed at data

provider

“governance, certification, metrics/evaluation, cost recovery, citation, legal”

TAB Clustering slides adapted from Beth Plale

Tech

nic

al

S

OLU

TIO

N

So

cial

rd-alliance.org

Page 39: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Focus of RDA Working Groups: Build infrastructure and support its adoption by communities of use

• Data type registries

• PID information types

• Practical policy

• RDA/WDS Publishing Data workflows

• Data Citation

• Biosharing Registry

• Metadata standards catalogue

• Data description registry interoperability

• Wheat data interoperability

• RDA/WDS Publishing Data Services

• RDA/CODATA summer schools in data science and cloud computing in the developing world

• RDA/WDS Publishing Data Biometrics

• Metadata standards directory

• Dynamic Data Citation

• Data foundation and terminology

• RDA/WDS publishing data cost recovery for data centers

• Brokering Governance

• Repository Audit and Certification DSA-WDS Partnership

• Standardization of Data Categories and codes

Data Provider BENEFICIARY Data Consumer

Tech

nic

al

S

OLU

TIO

N

So

cial

TAB Clustering slides adapted from Beth Plale

rd-alliance.org

Page 40: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Selected RDA Working Group Recommendations / outputs

Working Group Outputs Impact Adopters

Dynamic Data Citation

Working Group

(data consumer, social

solution)

Dynamic-data citation

methodology that supports

efficient processing of data

and linking from publications

Researchers can reference

precise subsets of changing

data

NERC, ESIP, CLARIN, Virtual

Atomic and Molecular

Data Centre

Data Type Registries

Working Group

(data provider, technical

solution)

Data type model and

prototype registry

Provides machine-readable

and researcher-accessible

registries of data types that

support the accurate use of

data

CNRI, International DOI

Foundation, Materials

Genome Initiative, Deep

Carbon Observatory,

EUDAT

Wheat Data

Interoperability Working

Group

(data consumer, technical

solution)

Common framework for

Wheat Data Terminology to

enable interoperability

between distinct data

collections

Semantically linked terms

describing wheat data so

researchers can share

harvest and related

information between data

sets and communities

Wheat Initiative

Information System, FAO

AIMS, INRA

rd-alliance.org

Page 41: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

RDA Wheat Data Interoperability WG

• Focus: Agricultural productivity to feed the planet is a major societal challenge. What data interoperability can be developed to help address agricultural productivity challenges?

• Solution approach: Make critical data sets for agricultural interoperable by agreeing on a common set of

– Metadata standards

– Data formats

– Vocabularies

– Guidelines for distributing, representing, and linking data

What they’re doing:

• WG building an interactive “cookbook” with recommendations and guidelines on data format and standards

• Developing common wheat-related vocabularies and including them in a human and machine-readable bio-portal

• Building a prototype interoperability framework for specific use cases.

Esther Dzalé Yeumo, France

Richard Fulss, Mexico

Page 42: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

WG enabling more effective agricultural research

Portal image from the RDA Outputs booklet, 2015 https://rd-alliance.org/rda-outputs.html

Adoption and next steps:

• Framework will be

incorporated into the

Wheat Information System

of the Global Wheat

Initiative, Coherence in

Information for

Agricultural Research for

Development (CIARD), etc.

• Subsequent work:

Framework will be adapted

to other crops such as Rice

and Maize.

Page 43: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Community Building: RDA Plenaries

Co-located meetings with RDA Plenaries:

• Data Citation Summit (Force 11, DataCite, etc.)

• BioCADDIE

• PASIG

• World Bank Workshop

• Climate Data Challenge

• Joint Data Preservation Workshop

• Data Seal of Approval Conference

• Earthcube Hackathon

• 3rd EUDAT Conference, etc.

RDA Plenary 1 Gothenburg,

Sweden

RDA Plenary 2

Washington, DC RDA Plenary 3

Dublin, Ireland

RDA Plenary 4

Amsterdam, The Netherlands

RDA Plenary 4

Amsterdam, The Netherlands

RDA Plenary 5 San Diego, California

RDA Plenary 6 Paris, France

RDA Plenary 7

Tokyo, Japan

Page 44: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

• RDA/US Membership: Currently

1200+ members from 45+ states)

• RDA/US Mission: Build RDA

community in the U.S. and

leverage RDA momentum to

advance the U.S. data community

Top 10: California, Washington DC, New York, Maryland, Virginia, Illinois, Indiana, Massachusetts, North Carolina, Texas

RDA/US: Enriching the data community within the U.S.

RDA/US: All U.S. members of RDA

Page 45: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

RDA/US 2016 Programs and Initiatives

Student/Early Career program (NSF, Sloan)

Targeted Outreach Workshops with data-enabled communities and organizations (NSF)

Joint Partnership Agreements between RDA/US and U.S.-based organizations to co-sponsor activities and events that build the RDA community (CENDI, NDS, ICPSR)

Adoption Amplification seed projects for RDA deliverables (NSF, MacArthur)

Partial support for the RDA-NISO Working Group on Privacy Implications of Research Data Sets (Sloan)

Emerging Initiatives: RDA/US Archivist, Publications, RDA Testbed

Hosting and U.S. participant

support for WG Coordination Meetings (NIST)

Development of RDA/US Website and

Communications (NSF)

Planning for U.S.-hosted Plenaries (NSF and sponsors)

International Participant Support for U.S. RDA leadership for non-U.S. Plenaries (NSF)

Page 46: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Lecture 7 Sources (not already on slides)

• Europe vs. Facebook in TechCrunch http://techcrunch.com/2015/12/03/schrems-steps-up-mass-surveillance-fight-against-facebook/

• EU Privacy Laws, https://www.loc.gov/law/help/eu-data-retention-directive/eu.php

• Research Data Alliance, http://rd-alliance.org

• “Genome study predicts DNA of the whole of Iceland,” MIT Technology Review, http://www.technologyreview.com/news/536096/genome-study-predicts-dna-of-the-whole-of-iceland/

• “NextCODE Health Mines deCODE’s Data, and More, to Catalyze Clinical Diagnosis”, PLOS

• “An analysis of the Icelandic Supreme Court judgement on the Health Sector Database Act”, http://www2.law.ed.ac.uk/ahrc/script-ed/issue2/iceland.asp

• Biology Blog, blogs.plos.org/dnascience/2013/11/14/nextcode-health-mines-decodes-data-and-more-to-catalyze-clinical-diagnosis

• “Facebook data privacy case to be heard before European Court,” The Guardian, http://www.theguardian.com/technology/2015/mar/24/facebook-data-privacy-european-union-court-maximillian-schrems

• Europe versus Facebook, http://www.europe-v-facebook.org/EN/en.html/

Page 47: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

April 22 Forum: Where do/could the Candidates Stand? Questions/topics from the Electronic Frontier Foundation

https://www.eff.org/deeplinks/2016/02/putting-digital-rights-spotlight-2016

• EFF Questions/Topics for the Presidential Candidates (Clinton, Cruz, Kasich, Sanders, Trump)

– Each student takes a topic; talks for 7 minutes and answers questions for 5 minutes

– In your talk,

• Describe the topic and the issues in some detail

• Describe the views of the candidates based on whatever information you can obtain or infer.

– Each student also writes a 3-4 page review on the area and the candidates’ views and turns it in on April 22

• Topics (continued on next slide) – each student do only one

– Surveillance (Ethan B.) • What are your views regarding further restrictions on domestic surveillance to finish the job that Congress started with

the USA Freedom Act?

• What are your views regarding the authority for domestic NSA surveillance under executive order 12333, which was announced in 1981 to authorize foreign intelligence collection but revealed in 2013 by a State Department whistleblower to have been frequently cited as a legal basis for secret domestic surveillance?

– Privacy (Courtney T.)

• Where do you stand regarding efforts by intelligence agencies to undermine encryption tools, such as the FBI's demands of Apple?

• What are your views regarding efforts to update the Electronic Communications Privacy Act to reflect recent reforms already adopted in states including California?

Page 48: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

April 22 Forum: Where do/could the Candidates Stand? Questions/topics from the Electronic Frontier Foundation

https://www.eff.org/deeplinks/2016/02/putting-digital-rights-spotlight-2016

• The Trans-Pacific Partnership (TPP) (Evan F.)

– Do you support the proposed TPP agreement? Why or why not?

– The TPP text was finally posted online after more than five years of negotiations. What are your views regarding whether to make trade agreements more transparent and democratic?

• Open access (Aima M.)

– Through various departmental grant programs, our federal government funds billions of dollars’ worth of grants intended to benefit the public. Do you think that federally funded research, educational materials, and cultural works should be made freely available to the public? Why or why not?

– What are your views regarding federal mandates requiring open licensing for federally funded content?

• Copyright (Theo B.)

– Looking forward, one of the most crucial digital freedom issues is: who will control the hardware in your home, in your pocket, and in your own body. Will you work to protect consumers' right to circumvent access controls on products they own and otherwise defend our freedom to tinker, repair, re-use and modify our stuff?

– Section 1201 of the Digital Millennium Copyright Act forbids users from breaking DRM (digital rights management) on works subject to copyright, even if the purpose is a clearly lawful fair use. What are your views regarding reforms to address this issue?

• Patent reform (Aesa K.)

– Would you endorse a comprehensive patent litigation reform bill to protect innovators from patent trolls?

– Would you endorse a venue reform bill, making it more difficult for parties in patent suits to shop for favorable forums?

• Transparency

– What are your views regarding whistleblowers who risk their careers to expose secret information important to the public interest?

– Under your administration, will there be consequences for intelligence officials who mislead Congress in response to direct questions at oversight hearings, or for agencies that misuse technology to cover up crimes, like when the CIA hacked into Congressional files to steal evidence of international human rights abuses?

Page 49: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

April 15: L8 Data Roundtable

• “France violating free speech rights”, USA Today, http://www.news-press.com/story/opinion/2016/03/30/france-violating-free-speech-rights/82425946/ (Jessica J.)

• “Denmark Ranks as Happiest Country; Burundi, Not So Much”, New York Times, http://www.nytimes.com/2016/03/17/world/europe/denmark-world-happiness-report.html?smprod=nytcore-iphone&smid=nytcore-iphone-share&_r=0 (Courtney Y.)

• “The Crypto Wars are Global“, Motherboard, http://motherboard.vice.com/read/the-crypto-wars-are-global (Sri I.)

• “This student put 50 million stolen research articles online. And they’re free,” Washington Post, https://www.washingtonpost.com/local/this-student-put-50-million-stolen-research-articles-online-and-theyre-free/2016/03/30/7714ffb4-eaf7-11e5-b0fd-073d5930a7b7_story.html (TK W.)

• “China Rejects Worry Over Domain Rules,” Zdnet, http://www.zdnet.com/article/china-rejects-worry-over-domain-rules/ (Kienan K-B)

Page 50: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Data Roundtable

Page 51: Data and Society Lecture 8: Data in the Global Landscapebermaf/Data Course 2016/Data in the Global... · Data and Society Lecture 8: Data in the Global Landscape ... April 8 Data

Fran Berman, Data and Society, CSCI 4370/6370

Data Roundtable Today

• “The Shazam Effect”, The Atlantic, http://www.theatlantic.com/magazine/archive/2014/12/the-shazam-effect/382237/?single_page=true (Arun V.)

• “Six Provocations for Big Data”, Oxford Internet Institute Network Conference, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1926431 (Kit H.)

• “Data lake governance: A big data do or die”, TechTarget, http://searchcio.techtarget.com/feature/Data-lake-governance-A-big-data-do-or-die (Aima M.)

• “Water Data is Broken. Fix it”, NY Times Op Ed, http://www.nytimes.com/2016/03/17/opinion/the-water-data-drought.html?_r=0 (Wissal L.)

• “Data Wars Target Omni-present American Voter”, Deutsche Welle, http://www.dw.com/en/data-wars-target-omni-present-american-voter/a-19026212 (Chris P.)