29
ARL Fall Forum Facilitating New Forms of Discovery 11 October 2013 11:15 a.m.-12:30 p.m. Larry Lannom Director of Information Services and Vice President Research Data Alliance Corporation for National Research Initiatives Corporation for National Research Initiatives RESEARCH DATA ALLIANCE

ARL Fall Forum - Association of Research Libraries · Terrestrial Ecosystems Research Network, ! Elect Organizational Advisory . CSC – IT, Center for Science Ltd., Oracle, Board

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

ARL Fall Forum

Facilitating New Forms of Discovery 11 October 2013

11:15 a.m.-12:30 p.m.

Larry Lannom Director of Information Services and Vice President

Research Data AllianceCorporation for National Research Initiatives

Corporation for National Research Initiatives RESEARCH DATA ALLIANCE

DAITF: Enabling

Technologies

21 March 2012

Larry LannomCorporation for National Research Initiatives

http://www.cnri.reston.va.us/http://www.handle.net/

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data Curators, End Users, Applications

ID 010001010 010011011 010101001 101010000

ID 010001010 010011011 010101001 101010000

ID ID

ID ID

ID ID ID

ID

010001010 010011011 010101001 101010000

ID

ID

Datasets

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data Curators, End Users, Applications

0100 0101..

ID

ID

ID

ID

ID

ID

ID

ID

ID 0100 0101..

ID

ID

ID

ID

0100 0101..

ID

0100 0101..

ID

0100 0101..

ID ID

ID

Datasets Accessed via Repositories

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data Curators, End Users, Applications

D

Enabling

scovery

Technologies

0100 0101..

ID

ID

ID

ID

ID

ID

ID

ID

ID 0100 0101..

ID

ID

ID

ID

0100 0101..

ID

0100 0101..

ID

0100 0101..

ID ID

ID

Datasets Accessed via Repositories

i

Corporation  for  National  Research  Initiatives  

Discovery & Evaluation •  Search

–  Metadata registries •  Subject •  Parties •  Dates •  Etc

–  Crawlers – more ad hoc •  Citation

–  Formats •  Permissions

–  Can I see it? –  Can I use it?

•  Trust

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data Curators, End Users, Applications

Discovery

Access

Enabling Technologies

0100 0101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets Accessed via Repositories

0100 0101..

ID

ID

ID

ID

0100 0101..

ID

0100 0101..

ID

0100 0101..

ID ID

ID

Corporation  for  National  Research  Initiatives  

Access

•  ID / reference resolution –  Go from ‘subject search’ to ‘known item’ search

•  Access Protocols –  How to get it –  Protocol registries –  Bootstrapping into new protocols

•  Authentication & Authorization –  Proof of identity (tradeoff: usability vs security) –  Permissions: with the object or in some external system?

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data CuratorEnd Users, Applications

D

s, Interpretation

iscovery

Access

Enabling Technologies

0100 0101..

ID

ID

ID

ID

ID

ID

ID

ID

ID 0100 0101..

ID

ID

ID

ID

0100 0101..

ID

0100 0101..

ID

0100 0101..

ID ID

ID

Datasets Accessed via Repositories

Corporation  for  National  Research  Initiatives  

Interpretation •  Registries

–  Schemas –  Vocabularies –  Formats –  Available services –  Useful client-side tools

•  Trust –  Who did this? –  Who owns this?

•  Provenance –  Data Source –  Processing steps –  Computing environment

•  what is needed to trust the numbers? •  Domain specific?

Corporation  for  National  Research  Initiatives  

Enabling Technologies

Scientists, Data CuratorEnd Users, Application

D

s, s Interpretation

iscovery

Access

Reuse

Technologies Enabling

0100 0101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets Accessed via Repositories

0100 0101..

ID

ID

ID

ID

0100 0101..

ID

0100 0101..

ID

0100 0101..

ID ID

ID

Corporation  for  National  Research  Initiatives  

Reuse

•  Everything from Interpretation slide + Permissions –  Example from BOF: I need to understand a data set for peer

review but that doesn’t give me permission to use the data •  Validation •  Education & Training

–  Integrate ‘live’ data into education and training •  Repurpose data

Corporation  for  National  Research  Initiatives  

DAITF Roles?

•  Bring good people together on a regular basis to discuss these issues

•  Get agreement on vocabulary for discussing data access and interoperability?

•  Working groups on specific topics –  Prototyping specific interoperability issues / domains

•  Create high-level framework, ala OAIS? Multiple frameworks?

•  Guides to Registries and Best Practices

Research Data Alliance Plenary 2 Update

Dr. Francine BermanChair, RDA/US

Hamilton Distinguished Chair in Computer ScienceRensselaer Polytechnic Institute

15 RDA Plenary 2 -- September 16-18, Washington D.C. -- 3 days of Peace, Love and Data

§  RDA Plenary 2 §  368 participants from 22

countries and all sectors

§  All-hands stakeholder talks and RDA working meeting

§  Data Citation Summit convened by DataCite, FORCE11,CODATA/ICST, ESIP, DCC, etc. to create a common agenda

§  ~5000 tweets over 3 days

16 RDA Plenary 2 -- Stakeholders and Invited Speakers

§  Keynotes: §  Tom Kalil, OSTP (introduced by Farnam

Jahanian, NSF) §  John Wilbanks, Chief Commons Officer,

Sage Bionetworks §  Carole Palmer, Professor, UIUC School of

Library and Information Science

§  Global Partnerships Panel: §  Chris Greer, NIST §  Mark Suskin, NSF CISE/ACI §  Kostas Glinos, European Commission §  Clare McLaughlin, Australian Embassy §  Mike Stebbins, OSTP

§  Affiliate Organization Panel: §  Sara Graves, CODATA §  Curt Tilmes, ESIP §  Phil Archer, W3C §  Jan Brase, DataCite

Sara!

17 RDA Community Current Status: ~1300 participants from 50+ countries 1.  Albania 18. France 36. Portugal 2.  Australia 19. Germany 37. Russian 3.  Austria 20. Greece Federation 4.  Bangladesh 21.  Iceland 38. Rwanda 5.  Belgium 22.  India 39. Serbia 6.  Bolivia 23.  Iran 40. Singapore 7.  Botswana 24.  Ireland 41. Slovenia 8.  Brazil 25.  Ireland 42. South Africa 9.  Bulgaria {Rep} 43. South Korea 10. Canada 26.  Italy 44. Spain 11. China 27.  Japan 45. Sweden 12. Congo 28. Krygrystan 46. Switzerland

{Democratic 29. Kuwait 47. Taiwan Rep} 30. Mexico 48. Turkey

13. Costa Rica 31. Netherlands 49. United Arab 14. Czech Republic 32. New Zealand Emirates 15. Denmark 33. Norway 50. United Kingdom 16. Estonia 34. Palestine 51. United States 17. Finland 35. Poland 52. Vatican City

53. Venezuela

RDA by Sector

Academics (66%) Private Sector (10%) Public Sector (17%) Unknown (7%)

Fran Berman

18 RDA Community Building Momentum

§  Growth in number and scope of Interest Groups and Working Groups §  New: BOFs for groups as precursor to

Interest Groups §  Groups beginning to “self-monitor” to promote

concrete deliverables to be used and adopted §  Increasing interest in more interaction and

“connective tissue” between groups

§  Pressing To-Dos before Plenary 3: §  Develop an RDA policy for IP that comes up

in Interest and Working Groups §  Determine the form of RDA deliverables and

what’s needed in terms of an “RDA archive”

Groups that Met at the RDA Plenary

§  Birds-of-a-Feather §  Interest Groups §  Economic Models and Infrastructure for

§  Linked Data §  Agricultural Data Federated Materials Data §  Chemical Safety Data §  Big Data Analytics Management §  Education and Skills §  Data Brokering §  Engagement

Development in Data §  Certification of Trusted   Preservation e-Intensive Science §Repositories (joint with Infrastructure

§  Libraries and Research ICSU-WDS) §  Legal Interoperability (joint Data §  Long tail of Research Data with CODATA)

§  Cloud Computing and §  Marine Data   Global Registry of Trusted Data Analysis Training for §Harmonization Data Repositories and the Developing World §  Community Capability Services

  Working Groups Model § §  Digital Practices in History §  Data Publishing (joint with and Ethnography

§  Data Type Registries WDS) §  Metadata Standards §  Toxicogenomics §  Data Citation §  Practical Policy Interoperability Harmonization Summit §  Persistent Identifier Types §  Research Data §  DataCite,FORCE11,§  Data Foundations and Provenance CODATA/ICST,

Terminology §  Data Citation ESIP, DCC, etc. §  Data Categories and §  Metadata

Codes

19 BOLD = new since last Plenary

§  Organizational Assembly = and many more after the Plenary

Organizational Members §  Organizational Affiliation under discussion

(subscription) + Organizational with CODATA, WDS and others

Affiliates (MOUs). §  Next 6 months (before Plenary 3)

§  Organizational Advisory §  Firm up model for Affiliates (how Board will represent many, how substantive should Organizational Assembly. the interaction be?)

§  Current Status: §  Complete creation of legal entity to host subscriptions for

§  Organizational Membership under discussion with Microsoft, IBM, ANDS, Organizational Members Australian Antarctic Data Center, Intersect, Terrestrial Ecosystems Research Network, §  Elect Organizational Advisory CSC – IT, Center for Science Ltd., Oracle, Board at Plenary 3 STFC, CNRI, STM, EUDAT, Barcelona Supercomputer Center, Columbia University Libraries / Information Services,

RDA Organizational Partners 20 New RDA constituencies / stakeholders

RDA Constituent Groups Coming Together

21 New Position: RDA recruiting for full-time Secretary- General

RDA Colloquium (National Research Agencies and Funders)

RDA Membership

RDA Council (overarching leadership)

Technical Advisory Board

(Technical oversight)

Secretary-General and Secretariat

(Administration and Operations)

Organizational Advisory Boards and

Organizational Assembly

(Organizational partnerships and

guidance)

Working Groups and Interest Groups(impact - focused infrastructure)

22 New RDA Leadership since Plenary 1

§  Council: §  Technical Advisory Board §  Patrick Cocquet (France) §  Peter Wittenburg §  Doris Wedlich (Germany) §  Francoise Genova §  Kaye Raseroka (Botswana) §  Andrew Treloar §  Tony Hey (US) §  Bill Michenor §  Ross Wilkinson (AU) §  Beth Plale, Chair §  John Wood (UK), co-Chair §  6 new TAB members to be §  Fran Berman (US), co-Chair elected this month (14

candidates) •  Organizational Assembly §  12th TAB member to be

§  Juan Bicarregui, co-Chair appointed by Council (for §  Leif Laaksonen, co-Chair balance)

23

Next Plenaries (2X a year)

§  Plenary 3 will be in Dublin March 26-28 in 2014, hosted by Australia and Ireland

§  Plenary 4 will be in the Netherlands – late September in 2014

§  Plenary 5 or 6 likely back in the U.S. (west coast?)

Info:[email protected]

Fran Berman

Data Type Registries (DTR)

Co-Chairs Larry Lannom: CNRI Daan Broeder: MPI

September 2013

RDA Plenary 2 Washington, DC

Research  Data  Alliance                                                                                                                                                                                                                                Corporation  for  National  Research  Initiatives    

Goal: Interoperable Set of Data Type Registries

•  Data Types –  Characterize data structures at multiple levels of granularity –  Formats are just part of the story –  Optimize interactions between data producers & consumers by

having types defined and associated with the data they describe –  Types should be standardized, discoverable, and unique

•  Type Registries –  Each type registered with unique identifier –  Common data model and expression –  Associate with services, tools, format registries, etc. –  Common API for machine consumption

Research  Data  Alliance                                                                                                                                                                                                                                Corporation  for  National  Research  Initiatives    

Schedule •  3/2013 – 9/2013

–  Gathering use cases –  Investigating other work in the area –  First drafts of data model and functional specs for a type registry

•  10/2013 – 12/2013 –  Refine data model and functional specs –  Deploy initial prototype

•  1/2014 – 5/2014  –  Finalize data model and functional specs –  Deploy functional type registry for PID types –  Release turnkey registry conforming to functional specs

Research  Data  Alliance                                                                                                                                                                                                                                Corporation  for  National  Research  Initiatives    

DTR Use Cases •  Broad Functional Classification

–  Repos hold widely varying levels of data & metadata –  High-level functional classification of the identified object needed to make sense of what is

available, e.g., data object, metadata, repo description, contact info, etc.

•  Simple License Information via PID Resolution –  Data set access conditions cannot be predicted based on ID –  For DataCite DOIs, a handle/type/value triple could be used to provide access information,

probably through a level of indirection, resulting in a pop-up or intervening page or open linked data

•  Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects

–  Using data acquisition as an example •  Determine object type you are trying to build •  Consult registry to index into an ontology to dynamically define required and optional properties •  Does the input data have what is needed?

•  Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation

–  Distinguish pointers to objects from pointers to metadata from pointers to services –  Enable complex client interactions as opposed to simple one-to-one re-direction

Research  Data  Alliance                                                                                                                                                                                                                                Corporation  for  National  Research  Initiatives    

One Use of Type Registries

Users Federated Set of Type Registries

Typed Data

ID  

Type  

Payload  

ID  

Type  

Payload  

ID  

Type  

Payload  

I

Type  

Payload  

ID  

Type  

Payload  

ID  

Type  

Payload  

D  

10100  11010  101….  

Visualiza6on  I  Agree  

Terms:…  

Rights  

Services Data  Processing  Data  Set  

Dissemina6on  

3  

1  4  

2  

4  

1   Client (process or people) encounter unknown type

2   Resolved to Type Registry 3   Response includes type definitions, relationships, properties, and possibly service pointers.

Response can be used locally for processing, or, optionally

4   Typed data or reference to typed data can be sent to service provider