Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ARL Fall Forum
Facilitating New Forms of Discovery 11 October 2013
11:15 a.m.-12:30 p.m.
Larry Lannom Director of Information Services and Vice President
Research Data AllianceCorporation for National Research Initiatives
Corporation for National Research Initiatives RESEARCH DATA ALLIANCE
DAITF: Enabling
Technologies
21 March 2012
Larry LannomCorporation for National Research Initiatives
http://www.cnri.reston.va.us/http://www.handle.net/
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data Curators, End Users, Applications
ID 010001010 010011011 010101001 101010000
ID 010001010 010011011 010101001 101010000
ID ID
ID ID
ID ID ID
ID
010001010 010011011 010101001 101010000
ID
ID
Datasets
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data Curators, End Users, Applications
0100 0101..
ID
ID
ID
ID
ID
ID
ID
ID
ID 0100 0101..
ID
ID
ID
ID
0100 0101..
ID
0100 0101..
ID
0100 0101..
ID ID
ID
Datasets Accessed via Repositories
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data Curators, End Users, Applications
D
Enabling
scovery
Technologies
0100 0101..
ID
ID
ID
ID
ID
ID
ID
ID
ID 0100 0101..
ID
ID
ID
ID
0100 0101..
ID
0100 0101..
ID
0100 0101..
ID ID
ID
Datasets Accessed via Repositories
i
Corporation for National Research Initiatives
Discovery & Evaluation • Search
– Metadata registries • Subject • Parties • Dates • Etc
– Crawlers – more ad hoc • Citation
– Formats • Permissions
– Can I see it? – Can I use it?
• Trust
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data Curators, End Users, Applications
Discovery
Access
Enabling Technologies
0100 0101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets Accessed via Repositories
0100 0101..
ID
ID
ID
ID
0100 0101..
ID
0100 0101..
ID
0100 0101..
ID ID
ID
Corporation for National Research Initiatives
Access
• ID / reference resolution – Go from ‘subject search’ to ‘known item’ search
• Access Protocols – How to get it – Protocol registries – Bootstrapping into new protocols
• Authentication & Authorization – Proof of identity (tradeoff: usability vs security) – Permissions: with the object or in some external system?
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data CuratorEnd Users, Applications
D
s, Interpretation
iscovery
Access
Enabling Technologies
0100 0101..
ID
ID
ID
ID
ID
ID
ID
ID
ID 0100 0101..
ID
ID
ID
ID
0100 0101..
ID
0100 0101..
ID
0100 0101..
ID ID
ID
Datasets Accessed via Repositories
Corporation for National Research Initiatives
Interpretation • Registries
– Schemas – Vocabularies – Formats – Available services – Useful client-side tools
• Trust – Who did this? – Who owns this?
• Provenance – Data Source – Processing steps – Computing environment
• what is needed to trust the numbers? • Domain specific?
Corporation for National Research Initiatives
Enabling Technologies
Scientists, Data CuratorEnd Users, Application
D
s, s Interpretation
iscovery
Access
Reuse
Technologies Enabling
0100 0101..
ID
ID
ID
ID
ID
ID
ID
ID
ID
Datasets Accessed via Repositories
0100 0101..
ID
ID
ID
ID
0100 0101..
ID
0100 0101..
ID
0100 0101..
ID ID
ID
Corporation for National Research Initiatives
Reuse
• Everything from Interpretation slide + Permissions – Example from BOF: I need to understand a data set for peer
review but that doesn’t give me permission to use the data • Validation • Education & Training
– Integrate ‘live’ data into education and training • Repurpose data
Corporation for National Research Initiatives
DAITF Roles?
• Bring good people together on a regular basis to discuss these issues
• Get agreement on vocabulary for discussing data access and interoperability?
• Working groups on specific topics – Prototyping specific interoperability issues / domains
• Create high-level framework, ala OAIS? Multiple frameworks?
• Guides to Registries and Best Practices
Research Data Alliance Plenary 2 Update
Dr. Francine BermanChair, RDA/US
Hamilton Distinguished Chair in Computer ScienceRensselaer Polytechnic Institute
15 RDA Plenary 2 -- September 16-18, Washington D.C. -- 3 days of Peace, Love and Data
§ RDA Plenary 2 § 368 participants from 22
countries and all sectors
§ All-hands stakeholder talks and RDA working meeting
§ Data Citation Summit convened by DataCite, FORCE11,CODATA/ICST, ESIP, DCC, etc. to create a common agenda
§ ~5000 tweets over 3 days
16 RDA Plenary 2 -- Stakeholders and Invited Speakers
§ Keynotes: § Tom Kalil, OSTP (introduced by Farnam
Jahanian, NSF) § John Wilbanks, Chief Commons Officer,
Sage Bionetworks § Carole Palmer, Professor, UIUC School of
Library and Information Science
§ Global Partnerships Panel: § Chris Greer, NIST § Mark Suskin, NSF CISE/ACI § Kostas Glinos, European Commission § Clare McLaughlin, Australian Embassy § Mike Stebbins, OSTP
§ Affiliate Organization Panel: § Sara Graves, CODATA § Curt Tilmes, ESIP § Phil Archer, W3C § Jan Brase, DataCite
Sara!
17 RDA Community Current Status: ~1300 participants from 50+ countries 1. Albania 18. France 36. Portugal 2. Australia 19. Germany 37. Russian 3. Austria 20. Greece Federation 4. Bangladesh 21. Iceland 38. Rwanda 5. Belgium 22. India 39. Serbia 6. Bolivia 23. Iran 40. Singapore 7. Botswana 24. Ireland 41. Slovenia 8. Brazil 25. Ireland 42. South Africa 9. Bulgaria {Rep} 43. South Korea 10. Canada 26. Italy 44. Spain 11. China 27. Japan 45. Sweden 12. Congo 28. Krygrystan 46. Switzerland
{Democratic 29. Kuwait 47. Taiwan Rep} 30. Mexico 48. Turkey
13. Costa Rica 31. Netherlands 49. United Arab 14. Czech Republic 32. New Zealand Emirates 15. Denmark 33. Norway 50. United Kingdom 16. Estonia 34. Palestine 51. United States 17. Finland 35. Poland 52. Vatican City
53. Venezuela
RDA by Sector
Academics (66%) Private Sector (10%) Public Sector (17%) Unknown (7%)
Fran Berman
18 RDA Community Building Momentum
§ Growth in number and scope of Interest Groups and Working Groups § New: BOFs for groups as precursor to
Interest Groups § Groups beginning to “self-monitor” to promote
concrete deliverables to be used and adopted § Increasing interest in more interaction and
“connective tissue” between groups
§ Pressing To-Dos before Plenary 3: § Develop an RDA policy for IP that comes up
in Interest and Working Groups § Determine the form of RDA deliverables and
what’s needed in terms of an “RDA archive”
Groups that Met at the RDA Plenary
§ Birds-of-a-Feather § Interest Groups § Economic Models and Infrastructure for
§ Linked Data § Agricultural Data Federated Materials Data § Chemical Safety Data § Big Data Analytics Management § Education and Skills § Data Brokering § Engagement
Development in Data § Certification of Trusted Preservation e-Intensive Science §Repositories (joint with Infrastructure
§ Libraries and Research ICSU-WDS) § Legal Interoperability (joint Data § Long tail of Research Data with CODATA)
§ Cloud Computing and § Marine Data Global Registry of Trusted Data Analysis Training for §Harmonization Data Repositories and the Developing World § Community Capability Services
Working Groups Model § § Digital Practices in History § Data Publishing (joint with and Ethnography
§ Data Type Registries WDS) § Metadata Standards § Toxicogenomics § Data Citation § Practical Policy Interoperability Harmonization Summit § Persistent Identifier Types § Research Data § DataCite,FORCE11,§ Data Foundations and Provenance CODATA/ICST,
Terminology § Data Citation ESIP, DCC, etc. § Data Categories and § Metadata
Codes
19 BOLD = new since last Plenary
§ Organizational Assembly = and many more after the Plenary
Organizational Members § Organizational Affiliation under discussion
(subscription) + Organizational with CODATA, WDS and others
Affiliates (MOUs). § Next 6 months (before Plenary 3)
§ Organizational Advisory § Firm up model for Affiliates (how Board will represent many, how substantive should Organizational Assembly. the interaction be?)
§ Current Status: § Complete creation of legal entity to host subscriptions for
§ Organizational Membership under discussion with Microsoft, IBM, ANDS, Organizational Members Australian Antarctic Data Center, Intersect, Terrestrial Ecosystems Research Network, § Elect Organizational Advisory CSC – IT, Center for Science Ltd., Oracle, Board at Plenary 3 STFC, CNRI, STM, EUDAT, Barcelona Supercomputer Center, Columbia University Libraries / Information Services,
RDA Organizational Partners 20 New RDA constituencies / stakeholders
RDA Constituent Groups Coming Together
21 New Position: RDA recruiting for full-time Secretary- General
RDA Colloquium (National Research Agencies and Funders)
RDA Membership
RDA Council (overarching leadership)
Technical Advisory Board
(Technical oversight)
Secretary-General and Secretariat
(Administration and Operations)
Organizational Advisory Boards and
Organizational Assembly
(Organizational partnerships and
guidance)
Working Groups and Interest Groups(impact - focused infrastructure)
22 New RDA Leadership since Plenary 1
§ Council: § Technical Advisory Board § Patrick Cocquet (France) § Peter Wittenburg § Doris Wedlich (Germany) § Francoise Genova § Kaye Raseroka (Botswana) § Andrew Treloar § Tony Hey (US) § Bill Michenor § Ross Wilkinson (AU) § Beth Plale, Chair § John Wood (UK), co-Chair § 6 new TAB members to be § Fran Berman (US), co-Chair elected this month (14
candidates) • Organizational Assembly § 12th TAB member to be
§ Juan Bicarregui, co-Chair appointed by Council (for § Leif Laaksonen, co-Chair balance)
23
Next Plenaries (2X a year)
§ Plenary 3 will be in Dublin March 26-28 in 2014, hosted by Australia and Ireland
§ Plenary 4 will be in the Netherlands – late September in 2014
§ Plenary 5 or 6 likely back in the U.S. (west coast?)
Data Type Registries (DTR)
Co-Chairs Larry Lannom: CNRI Daan Broeder: MPI
September 2013
RDA Plenary 2 Washington, DC
Research Data Alliance Corporation for National Research Initiatives
Goal: Interoperable Set of Data Type Registries
• Data Types – Characterize data structures at multiple levels of granularity – Formats are just part of the story – Optimize interactions between data producers & consumers by
having types defined and associated with the data they describe – Types should be standardized, discoverable, and unique
• Type Registries – Each type registered with unique identifier – Common data model and expression – Associate with services, tools, format registries, etc. – Common API for machine consumption
Research Data Alliance Corporation for National Research Initiatives
Schedule • 3/2013 – 9/2013
– Gathering use cases – Investigating other work in the area – First drafts of data model and functional specs for a type registry
• 10/2013 – 12/2013 – Refine data model and functional specs – Deploy initial prototype
• 1/2014 – 5/2014 – Finalize data model and functional specs – Deploy functional type registry for PID types – Release turnkey registry conforming to functional specs
Research Data Alliance Corporation for National Research Initiatives
DTR Use Cases • Broad Functional Classification
– Repos hold widely varying levels of data & metadata – High-level functional classification of the identified object needed to make sense of what is
available, e.g., data object, metadata, repo description, contact info, etc.
• Simple License Information via PID Resolution – Data set access conditions cannot be predicted based on ID – For DataCite DOIs, a handle/type/value triple could be used to provide access information,
probably through a level of indirection, resulting in a pop-up or intervening page or open linked data
• Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects
– Using data acquisition as an example • Determine object type you are trying to build • Consult registry to index into an ontology to dynamically define required and optional properties • Does the input data have what is needed?
• Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation
– Distinguish pointers to objects from pointers to metadata from pointers to services – Enable complex client interactions as opposed to simple one-to-one re-direction
Research Data Alliance Corporation for National Research Initiatives
One Use of Type Registries
Users Federated Set of Type Registries
Typed Data
ID
Type
Payload
ID
Type
Payload
ID
Type
Payload
I
Type
Payload
ID
Type
Payload
ID
Type
Payload
D
10100 11010 101….
Visualiza6on I Agree
Terms:…
Rights
Services Data Processing Data Set
Dissemina6on
3
1 4
2
4
1 Client (process or people) encounter unknown type
2 Resolved to Type Registry 3 Response includes type definitions, relationships, properties, and possibly service pointers.
Response can be used locally for processing, or, optionally
4 Typed data or reference to typed data can be sent to service provider