1
Sam Madden 1 , Jane Greenberg 2 3 , Carsten Binnig 4 , Tim Kraska 4, Danny Weitzner 1 , & Sam Grabus 2 3 A Licensing Model and Ecosystem for Data Sharing 1 MIT, 2 Metadata Research Center, 3 Drexel University, 4 Brown University Screen Shot 2017-03-13 at 5.15.39 PM References Greenberg, J., Grabus, S., Hudson, F., Kraska, T., Madden, S., & Bastón, R. (2016). The northeast big data hub: “Enabling seamless data sharing in industry and academia” workshop. Philadelphia, PA: The Northeast Big Data Innovation Hub. Nelson, G. (2015). Practical implications of sharing data: A primer on data privacy, anonymization, and de-identification. Paper presented at SAS Global Forum, Dallas, TX. A Licensing Model and Ecosystem for Data Sharing is supported by NSF award: 1636788 Summary A part of the NSF Big Data regional innovation hub program, the Northeast hub, is addressing key data sharing challenges by: Creating a licensing model for data that facilitates sharing data that is not necessarily open or free between different organizations, Developing a prototype data sharing software platform, ShareDB, which will enforces the terms and restrictions of the developed licenses, and Developing and integrating relevant metadata that will accompany the datasets shared under the different licenses, making them easily searchable and interpretable. To ensure that the developed tools and licenses are useful, the project will form the Northeast Data Sharing Group, comprised of many different stakeholders to make the licensing model widely accepted and usable in many application domains (e.g., health and finance). Timeline Year 1: Requirements gathering. Initial requirements-gathering workshop Year 2: Version 0.1 first draft version of the licensing model. Present model at workshop for suggestions. Identify stakeholders who commit to using the licensing model Year 3: Transition North East Data Sharing Consortium into a non-profit organization Rationale Sharing of data sets can provide tremendous mutual benefits for industry, researchers, and nonprofit organizations. A major obstacle is that data often comes with prohibitive restrictions on how it can be used (e.g., requiring the enforcement of legal terms or other policies, handling data privacy issues, etc.). Additionally, many attempts to share relevant data sets between different stakeholders in industry and academia fail or require a large investment to make data sharing possible. Key Components 1. Data-sharing Licensing Framework/Generator 2. Data-Sharing Platform (enforce licenses) 3. Metadata (Search Licenses & Data) Licensing Framework: Creating a set of possible options that can be easily composed into a standardized data sharing agreement for different domains. Data-Sharing Platform: Develop a prototype software system for data sharing, which seamlessly enforces the restrictions stated in the developed licenses. Metadata: Develop a metadata scheme which leverages the best-of-breed from the vast amount of existing metadata standards. Current Progress @ Brown, Drexel, & MIT Gathering examples of data sharing licenses Parsing essential attributes Natural language processing, term clustering and categorization Initial data sharing platform (DataHub) Added support for access control & authorization Exploring research issues related to anonymization & de- identification of PII In collaboration with:

A Licensing Model and Ecosystem for Data Sharingsmg383/DCJointPIMeetingPoster.pdf · northeast big data hub: “Enabling seamless data sharing in industry and academia” workshop

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Licensing Model and Ecosystem for Data Sharingsmg383/DCJointPIMeetingPoster.pdf · northeast big data hub: “Enabling seamless data sharing in industry and academia” workshop

Sam Madden1, Jane Greenberg2 3, Carsten Binnig4, Tim Kraska4, Danny Weitzner1, & Sam Grabus2 3

A Licensing Model and Ecosystem for Data Sharing

1 MIT, 2 Metadata Research Center, 3 Drexel University, 4 Brown University

ScreenShot2017-03-13at5.15.39PM

ReferencesGreenberg,J.,Grabus,S.,Hudson,F.,Kraska,T.,Madden,S.,&Bastón,R.(2016).The

northeastbigdatahub:“Enablingseamlessdatasharinginindustryandacademia”workshop.Philadelphia,PA:TheNortheastBigDataInnovationHub.

Nelson,G.(2015).Practicalimplicationsofsharingdata:Aprimerondataprivacy,anonymization,andde-identification.PaperpresentedatSASGlobalForum,Dallas,TX.

ALicensingModelandEcosystemforDataSharingissupportedbyNSFaward:1636788

Summary

ApartoftheNSFBigDataregionalinnovationhubprogram,theNortheasthub,isaddressingkeydatasharingchallengesby:

• Creatingalicensingmodelfordatathatfacilitatessharingdatathatisnotnecessarilyopenorfreebetweendifferentorganizations,

• Developingaprototypedatasharingsoftwareplatform,ShareDB,whichwillenforcesthetermsandrestrictionsofthedevelopedlicenses,and

• Developingandintegratingrelevantmetadatathatwillaccompanythedatasetssharedunderthedifferentlicenses,makingthemeasilysearchableandinterpretable.

Toensurethatthedevelopedtoolsandlicensesareuseful,theprojectwillformtheNortheastDataSharingGroup,comprisedofmanydifferentstakeholderstomakethelicensingmodelwidelyacceptedandusableinmanyapplicationdomains(e.g.,healthandfinance).

Timeline• Year1:Requirementsgathering.Initialrequirements-gathering

workshop• Year2:Version 0.1firstdraftversionofthelicensingmodel.

Presentmodelatworkshopforsuggestions.Identifystakeholderswhocommittousingthelicensingmodel

• Year3:TransitionNorthEastDataSharingConsortiumintoanon-profitorganization

Rationale

Sharingofdatasetscanprovidetremendousmutualbenefitsforindustry,researchers,andnonprofitorganizations.Amajorobstacleisthatdataoftencomeswithprohibitiverestrictionsonhowitcanbeused(e.g.,requiringtheenforcementoflegaltermsorotherpolicies,handlingdataprivacyissues,etc.).Additionally,manyattemptstosharerelevantdatasetsbetweendifferentstakeholdersinindustryandacademiafailorrequirealargeinvestmenttomakedatasharingpossible.

KeyComponents1. Data-sharingLicensingFramework/Generator2. Data-SharingPlatform(enforcelicenses)3. Metadata(SearchLicenses&Data)

LicensingFramework:Creatingasetofpossibleoptionsthatcanbeeasilycomposedintoastandardizeddatasharingagreementfordifferentdomains.

Data-SharingPlatform:Developaprototypesoftwaresystemfordatasharing,whichseamlesslyenforcestherestrictionsstatedinthedevelopedlicenses.

Metadata:Developametadataschemewhichleveragesthebest-of-breedfromthevastamountofexistingmetadatastandards.

CurrentProgress@Brown,Drexel,&MIT

• Gatheringexamplesofdatasharinglicenses

• Parsingessentialattributes

• Naturallanguageprocessing,termclusteringandcategorization

• Initialdatasharingplatform(DataHub)

• Addedsupportforaccesscontrol&authorization

• Exploringresearchissuesrelatedtoanonymization&de-identificationofPII

Incollaborationwith: