A Licensing Model and Ecosystem for Data Sharingsmg383/DCJointPIMeetingPoster.pdf · northeast big...

Preview:

Citation preview

Sam Madden1, Jane Greenberg2 3, Carsten Binnig4, Tim Kraska4, Danny Weitzner1, & Sam Grabus2 3

A Licensing Model and Ecosystem for Data Sharing

1 MIT, 2 Metadata Research Center, 3 Drexel University, 4 Brown University

ScreenShot2017-03-13at5.15.39PM

ReferencesGreenberg,J.,Grabus,S.,Hudson,F.,Kraska,T.,Madden,S.,&Bastón,R.(2016).The

northeastbigdatahub:“Enablingseamlessdatasharinginindustryandacademia”workshop.Philadelphia,PA:TheNortheastBigDataInnovationHub.

Nelson,G.(2015).Practicalimplicationsofsharingdata:Aprimerondataprivacy,anonymization,andde-identification.PaperpresentedatSASGlobalForum,Dallas,TX.

ALicensingModelandEcosystemforDataSharingissupportedbyNSFaward:1636788

Summary

ApartoftheNSFBigDataregionalinnovationhubprogram,theNortheasthub,isaddressingkeydatasharingchallengesby:

• Creatingalicensingmodelfordatathatfacilitatessharingdatathatisnotnecessarilyopenorfreebetweendifferentorganizations,

• Developingaprototypedatasharingsoftwareplatform,ShareDB,whichwillenforcesthetermsandrestrictionsofthedevelopedlicenses,and

• Developingandintegratingrelevantmetadatathatwillaccompanythedatasetssharedunderthedifferentlicenses,makingthemeasilysearchableandinterpretable.

Toensurethatthedevelopedtoolsandlicensesareuseful,theprojectwillformtheNortheastDataSharingGroup,comprisedofmanydifferentstakeholderstomakethelicensingmodelwidelyacceptedandusableinmanyapplicationdomains(e.g.,healthandfinance).

Timeline• Year1:Requirementsgathering.Initialrequirements-gathering

workshop• Year2:Version 0.1firstdraftversionofthelicensingmodel.

Presentmodelatworkshopforsuggestions.Identifystakeholderswhocommittousingthelicensingmodel

• Year3:TransitionNorthEastDataSharingConsortiumintoanon-profitorganization

Rationale

Sharingofdatasetscanprovidetremendousmutualbenefitsforindustry,researchers,andnonprofitorganizations.Amajorobstacleisthatdataoftencomeswithprohibitiverestrictionsonhowitcanbeused(e.g.,requiringtheenforcementoflegaltermsorotherpolicies,handlingdataprivacyissues,etc.).Additionally,manyattemptstosharerelevantdatasetsbetweendifferentstakeholdersinindustryandacademiafailorrequirealargeinvestmenttomakedatasharingpossible.

KeyComponents1. Data-sharingLicensingFramework/Generator2. Data-SharingPlatform(enforcelicenses)3. Metadata(SearchLicenses&Data)

LicensingFramework:Creatingasetofpossibleoptionsthatcanbeeasilycomposedintoastandardizeddatasharingagreementfordifferentdomains.

Data-SharingPlatform:Developaprototypesoftwaresystemfordatasharing,whichseamlesslyenforcestherestrictionsstatedinthedevelopedlicenses.

Metadata:Developametadataschemewhichleveragesthebest-of-breedfromthevastamountofexistingmetadatastandards.

CurrentProgress@Brown,Drexel,&MIT

• Gatheringexamplesofdatasharinglicenses

• Parsingessentialattributes

• Naturallanguageprocessing,termclusteringandcategorization

• Initialdatasharingplatform(DataHub)

• Addedsupportforaccesscontrol&authorization

• Exploringresearchissuesrelatedtoanonymization&de-identificationofPII

Incollaborationwith:

Recommended