PowerPoint Presentation
Section 3:Commons:Lessons Learned, current stateThe Big Data to Knowledge (BD2K) Guide to the Fundamentals of Data Science
Vivien BonazziSenior Advisor for Data Science & the Data Commons National Institutes of Health, Bethesda
February 3, 2017
Vivien Bonazzi
Leads the Data Commons efforts within the NIH.Serves on the NIH Big Data to Knowledge (BD2K) executive committeeDr. Bonazzi received a B.Sc. in Medical Laboratory Science from the University of Canberra, Australia, a M.Sc. (prelim) in Pharmacology from the University of Melbourne, Australia and a Ph.D. in Molecular Pharmacology and Computational Biology also from the University of Melbourne. Served as a Program Director for the computational biology and bioinformatics program for National Human Genome Research Institute (NHGRI)
Was part of the Human Microbiome Project (HMP) a trans-NIH Common Fund Initiative. She was responsible for the bioinformatics & computational aspect of the project as well as managing several of the computational tools awards. She has held positions as the R&D Director for Bioinformatics at Invitrogen and Director of Gene Discovery at Celera Genomics where she was part of the team that sequenced and annotated the human, mouse and drosophila genomes.
2
Lets Talk About Biomedical Big Data
What Makes Big Data Big?
VOLUMEVELOCITYVARIETYVERACITY
Its a signal of the coming Digital Economy DATA has VALUEDATA is CENTRAL to the Digital EconomyBut its more than this..
An economy characterized by using data to gain a business advantage
(yes, institutions are a business)
Organizations that are not born digital will be at a disadvantage in the new economy
Organizations will be defined by their digital assets
Scientific digital assets Data Software Workflows Documentation Journal Articles
The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise
Make data
The currency of an organization
Usable in a digital ecosystems Data Commons
The problem with biomedical data
Digital assets includes Data
Challenges Biomedical Data
The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
The ProblemWith Biomedical DATA
https://www.youtube.com/watch?v=N2zK3sAtr-4
WhatsChanging?
FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help change the culture
Currencies dont exist in a vacuum
Buy and sell Goods
15
We also need a digital ecosystem that allows transactions to occur on FAIR data at scale
The Data Commons is a platform that fosters the development of a digital ecosystem
The Data Commons platform that fosters development of a digital ecosystem
Treats products of research data, software, methods, papers etc as digital asset (object)
Digital objects need to conform to FAIR principles
Digital objects exist in a shared virtual space- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support them
The Data Commons is a platform? that fosters the development of a digital ecosystem
A nascent platform19
A platform is a plug and play model that allows multiple participants (producers and consumers) to connect to it, interact with each other and create value
Sangeet Paul Choudary Platform Scale
A lot of what see today uses a platform approach
Sangeet Paul Choudary Platform Scale
Platforms that utilize data as a central currency enable transactions between producers and consumers21
The goal of the a Data Commons Platform is to enable interactions between producers and consumersSangeet Paul Choudary Platform Scale
Producers of digital objects - data, tools, workflows - used by consumersThe Platform enables these transactions Accommodates bioinformatics and non bioinformatics users22
To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stackto help visualize the concept
Framework helps visualize the concept of the platform23
Sangeet Paul Choudary Platform Scale
Platforms have 3 layers
NIH Data Commons - Platform Stackhttps://datascience.nih.gov/commons
TechnologyTechnologyDataNetwork/market place
https://datascience.nih.gov/commonsNIH Data Commons - Platform Stack
Initial PhaseUnique digital object identifiers of resolvable to original authoritative sourceMachine readableA minimal set of searchable metadata Clear access rules (especially important for human subjects data)An entry (with metadata) in one or more indices
Future PhasesStandard, community based unique digital object identifiers Conform to community approved standard metadata and ontologies for enhanced searchingDigital objects accessible via open standard APIsNIH Data Commons: Digital Asset Compliance Making things FAIR
27
Data Commons Platform drives digital ecosystem
The NIH Data Commons Pilot
The NIH Data Commons Pilot
Co-location of large and/or highly utilized NIH funded data withstorage and computing infrastructure + Commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community.
Investigators will be able to collaborate and share digital objects within this environment and connect with others
Other Data Commons
An NIH Wide Data Commons Pilot - Example
34
An NIH Wide Data Commons Pilot - Example
Indexing
An NIH Wide Data Commons Pilot - Example
Indexing
An NIH Wide Data Commons Pilot - Example
IndexingAuthorization /authentication layer
Digital Ecosystems
38
ConsiderationsMetrics Understanding and accounting of data usage patternsCost Cloud Storage Pay for use cloud compute (NIH credits pilot) Indirect costs for cloudHybrid Clouds Institution (private) and commercial (public) cloudsManaging Open vs Controlled access data Auth: single sign on - dreams/nightmares?Archive vs Working and versioning Copies of dataInteroperability with other Commons (clouds)
Standards Metadata, UIDs, APIsDiscoverability Finding digital objects across cloudsInterfaces For users with different needs and capabilitiesConsent Re-consenting dataPolicies Data sharing policies that are useful and effective Keep pace with use of technology (e.g. dbGAP data in the Cloud) Incentives Access to, and shareability of FAIR Data as part of NIH grant review criteriaGovernance Community involvement in governance models Sustainability Long term support
Considerations
AcknowledgmentsADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS), Ron Margolis NCBI: George KomatsoulisNHGRI: Valentina di Francesco, Ajay Pillai,NIGMS: Susan GregurickCIT: Andrea Norris, Debbie SinmaoNIH Common Fund: Jim Anderson , Betsy Wilder, Leslie DerrNCI: Ian Fore, Sean Davis, Warren Kibbe, Tony Kerlavage, Tanja DavidsenNIAID: Maria Giovanni, Alison Yao, Eric Choi, Claire SchulkeyNHLBI: Weiniu Gan, Alastair ThomsonNIH Clinical Centre: Elaine Ayres, (BITRIS), NIBIB: Vinay Pai (DK), OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke, Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
Stay in Touch
QR Business [email protected]
SlideshareBlog (Coming soon!)Vivien Bonazzi