16
©2016 Allotrope Foundation Allotrope Foundation: Driving Metadata & Master Data Management through Improved Data Modeling with Semantic Technologies Dana Vanderwall, Ph.D. Director, Biology & Preclinical IT (BMS) Vice Chair, Board of Directors (Allotrope) Eric Little, Ph.D. Vice President, Data Science (OSTHUS) Adjunct Professor (NYU Polytechnic School of Engineering )

Allotrope foundation vanderwall_and_little_bio_it_world_2016

  • Upload
    osthus

  • View
    333

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Allotrope Foundation: Driving Metadata & Master Data Management through Improved Data Modeling with Semantic Technologies

Dana Vanderwall, Ph.D.Director, Biology & Preclinical IT (BMS)

Vice Chair, Board of Directors (Allotrope)

Eric Little, Ph.D.Vice President, Data Science (OSTHUS)

Adjunct Professor (NYU Polytechnic School of Engineering )

Page 2: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

The Current Situation in the LabMany challenges exist for data to be captured, integrated and shared• Data Silos• Incompatible instruments and

software systems• Legacy architectures are brittle

and rigid• SME knowledge resides in

people’s heads• Data schemas are not explicitly

understood• Lack of common vision between

business units and scientists

2

Page 3: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

How do we change that?

3

Data in Standard Format

Metadata in a Standard vocabularyRegulatory GuidanceMethodsRecipesSOPs…

Vendor-Specific Formats

ProcessMaterial

EquipmentResult

Page 4: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Allotrope Foundation: Driving the Change

4

• Subject Matter Experts• Project Funding

Member Companies

• Project Management• Legal & Logistical Support

Secretariat

• Framework Development• Technical Leadership

ProfessionalSoftware Firm

• Requirements & Specifications• Contributions, PoC Applications

Partner Network

AbbVieAmgenBaxterBayer

BiogenBoehringer IngelheimBristol-Myers SquibbEli Lilly

Genentech/RocheGlaxoSmithKlineMerck & Co.Pfizer

ACD/LabsAgilent BioviaBrukerBSSNIDBSLabAnswerLabVantageLEAP TechnologiesMestrelab Research

Mettler ToledoPerkinElmerPersistent SystemsRiffynSartoriusShimadzuTetra ScienceThermo ScientificWaters

Erasmus Univ. Med CenterJ. Paul Getty Trust(UK) Science and Technology Facilities CouncilUniversity of SouthamptonUniversity of Strathclyde

Page 5: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Allotrope Data Format (ADF)

5

Data DescriptionRDF Model

Data Cubes Universal data container

Data Package Virtual file system *

Contains:• Method, instrument, sample,

process, result, etc.• Data cube metadata• Data package metadata• …

Analytical data represented by one- or multidimensional arrays.

HDF5Platform Independent File Format

Allotrope Data Format

Analytical data represented by arbitrary formats, incl. native instrument formats, images, pdf, video, etc.

Specifically designed to store and organize large amounts of numerical data.

API

s (J

ava

& .

NET

cla

ss lib

rari

es)

v1.0 ADF, Taxonomies, Class Libraries released Sept 2015, v1.1 April 2016

Page 6: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Moving from Data Format to Semantics• Has its origins in philosophy - generally understood as the abstract study

of meaning• Distinguished from syntax – which is the rules-based grammar of a

language

6

“Washington”

Page 7: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Allotrope Foundation Taxonomies (AFT)

7

Page 8: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Result

Process

Equipment

8©2016 Allotrope Foundation

Allotrope Taxonomies Standardize our Metadata

Page 9: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Utilizing the Semantic Spectrum (Moving Beyond Taxonomies)

9

Code (Lists) Terms (Soil, Plant, etc.)

Controlled Vocabulary(Agreed Upon Terms)

Taxonomy(Hierarchy)

Thesaurus(Preferred Labels, Synonyms, etc.)

RDF Models(Triples as Graphs)

OWL Ontologies(RDF + Axioms)

Reasoning(Rule-based Logics:

Discover New Patterns)

Ontologies and Reasoning add Axioms and Advanced Logic

Page 10: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Understanding the 4V’s of Big Data

10

Normally the focus –Big Data Analysis is more than just size

Performance is Critical to Success

Data complexity is increasing – Model complexity

Uncertainty abounds – requires statistics and probabilities

Majority of Big Data analytics approaches treat these two V’s

Semantic technologies provide

clear advantages

Mathematical Clustering

Techniques provide clear advantages

Page 11: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Why Semantics Matters for Data Analytics

11

Big Data approaches require proper metadata and

terminologies to integrate information well

Relationships matter in the data

Understanding perspective (context) is crucial for success

in today’s world

Semantics provides better data models/schemas

Page 12: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

The Foundation for Real Data Analytics on the Laboratory Workflow and Data

12

Plan Analysis

Prepare Samples

Submit Samples

Control Inst. Acquire Data

Process Data

Analyze Data

Reports Results

Store, Archive

Data

Request ReportSearch &

Reuse Data

Sample Prep Data

Instrument Instructions

Instrument Data Processed Data Analyzed Data Reported

Results Stored DataAnalytical Method

Data DescriptionRDF Model

Data Cubes Universal data

container

Data Package Virtual file system

Allotrope Data Format

Page 13: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

How is the Framework Being Used? Implementation by Member Companies

13

DevelopmentResearch Commercial

Member non-GMP GMPInstrument

BMS

Bayer

Baxter

Merck & Co.

Amgen

Boehringer-Ingelheim

GSK Drug Substance Release & Stability

Structure ID, Purification,In vitro bioanalysis

Method ScreeningHPLC-UV/MS

HPLC-UVBalance

HPLC-UV/MS

Structure IDHPLC-MS

FermentationProcess ControlBioanalyzer

Small and Large Molecule CMC

Genentech

Elemental Impurities

Assay, PurityHPLC-UV

Biogen CRO IntegrationHPLC-UV

Pfizer LC Data to ADF Converter/AdapterHPLC-UV

ICP-MS

pH, Weighing, GC, Karl Fischer, TGA, NMR , Cell Density/Viability, Blood Gas Analyzer, Cell Culture Analyzer, Capillary Electrophoresis…

Page 14: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

How is the Framework Being Used? Implementations by Member Companies

14

DevelopmentResearch Commercial

Member non-GMP GMPInstrument

Member 6

Member 3

Member 2

Member 9

Member 1

Member 5

Member 8 Drug Substance Release & Stability

Structure ID, Purification,In vitro bioanalysis

Method ScreeningHPLC-UV/MS

HPLC-UVBalance

HPLC-UV/MS

Structure IDHPLC-MS

FermentationProcess ControlBioanalyzer

Small and Large Molecule CMCMultiple types

Member 7

Elemental ImpuritiesICP-MS

Assay, PurityHPLC-UV

pH, Weighing, GC, Karl Fischer, TGA, NMR , Cell Density/Viability, Blood Gas Analyzer , Cell Culture Analyzer, Capillary Electrophoresis…

Member 4 CRO IntegrationHPLC-UV

Member 10 LC Data to ADF Converter/AdapterHPLC-UV

DevelopmentResearch Commercial

Member 6

Member 9

Member 8 Drug Substance Release & Stability

Structure ID, Purification,In vitro bioanalysis

Method ScreeningHPLC-UV/MS

HPLC-UVBalance

HPLC-UV/MS

Member 10 LC Data to ADF Converter/AdapterHPLC-UV

Member 1 Small and Large Molecule CMC

Multiple types

taxonomies

methodsrepository

data repository adapter

instrument adapter

pH, Weighing, GC, Karl Fischer, TGA, NMR , Cell Density/Viability, Blood Gas Analyzer, Cell Culture Analyzer, Capillary Electrophoresis…

Page 15: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Smart Labs for the 21st CenturySmart labs in the future will provide the enterprise with:

• Integrated Data – common reference data structures (vocabularies)

• Sharable Data – easier interaction across teams and business units

• Scalability – Big data applications that can be highly elastic

• Conceptual Representations – context and perspective are captured

• Advanced Analytics – complex & automated problem-solving capabilities

Page 16: Allotrope foundation vanderwall_and_little_bio_it_world_2016

©2016 Allotrope Foundation

Thank you!• Any questions, please contact the Secretariat at

[email protected] or [email protected]

• 2016 Workshops– January 20, 2016: San Francisco, CA @ Genentech– June 2016: Ingelheim, Germany @ Boehringer Ingelheim– September 2016: Indianapolis, ID @ Eli Lilly and Co.

http://www.allotrope.org for more information and to register

16