30
Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 1 The Evolution of Semantic Technologies-The Value of Merging Smart Data With Big Data Eric Little, PhD VP – Chief Scientist

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 1 The Evolution of Semantic Technologies-The Value of Merging Smart Data With Big Data Eric Little, PhD VP

Embed Size (px)

Citation preview

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 1

The Evolution of Semantic Technologies-The Value of Merging Smart Data With Big Data

Eric Little, PhD

VP – Chief Scientist

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 2

Who is Modus Operandi? Privately held small

business headquartered in Melbourne, FL.• Satellite locations in

Aberdeen, MD and Ft. Huachuca, AZ.

• 82% of employees possess a security clearance.

U.S. Government is our primary customer• Expanding into select

commercial markets

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 3

IT’s common challenge TOO BIG - Too much data or too many

variables

SILOED - Data is in legacy silos so nothing is integrated

LOST EXPERTISE - SME info is lost in people’s heads

NO EXCHANGE - No good processes for data exchange

NO VIZ - No good ways to visualize data

NO QUALITATIVE - Cannot use statistical tools to get qualitative answers

DIRTY DATA - Too many errors in the data

NO RULES - No way to capture business rules without big coding effort

NO VOCAB - No good vocabularies exist to capture data elements

MANY MODELS - Too many data models to be controlled effectively

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 4

MODUSAPPROA

CHRather than build one type of technology we

realize the need for an end-to-end

platform to provide solutions for our

customers

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 5

The “Cognitive Evolution” of Intelligent Software Semantic technologies are part of an IT evolution from code to data centricity

• In the Code-Centric years, data was often stored in flat files with no structure, while complex, procedural “edit” programs contained all knowledge about the data

• The creation of databases, specifically Network and RDBMS, was one of the first steps leading to Data-Centric evolution

• The last decade has seen standards such as XML, RDF, Web services, and now OWL, that further evolve IT to a Data-Centric environment

Big data and scalability is now helping to shape semantic tech at large scale.

Big data science

Retrieval at Scale is most important

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 6

New & Expanding Tech Areas The past few years have seen

a significant rise in new tech fields• data science, big data analytics,

semantic technologies, natural language processing, graph computing, and systemics

These areas provide new paradigms for data analysis and integration

These are driving new innovations in the ways people can access and use their data.

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 7

Innovation is Key in These Types of Tech Spaces The idea seems straight-forward

and easy• But it is difficult to find true spots of

Blue Ocean Requires new approaches that are

taken from numerous disciplines Small businesses need to compete

by focusing and being disruptive• Being disruptive involves the

counter intuitive approach of focusing on specific market segments

• Requires an ability to be nimble and respond quickly to needs (iterative prototyping)

• Every wave is different – reading the wave is key

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 8

SEMANTIC TECHNOLOGY

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 9

Semantics and Reasoning

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 10

Semantic Approach Improves Data Access

10

Traditional Approach Semantic Approach

Database Experts

Domain Experts & Scientists

Systems Engineers

Management & Executives

• Manual Data Correlation• Manual Report Generation

(High Potential for Error)• Integrated Classifications/Schemas

• Automated Reasoning Capabilities(Significant Error Reduction)

Domain Experts & Scientists Systems

EngineersManagement & Executives

Ontology Engine

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 11

Semantic Approach Simplifies Queries

Traditional Approach

Database Experts

Query Must Contain:1. Data Requirements2. All Logic Required to Relate the Data

(Rules, Joins, Decode, Sub-queries, etc.)

Complexity: HIGHReusability: LOW-MED

Semantic Approach

Reasoning is done on the user side for each query

Reasoning is performed by Ontobroker within the system

Database Experts

Scientists, Systems

Engineers

Management & Executives

Query Must Contain:1. Data Requirements only

Complexity: LOWReusability: HIGH (Logic embedded in Model)

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 12

12

Building Semantic Profiles From Raw Data

• Key data elements are identified – creating lexicon of important terms

• Data elements are categorized into appropriate classes – ranges are captured for autoclassification

• Can be applied to any type of data elements: equipment, reports, products, processes, etc.

• Advanced logics allow for reasoning over data sets such that new patterns and information can be gained

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 13

Utilizing Semantics to Integrate Disparate Medical Data

807862 ACETAMINOPHEN (TYLENOL) LEVEL855406 Acetaminophen (Tylenol) Level

2273960 Acetaminophen (Tylenol) Level7543180 ACETAMINOPHEN (TYLENOL) LEVEL

253965 Acetaminophen [ Tylenol]512270 Acetaminophen + Codeine

3016745 Acetaminophen + Codeine6075682 Acetaminophen + Codeine6327790 Acetaminophen + Codeine1688184 Acetaminophen + Codeine (120mg-12mg/5ml) (NF) Liquid1701785 Acetaminophen + Codeine (120mg-12mg/5ml) (NF) Liquid3967939 Acetaminophen + Codeine (120mg-12mg/5ml) (NF) Liquid

7271363 Acetaminophen - 325 mg PO q4h PRN Temp >1017881183 Acetaminophen - 650 mg PO q6h PRN Pain

64654 ACETAMINOPHEN SUPP 325 MG SUPP4851508 ACETAMINOPHEN SUPP 325 MG SUPP9870184 Acetaminophen Tab 325MG

679752 ACETAMINOPHEN TAB 325 MG TAB1715007 ACETAMINOPHEN TAB 325 MG TAB2292336 ACETAMINOPHEN TAB 325 MG TAB3914196 ACETAMINOPHEN TAB 325 MG TAB6768031 ACETAMINOPHEN TAB 325 MG TAB8163956 ACETAMINOPHEN TAB 325 MG TAB9629590 ACETAMINOPHEN TAB 325 MG TAB6802504 Acetaminophen (160mg/5ml) Suspension

Hospital 1 Data

Hospital 2 Data

• Disparate data sources can be ingested by the system and automatically classified into their appropriate class, attributes, etc.

• The models only need to be developed initially with the help of medical SMEs (as opposed to continuous point-to-point mappings with traditional systems).

Common Data Model

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 14

Classification Schemas Must Reflect Subject Matter Expertise

Orbis Technologies, Inc. Proprietary 14

• SME’s are often ill equipped to capture their knowledge semantically• Knowledge can be captured in ontologies (as attributes, advanced

relationships, etc.) – but this requires a separate skills set• Multiple ontologies can be integrated to capture enterprise-wide

applications for advanced business intelligence

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 15

Federated Ontology Layers Allow for Advanced Data Modeling

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 16

Putting It All Together Into A Platform

Unstructured Outcomes Data

Structured Data

Customizable User Interfaces

Ontology Engine

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 17

BIG DATA – NOW THAT YOU HAVE SEMANTICS, HOW TO SCALE…

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 18

The Problem of Big Data is Real (And Closing In) The past couple of

decades have been spent on data gathering and storage

Most Data Stores were not built to get data out

The new push is connecting data

New high-performance systems are required to meet those needs • Data solutions must be

big, smart, and easy to deploy

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 19

Big Data Analytics Challenge for Intelligent Systems

Data Analysis in the many spaces requires near-real-time decision support tools.

Connecting the dots is paramount to successful and effective analysis

This requires a culmination of new techniques that combine robust data modeling and linkage (e.g., graphs) with high-performance computing capability

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 20

Capturing Complex Data Is Difficult

People are now attempting to utilize their data like never before.• Semantics has shown significant

promise but has not scaled well in the past.

• Entities, attributes, locations, temporal signatures, etc. result in data explosions

Breakthroughs in cloud computing and high performance graph stores are providing new ways to innovate data science.

Multiple users can now apply perspectives

Can be driven to an entire enterprise

Built on Standards-based Approaches

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 21

Scaling semantics

Semantics has not scaled well in the past Entities, inferred data, facets, over time, with

quality attribution,… = a data explosion Our newest breakthroughs in cloud computing and

high performance graph stores allow semantics-at-scale

BIG GRAPHS

+ +

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 22

Scaling semantics (cont.)

Enterprise-level graph computing requires cutting edge technology components

Data Ingest at Cloud scale – must be able to ingest millions of entities and thousands of documents per second. (Modus Operandi Wave Engine)

Data Storage (Triple Stores and Cloud DBs)• 60 billion triples, sub-second queries, thousands of

unstructured docs processed per second

Data Traversal (High-performance UI’s) – app stores and BI tools to provide a diverse user experience

High-performance triple store

Semantic Reasoner

Graph-based

Appliances

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 23

Avoiding the Hype Cycle

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 24

EASY-TO-USE DATA

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 25

EASYSMARTBIG

Easy UI’s Leverage Common Models

Our user Interfaces are designed around common use models

HDFS / Hadoop / MapReduce

Accumulo Key Value Store

Semantic search

Geospatial views

Semantic Wiki – collaborate

Timelines

ExploreVisualizations

Large-scaleSemantic triple stores with reasoning

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 26

Driving the Knowledge to Multiple Users Combining software tools in innovative ways allows for multiple

users to view the same data at once.• These technologies are providing new platforms that are driving new

ways to utilize advanced analytics like never before Information can be driven to multiple users in near-real-time for

improved decision support

End Users

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 27

Visualizing patterns

Correlations, associations and patterns require special purpose visualizations

Our ExtJS/Ozone framework enables fast assembly of point solutions

Patterns recognition leads to prediction

Prediction leads to prevention

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 28

Big Data Results in a Highly Intuitive UIs is Key

Complex data does not require complex UIs• Many new tech

innovations involve simple, intuitive front ends (apps)

Users must be able to quickly manipulate information

Must be able to quickly derive answers

Different technologies must be integrated into a common look and feel

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 29

Providing an End-to-End Solution Many companies our size provide a capability or two

• Modus Operandi provides a complete platform for a multitude of user applications (and growing).

Information can be ingested from nearly any source (structured, semi-structured or unstructured).• Common models such as UMLS, Ucore-SL, BFO, etc.• Custom models can be created based on project specifics.

Information is stored in a high-performance graph knowledge base (we can integrate numerous ones – currently using Bigdata, Rya and Allegrograph).

Results can be driven to a wide variety of easy-to-use UI’s that can be highly customized to fit user needs.

Smart + Big + Easy provides a new means to successfully apply semantic technologies to large scale graph computing.

Jan. 22, 2013 | © 2013 Modus Operandi, Inc. | 30

THANK YOUQUESTIONS?