Big Data in Disease Management

1

Big Data in Disease Management

Mohamood AdhilInterpretOmics, Bangalore

9th Cloud Computing and Big Data Analytics, 17th March 2016

2

Big Data ?

Big data is not only about size

Term “Big Data” is coined when the growth of the data exponentially increased and the data are difficult to process with conventional software tools to extract meaningful information

Big data analytics is used in many fields such as ecommerce, Health care, Astronomy, Politics, Weather, Media, Research .......

www.interpretomics.co

3

Knowledge

Hypothesis Validation

HypothesisCreation

Big data

Hypothesis Validation: We know something and we need to proove.

Hypothesis Creation: We don't know anything and we want to find something from data. This is also known as Exploratory Data Analysis (EDA)

f(big data) = knowledge

Big Data to Knowledge


4

Causal Relationship Main objective from the data for any

field is to find the causal of an event, For Example:

What are the causes of the downfall of company stock values

What are the causes of the crime happens frequently in particular place

Variables (v) are identified from the data to find the cause for the effect which can be used in future to alter the event

Data

v1 v2 v3 vn


5

Interesting discovery for the Causal US-based National Institute of Neurological

Disorders and Stroke in 2007 which found that in families affected by Parkinson Disease, those who drank a lot of coffee were less likely to develop the parkinson disease

In the early 1900s, incidents of lung cancer were on the rise but no-one really knew why. German physician Fritz Lickint published a paper in which he showed that lung cancer patients were particularly likely to have been smokers

Positive Correlation

http://archneur.jamanetwork.com/article.aspx?articleid=793724http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confounding-Factors.html


http://archneur.jamanetwork.com/article.aspx?articleid=793724

http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confounding-Factors.html

http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confounding-Factors.html

6

Correlation is not always the causal

In US, Number of people eating ice-cream positively correlates with number of deaths caused due to drowning.

Worldwide non-commercial space launches correlates with sociology doctorates awarded (US)

Japanese passenger cars sold in the US correlates with suicides by crashing of motor vehicle Confounding Factors

http://www.dailymail.co.uk/sciencetech/article-2640550/Does-sour-cream-cause-bike-accidents-No-looks-like-does-Graphs-reveal-statistics-produce-false-connections.html




7

Statistics Key Points on Analyzing Big-data Understand the sample size (data size and sample size are

different) Visualize data before and after analyzing the data Select the appropriate statistical model or tool based on the

problem to be addressed Dont look for patterns, discover patterns from the data

(Exploratory Data Analysis) Be aware of Confounding factors


8

Bangalore - Breast Cancer Capital of India

City New cases per lakh

Bangalore 36.6

Thiruvananthapuram 35.1

Chennai 32.6

Nagpur 32.5

Delhi 32.2

Some of the proposed reasons with no proper evidences are Rapid Urbanization, Late Marriage, declining trend of Breastfeeding, Contraceptive Pills, Food Habbits

Are these factors really causal ?


9

Genomics data (Big Omics Data) Complete set of DNA (Chromosome)

which includes genic and non-genic regions is known as Genome

Entire genome contains 3 billion bases

Genome is sequenced (NGS) to identify the variantions responsible for the phenotype (Example: Disease)

One sample sequence data will be approximately 10-20 GB depends on the type of sequencing

DNA will define youwww.interpretomics.co

10

Genomics – Complex Puzzle


11

Genomics - Trillion Dollar Industry

New Jobs

Improved Healthcare

New Drugs

Start-UpCompanies


12

Big Omics Challenges

Genomics

Data Processing

Storage

Compute

FunctionalAnnotation

Statistics

MathematicalModels


13

Seven Dimensions of Genomics Data

Volume

Velocity

Variety

Veracity

Vexing

Variability

Value

General for all big data

Specific to Genomics data


14

Application of Genomics Data

Genomics data plays crucial role from bench to bed side

Bench - Drug Discovery Process Bed Side - Genetic Testing for

Precision Medicine

Main difference between bench and bedside is the number of samples; Usually bench will have cohort data (N=n) and bed side will have single data (N=1)

Bench Bed side


15

Genetic Testing for Precision Medicine Some of the popular genetic testing using NGS

technique (Big Omics data) are: Genetic Predisposition - To know more about the

genetic make up and odds of getting the disease

Disease Diagnostic – This test is to diagnose the particular disease where it is difficult in case of rare disorders like psychatric and metabolic disorders

Drug Response prediction (Pharmacogenomics) – This type of test helps for drug selection based on the genomic variations

These results are produced using evidence based technique

Analytical Engine

Genome-phenomeDatabases

10-1000 TB of data

5-10 GB

Genetic ReportWith Valid Evidence


16

Example - Genomics Data for Screening and Diagnosis

Applied genetics diagnostics, Bangalore is the next generation healthcare company based on bangalore that offers genetic diagnostic services to hospitals, physicians and healthcare organization

Interpretomics is the scientific partner providing sequencing and interpretation to Applied genetics.

Some of the test from AppGenDx includes: Single Gene Test

Multi Gene Test

Multi Disease Test

OncoScreen

CarrierScreen ....

To Know more: http://www.appgendx.com/


17

Case Study

Patient: 33 years male with ulcer in buccal mucosa Doctor Diagnosis: Oral Squamous Cell Carcinoma Disease Causal Mutation using NGS: CDKN1A gene c.93C>A; p.Ser32Arg, Heterozygous condition Disease Reported: Oral Squamous Cell Carcinoma

Case 2

Patient: 2 Years 7 Months age female patient having unsteady walks and not diagnosed with specific disease Doctor Diagnosis: - Disease Causal Mutation using NGS: Mutation in AGRN gene: c.1072G>T; p.Gly358Trp, Heterozygous Disease Reported: Myasthenic syndrome, congenital, 8, with pre- and postsynaptic defects

Case 1


18

Drug Development Drug development is the time consuming

where it takes approximately 15 years to enter into the market

Requires huge amount of money (~1 to 10 billion) for the drug development

On average 1 in 10 drugs from the clinical development will be approved by FDA

These three hurdles can be overcome by targeted drugs using big omics data for improved turn around time, reduced cost, and increased success rate

Currently 42% of all drugs and 73% of oncology drugs in development are targeted drugs. This market is worth approximately $42 billion and should be worth over $60 billion by 2019.

(The Journal of Precision Medicine Vol1 Issue 2 Page no 31)


19

iOMICS – Unified Genomics Software Solution Multi-omics Multi-scale data management, analysis and interpretation

software.

Developed for composite analysis needs and tested with numerous real data sets, this robust platform addresses the complexities of Life Sciences “Big Data” for driving actionable insights with unprecedented ease.

Cloud and On-Premise Version

Intuitive Analysis

Dynamic Visualisation

Support in-house and 3rd party softwares and databases


20www.interpretomics.co

21

iOMICS – Omnia (Knowledge Base)

Curation is based on data and text mining techniques using manual curation and manual validation pipelines by PhD quality biologists

Omnia contains 316 disease types for four disease groups: Neurology, Metabolic, Pediatric and Oncology.

Currently, Omnia contains more than 200,000 variations, 100 Genomic Experiments and 5000 papers are curated for genome-phenome relationship.


22

Future Predictions Computing resources needed to handle genome data will soon

exceed those of Twitter and YouTube By 2025, between 100 million and 2 billion human genomes

could have been sequenced Data-storage could run to as much as 2–40 exabytes Storage is smaller problem compared to computing such as

acquiring, distributing and analysing genomics data may be even more demanding

http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912


http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912

Health & Medicine

Big Data in Disease Management