Upload
interpretomics
View
267
Download
0
Embed Size (px)
Citation preview
1
Big Data in Disease Management
Mohamood AdhilInterpretOmics, Bangalore
9th Cloud Computing and Big Data Analytics, 17th March 2016
2
Big Data ?
Big data is not only about size
Term “Big Data” is coined when the growth of the data exponentially increased and the data are difficult to process with conventional software tools to extract meaningful information
Big data analytics is used in many fields such as ecommerce, Health care, Astronomy, Politics, Weather, Media, Research .......
www.interpretomics.co
3
Knowledge
Hypothesis Validation
HypothesisCreation
Big data
Hypothesis Validation: We know something and we need to proove.
Hypothesis Creation: We don't know anything and we want to find something from data. This is also known as Exploratory Data Analysis (EDA)
f(big data) = knowledge
Big Data to Knowledge
www.interpretomics.co
4
Causal Relationship Main objective from the data for any
field is to find the causal of an event, For Example:
What are the causes of the downfall of company stock values
What are the causes of the crime happens frequently in particular place
Variables (v) are identified from the data to find the cause for the effect which can be used in future to alter the event
Data
v1 v2 v3 vn
www.interpretomics.co
5
Interesting discovery for the Causal US-based National Institute of Neurological
Disorders and Stroke in 2007 which found that in families affected by Parkinson Disease, those who drank a lot of coffee were less likely to develop the parkinson disease
In the early 1900s, incidents of lung cancer were on the rise but no-one really knew why. German physician Fritz Lickint published a paper in which he showed that lung cancer patients were particularly likely to have been smokers
Positive Correlation
http://archneur.jamanetwork.com/article.aspx?articleid=793724http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confounding-Factors.html
www.interpretomics.co
6
Correlation is not always the causal
In US, Number of people eating ice-cream positively correlates with number of deaths caused due to drowning.
Worldwide non-commercial space launches correlates with sociology doctorates awarded (US)
Japanese passenger cars sold in the US correlates with suicides by crashing of motor vehicle Confounding Factors
http://www.dailymail.co.uk/sciencetech/article-2640550/Does-sour-cream-cause-bike-accidents-No-looks-like-does-Graphs-reveal-statistics-produce-false-connections.html
www.interpretomics.co
7
Statistics Key Points on Analyzing Big-data Understand the sample size (data size and sample size are
different) Visualize data before and after analyzing the data Select the appropriate statistical model or tool based on the
problem to be addressed Dont look for patterns, discover patterns from the data
(Exploratory Data Analysis) Be aware of Confounding factors
www.interpretomics.co
8
Bangalore - Breast Cancer Capital of India
City New cases per lakh
Bangalore 36.6
Thiruvananthapuram 35.1
Chennai 32.6
Nagpur 32.5
Delhi 32.2
Some of the proposed reasons with no proper evidences are Rapid Urbanization, Late Marriage, declining trend of Breastfeeding, Contraceptive Pills, Food Habbits
Are these factors really causal ?
www.interpretomics.co
9
Genomics data (Big Omics Data) Complete set of DNA (Chromosome)
which includes genic and non-genic regions is known as Genome
Entire genome contains 3 billion bases
Genome is sequenced (NGS) to identify the variantions responsible for the phenotype (Example: Disease)
One sample sequence data will be approximately 10-20 GB depends on the type of sequencing
DNA will define youwww.interpretomics.co
10
Genomics – Complex Puzzle
www.interpretomics.co
11
Genomics - Trillion Dollar Industry
New Jobs
Improved Healthcare
New Drugs
Start-UpCompanies
www.interpretomics.co
12
Big Omics Challenges
Genomics
Data Processing
Storage
Compute
FunctionalAnnotation
Statistics
MathematicalModels
www.interpretomics.co
13
Seven Dimensions of Genomics Data
Volume
Velocity
Variety
Veracity
Vexing
Variability
Value
General for all big data
Specific to Genomics data
www.interpretomics.co
14
Application of Genomics Data
Genomics data plays crucial role from bench to bed side
Bench - Drug Discovery Process Bed Side - Genetic Testing for
Precision Medicine
Main difference between bench and bedside is the number of samples; Usually bench will have cohort data (N=n) and bed side will have single data (N=1)
Bench Bed side
www.interpretomics.co
15
Genetic Testing for Precision Medicine Some of the popular genetic testing using NGS
technique (Big Omics data) are: Genetic Predisposition - To know more about the
genetic make up and odds of getting the disease
Disease Diagnostic – This test is to diagnose the particular disease where it is difficult in case of rare disorders like psychatric and metabolic disorders
Drug Response prediction (Pharmacogenomics) – This type of test helps for drug selection based on the genomic variations
These results are produced using evidence based technique
Analytical Engine
Genome-phenomeDatabases
10-1000 TB of data
5-10 GB
Genetic ReportWith Valid Evidence
www.interpretomics.co
16
Example - Genomics Data for Screening and Diagnosis
Applied genetics diagnostics, Bangalore is the next generation healthcare company based on bangalore that offers genetic diagnostic services to hospitals, physicians and healthcare organization
Interpretomics is the scientific partner providing sequencing and interpretation to Applied genetics.
Some of the test from AppGenDx includes: Single Gene Test
Multi Gene Test
Multi Disease Test
OncoScreen
CarrierScreen ....
To Know more: http://www.appgendx.com/
www.interpretomics.co
17
Case Study
Patient: 33 years male with ulcer in buccal mucosa Doctor Diagnosis: Oral Squamous Cell Carcinoma Disease Causal Mutation using NGS: CDKN1A gene c.93C>A; p.Ser32Arg, Heterozygous condition Disease Reported: Oral Squamous Cell Carcinoma
Case 2
Patient: 2 Years 7 Months age female patient having unsteady walks and not diagnosed with specific disease Doctor Diagnosis: - Disease Causal Mutation using NGS: Mutation in AGRN gene: c.1072G>T; p.Gly358Trp, Heterozygous Disease Reported: Myasthenic syndrome, congenital, 8, with pre- and postsynaptic defects
Case 1
www.interpretomics.co
18
Drug Development Drug development is the time consuming
where it takes approximately 15 years to enter into the market
Requires huge amount of money (~1 to 10 billion) for the drug development
On average 1 in 10 drugs from the clinical development will be approved by FDA
These three hurdles can be overcome by targeted drugs using big omics data for improved turn around time, reduced cost, and increased success rate
Currently 42% of all drugs and 73% of oncology drugs in development are targeted drugs. This market is worth approximately $42 billion and should be worth over $60 billion by 2019.
(The Journal of Precision Medicine Vol1 Issue 2 Page no 31)
www.interpretomics.co
19
iOMICS – Unified Genomics Software Solution Multi-omics Multi-scale data management, analysis and interpretation
software.
Developed for composite analysis needs and tested with numerous real data sets, this robust platform addresses the complexities of Life Sciences “Big Data” for driving actionable insights with unprecedented ease.
Cloud and On-Premise Version
Intuitive Analysis
Dynamic Visualisation
Support in-house and 3rd party softwares and databases
www.interpretomics.co
20www.interpretomics.co
21
iOMICS – Omnia (Knowledge Base)
Curation is based on data and text mining techniques using manual curation and manual validation pipelines by PhD quality biologists
Omnia contains 316 disease types for four disease groups: Neurology, Metabolic, Pediatric and Oncology.
Currently, Omnia contains more than 200,000 variations, 100 Genomic Experiments and 5000 papers are curated for genome-phenome relationship.
www.interpretomics.co
22
Future Predictions Computing resources needed to handle genome data will soon
exceed those of Twitter and YouTube By 2025, between 100 million and 2 billion human genomes
could have been sequenced Data-storage could run to as much as 2–40 exabytes Storage is smaller problem compared to computing such as
acquiring, distributing and analysing genomics data may be even more demanding
http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912
www.interpretomics.co