MAQC Society 2nd Annual Meeting
Theme: Precision Medicine and Clinical Omics
Fudan University, 220 Handan Road, Shanghai 200433, China
February 24‐27, 2018 (Saturday – Tuesday)
Organized by:
1
Table of Contents
1. General Information 2
2. Program 3‐8
3. Biographies: Open Remarks and Session Co‐Chairs 10‐15
4. Biographies and Abstracts for Sessions 1‐3 (in alphabetic order by Speakers) 16‐39
5. Descriptions of the proposed MAQC Projects (Session 4) 40‐54
6. Posters 55‐58
7. MAQC2019 (Trento, Italy) 59
2
General Information
MAQC website: www.maqcsociety.org
Venue:
Guanghua Tower East Wing / Conference Room 202 Fudan University, 220 Handan Road, Shanghai 200433, China
Date: February 24‐27, 2018 (Saturday – Tuesday)
Scientific Program Committee:
Scientific Program Chairs: Weida Tong ([email protected]) and Leming Shi ([email protected])
The MAQC Society Board Directors and President Officers
Local Organizing Committee Administrators:
Main contact: Ms. Lei Zhang ([email protected], +86‐15901714785)
Backup contact: Ms. Wanwan Hou ([email protected], +86‐18721040304)
Surrounding Hotels (addresses and estimated cost):
Fortune Hotel, 399 Handan Road, Shanghai. Tel: +86‐21‐6511‐0000, http://shanghaifortunehotel.com/ (~$70/night, most participants will stay at this hotel; 11 mins walking to conference venue)
Crowne Plaza Fudan, 199 Handan Road, Shanghai. Tel: +86‐21‐5552‐9999, www.crowneplazafudan.com (~$150/night; 10 mins walking to conference venue)
Hyatt Regency Wujiaochang, 88 East Guoding Road, Shanghai. Tel: +86‐21‐2565‐1234, shanghaiwujiaochang.regency.hyatt.com (~$150/night; 20 mins walking to conference venue)
Nearby Airports: Shanghai Pudong International Airport (PVG)
Shanghai Hongqiao International Airport (SHA)
Key Activities:
Poster presentation: Setup in the morning of Saturday and presentation in the afternoon of Sunday
February 24 (Saturday): Welcome reception for all attendees (Sponsored by Fudan University)
February 25 (Sunday): Dinner reception for the speakers and poster presenters (Sponsored by Illumina)
February 26 (Monday): Post‐conference workshop for the SEQC2 project discussion
February 27 (Tuesday): Post‐conference workshop on advanced analytics by SAS Institutes
Sponsors:
3
7:00 am, Saturday, February 24, 2018: Registration and poster hanging
Day 1 Morning, Saturday, February 24, 2018
Session I: Precision Medicine and Clinical Omics
Co‐Chairs: Matthias Fischer (Cologne University, Germany) and Wendell Jones (Q2 Solutions – EA
Genomics, USA)
8:30 am Welcome remarks Li Jin and Leming Shi (Fudan
University, China)
8:45 am Keynote address: Using Mandel’s row linear model
in the ‘omics era
Terry Speed (Walter and Eliza Hall
Institute of Medical Research,
Australia)
9:25 am Evolution of the breast cancer genome under
immune surveillance Lajos Pusztai (Yale University, USA)
9:50 am Intratumoral heterogeneity and clonal selection of
breast cancer
Zhimin Shao (Fudan University,
China)
10:15 am Group photo and coffee break
10:45 am The genetic basis of tumor progression and
spontaneous regression in neuroblastoma
Matthias Fischer (Cologne
University, Germany)
11:10 am Early staged lung cancer: challenges and
opportunities
Haiquan Chen (Fudan University,
China)
11:35 am Panel discussion
Terry Speed
Lajos Pusztai
Zhimin Shao
Matthias Fischer
Haiquan Chen
12:00 pm Lunch break
4
Day 1 Afternoon, Saturday, February 24, 2018
Session II: Reproducibility and Standards
Co‐Chairs: Susanna Sansone (Oxford University, UK) and Chris Mason (Weill Cornell Medicine, USA)
1:30 pm Keynote address: Closing the reproducibility gap
with standards and best practices
Leonard Freedman (Global
Biological Standards Institute, USA)
2:10 pm
The FAIR principles: Findability, Accessibility,
Interoperability and Reusability of the research
assets
Susanna Sansone (Oxford
University, UK)
2:35 pm
International standardization activity on emerging
technologies for medical and food industries: an ISO
perspective
Hiroki Nakae (JMAC Japan
Multiplex bio‐Analysis Consortium,
Japan)
3:00 pm Coffee break
3:30 pm Reliability of whole‐exome sequencing for assessing
intratumor genetic heterogeneity in breast cancer
Christos Hatzis (Yale University,
USA)
3:55 pm Quality control and standardization of precision
oncology related gene mutation detection in China
Jinming Li (National Center for
Clinical Laboratories, China)
4:20 pm Enable precision data for precision medicine Jun Ye (Sentieon, USA)
4:45 pm Panel discussion
Leonard Freedman
Susanna Sansone
Hiroki Nakae
Christos Hatzis
Jinming Li
Jun Ye
5:30 pm Adjourn
6:30 pm Welcome and dinner reception
5
Day 2 Morning, Sunday, February 25, 2018
Session III: Pharmacogenomics and Bioinformatics
Co‐Chairs: Russ Wolfinger (SAS, USA) and Cesare Furlanello (FBK, Italy)
8:30 am Keynote address: Pharmacogenomics and precision
medicine: current and future perspectives
Munir Pirmohamed (University of
Liverpool, UK)
9:10 am Clinical‐grade bioinformatics systems: Overview and
lessons learned
Wendell Jones (Q2 Solutions – EA
Genomics, USA)
9:25 am Towards robust clinical use of NGS in precision
medicine Han‐Yu Chuang (Illumina, USA)
9:40 am A general framework for analysis of clonal
heterogeneity and tumor evolution Mehdi Pirooznia (NIH, USA)
9:55 am Performance assessment of de novo assembly‐based structural variation detection in the human genome
Chunlin Xiao (NCBI/NIH, USA)
10:10 am Coffee break
10:30 am Prediction of drug efficacy in breast cancer subtypes
Melissa Davis (Walter and Eliza Hall
Institute of Medical Research,
Australia)
10:45 am
Detecting mutations induced by genotoxic
carcinogens using whole‐genome sequencing of
clonal cells
Tao Chen (NCTR/FDA, USA)
11:00 am Towards the Development of an Omics Data Analysis
Framework for Regulatory Application
Florian Caiment (Maastricht
University, The Netherlands)
11:15 am Personalization in molecular diagnostics of acute
lymphoblastic leukemia for Polish children
Aleksandra Gruca (Silesian
University of Technology, Poland)
11:30 am Power and limitations of RNA‐Seq ‐ Putting
reproducibility to the test
Paweł Łabaj (Boku University,
Austria)
11:45 am
Oncogenomics of c‐Myc transgenic mice reveal
novel regulators of extracellular signaling,
angiogenesis and invasion with clinical significance
for human lung adenocarcinoma
Jürgen Borlak (Hannover Medical
School, Germany)
12:00 pm Lunch break
6
Day 2 Afternoon, Sunday, February 25, 2018
Session IV: Poster Session and Society Projects
Co‐Chairs: Benjamin Haibe‐Kains (Canada) and Rebecca Kusko (Immuneering Corp, USA)
1:30 pm Poster session Poster presenters must stand by
their posters
MAQC Society Projects (15+5 min each):
2:30 pm Computational reproducibility project Benjamin Haibe‐Kains (University
of Toronto, Canada)
2:50 pm Project #1: Reproducible machine learning for
pathology image analysis
Roberto Salgado (The International
Immuno‐Oncology Biomarker
Working Group) and Wentao Yang
(Fudan University, China)
3:10 pm Project #2: Challenges and opportunities in N‐of‐1
clinical trial: a reality of applying genomics in clinic
Xichun Hu (Fudan University,
China)
3:30 pm
Project #3: Developing reference materials and
reference data sets for the QC/standardization of
multi‐omics platforms
Yuanting Zheng (Fudan University,
China)
3:50 pm Project #4: QC/standardization of proteomics
technology
Chen Ding (Fudan University,
China)
4:10 pm
Project #5: A structured approach to comparing
genomics, computational, and high content biology
screening methods for predicting toxicity
Weida Tong (NCTR/FDA, USA)
4:30 pm Poster award announcement Wendell Jones, The President‐Elect
of the Society
4:45 pm MAQC2019 announcement Cesare Furlanello, Vice‐President of
the Society
5:00 pm Adjourn of MAQC2018
7
Day 3, Monday, February 26, 2018
Post‐Conference Workshop on SEQC2
Co‐Chairs: Weida Tong (NCTR/FDA, USA) and Leming Shi (Fudan University, China)
8:30 am Welcome and overview Weida Tong (NCTR/FDA, USA)
8:45 am Keynote address: The role of journals/publishers in
promoting research standards and reproducibility
Andrew Marshall (Chief Editor,
Nature Biotechnology, USA)
9:30 am
Session 1: Cancer genomics with Whole Genome Sequencing (Led by Wenming Xiao,
NCTR/FDA, USA)
(10+5min presentation for each manuscript)
1. Establishment of reference samples for detection of somatic mutation in cancer
2. A comprehensive investigation of factors impacting cancer mutation detection
3. Effect of tumor purity on somatic mutation detection
4. Comprehensive investigation of false mutation discoveries in FFPE samples
10:30 am Coffee break
11:00 am
Session 2: Cancer genomics with onco‐panel sequencing (Led by Joshua Xu, NCTR/FDA,
USA)
(10+5min presentation for each manuscript)
1. Establishment of reference samples for onco‐panel sequencing (Joshua Xu,
NCTR/FDA, USA)
2. Spike‐in controls for reliable use of onco‐panel sequencing in clinical diagnostic
applications (James Willey, University of Toledo, USA)
3. Sensitivity and reproducibility of onco‐panel sequencing across multiple
laboratories and technologies (Joshua Xu, NCTR/FDA, USA)
4. Integration of DNA‐seq and RNA‐seq for enhanced clinical application (David Kreil,
Boku University, Vienna, Austria)
12:00 pm Lunch break
1:00 pm
Session 3: Germline variants (Led by Huixiao Hong, NCTR/FDA, USA)
(10+5min presentation for each manuscript)
1. Assessing reproducibility of SNVs and small indels detected in WGS (Huixiao Hong,
NCTR/FDA, USA)
2. Establishment of reproducible metrics for structural variant detection with WGS
(Marghoob Mohiyuddin, Roche, USA)
3. WGS in detection and characterization of important pharmacogenomic genes –
genetic variations and pseudogenes in DMETs (Baitang Ning, NCTR/FDA. USA)
8
1:45 pm
Session 4: Epigenomics (Led by Chris Mason, Weill Cornell Medicine, New York, USA)
(10+5min presentation for each manuscript)
1. WGBS and ATAC‐seq metrics, inter‐site and intra‐site reproducibility, and best
epiQC practices
2. Single molecule computational methods for base modifications detection (PacBio
and ONT) and validation
3. Varied computational methods for differentially methylated CpGs (DMCs),
differentially methylated regions (DMRs), and peak‐calling
2:30 pm Coffee break
3:00 pm
Session 5: Additional manuscript ideas (Chaired by Leming Shi)
(5min each with all the questions answered in the end of presentations)
1. Cross‐lab and platforms comparison for of single cell sequencing (Charles Wang,
Loma Linda University, USA)
2. A close look at the inconsistent FFPE artifact myth with onco‐panel sequencing
(Thomas Blomquist, University of Toledo, USA)
3. Variations on ATAC‐seq enzymes (Tn5059) and impact on epigenome variation
(Chris Mason, Weill Cornell Medicine, New York, USA)
3:30 pm Session 6: SEQC2 Manuscript Discussion
5:30 pm Adjourn
9
Day 4 (Tuesday, Feb 27th, 2018)
Advanced Data Analysis and Deep Learning Workshop
Data scientists from the SAS / JMP Life Sciences division will offer a free advanced level hands‐on
workshop to MAQC Society conference attendees. We will analyze one or more complex experiments
together, discuss various statistical methods and concepts and share perspectives on deep learning. You
will be able to follow along on your laptop.
When: Feb. 27th, 2018 9am‐4pm
Where: Fudan University
Data: Submit your omics, NGS, clinical trial, or laboratory data before February 14, 2018. We will
select representative data sets to demonstrate analyses. Your data set must be publicly
shareable, but we will request that other attendees keep it confidential until you provide
permission to use it more broadly.
Analysis: Depending on the problems, topics can include: Design of Experiments, Quality Assessment,
Normalization, ANOVA and Mixed Modeling, Reproducibility, Pattern Discovery, Predictive
Modeling, Genetic Marker Screening, Genome‐Wide Association Study, Population
Analysis, Marker‐Assisted Breeding and Cross‐Evaluation, Best Linear Unbiased Prediction,
Linkage Mapping, Quantitative Trait Loci, Bioassay, Clinical Trials, Bioequivalence, Method
Comparison, Calibration Curves, Limit of Quantification, Feature Engineering, Cross
Validation Model Comparison, Boosted Trees, Neural Networks, Ensembling, Data Science
Competitions
Software: JMP, with dashboards created by JMP Genomics and/or JMP Clinical
Instructors: Dr. Russ Wolfinger(JMP/ SAS), a fellow of the American Association for the Advancement of
Science and the American Statistical Association, and Kaggle Grandmaster will lead the
workshop with assistance from Dr. Wenjun Bao, Dr. Li Li and Dr. Kelci Miclaus.
Contact: Dr. Wenjun Bao (JMP/SAS) Email: [email protected] to register Tel: 1‐919‐531‐1484 (0), 1‐919‐244‐0260
11
Open Remarks (February 24th):
Dr. Leming Shi is a professor at the School of Life Sciences and Shanghai Cancer Center of Fudan University in
Shanghai, China where he established and directs the Center for Pharmacogenomics. Dr. Shi is the president of the
International Massive Analysis and Quality Control (MAQC) Society (2017‐2018). Dr. Shi’s research focuses on
pharmacogenomics, bioinformatics, and cheminformatics aiming to realize precision medicine by developing
biomarkers for early cancer diagnosis, prognosis, and personalized therapy. As a principal investigator at the US
Food and Drug Administration (FDA) from 2003 to 2012, Dr. Shi conceived and led the MicroArray and Sequencing
Quality Control (MAQC/SEQC) project aimed at realizing precision medicine by standardizing genomics and
bioinformatics, leading to the development of several FDA guidance documents. Dr. Shi was a co‐founder of
Chipscreen Biosciences Ltd. in Shenzhen, China where he co‐developed a chemogenomics‐based drug discovery
platform leading to several novel small‐molecule drug candidates with promising efficacy and safety profiles in
anticancer and antidiabetic clinical trials in China, US, and Japan; one novel compound (Chidamide) was approved in
2014 by China FDA for treating T‐cell lymphoma and another antidiabetic candidate (Chiglitazar) is in Phase III
clinical trials. Dr. Shi is a co‐inventor on nine issued patents about novel therapeutic molecules and has published
over 200 peer‐reviewed papers (12 of them appeared in Nature Biotechnology) with >10,000 citations by SCI
journals. Dr. Shi received his Ph.D. in computational chemistry from the Chinese Academy of Sciences in Beijing.
Professor Jin received his doctoral degree in genetics from the University of Texas and is an academician of the
Chinese Academy of Sciences. He worked as a faculty at University of Texas and University of Cincinnati College
of Medicine. He is also an external member of Max‐Planck Society and served as a board member of Human
Genome Organization (HUGO). He is one of the founders of CAS‐MPG Partner Institute of Computational
Biology, National Center of Human Genome at Shanghai, and Fudan Taizhou Institute of Health Sciences.
Professor Jin assumed his current position of Vice President in 2007. He is Director of the Collaborative
Innovation Centre of Genetics and Development. Professor Jin holds a Haoqing‐Fudan Professorship and has
been awarded the Ho Leung Ho Lee Foundation Award for Science and Technology Achievement, Second Prize
for National Natural Science Award (twice) among others. He serves as an editorial board member for nine
academic journals, is president of the Shanghai Society of Genetics, and is president of the Shanghai Society of
Anthropology. His research interests lie in medical genetics and genetic epidemiology, computational biology,
human population genetics and genomics. Professor Jin has published over 500 articles in journals
including Nature, Science, Cell, New England Journal of Medicine, PNAS, JCI and JAMA.
Leming Shi
Professor and Director, Center for Pharmacogenomics and
Fudan‐Zhangjiang Center for Clinical Genomics
School of Life Sciences, Fudan University
Shanghai, China
Li Jin, Ph.D.
Professor and Vice President Fudan University Shanghai, China
12
Co‐Chairs for Session 1 (February 24th):
Dr. Matthias Fischer is a physician‐scientist heading the Department of Experimental Pediatric Oncology at the University Children’s Hospital of Cologne, Germany. Dr. Fischer is serving as Senior Physician at the University Children’s Hospital of Cologne since 2009, and was appointed as full Professor for Pediatrics in 2016. His laboratory is focused on elucidating the genetic etiology and molecular pathogenesis of neuroblastoma, a pediatric tumor of the sympathetic nervous system. In particular, Dr. Fischer and his team is applying high‐throughput technologies, such as massively parallel sequencing and microarray analysis, to discover relevant alterations of neuroblastoma development, to establish prognostic and predictive biomarkers, and to identify therapeutic targets. All of this work is geared to translate novel findings from basic research into clinical practice, in order to improve clinical management of neuroblastoma patients. Dr. Fischer has authored more than 100 peer‐reviewed publications, and served as advisory board member in several national and international committees, covering both basic and clinical research projects.
Dr. Jones is currently Principal Bioinformaticist and Scientific Advisor at Q2 Solutions | EA Genomics. He conducts
collaborative scientific research with clients in multiple areas, specially in oncology and immuno‐oncology. His
background includes leading the analysis, development and validation of the bioinformatic and computational
systems that process complex genomic assays, including next generation sequencing assays, evaluating new and
emerging genomic technologies, and developing bioinformatic implementation strategies. He consults with clients
and provides thought leadership in industry and public consortiums involved in genomic science and measurement.
Dr. Jones has over 15 years of experience in advanced genomic technologies and 20 years of experience in scientific
and technology leadership positions, including serving as Vice President of Statistics and Bioinformatics at
Expression Analysis, Inc (EA) and Chief Science Officer at Reliametrics, a Nortel Networks business unit. He has
authored over 30 peer‐reviewed publications and has presented at numerous scientific meetings and industry
conferences and consortium workshops.
Matthias Fischer
Professor and Senior Physician
Experimental Pediatric Oncology
University Children's Hospital, Colonge University
Cologne, Germany
Wendell Jones
Principal Bioinformaticist and Scientific Advisor,
Q2 Solutions | EA Genomics,
Morrisville, North Carolina, USA
13
Co‐Chairs for Session 2 (February 24th):
Prof. Assunta‐Sansone’s activities are in the areas of knowledge and information management, and interoperability of applications, impacting on the reproducibility of research outputs and the evolution of scholarly publishing. Prof. Sansone seats on the board of several non‐for‐profit efforts, and she is a consultant for Springer Nature and Honorary Academic Editor of the Scientific Data journal. She leads the Centre in several UK, European, NIH and pharma‐ funded projects in the life and biomedical sciences, and is a founding member of the ELIXIR UK Node, where she is responsible for standards and curation areas. Working with and for data producers and consumers, service providers, pre‐competitive informatics initiatives, journals and funding agencies, she strives to make digital research objects Findable, Accessible, Interoperable and Reusable = FAIR. She holds a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London; after few years working on vaccine genetics in an Imperial's spin off she moved to the European Bioinformatics Institute (EBI, Cambridge) where she worked for nine years as a Project and Team Coordinator and Principal Investigator before moving to Oxford in 2010.
Dr. Mason is an associate professor of Computational Genomics at Weill Cornell Medical College. He completed his
B.S In Genetics and Biochemistry from University of Wisconsin‐Madison and Ph.D. in Genome Evolution and
postdoctoral in Neuroscience from Yale University. His laboratory work utilizes computational and experimental
methodologies to identify and characterize the essential genetic elements that guide the function of the human
genome. He perform research in three principal areas: (1) the functional annotation of the human genome by
mutational profiling in families with brain malformations and cancer patients, (2) the examination of the elements
that orchestrate the development of the human brain and their evolutionary changes, and (3) the development of
models for systems and synthetic biology. Mason Lab uses high‐throughput methods to generate cell‐specific
molecular maps of genetic, epigenetic, and transcriptional activity and we use them to create multi‐dimensional
molecular portraits of development and disease. He also develops algorithms to detect, catalog and functionally
annotate variants in the genetic pathways that control developmental processes. He has more than 130
publications.
Susanna‐Assunta Sansone
Associate Professor and Associate Director
Oxford e‐Research Centre,
Engineering Science Department,
University of Oxford, UK
Christopher Mason
Associate Professor
Department of Physiology and Biophysics
Weill Cornell Medicine, New York, USA
14
Co‐Chairs for Session 3 (February 25th):
Dr. Wolfinger leads a team in research and development of JMP‐based software solutions in the areas of genomics
and clinical research. He joined SAS in 1989 after earning a PhD in Statistics from North Carolina State University
(NCSU). For ten years he devoted his efforts to developing statistical procedures in the areas of linear and nonlinear
mixed models, multiple testing, and density estimation. In 2000 he started the Scientific Discovery department at
SAS. Wolfinger is co‐author of more than 100 publications and a fellow of both the American Association for the
Advancement of Science and the American Statistical Association. He also is an adjunct faculty member at NCSU
and the University of North Carolina at Chapel Hill and a Kaggle Grandmaster.
Dr. Cesare Furlanello is head of Data Science at the Kessler Foundation (Trento, Italy), where he is Senior
researcher. He also leads the MPBA Lab (https://mpbalab.fbk.eu), previously the ITC‐IRST Neural Networks for
Complex Data Analysis Project, since 1995. After graduating with honors in Mathematics at the University of Padua
in 1986, he joined IRST, the first Artificial Intelligence research centre in Italy. He is a data scientist and an expert in
machine learning applied to complex data, with a focus on predictive models for human and environmental health
and scientific reproducibility. He has been PI of more than 60 projects funded by competitive grants or industry,
notably in the first national project on industrial and health applications of neural networks in 1993, 4 projects of
the European Institute of Technology, and many other EU research grants. He has published in machine learning
and bioinformatics on Nature, Nature Biotech, Nature Genetics, Bioinformatics, IEEE J. Sign Proc., IEEE Trans Nano
Biosc, Brief. Bioinformatics and others. CF was Scientific secretary of the GNCB‐CNR school on Neural Networks for
Signal Processing (Trento 1989) and organizer of other workshops on applications of Machine Learning and Neural
Networks. I Local Conference Chair of the MGED11 Workshop of the MGED Society for international standards in
bioinformatics (and in its Advisory Board since 2007). He is adjunct research faculty of The Wistar Institute cancer
research centre in Philadelphia. He is in the PhD Board of the Centre for Integrative Biology of the University of
Trento, and a founder of the Laboratory of Biomolecular Sequence and Structure Analysis for Health (FBK, Univ. of
Trento, CNR). In Dec 2017 he has attained the national habilitation as full professor in bioengineering. Since 2001,
he is Scientific Director of WebValley, the first summer school in Data Science for interdisciplinary research
dedicated to talented high school students. His research currently aims at developing reproducible Deep Learning
methods for Precision Medicine, with a focus on the integration of multi‐modal omics and imaging data. He is
President Elect of the MAQC international society.
Russell Wolfinger
Director of Scientific Discovery and Genomics
SAS Institute Inc.
Cary, NC, USA
Cesare Furlanello
FBK ‐ Fondazione Bruno Kessler
MPBA: Predictive Models for Biomedicine and Environment
Senior Researcher, Head of Research Unit
Povo (Trento), Italy
15
Co‐Chairs for Session 4 (February 25th):
Dr. Haibe‐Kains earned his Ph.D in Bioinformatics at the Université Libre de Bruxelles (Belgium), for which he was
awarded the Solvay Award (Belgium). Supported by the Fulbright Award, Dr. Haibe‐Kains did his postdoctoral
fellowship at the Dana‐farber Cancer Institute and Harvard School of Public Health (USA). He started his laboratory
at the Institut de Recherches Cliniques de Montréal (Canada) and moved to PM in November 2013. His research
focuses on the integration of high‐throughput data from various sources to simultaneously analyze multiple facets
of carcinogenesis. His team is analyzing high‐throughput (pharmaco)genomic datasets to develop new prognostic
and predictive models and to discover new therapeutic regimens in order to significantly improve disease
management. Dr. Haibe‐Kains’ main scientific contributions include several prognostic gene signatures in breast
cancer, subtype classification models for ovarian and breast cancers, as well as genomic predictors of drug response
in cancer cell lines.
Dr. Kusko is a computational biologist by training with expertise in translating NGS and other genomic data to
actionable discoveries. After completing her undergraduate degree in Biological Engineering at Massachusetts
Institute of Technology (MIT), she went on to complete her Ph.D. in Computational Biomedicine at the Boston
University School of Medicine. Her doctoral thesis focused on the transcriptome in Chronic Obstructive Pulmonary
Disease, or COPD, and lung cancer in never‐smokers. She has integrated directly with clinicians on study design,
with lab scientists to plan experiments, with senior leadership for strategic planning, and with fellow computational
scientists to collaborate. Her published areas of experience include: drug mechanism of action (MOA), drug
repositioning, target identification, drug combinations, and big data reproducibility.
Benjamin Haibe‐Kains Scientist, Princess Margaret Cancer Center, University Health Network Assistant Professor, Department of Medical Biophysics, University of Toronto Adjunct Professor, Department of Computer Science, University of Toronto OICR Associate, Ontario Institute of Cancer Research Toronto, Canada
Rebecca Kusko
Vice President of Genomics
Immuneering Corporation,
Cambridge, MA, USA
17
Dr. Jürgen Borlak was born in Neu‐Ulm, Germany in 1958. After studies at Universities in Germany and abroad he
obtained his Doctorate in Pharmacology and Toxicology at the University of Reading, GB. Following residencies in
the UK and France (Strasbourg) he was habilitated in pharmacology and toxicology and received the venia legend
(“Privatdozent”) at Hannover medical Scholl in the year 2000. Two years later he was appointed as full professor of
Pharmacology and Toxicology at Hannover Medical School. From 2002 onwards he has been the Director of the
Institute of Pharmaco‐ and Toxicogenomics at Hannover Medical School. This new field of genomic science
applies a wide range of methods in genetics, molecular biology, molecular toxicology and functional genomics for
a better understanding of disease causing mechanisms and drug induced toxicities. An array of enabling
technologies are applied for an identification of “drugable” targets and for a better understanding of inter‐
individual differences in drug response, therefore allowing individualized drug treatment regimens and disease
prevention strategies. Jürgen Borlak is also an appointed Professor of Molecular Anatomy at the Medical Faculty
of the University Leipzig; a Professor of Experimental Medicine at Uppsala University, Sweden and is
Distinguished Visiting Professor at the University of Trento, Italy. Jürgen Borlak is author of > 270 original
publications and 25 book chapters and editor of the Handbook of Toxicogenomics. He is reviewer and member of
the editorial board for various scientific journals. Amongst others he is an appointed expert of the World
Health Organisation (WHO), of the US governmental agency FDA, the European Medicines Agency EMA and is
also an international reviewer for many European, US and Asian Research Organisations.
The c‐Myc transcription factor is frequently deregulated in cancers. To search for disease diagnostic and druggable
targets a transgenic lung cancer disease model was investigated. Oncogenomics identified c‐Myc target genes in
lung tumors. These were validated by RT‐PCR, Western Blotting, EMSA assays and ChIP‐seq data retrieved from
public sources. Gene reporter and ChIP assays verified functional importance of c‐Myc binding sites. The clinical
significance was established by RT‐qPCR in tumor and matched healthy control tissues, by RNA‐seq data retrieved
from the TCGA Consortium and by immunohistochemistry recovered from the Human Protein Atlas repository. In
transgenic lung tumors 25 novel candidate genes were identified. These code for growth factors, Wnt/β‐catenin
and inhibitors of death receptor signaling, adhesion and cytoskeleton dynamics, invasion and angiogenesis. For 10
proteins over‐expression was confirmed by IHC thus demonstrating their druggability. Moreover, c‐Myc over‐
expression caused complete gene silencing of 12 candidate genes, including Bmp6, Fbln1 and Ptprb to influence
lung morphogenesis, invasiveness and cell signaling events. Conversely, among the 75 repressed genes TNFα and
TGF‐β pathways as well as negative regulators of IGF1 and MAPK signaling were affected. Additionally, anti‐
angiogenic, anti‐ invasive, adhesion and extracellular matrix remodeling and growth suppressive functions were
repressed. For 15 candidate genes c‐Myc‐dependent DNA binding and transcriptional responses in human lung
cancer samples were confirmed. Finally, Kaplan‐Meier survival statistics revealed clinical significance for 59 out of
100 candidate genes, thus confirming their prognostic value.In conclusion, previously unknown c‐Myc target genes
in lung cancer were identified to enable the development of mechanism‐based therapies. (*The paper was
published in Oncotarget. 2017; 8:101808‐101831 and the abstract is a transcript of this paper.)
Jürgen Borlak, Ph.D
Univ.‐Professor, Hannover Medical School
Centre for Pharmacology and Toxicology
Hannover, Germany
Oncogenomics of c‐Myc transgenic mice reveal novel regulators of extracellular signaling,
angiogenesis and invasion with clinical significance for human lung adenocarcinoma
18
Dr. Florian Caiment is Assistant professor with a main expertise is in the recently emerging next‐generation
sequencing (NGS) technology, which allows sequencing complete genome or transcriptome of any biological
material for unlimited applications. He was involved in this technology from the very beginning, initially in the lab
during his phD then moving to the bioinformatics analysis during his Post‐Doc. This unique double expertise allows
him to design innovative and coherent experiment both from the biological and the analytical point of view. Florian
joined the department of Toxicogenomics in Maastricht as a postdoctoral fellow in April 2011, as a full time
bioinformatician on the ASAT knowledge base project (assuring safety without animal testing). He followed up with
the DiXa European project (Data Infrastructure for alternatives to animal‐based Chemical SAfety testing). Florian is
now supervising the RNA‐Seq activities of the EU FP7 HeCaToS project (14 partners) as well as in the Horizon 2020
Eu‐ToxRisk (39 partners).
Despite the expanding number of research scientific publications using omics in the field of toxicology ‐ with the
exception of few cases in the domain of drug development ‐ no omics data has been used till date to support a
chemical regulatory application, for instance under REACH. Regulatory agencies mainly report two major issues
concerning the use of omics technologies: 1/ The high technical variance for each given technological platform,
which make the data sometimes difficult to correlate within and between different platforms; 2/ The impact that
the choice of bioinformatics analysis pipeline has on the results, reflected in pipeline‐dependent differences in the
lists of biological systems significantly affected by the compounds of interest, making the “truth” of toxicity difficult
to assess or believe from omics data.
While several scientific consortium had been carried out to tackle these two main issues, notably with respect to
microarray quality control (MAQC‐I and II) followed by sequencing quality control (SEQC), both leading to major
publications in high impact factor journals, no consensus on an omics analysis framework (ODAF) for regulatory
application has been achieved yet . To date, there are no OECD guidance documents available for the generation
and analysis of omics data. Here, one of the major roadblocks is the lack of a standardized procedure for the
analysis of the data. This results in different conclusions possibly being derived from one and the same set of data
depending on the transformations and statistical procedures used. This creates an issue for regulators who are not
able to assess whether the results generated from such data support the conclusions being drawn and do not have
the means to verify the conclusions.
In this particular context, this new project aims to regroup toxicogenomics experts to test and further develop a
regulatory ODAF (R‐ODAF) proposal for the toxicogenomics community with the ambition to enable the regulatory
bodies to consider omics as a relevant data type to support compound submissions. For this, we will focus our
project on transcriptomics data, and will start by identifying, collect and review the analysis methods for all relevant
toxicogenomics dataset on the three major transcriptomics platforms: microarrays, RNA‐Seq and TempO‐Seq
technology (from BioSpyder). Ultimately, this project will propose a common foundation method to the regulatory
agency with clear guidelines on how to recognize and discard bad quality samples (such as outliers), how to define
thresholds and parameters for identifying differential expression (pvalue, multiples testing correction methods, fold
change…) for each platform.
Towards the Development of an Omics Data Analysis Framework for Regulatory Application
Florian Caiment, Ph.D
Maastricht University
School of Oncology & Developmental Biology
Maastricht, The Netherlands
19
Early staged lung cancer, challenges and opportunities
Haiquan Chen, M.D.
Director of Lung Cancer Center
Fudan University Shanghai Cancer Center
Shanghai, China
20
Dr. Tao Chen received his Ph.D. degree in Toxicology from the University of Arkansas for Medical Sciences in 1997
and received his diplomat of the American Board of Toxicology in 1999. He was a postdoctoral scientist in Duke
University during 1998‐2000. He joined the Division of Genetic and Molecular Toxicology, National Center for
Toxicological Research, U.S. Food and Drug Administration in 2000 as a research toxicologist. He is also an adjunct
professor in several universities. Dr. Chen has served as an editor or an editor board member for more than six
scientific journals. He has been a consultant in the World Health Organization (WHO) and the Organization for
Economic Co‐operation and Development (OECD) for development of regulatory documents and a grant reviewer
for the U.S. National Science Foundation and European Research Council. Dr. Chen has served as an organization
committee member or a chair for a meeting or a meeting session many times. He has also been invited to present
several keynote speeches and planetary lectures in national and international scientific meeting. He has published
more than 130 articles in peer‐reviewed scientific journals and books. Dr. Chen’s current approaches addresses on
evaluation of mutagenicity and carcinogenicity of FDA regulated agents using next generation sequencing.
Mutations are heritable changes in the nucleotide sequence of DNA that can lead to many adverse effects, such as
cancers. Genotoxicity assays have been used to identify chemical mutagenicity and carcinogenicity. Current FDA‐
recommended mutation assays, such as the Ames test and mouse lymphoma assay, predict mutagenicity of test
agents in the genes that allow mutant cells to be positively selected when mutations occur in the genes. These
assays only detect mutations related to the genes, but not the whole genome. The mutations induced by the test
agents may bias to certain types of mutations due to the target genes’ natures. Although the assays have been
used for many years, a new mutation assay that can directly measure all types of mutations in genome has been
expected for a long time. Recently developed next‐generation sequencing (NGS) technology allows us to detect
genome mutations in the cells directly. In our laboratory, we have used whole genome sequencing method to
screen mutagens using Salmonella typhimurium TA100 cells, a bacteria system, to detect germline mutations in
Caenorhabditis elegans, a worm system, and to evaluate mutational spectra in mouse lymphoma cells, a
mammalian system. The results show that NGS technology can sensitively detect mutation induction caused by
genetic carcinogens and effectively evaluate the different types of mutations including base pair substitutions,
insertions and deletions (indels), loss of heterozygosity, and chromosome number changes, suggesting that the
unparalleled advantages of NGS for evaluating mutagenicity of chemicals can be applied for the next generation of
mutagenicity tests.
Tao Chen, Ph.D
National Center for Toxicological Research
U.S. Food and Drug Administration
Jefferson, Arkansas, USA
Genome‐wide characterizations of mutations induced by genetic carcinogens using next‐generation
sequencing
21
Dr. Chuang is Senior Manager of Bioinformatics and Genomic Applications Partnerships in Illumina, Inc. She has
been working on various genomic applications with NGS technology, from whole genome sequencing to targeted
sequencing, from short reads to long reads, from multiplex PCR to hybrid capture, from service models to
distributed kits, etc. Previously, she served as the informatics core team lead for developing Illumina’s clinical‐grade
comprehensive gene panel products in cancer diagnostics and therapy selection. Beyond leading a group of
Bioinformatics Scientists in developing NGS informatics solutions for oncology applications, Dr. Chuang also
provides technical consultant for other functional teams in assay development, software development, business
development, marketing, manufacturing, and regulatory in clinical product development. Another big part of her
role is the interaction with key opinion leaders in the oncology field, including pharma partners and translational
researchers, to bridge gaps in customer demand and product design. Han‐Yu is always thinking ahead for next
products to facilitate the realization of precision medicine, so technology scouting is also a huge piece in her pursuit.
Currently, she serves as a broader role to facilitate Illumina’s partnership with external innovation in genomic
applications.Prior to joining Illumina, Han‐Yu got her PhD degree in Bioinformatics and Systems Biology from
University of California, San Diego and her bachelor’s and master’s degree in Computer Science and Information
Technology from National Taiwan University. In her past life in academic, she has published more than 20 peer
reviewed journal papers in the field of cancer biomarker selection and functional genomics, with in total near to
3,400 citations by research journal papers. Her work has pioneered the use of network based approaches for cancer
patient classification and risk stratification towards precision medicine.
More and more studies have indicated the utility of next‐generation sequencing (NGS) in precision medicine, from
rare genetic disease, reproductive health, to oncology, thanks to increasing data output and decreasing costs of the
technology. The vast amount of data output has brought in the promise for comprehensive diagnostic approaches
but might also raise the challenge in robust clinical use. Starting from product design to validation, sample
accessibility, workflow complexity, and data analysis optimization are integral parts of a robust NGS solution for
clinical use. In this talk, I would like to discuss key considerations on the above vital components in developing NGS
diagnostics for companion therapeutics, and use Illumina’s most recent tumor profiling tool TruSight Tumor 170 as
an example for illustration.
Han‐Yu Chuang, Ph.D
Senior Manager, Bioinformatics and Genomic Applications Partnerships
Illumina Inc
San Diego, USA
Towards robust clinical use of NGS in precision medicine
22
Dr. Melissa Davis is a computational cancer biologist, and Laboratory Head at the Walter and Eliza Hall Institute of
Medical Research (https://www.wehi.edu.au/people/melissa‐davis/). She is an expert in the analysis and
reconstruction of molecular mechanisms of cancer progression and response to therapy. Her background is
genetics and computational cell biology, and she currently holds a National Breast Cancer Foundation Fellowship to
study molecular mechanisms of breast cancer metastasis. Dr. Davis and her team work on the analysis of large,
heterogeneous datasets from national and international cancer projects and seek to discover patterns in the data
that will help to target patient treatment and personalise cancer therapy.
More effective targeting of cancer therapies has resulted in dramatically improved outcomes for patients with
breast cancer, however breast cancer is a heterogeneous disease with many molecular subtypes, and survival gains
have not been uniform. Even for subtypes with relatively good treatment options and outcomes, patients do not
show a uniform response to therapy; some patients will not respond to the standard treatment regimens indicated
by their clinical presentation, and others will experience disease recurrence after initially favourable responses.
Considerable work has been undertaken in recent years to develop computational models that will predict the
response of a patient to therapy to improve the precision with which patients can be treated. As the collection of
molecular data on individual patients becomes increasingly feasible in the clinical setting, these in silico methods
have the potential to improve the precision of treatment and outcomes for patients. Gene expression data has
repeatedly been shown to be the most effective kind of molecular measurement for training predictive models,
exceeding the performance of genetic and proteomic data in predicting response to therapy. As such, methods that
use gene expression data are likely to provide the most powerful predictions of patient response.
We have generated a series of predictive models that can be used to estimate the drug sensitivity of breast cancer
samples to determine which patients are most likely to respond well to a given therapy. We have also segregated
patients based on molecular phenotypes, such as signalling status and epithelial‐mesenchymal plasticity, in addition
to traditional molecular subtypes subtypes (such as basal, luminal and normal‐like) to identify therapies that show
enhanced efficacy against specific subtypes.
Prediction of drug efficacy in breast cancer subtypes
Melissa Davis, Ph.D
The Walter and Eliza Hall Institute of Medical Research, Australia
23
Dr. Matthias Fischer is a physician‐scientist heading the Department of Experimental Pediatric Oncology at the University Children’s Hospital of Cologne, Germany. Dr. Fischer is serving as Senior Physician at the University Children’s Hospital of Cologne since 2009, and was appointed as full Professor for Pediatrics in 2016. His laboratory is focused on elucidating the genetic etiology and molecular pathogenesis of neuroblastoma, a pediatric tumor of the sympathetic nervous system. In particular, Dr. Fischer and his team is applying high‐throughput technologies, such as massively parallel sequencing and microarray analysis, to discover relevant alterations of neuroblastoma development, to establish prognostic and predictive biomarkers, and to identify therapeutic targets. All of this work is geared to translate novel findings from basic research into clinical practice, in order to improve clinical management of neuroblastoma patients. Dr. Fischer has authored more than 100 peer‐reviewed publications, and served as advisory board member in several national and international committees, covering both basic and clinical research projects.
Neuroblastoma is a pediatric tumor of the sympathetic nervous system. The clinical course of the disease varies
dramatically, ranging from spontaneous regression to fatal progression despite intensive cytotoxic treatment. The
pathogenesis underlying the distinct phenotypes, however, has been poorly understood to date. To determine the
molecular mechanisms of favorable and unfavorable courses, we performed massively parallel sequencing of 416
untreated neuroblastomas. We detected genomic alterations of 17 genes related to the RAS and p53 pathways in
73/416 patients. The presence of these mutations was strongly associated with dismal outcome in the entire cohort,
as well as in the clinical high‐risk and non‐high‐risk subgroups. We noticed, however, that the prognostic effect of
RAS/p53 pathway mutations was strictly dependent on the occurrence of telomere maintenance mechanisms.
Survival of patients whose tumors were telomere maintenance‐positive was dramatically inferior when additional
RAS/p53 pathway mutations were present as compared to those without such alterations. By contrast, all patients
whose tumors lacked telomere maintenance mechanisms have survived to date, and spontaneous regression or
differentiation into ganglioneuroblastoma occurred both in the presence and absence of RAS/p53 pathway
mutations. Our data suggest a precise definition of clinical neuroblastoma phenotypes: High‐risk tumors are
characterized by telomere maintenance activation, and additional mutations in RAS or p53 pathway genes
delineate a patient subgroup with devastating outcome. By contrast, patient outcome is excellent in the absence of
telomere maintenance, and mutations in RAS/p53 pathway genes fail to establish fully malignant tumors in this
subgroup. Together, our results emphasize the importance of activating telomere maintenance mechanisms in the
development of human malignancies, and provide a starting point for refined risk assessment and new therapies in
neuroblastoma.
The genetics of spontaneous regression and fatal progression in neuroblastoma
Matthias Fischer, Ph.D
Professor and Senior Physician
Experimental Pediatric Oncology
University Children's Hospital, Colonge University
Cologne, Germany
24
Dr. Freedman is the founding President of the Global Biological Standards Institute (GBSI). He has held leadership
positions in basic biomedical research, drug discovery, and science policy in both the private and non‐profit sectors,
as well as in academia.
Prior to starting GBSI, Dr. Freedman served as Vice Dean for Research and Professor of Biochemistry & Molecular
Biology at Jefferson Medical College, Thomas Jefferson University. Dr. Freedman also led discovery research efforts
in the pharmaceutical industry as a Vice President at Wyeth Pharmaceuticals and Executive Director at Merck
Research Laboratories. Before moving to industry, Dr. Freedman was a Member and Professor of Cell Biology &
Genetics at Memorial Sloan‐Kettering Cancer Center and Weil Cornell Medical College. There, Dr. Freedman and his
lab made several highly impactful discoveries in the area of nuclear hormone receptor structure and function.
Dr. Freedman has received numerous competitively funded grants, and has been the recipient of several research
honors, including the Boyer Award for Biomedical Research, and a MERIT award from the National Institutes of
Health. He was also the 2002 recipient of the Ernst Oppenheimer Award from The Endocrine Society. Dr.
Freedman has published extensively and served on numerous scientific review panels and editorial boards. He was
an editor of Molecular and Cellular Biology for ten years. In addition, Dr. Freedman has served on the Board of
Directors of the American Type Culture Collection (ATCC).
Dr. Freedman earned a B.A. degree in Biology from Kalamazoo College and a Ph.D. in Molecular Genetics from the
University of Rochester. He completed his post‐doctoral fellowship in the laboratory of Dr. Keith Yamamoto at the
University of California, San Francisco.
Irreproducible basic biological research is a tremendously expensive and global problem. The inability to reproduce
experimental data in preclinical studies has resulted in the invalidation of research breakthroughs, retraction of
published papers, abrupt discontinuation of clinical studies, and reduced trust in the research and development
enterprise. More importantly, valuable time and critical resources are wasted by irreproducibility as opportunities
to enhance human health are delayed or simply lost. Although the causes of irreproducible preclinical research are
complex, they can be traced to cumulative errors/flaws in one or more of the following areas: 1) study design, (2)
biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting. This
presentation will use examples of how biological reagents, specifically cell lines and antibodies, impact
irreproducibility in preclinical research and how the implementation of consensus‐based standards to authenticate
these critical and widely used reagents will lead to both increased rates of reproducibility and dramatic returns on
research funding investments.
Closing the Reproducibility Gap with Standards and Best Practices
Leonard P. Freedman, Ph.D
President
Global Biological Standards Institute, USA
25
Dr. Aleksandra Gruca obtained her PhD degree in technical sciences, specialty bioinformatics and works as an
assistant professor at the Institute of Informatics at the Silesian University of Technology (Gliwice, Poland). In her
PhD entitled Characterisation of gene groups using decision rules she developed a data mining system for
automated functional interpretation of the results of high‐throughput biological experiments. Currently her
research is focused on application of data mining methods for multi‐omics data integration, analysis and
interpretation. She is involved in cooperation with several polish clinical centres interested in development of such
methods to analysis of heterogonous cancer data in order to improve diagnostic, classification and treatment
personalisation.
As a co‐leader of a Community/platform‐building Working Group within COST Action CA15110 ‐ Harmonising
standardisation strategies to increase efficiency and competitiveness of European life‐science research (CHARME)
she is also interested in development and implementation of data reproducibility and standardisation practices in
life‐sciences.
Since 2010 she is a member of the Board of the Polish Bioinformatics Society, a scientific society with a mission of
support and popularisation of a bioinformatics in Poland. She is author or co‐author of almost 50 peer‐reviewed
publications in scientific journals and conference proceedings.
Acute lymphoblastic leukemia (ALL) is the most frequently occurring childhood cancer, comprising approximately
30% of all pediatric malignancies. Each year, diagnosis of ALL is established in approximately 200 children in Poland.
They are all treated uniformly according to European standards in the centers of the Polish Pediatric
Leukemia/Lymphoma Study Group (PPLLSG). The results of the last treatment protocol ALL‐IC BFM 2009 were
very good with overall survival >90%. Nevertheless, >10% of all patients suffered from ALL relapse and required
additional intensive treatment including hematopoietic stem cell transplantation (HSCT).
Here we present the information system and data workflow that is developed within the PersonALL project ‐ a
collaborative project among the PPLLSG centers that focuses on research on molecular mechanisms of ALL, aiming
into improved diagnostics, classification and, finally, treatment personalization.
The system is dedicated to store and analyze heterogeneous data collected from different clinical centers, integrate
results from molecular biology analysis and clinical information to assess the prognostic and therapeutic relevance
of different diagnostic parameters. Collected data will cover aspects such as genomics, transcriptomics,
PersonALL – towards treatment personalization in molecular diagnostics of acute lymphoblastic leukemia for Polish children
Aleksandra Gruca1, Roman Jaksik2, and Marek Sikora1,3
1Institute of Informatics, Silesian University of Technology, Gliwice, Poland 2 Institute of Automated Control, Silesian University of Technology, Gliwice, Poland
3 Institute of Innovative Technologies EMAG, Katowice, Poland
Aleksandra Gruca, Ph.D
Institute of Informatics, Silesian University of Technology
Gliwice, Poland
26
cytogenetics, fluorescent in situ hybridization (FISH) and immunophenotyping as well as selected clinical
information and applied therapy. The main challenge is to link all this heterogeneous information into a
comprehensive expert system, which enable rapid recognition of the features associated with treatment outcome.
The system will provide uniformed access to the data, allowing the biologists and clinicians to analyze it from
different aspects and summarize into useful information. The users will be provided with “analysis assistant” ‐ a set
of advanced analytical tools in a form of a simple GUI interface allowing statistical and data mining analysis.
Proposed workflow for data integration and analyses will allow grouping the patients according to certain criteria in
order to discover discriminant features for the groups (including the relevance of the features) related to the
treatment outcome.
The overall result of the PersonALL project will be development of innovative diagnostics of childhood ALL that
should enable more targeted therapies and lead towards improved treatment outcome.
27
Dr. Hatzis is an Associate Professor of Medicine at the Yale University School of Medicine. He has 20 years of
experience in senior research and management roles in biocomputational techniques, systems biology modeling,
genomic analysis and clinical diagnostics. He received his Ph.D. from the University of Minnesota and held several
senior research roles in the biotechnology industry. He has been the cofounder of two startup companies
specializing in bioinformatics tools development and in clinical diagnostics. Dr. Hatzis had been an active member of
the Biostatistics committee of FDA's Microarray QC program, co‐investigator on the NCI Cancer Biospecimen
Integrity program and an investigator of the Breast Cancer Research Foundation. Among his most significant
contributions are the co‐development with colleagues from MD Anderson of the RCB index, a continuous index of
residual disease in breast cancer, and the development of a gene‐expression based prognostic signature for
patients treated with standard chemotherapy that accounts for phenotypic differences and integrates endocrine
sensitivity, and chemotherapy response and resistance endpoints. Dr. Hatzis continues to be involved in the design
of biomarker validation clinical studies and development of strategies for translating genomic diagnostic assays to
clinical practice. His current research interests focus on developing methods to characterize the genetic and
molecular heterogeneity of breast cancer subtypes and the implications it might have on response and resistance to
treatment. A key area of interest is to develop methodology that integrates genomic level information of individual
patients to lead to more focused treatment decisions tailored for the individual tumor. Dr. Hatzis is serving as
academic editor on biomarker journals, has been a reviewer on NCI and NSF panes and is serving as ad‐hoc
reviewer on several bioinformatics and clinical journals.
Multi‐region sequencing is used to detect intratumor genetic heterogeneity (ITGH) in tumors. To assess whether
true ITGH can be distinguished from sequencing artifacts, we used whole‐exome sequencing (WES) of three
anatomically distinct regions of the same tumor, and also the same DNA twice to estimate technical noise. Somatic
variants were detected with three different WES pipelines (tumor only, cohort normal, matched normal) and
subsequently validated by high‐depth amplicon sequencing. The cancer‐only pipeline was unreliable, with about
69% of the identified somatic variants being false positive. Even with matched normal DNA where 82% of the
somatic variants were detected reliably, only 36%‐78% were found consistently in technical replicate pairs. Overall
34%‐80% of the discordant somatic variants, which could be interpreted as ITGH, were found to constitute technical
noise. Excluding mutations affecting low mappability regions or occurring in certain mutational contexts was found
to reduce artifacts, yet detection of subclonal mutations by WES in the absence of orthogonal validation remains
unreliable.
Reliability of Whole‐Exome Sequencing for Assessing Intratumor Genetic Heterogeneity in Breast
Cancer
Christos Hatzis, Ph.D
Associate Professor of Medicine
Director Bioinformatics
Breast Medical Oncology, Yale Cancer Center
Yale School of Medicine, USA
28
Dr. Jones is currently Principal Bioinformaticist and Scientific Advisor at Q2 Solutions | EA Genomics. He conducts
collaborative scientific research with clients in multiple areas, specially in oncology and immuno‐oncology. His
background includes leading the analysis, development and validation of the bioinformatic and computational
systems that process complex genomic assays, including next generation sequencing assays, evaluating new and
emerging genomic technologies, and developing bioinformatic implementation strategies. He consults with clients
and provides thought leadership in industry and public consortiums involved in genomic science and measurement.
Dr. Jones has over 15 years of experience in advanced genomic technologies and 20 years of experience in scientific
and technology leadership positions, including serving as Vice President of Statistics and Bioinformatics at
Expression Analysis, Inc (EA) and Chief Science Officer at Reliametrics, a Nortel Networks business unit. He has
authored over 30 peer‐reviewed publications
The landscape of proper processes and procedures for constructing and validating clinical‐grade bioinformatics
systems is sometimes muddled in clinical research and practice due to potential regulatory confusion regarding FDA
vs. CLIA oversight, the interactions between wet‐lab and dry‐lab methods, and an industry where traditionally the
software (if any) assessing clinically actionable analytes was tightly integrated with the laboratory device. This talk
will provide a quick overview of the relevant regulatory landscape as well as the risk‐based approaches taken by EA
Genomics, a business unit of Q2 Solutions, to address them. We will also discuss lessons learned in handling and
integrating custom and open‐source software, building appropriate validation datasets and scenarios, and
regulatory compliance.
Clinical‐grade Bioinformatics Systems: Overview and Lessons Learned
Wendell Jones, Ph.D
Principal Bioinformaticist and Scientific Advisor,
Q2 Solutions | EA Genomics,
Morrisville, North Carolina, USA
29
Paweł Łabaj has studied Computer Science in Medicine at Silesian University of Technology (Gliwice, Poland). For
his MSc thesis he was working on the project in The Institute of Medical Technology and Equipment (Zabrze, Poland)
where he has developed system of automatic analysis and pattern recognition using neural networks applied in
Fetal Heart Rate monitoring devices. He then has joined the Vienna Science Chair of Bioinformatics at Boku
University Vienna, where he obtained PhD in Bioinformatics on Measurement and data analysis in the face of noise
and complex backgrounds – Advances from improved bioinformatics algorithms. He was active member of FDA
MAQC‐III/SEQC consortium where he studied performance of the platforms and pipelines for high throughput
expression profiling. This experience has led to winning the APART Fellowship of Austrian Academy of Sciences as
well as recent competition for bioinformatics group leader at Malopolska Centre of Biotechnology (Krakow, Poland).
Dr Łabaj’s research focuses on consequences of gene vs. alternative transcript expression profiling, as well as on the
approaches for assessing performance / benchmarking of the platforms and analysis pipelines.
The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest
developments in analysis tools for microarray and RNA‐seq expression profiling. Such objective benchmarks are
required for basic and applied research, and can be critical for clinical and regulatory outcomes. It is invaluable in
times when about 90% of questioned scientists has confirmed that there is ‘reproducibility crisis’ in science and it is
estimated that 85% of research resources are wasted. This rich and publicly available benchmark enables to identify
the underperforming computational tools which are the major offender in science’s reproducibility crisis.
In our recent research work we are going beyond the first comparisons presented in the original SEQC study. We
have demonstrated the benefits that can be gained by analysing results in the context of other experiments
employing a reference standard sample. This allowed the computational identification and removal of hidden
confounders, for instance, by factor analysis. In itself, this already substantially improved the empirical False
Discovery Rate (eFDR) without changing the overall landscape of sensitivity. Further filtering of false positives,
however, is required to obtain acceptable eFDR levels. Appropriate filters noticeably improved reproducibility of
differentially expressed calls both across sites and between alternative differential expression analysis pipelines.
Power and limitations of RNA‐Seq ‐ Putting reproducibility to the test
Paweł Łabaj, PhD Austrian Academy of Sciences APART Fellow, Vienna, Austria Chair of Bioinformatics Rsearch Group, Boku University Vienna, Austria Bioinformatics Group Leader, Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
30
Dr. Li is Associate director of National Center for Clinical Laboratories (NCCL) and director of the Department of
immunoassay and molecular diagnosis of NCCL. He got his Ph.D. from Peking Union Hospital College in July, 1993.
He is responsible for proficiency testing of immunoassay and nucleic acid testing of infectious diseases,
pharmacogenomics and tumor gene mutation detection for clinical laboratories in hospitals of China. His research
interesting is major in the methodology and standardization of molecular diagnosis and immunoassay. He had got
six grants from The National Natural Science Funds Fund, one grant from the National High Technology Research
and Development Program of China 863 program and AIDS and hepatitis, and other major infectious disease control
and prevention Program of China respectively as the project principal. He has published 108 papers in academic
journals (Ann Rheum Dis., Clin Chem., J Mol Diagn. , J Clin Endocrinol Metab., Int J Cancer, J Thorac Oncol. and so
on).
Precision oncology takes advantage of individual differences in a patient’s tumor biomarkers, which are associated with patient prognosis and tumor response to therapy, and applies this information to better inform medical care. Tumor biomarkers can be DNA, RNA, protein and metabolomic profiles (Panomic analyses) that predict therapy response. However, the most recent approach is the detection or sequencing of tumor DNA, which can reveal genomic alterations that have implications for cancer treatment.
Since 2012, the National Center for Clinical Laboratories has established more than 10 external quality assessment(EQA)/proficiency testing(PT) programs, which include gene mutations in EGFR, KRAS, BRAF, PIK3A, Her2
(FISH), EML4‐ALK(FISH and RT‐PCR), BCR‐ABL(qRT‐PCR), ctDNA (ARMS, ddPCR and NGS) and multiple gene
detection by NGS,based on reference materials or controls developed in our laboratory. In the beginning of each program, only nearly half of participants get satisfactory results. Most of the participants reported false positive and false negative results, especially in false positive. Now, improvement of quality has been achieved greatly because of implement of quality control measures in clinical laboratories and training program of personnel. The EQA/PT program for bioinformatics pipeline (dry bench process) of NGS (whole whole genome sequencing, whole
exome sequencing and targeted sequencing)is being prepared based on the reference materials developed in our
laboratory.
Quality control and standardization of precision oncology related gene mutation detection in China
Jinming Li, Ph.D
Associated director of National Center for Clinical Laboratories
Beijing Hospital of the Ministry of Health,
Beijing, China
31
Dr. Nakae is Director‐General of an Japanese industrial consortium in the field of biotechnology, called JMAC (Japan
Multiplex bio‐Analysis Consortium). JMAC is a unique industrial consortium consisting wide variety of companies
including microarray manufactures, material providers, plastic‐processing technology providers, trading companies
and consultants, pursuing the common target, namely industrialization of biotechnology. The major activities of
JMAC are to support large research projects from the standpoint of quality control and to develop international
standard by taking in the outcome of the project works. The representative project is “Project focused on
developing key technology for discovering and manufacturing drugs for next‐generation treatment and diagnosis”
supported by AMAD in JAPAN. He is leading the development of the quality control system including miRNA
standard materials collaborating with AIST (National Institute of Advanced Industrial Science and Technology) in this
project. For the development of international standards, he is leading and supporting the development of over 10
ISO standards in broad areas of industries. He is an expert member of TC 212 (Clinical laboratory testing and in vitro
diagnostic test systems), TC 34/SC 16 (Horizontal methods for molecular biomarker analysis), TC 276
(Biotechnology), TC 229 (Nanotechnologies) and a formal liaison observer among these committees and to another
TC and SC such as TC 34/SC 9 (Microbiology), TC272 (Forensic sciences). Dr. Nakae is also an assessor of medical
laboratory accreditation program based on ISO 15189, belonging to Japan Accreditation Board (JAB).
The MAQC/SEQC project had been started in order to discuss the quality issues in submitted data for the
application of drug approval. In the project, multi‐platform measurement data were analyzed and discussed by
broad approaches including multi‐laboratory testing, software pipeline comparison and statistical analysis of the
outcomes of the genome‐wide analysis systems.
One of the goals of the project was to reach a consensus for the emerging technologies to ensure the quality of
data and to maintain the compatibility to understand the accuracy of the submitted data. For precision medicine,
companion diagnostics play an important role for selecting medicines for each person based on his/her genetic
background. In this sense, the accuracy of IVD is the key issue for the realization of precision medicine.
Quality control of IVD needs another aspect other than the issues discussed in the series of discussion in
MAQC/SEQC. The testing is performed not only in selected high‐level laboratories, e.g. laboratories of
pharmaceutical companies, but also in a large number of small clinical laboratories all over the world. The
standards play a significant role in controlling such general laboratory works for realizing the society of precision
medicine. For example, ISO 15189 clearly states “The laboratory shall validate examination procedures derived
from the following sources; a) non‐standard methods; …”, and “Validated examination procedures used without
modification shall be subject to independent verification by the laboratory…”. In addition, the procedure to record
“the metrological traceability of the calibration standard and the calibration of the item of equipment” shall be
International standardization activity on emerging technologies for medical and food industries – An
ISO perspective
Hiroki Nakae, Ph.D Director‐General Japan Multiplex bio‐Analysis Consortium (JMAC) Tokyo, Japan
32
documented. Thus, the reference material is the key tool not only for equipment calibration, but also for preparing
the quality control materials to be used for each measurement method. Standardization would be helpful for such
kind of quality control in clinical laboratories for sustaining compatibility of the results of molecular testing.
Presently, many standardizations of molecular testing are going forward in the ISO world.
ISO is an international organization for standardization. It develops and publishes International Standards in many
fields of technologies other than the electrical industry. In order to cover the wide variety of industries, specific TCs
(Technical Committees) are formed on the technology and industry‐field base. Member bodies (countries) assign
experts for each TC, who participate in discussions to develop International Standard documents are developed by
the assigned experts according to ISO directives.
Japan Multiplex bio‐Analysis Consortium (JMAC) was mainly established to actively engage in the development of
ISO standards by providing formal experts to the ISO/TCs and advising any other experts in the field of
biotechnology. JMAC has provided such activities to TC 212 (Clinical laboratory testing and in vitro diagnostic test
systems), TC 34/ SC 16 (Horizontal methods for molecular biomarker analysis), TC 276 (Biotechnology), and TC 229
(Nanotechnologies).
Emerging technologies related to MAQC / SEQC meetings are also discussed In the ISO world. Especially, the major
technology NGS is under discussion in TC 212, TC 34/SC 16, TC 34/SC 9 (Microbiology), and TC 276. Briefly, the
preparation for starting a formal development of guidance documents related to the introduction of emerging
technologies into clinical laboratories has been underway in TC 212. In TC 34/SC 16, a document for application of
NGS to identify animal species in food and feed has been prepared. In TC 34/SC 9, a standard for the whole genome
sequencing of foodborne pathogen genome mainly by NGS is being discussed. In TC 276, two documents for NGS
are currently under development. One document is ISO/AWI 20397‐2, entitled “Biotechnology ‐‐ General
requirements for massive parallel sequencing ‐‐ Part 2: Methods to evaluate the quality of sequencing data”, and
the other is a document for the pre‐analysis phase of NGS analysis. Details of these works will be introduced in my
talk.
The emerging technology including NGS or massive parallel sequencing is a powerful tool for many industrial fields
including medical and food industries. For industrial use of emerging technologies, quality is a very important issue
that should be discussed, not only for approval of IVDs, but also for the daily management of the test results,
namely “from approval to lab”. MAQC has been focusing on the development of regulation science regarding
quality control of data for application. Now it should expand its scope to the infrastructure for controlling the
quality of testing in clinical laboratories. MAQC/SEQC should carefully watch the ISO standardization activity and
collaborate with them for future works in order to achieve the common goal; assurance of test quality for emerging
technologies.
33
Professor Sir Munir Pirmohamed (MB ChB, PhD, FRCPE, FRCP, FBPhS, FMedSci) is currently David Weatherall Chair
in Medicine at the University of Liverpool, and a Consultant Physician at the Royal Liverpool University Hospital. He
is also the Associate Executive Pro Vice Chancellor for Clinical Research for the Faculty of Health and Life
Sciences. He also holds the only NHS Chair of Pharmacogenetics in the UK, and is Director of the M.R.C. Centre for
Drug Safety Sciences, Director of the Wolfson Centre for Personalised Medicine and Executive Director, Liverpool
Health Partners. He was awarded a Knights Bachelor in the Queen’s Birthday Honours list in 2015. He is also an
inaugural NIHR Senior Investigator, and Fellow of the Academy of Medical Sciences in the UK. He is also a
Commissioner on Human Medicines. His research focuses on personalised medicine in order to optimise drug
efficacy and minimise toxicity, move discoveries from the lab to the clinic, and from clinic to application. He has
authored over 420 peer‐reviewed publications, and has a H‐index of 85.
Pharmacogenomics is the study of how genetic variation affects drug responses. Precision Medicine is a wider term
also encompassing other technologies that personalise therapeutic and preventive approaches. However,
pharmacogenomics is an important component of precision medicine and needs to be considered alongside other
aspects to ensure patients get the right drugs at the right time and at the right dose.
Pharmacogenomic variation can affect drug pharmacokinetics or drug pharmacodynamics, and is important for
both drug efficacy and drug safety. Crucial to both is precision dosing. Individual dose requirements vary, but are
not currently accounted for in clinical practice. The one‐dose‐fits‐all paradigm leads to marked variability in
exposure, and therefore in drug responses. Development of novel dosing algorithms and their clinical
implementation is now being undertaken for certain drugs such as warfarin.
The greatest success in efficacy studies for pharmacogenomics has been in the development of targeted agents in
cancer and rare diseases. In the former, drugs targeting somatic mutations have had major impact in certain
malignancies, but the challenge for the future will be to ensure the use of combinations of drugs to ensure that any
response is durable. In rare diseases, drugs targeting novel mutations, for example ivacaftor for the G551D
mutation in cystic fibrosis, have had a transformational effect on patients’ lives.
There have been important advances for drug safety pharmacogenomics, notably in determining the role of HLA
gene polymorphisms in predisposing to serious immune mediated adverse drug reactions. To date almost 30 novel
HLA allele associations have been identified, and this number is increasing. Non‐HLA gene polymorphisms are also
being identified now, and this will be an area of growth in the future.
In the future, as more patients have their whole genome sequence, the challenge will be to ensure that both
common and rare genetic variation is taken into account in optimising drug responses. With all these
developments, the greatest challenges for all healthcare systems will be to deliver these advances in a cost‐
effective and sustainable manner that is acceptable to both patients and the public.
Pharmacogenomics and precision medicine ‐ current and future perspectives
Professor Sir Munir Pirmohamed, MB ChB, PhD, FRCPE, FRCP, FBPhS, FMedSci
David Weatherall Chair of Medicine and NHS Chair of Pharmacogenetics
University of Liverpool, UK
34
Dr. Mehdi Pirooznia is the Director of Bioinformatics and Computational Biology (BCB) Core Facility at the National Heart Lung and Blood Institute at the NIH (NHLBI/NIH). The BCB core operates with the goal to facilitate and accelerate biomedical research and discovery through the application of computational and statistical tools in the analysis and interpretation of high‐throughput and high‐dimensional biological data. Dr. Pirooznia supervises and spearheads this effort by providing bioinformatics analyses support for intramural scientists in life sciences, clinical and translational research. In particular, the BCB core specializes in analyses pertaining to next‐generation sequencing and biomedical informatics in genomics, transcriptomics, epigenomics and disease biomarkers. Towards this end, Dr. Pirooznia’s team takes an integrative approach to incorporate site‐specific sequence variations changes with gene expression and proteomics data to investigate molecular mechanisms underlying disease progression and treatment responses. Dr. Pirooznia has published many articles in peer‐reviewed journals and serves as an editor and reviewer for scientific journals. Dr. Pirooznia is also an Adjunct Assistant Professor at the Johns Hopkins University School of Medicine, where he served for 8 years as a faculty prior to joining the NIH in 2016, and provided leadership, scientific direction and was responsible for implementing the high performance computational laboratory and bioinformatics system.
The evolution of subclones during cancer progression due to accumulation of a number of somatic mutations
represents Intra‐tumor heterogeneity. Despite recent advances, determination of sub‐populations within a tumor
remains a challenge. Here, we address this problem through designing a computational workflow for identifying the
sub‐populations within a tumor. The workflow infers clonal populations and their frequencies from bulk tumor
samples. It profiles a reliable set of for somatic copy number alterations and point somatic mutations along with
allele‐specific coverage ratios between the tumor and matched normal sample, estimates cellular fractions of them,
identifies and evaluates the clustering of the mutations, infers clonal ordering, and visualizes and interprets the
results. The analysis workflow will be presented in detail as well as results from simulated datasets and NGS
sequencing data from a CLL cancer study, to demonstrate the efficiency of the analysis pipeline.
A General Framework for Analysis of Clonal Heterogeneity and Tumor Evolution
Mehdi Pirooznia, MD., PhD.
Director, Bioinformatics and Computational Biology Core Facility
National Heart Lung and Blood Institute of National Institutes of Health, USA
35
Dr Pusztai is Professor of Medicine at Yale University, Director of Breast Cancer Translational Research and Co‐
Director of the Yale Cancer Center Genomics Genetics and Epigenetics Program. He is also Chair of the Breast
Cancer Research Committee of the Southwest Oncology Group (SWOG). Dr. Pusztai received his medical degree
from the Semmelweis University of Medicine in Budapest, and his D.Phil. degree from the University of Oxford in
England. His research group has made important contributions to establish that estrogen receptor‐positive and‐
negative breast cancers have fundamentally different molecular, clinical and epidemiological characteristics. He has
been a pioneer in evaluating gene expression profiling as a diagnostic technology to predict chemotherapy and
endocrine therapy sensitivity and have shown that different biological processes are involved in determining the
prognosis and treatment response in different breast cancer subtypes. He made important contributions to clarify
the clinical value of preoperative (neoadjuvant) chemotherapy in different breast cancer subtypes. Dr Pusztai is also
principal investigator of several clinical trials investigating new drugs, including immunotherapies for breast cancer.
He has published over 250 scientific manuscripts in high impact medical journals including the NEJM, JAMA, Journal
of Clinical Oncology, Nature Biotechnology, PNAS, Lancet Oncology and JNCI. He is among the top 1% most highly
cited investigators in clinical medicine according to a 2015 Thomson Reuters report. He is member of the Scientific
Advisory Board of the Breast Cancer Research Foundation and a Susan Komen Scholar.
Tumor‐infiltrating lymphocyte (TIL) count and gene expression signatures that reflect the extent of immune
infiltration in the tumor microenvironment has long been recognized as prognostic markers in early stage triple
negative (TNBC), HER2 positive, and highly proliferative estrogen receptor (ER) positive breast cancers. Extensive
immune infiltration in the tumor microenvironment also predicts for greater chemotherapy sensitivity. It has been
suggested that high mutation load and consequently, large number of potentially immunogenic new antigens drive
immune infiltration in cancer. However, in breast cancer higher TIL counts and greater immune metagene
expression is associated with significantly lower clonal heterogeneity in all breast cancer subtypes and with a trend
for lower overall mutation, neoantigen and CNV loads in TNBC and HER2+ cancers. The high immune gene
expression and lower clonal heterogeneity suggest an immune pruning effect and equilibrium between immune
surveillance and clonal expansion. This suggests that anti‐tumor immune surveillance in immune‐rich tumors leads
to elimination of clones, lower clonal heterogeneity and “simpler” genomes. The surviving neoplastic cell
population exists at a near equilibrium with the immune surveillance explaining the better prognosis of these
cancers. The higher genomic diversity of immune‐poor TNBC suggest escape from immune surveillance and
genomic diversification. When we examined the immune microenvironment of paired primary tumors and
metastasis, most immune cell subtypes, immune functions, and immune‐associated gene expression were lower in
metastases compared to primary tumors, consistent with immune escape. These immunological differences suggest
that immunotherapy will be more effective in early stage disease than in metastatic cancers. While breast cancer
metastases are immunologically more inert than the corresponding primary tumors, several immune‐oncology
targets, macrophage and angiogenesis signatures show preserved expression in metastases suggesting rational
therapeutic combinations for clinical testing.
Evolution of the breast cancer genome under immune surveillance
Lajos Pusztai, M.D, D.Phil.
Professor of Medicine and Director of Breast Cancer Translational Research
Yale Cancer Center
Yale School of Medicine, USA
36
Prof. Assunta‐Sansone’s activities are in the areas of knowledge and information management, and interoperability of applications, impacting on the reproducibility of research outputs and the evolution of scholarly publishing. Prof. Sansone seats on the board of several non‐for‐profit efforts, and she is a consultant for Springer Nature and Honorary Academic Editor of the Scientific Data journal. She leads the Centre in several UK, European, NIH and pharma‐ funded projects in the life and biomedical sciences, and is a founding member of the ELIXIR UK Node, where she is responsible for standards and curation areas. Working with and for data producers and consumers, service providers, pre‐competitive informatics initiatives, journals and funding agencies, she strives to make digital research objects Findable, Accessible, Interoperable and Reusable = FAIR. She holds a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London; after few years working on vaccine genetics in an Imperial's spin off she moved to the European Bioinformatics Institute (EBI, Cambridge) where she worked for nine years as a Project and Team Coordinator and Principal Investigator before moving to Oxford in 2010.
A growing worldwide movement for reproducible research encourages making data, along with the experimental details, available according to the FAIR principles of Findability, Accessibility, Interoperability and Reusability (see http://www.nature.com/articles/sdata201618). Several data management, sharing policies and plans have emerged and, in parallel, a growing number of community‐based groups are developing hundreds of standards to harmonize the reporting of different experiments. Community mobilization is evident also by the number of efforts and alliances, but also data journals and data centres being launched. I will paint this dynamic landscape, highlighting NIH Data Commons and ELIXIR related activities, including FAIRsharing (https://fairsharing.org) and their role in scholarly communication.
The FAIR principles: Findability, Accessibility, Interoperability and Reusability of the research assets
Susanna‐Assunta Sansone, Ph.D
Associate Professor and Associate Director
Oxford e‐Research Centre,
Engineering Science Department,
University of Oxford, UK
37
Dr. Terry Speed completed a BSc (Hons) in mathematics and statistics at the University of Melbourne (1965), and a
PhD in mathematics at Monash University (1969). He held appointments at the University of Sheffield, U.K. (1969‐
73) and the University of Western Australia in Perth (1974‐82), and he was with Australia’s CSIRO between 1983
and 1987. In 1987 he moved to the Department of Statistics at the University of California at Berkeley (UCB), and
has remained with them ever since. In 1997 he took an appointment with the Walter & Eliza Hall Institute of
Medical Research (WEHI) in Melbourne, Australia, and was 50:50 UCB:WEHI until 2009, when he became emeritus
professor at UCB and full‐time at WEHI, where he headed the Bioinformatics Division until 2014. His research
interests lie in the application of statistics to genetics and genomics, and to related fields such as proteomics,
metabolomics and epigenomics, with a focus on cancer and epigenetics.
In a landmark 1959 paper (Technometrics 1: 251‐267) entitled “The Measuring Process”, the statistician John
Mandel from the US National Bureau of Standards in Washington, DC presented the theory for what he later called
the “row‐linear model” for the analysis of two‐way arrays of measurements made on the same set of units
(materials) across a number of laboratories. A companion paper published at the same time in the American
Society for Testing Materials (ASTM) Bulletin discussed practical and computational aspects of his method.
Mandel’s focus was interlaboratory studies of a single test method, and his method is now embodied in ASTM
Standard E691. Interestingly, his model can also be used with data from studies of different measurement methods
in a single laboratory, or studies involving multiple methods and laboratories. Note that in Mandel’s interlab
studies, a single measurement (perhaps replicated) such as of sulfur in petroleum was taken on each unit in each
lab.
All of the preceding notions can be applied with little change to the MAQC enterprise, where method is replaced by
platform (e.g. one of several microarray platforms, sequencing or qrt‐PCR assays). But there is one major difference:
instead of a single measurement being taken on each unit (e.g. sample of cells) in each lab or by each platform, as
was normal 60 years ago, these days we might take hundreds (qrt‐PCR), thousands (gene expression microarrays),
or millions (DNA methylation) of measurements on each unit in each lab with each platform. Mandel’s method
remains relevant, and illuminating, but we need to address the multiplicity of measurements on each unit. This talk
will summarize a recent study involving measurements on cell samples using multiple microarray and sequencing
assays measuring gene expression and DNA methylation, where we have adopted Mandel’s row linear model to the
‘omics era. Also, relevant to MAQC, Mandel’s approach works best with a good number of different samples
spanning a wide range of measurement values, and it becomes stronger the more labs or platforms are used on the
these samples.
Using Mandel’s row linear model in the ‘omics era
Terry Speed, Ph.D.
Walter and Eliza Hall Institute of Medical Research
Australia
38
After graduating from Shanghai Medical University in 1985, Dr. Shao underwent surgical residency training at the
Cancer Hospital affiliated with Shanghai Medical University. From 1990 to 1995, he completed his postdoctoral
research at the University of Maryland Cancer Center in the U.S. He was a visiting scientist to the Breast Center at
the University of California, Los Angeles from 1999 to 2001. In the year 2000, Dr. Shao was appointed Chairman of
the Department of Breast Surgery at the Cancer Hospital/Cancer Institute affiliated with Shanghai Medical
University. He was elected as the Director of Fudan University’s Breast Cancer Institute in 2002. In 2005, he became
the Chairman of the Chinese Anti‐Cancer Association’s Breast Cancer Society. Since his appointment in 2006, Dr.
Shao has been the Chairman of the Department of Surgery at the Cancer Hospital/Cancer Institute affiliated with
Fudan University, and he was appointed as the Director of Fudan University’s Breast Cancer Institute in 2012.
Dr. Shao's research focuses on the translational and clinical research of breast cancer, especially upon breast cancer
susceptibility and metastasis. Either as the principal investigator or in collaboration with others, he has conducted
excellent research in these areas. Dr. Shao has published over 300 articles in the field of breast cancer research,
which have been cited more than 3,000 times all over the world.
Breast cancer is the most common cancer diagnosed in women, and approximately one in eight women living in the
United States has a lifetime risk of developing the disease. Breast cancer is also one of the most studied solid
tumors and has the potential to being tractable to a precision medicine approach. It has been well established that
these tumors are extremely heterogeneous, so intra‐ and inter‐tumoral heterogeneity are the basic characteristics
of breast cancer. According to gene expression profiles, breast cancer can be divided into several molecular
subtypes, including luminal breast cancer, HER2‐positive breast cancer, and triple‐negative breast cancer (TNBC).
For each subtype, there are unique tumor biology characteristics and treatment strategies. Intrinsic subtypes are
associated with different gene expression and mutation profiles as well as different prognoses and responses to
therapies. In the era of precision medicine, therapies are being developed using the framework of molecular
subtyping. Here, we summarize the major challenges and possible solutions of treating the intratumoral
heterogeneity of breast cancer.
Intratumoral heterogeneity and clonal selection of breast cancer
Zhimin Shao, Ph.D
Director of Fudan University’s Breast Cancer Institute
Fudan University, China
39
Dr. Chunlin Xiao is a Staff Scientist at National Center for Biotechnology Information (NCBI), National
Institute of Health (NIH). His primary role is to deal with large scale sequencing data analysis and
management involving next‐generation sequencing technologies, such as 1000Genomes Project,
Genome‐in‐a‐Bottle project, and Sequence Quality Control project. His research interests include
population genetics, reference material and reference sequence dataset development, structural
variation detection method development, and cloud computing.
Structural variations (SVs) contribute to genetic diversity of human populations, affect biological functions, and
cause various human disorders. However, accurately identifying SVs with correct sizes and locations in the human
genome remains challenging due to the complexity of the human genome, limitations of sequencing technologies,
and drawbacks of analysis methods. The advancement of next‐generation sequencing technologies has dramatically
decreased the sequencing cost, while substantially increased the lengths of the sequencing reads. Thus, using de
novo assembly based approaches for discovering a full spectrum of SVs in human genome becomes appealing.
While various assembly methods have been developed and proposed for general use by the community, the
relative efficiency and predictive accuracy of SVs calling based on these assembly methods have not been fully
evaluated. In this study, we applied several popular de novo assembly tools to the sequencing read data that were
generated using multiple sequencing technologies with technical replicates for NA12878/HG001, a well‐studied
individual from NIST‐led Genome‐in‐a‐Bottle (GIAB) project; an HapMap Caucasian trio and a Chinese Quartet from
FDA‐led Sequencing Quality Control Phase II (SEQC2) project. Assemblies and SVs callsets were generated for each
of the samples, and repeatability in the SVs of the technical replicates and reproducibility across sequencing sites
were evaluated. These results allow better understanding of the impacts of de novo assembly methods on SVs
calling, thus providing a better insight to precision medicine.
Performance assessment of de novo assembly‐based structural variation detection in the human
genome
Chunlin Xiao, Ph.D
National Center for Biotechnology Information (NCBI)
National Institute of Health (NIH), USA
40
Dr. Jun Ye is co‐founder, president, and CEO of Sentieon, a bioinformatics company established in 2014. Prior to
Sentieon, Ye was co‐founder, president, and CEO of Founton Technologies, a company that specialized in
datamining, which is now part of Alibaba Group. Prior to Founton, Ye was co‐founder, president, and CTO of Brion
Technologies, a company specializing in computational lithography for semiconductor manufacturing, now part of
ASML. Prior to Brion, he was director of engineering at Onetta, an optical telecom company, working on
communication system control. He also served as director of engineering at KLA‐Tencor, where he worked on the
software and algorithm for mask inspection. From 2001 to 2015, Ye also served as a consulting professor of
electrical engineering at Stanford University, where he mentored and supervised graduate student research in
microlithography and other areas. Ye earned BSEE from Fudan University in 1987, MS‐Physics from Iowa State
University in 1991, and Ph.D. EE from Stanford University in 1996. During his career, he has authored or co‐
authored more than 50 U.S. patents covering algorithm, software, hardware, and system architecture. In 2014 he
received the ISU John V. Atanasoff Discovery Award for his work to advance scientific knowledge.
Sentieon (www.sentieon.com), incorporated in 2014, develops and supplies a suite of bioinformatics secondary
analysis tools that process genomics data with high computing efficiency, fast turnaround time, exceptional
accuracy, and 100% consistency. Current released products include Sentieon DNAseq, a germline DNA pipeline, and
Sentieon TNseq and TNscope, for tumor‐normal somatic variant detection. The Sentieon tools are easily scalable,
easily deployable, easily upgradable, software‐only solutions. The Sentieon tools achieve their efficiency and
consistency through optimized computing algorithm design and enterprise‐strength software implementation, and
achieve high accuracy using the industry’s most validated mathematics models. Sentieon products have won
multiple top awards at precisionFDA challenges, and ranked first place on the most recent ICGC‐TCGA DREAM
Mutation Calling challenge leaderboard in all three categories (snv, indel, SV). We strive to enable
precision genomics data for precision medicine.
Enable Precision Data for Precision Medicine
Jun Ye, Ph.D
Chief Executive Office and Co‐Founder
Sentieon, USA
42
Project #1: Computational Reproducibility
Project coordinator: Benjamin Haibe‐Kains, [email protected]
How to participate: Participants must propose studies, either their own or independent ones, for which they plan
to reproduce all the computational analysis results (figures, tables and supplementary materials). Pointers to raw
data, processed data, analysis code, documentation and tutorial must be submitted to the project organizers.
Objectives: The goal of this project is to provide practical examples and guidelines to help scientists make their own
research fully reproducible. From the diversity of studies and tools will emerge templates that could be used to
ensure full reproducibility of future studies.
Background: Biomedical science is undergoing a “reproducibility crisis” where the results of many studies cannot be
reproduced by independent investigators, or even the original authors. While reproducing biological experiments is
difficult, computational analyses can be reproduced. However, the amount of data and the complexity of the
analysis pipeline is ever increasing, making computational reproducibility challenging. There exist many ways to
make the computational analyses of a given study fully reproducible. Size and accessibility of the data, software
tools and computing resource requirements are among the factors that will define how an analysis can be made
fully reproducible. There is a dire need for practical guidelines to make biomedical studies more reproducible.
Specific Aims
Aim 1: Identification of candidate studies. The MAQC Society asks its members to share examples of manuscripts
that can be fully reproduced.
Benjamin Haibe‐Kains, Ph.D Scientist, Princess Margaret Cancer Center, University Health Network Assistant Professor, Department of Medical Biophysics, University of Toronto Adjunct Professor, Department of Computer Science, University of Toronto OICR Associate, Ontario Institute of Cancer Research Toronto, Canada Dr. Haibe‐Kains earned his Ph.D in Bioinformatics at the Université Libre de Bruxelles (Belgium), for which he
was awarded the Solvay Award (Belgium). Supported by the Fulbright Award, Dr. Haibe‐Kains did his
postdoctoral fellowship at the Dana‐farber Cancer Institute and Harvard School of Public Health (USA). He
started his laboratory at the Institut de Recherches Cliniques de Montréal (Canada) and moved to PM in
November 2013. His research focuses on the integration of high‐throughput data from various sources to
simultaneously analyze multiple facets of carcinogenesis. His team is analyzing high‐throughput
(pharmaco)genomic datasets to develop new prognostic and predictive models and to discover new therapeutic
regimens in order to significantly improve disease management. Dr. Haibe‐Kains’ main scientific contributions
include several prognostic gene signatures in breast cancer, subtype classification models for ovarian and breast
cancers, as well as genomic predictors of drug response in cancer cell lines.
43
Aim 2: Reproducing the studies. The data must be freely accessible, the code must be open‐source and documented,
and a tutorial describing how to rerun the analyses to generate the figures and tables of the manuscript must be
provided with the submission.
Aim 3: Generating guidelines and templates. The set of studies that have been successfully reproduced will be used
to generate guidelines and templates to help the community in their quest for full computational reproducibility.
Study Design
The Code Ocean platform (codeocean.com) will be used to store the code, processed data and all the software
dependencies. Code Ocean allows to create a Docker virtual machine for each project, ensuring that anybody can
easily run the code and reproduce all the analysis results.
Timeline
April 1st: Submission of the list of proposed studies.
April 16th: Selection of the proposed studies.
August 31st: Submission of the Code Ocean instance (pointers to raw data, processed data, code, documentation)
and tutorial
October 15th: Reproduction of the study results using the participants’ Code Ocean instance
November 26th: Selection of the three top submissions
44
Project #2: An Internationally‐conducted External Quality Control scheme on Machine Learning
Algorithms to assess Tumor Infiltrating Lymphocytes in Breast Cancer
Project coordinator: Roberto Salgado, [email protected]
How to participate: all groups with documented analytically validated machine learning tools are welcome to
participate. Interested groups need to provide to the coordinators a motivated request to participate, with
documentation of the analytical validity of the method they are going to apply for this program. The information
provided by the groups will be considered strictly confidential. The method that is going to applied for this
assessment needs to be locked, may not be changed during the assessment and should be described in detail in
order to avoid implicit overfitting.
Objectives/Goals:
To set quality standards and performance metrics on machine learning algorithms before introduction in a
clinical trial setting and/or daily practice setting.
To set quality standards and performance metrics that can be used by regulatory agencies to certify machine
learning algorithms for use in patient management.
To develop a framework for comparison of machine learning algorithms to determine precise quantitative
metrics of other breast cancer biomarkers, like Ki67.
Roberto Salgado, Ph.D
Department of Pathology/GZA, Antwerp
Breast Cancer Translational research Laboratory
Jules Bordet Institute, Brussels, Belgium
Translational Breast Cancer Genomic and Therapeutics Laboratory of
the Peter Mac Callum Cancer Center, Melbourne, Australia
Dr. Roberto Salgado is board certified in Anatomic Pathology since 2006, has obtained his medical training at
the University Hospital of Antwerp (Belgium) and the University Hospital in Leiden (The Netherlands). A PhD‐
thesis was obtained working with the Translational Cancer Research Group of the AZ Sint‐Augustinus
Hospital/Antwerp and at the Department of Pathology at the University Hospital of Antwerp, studying the
interactions of hemostasis and angiogenesis in breast cancer. His training in Anatomic and Molecular
Pathology took place at the University Hospital Antwerp, the University Hospital Leuven and at the Jules
Bordet Institute, Brussels, Belgium. Currently he works as a Pathologist in Antwerp, is a scientific collaborator
with the Breast Cancer Translational research Laboratory of the Jules Bordet Institute, the Immuno‐Task
Force of the Breast International Group in Brussels, the Translational Breast Cancer Genomic and
Therapeutics Laboratory of the Peter Mac Callum Cancer Center, Melbourne, Australia, and he works in close
collaboration with the EORTC, of which he’s co‐leading the development of the Specta‐trial concept. He is
also an auditor on Molecular Pathology/Genetic laboratories for the Federal Belgian Government.
45
Background:
At present, in early‐stage disease clinico‐pathological risk stratification is performed using a limited set of features
such as tumor size and lymph node status. Very large adjuvant trials such as ALTTO and APHINITY that have applied
these stratification schemes have illustrated the key problems with the current classification scheme – it does not
stratify patients with sufficient granularity to permit selection for clinical trials. The current scheme also takes the
approach of placing patients on a continuum of risk. This is at odds with results from high‐throughput technologies
such as gene expression profiling and genomic assays, which focus on identifying individual patient groups with
particular clinical behaviour. Several results in this area have identified genomic, transcriptomic or proteomic
features which in hindsight are associated with particular histological features. This suggests that the histological
appearance of a tumor represents a useful cancer phenotype which can be further explored, and contribute to
staging and stratification.
Machine learning refers to the general computational approach whereby data is used by algorithms to develop
predictive models. These models are finely tuned to optimize accuracy and generalizability as applied to new data.
Although machine learning existed for some time, more recently, advances in algorithm development and
hardware infrastructure has enabled ‘deep learning’ approaches. Deep learning was originally designed to mimic
the neural architecture of the human brain, and conceptually uses a series of connected nodes (neural nets) which
respond to input in a way that is tuned with repeated cycles of learning. Neural nets have the ability to learn rich
representations of complex data, which may contain hierarchical and non‐linear relationships. These abilities make
neural nets ideally suited to image classification. They have exhibited spectacular results in this area, often
matching the performance of experts in the field or exceeding it (superhuman capabilities).
With recent advances in deep learning provides a path forward for numerous applications in digital pathology. On
one level, the robust performance and training characteristics of deep learning allows us to develop accurate
automated assays for pathological features such as grade and lymphocyte infiltration. These have the potential to
be ‘learn once, apply everywhere’. This is in contrast to existing imaging methods, which lack the precision and
robustness to be used in the clinical setting. If the promise of deep learning can be validated, the use of digital
pathology would aid pathologists in routine reporting, and could be expected to improve the validity of current
pathology based clinico‐pathological features. In the short term, digital pathology would also help standardize
pathology results within and across trials given the time required for pathology assessed quantitative metrics.
TILs have been shown to be a reliable and reproducible marker of tumor immunogenicity in breast cancer. It is clear
that higher levels of TILs are associated with improved prognosis in early stage TNBC and HER2‐positive breast
cancer, as well as a higher probability of achieving pCR in the neoadjuvant setting. Analysis of TILs in residual
disease specimens after neoadjuvant therapy has also been shown to have prognostic value. The evaluation of TILs
as a biomarker in breast cancer is likely to be extended from the research domain to the clinical setting in the near
future. The assessment of TILs by digital image analysis might be useful for standardization in the future, since this
approach has the potential, for example, to determine the number of TILs per mm² stromal tissue as an exact
measurement contrary to the approximate semi‐quantitative evaluation suggested at this moment. In the first
International Guidelines on TIL‐assessment in breast cancer we proposed to develop an inter‐laboratory Ring study
to assess the reproducibility and clinical validity of TILs assessment, including machine learning algorithms. While
TILs have been measured morphologically and have been shown to add predominantly prognostic information,
methodological open questions in the morphological evaluation of TILs still remain, for example the assessment and
importance of spatial TIL‐heterogeneity. The measurement on H&E‐stained slides most likely represents the
46
beginning of the efforts to use infiltrating cell properties as companion diagnostic tests. Thus, as a field, we should
be open to the introduction of molecular methods, most likely in situ, that can classify the TILs‐component and
bring higher levels of information to the patient sample. However, at this time, these deep learning approaches are
still experimental and not sufficiently documented for introduction into standard practice.
On another level however, deep learning also permits discovery of image based features which may be very difficult
for current approaches to identify, particularly if they only exist in small groups of patients. The key benefit of deep
learning here is to rapidly identify pathological features in clinical trials that are predictive of treatment or
prognostic of outcome in a standardized way. This is an essential first step in deciding if previously undescribed
pathological features are clinically relevant, and is largely infeasible using current approaches. Deep learning also
permits modification and retraining of the feature set to optimize accuracy and interpretability, which is again
infeasible with current methods.
The Working Group is therefore proposing a collaboration with the Massive Analysis and Quality Control
Consortium (www.maqcsociety.org) characterizing tumor infiltrating lymphocytes using machine learning
algorithms. Developing a machine learning based assay for tumor infiltrating lymphocytes would enable rapid
expansion of this promising pathological feature, and by providing an adjunct to human pathologists, enhance the
validity and robustness for prognosis/prediction.
Specific Aims:
Comparison of the machine learning image classification metrics with those of pathologists in the RING‐study
which the Working group has published (Carsten Denkert et al., Mod. Pathol. 2016)
Comparison of automated TILs scoring with pathologist scoring results in different settings, namely core
biopsies, full sections, pre‐invasive (DCIS), untreated and treated tumors.
Comparing the performance of deep learning approaches to identify complex features such as clustering/spatial
statistics including proximity of TILs to cancer cells that are prognostic of outcome or predictive of treatment.
Comparing the clinical validity of different machine learning algorithms, the utility of combining models for
improved accuracy and to identify possible false positives and false negatives.
To combine annotated training data from different sites to create a comprehensive breast cancer ML training
and validation data base hosted by the consortium.
A framework will be developed to facilitate automated testing, validation and certification of image
classification derived pathology metrics that can improve the standard of care.
Develop together with both groups a kind of review, perspective or opinion paper on the use of Artificial
Intelligence/machine learning tools in Oncology, focusing but not exclusively on TILs and including the quality
requirements for use of these technologies in a clinical trial and daily practice setting, similar in kind as the Lisa
Mc Shane paper in Nature Criteria for the use of omics‐based predictors in clinical trials, doi:
10.1038/nature12564, Nature 2013, which is an exercise that may be very useful for regulatory (FDA; EMA). If
we pursue this idea, we should aim for a high level journal like Nature, Nature Biotechnology, Nature Reviews
Clinical Oncology or Nature Reviews Drug Discovery.
47
Study Design:
Breast cancer slide‐sets in different settings (invasive, DCIS, residual disease) with known TIL‐assessment by
pathologists will be posted on the website of the International Immune‐oncology Biomarker Group.
A clinical trial slide‐set, with clinical annotation, will be hosted on the website of the International Immune‐
oncology Biomarker Group.
All these slides can then be assessed by all participating groups.
The metrics assessed will be reported on pre‐specified formats to the coordinators.
A systematic comparison of the output of the machine learning/deep learning approaches with the
pathologists’ score will be performed in all datasets and eventual added clinical validity to the pathologists’
TIL‐score will be evaluated using the clinical trial datasets.
Timeline:
The project is aimed to start in 2019 and aims to be finished within 1 year from the start of the program.
Results will be presented at the annual meeting of the International Immuno‐Oncology Biomarker Working
Group held at the San Antonio Breast Cancer Conference and at the annual MAQC‐Conference.
Publication is aimed within 6 months after completion of the program in a high level journal.
48
Project #3: Challenges and opportunities in N‐of‐1 clinical trial: reality of applying genomics in clinic
Xichun Hu, Ph.D.
Fudan University Shanghai Cancer Center
Shanghai, China
49
Yuanting Zheng, Ph.D
Associate Professor
School of Life Sciences
Fudan University
Shanghai, China
Dr. Yuanting Zheng is an associate professor at the
School of Life Sciences of Fudan University. Dr.
Zheng’s research focuses on precision medicine,
pharmacogenomics, and clinical pharmacy. She is
developing multi‐omics reference materials and
quality control metrics to facilitate the translation of
multi‐omics technologies into reliable clinical
biomarkers and companion diagnostics for cancer and
type 2 diabetes. Dr. Zheng received her Ph.D. in
clinical pharmacology from China Pharmaceutical
University in 2009 when she joined the School of
Pharmacy of Fudan University as an assistant
professor. Dr. Zheng has published 40 peer‐reviewed
papers in clinical pharmacology, pharmacogenomics,
and bioinformatics. She is also an inventor on two
issued patents about drug repositioning and
combination therapies.
Leming Shi, Ph.D Professor and Director,
Center for Pharmacogenomics
School of Life Sciences
Fudan University
Shanghai, China
Dr. Leming Shi is a professor at the School of Life
Sciences and Shanghai Cancer Center of Fudan
University in Shanghai, China where he established and
directs the Center for Pharmacogenomics. Dr. Shi is the
president of the International Massive Analysis and
Quality Control (MAQC) Society (2017‐2018). Dr. Shi’s
research focuses on pharmacogenomics,
bioinformatics, and cheminformatics aiming to realize
precision medicine by developing biomarkers for early
cancer diagnosis, prognosis, and personalized therapy.
Dr. Shi is a co‐inventor on nine issued patents about
novel therapeutic molecules and has published over
200 peer‐reviewed papers (12 of them appeared in
Nature Biotechnology) with >10,000 citations by SCI
journals. Dr. Shi received his Ph.D. in computational
chemistry from the Chinese Academy of Sciences in
Beijing.
Project #3: Developing reference materials and reference data sets for the QC/standardization of multi‐
omics platforms
Project coordinators:
Yuanting Zheng, PhD Fudan University, China. [email protected]
Leming Shi, PhD Fudan University, China. [email protected]
How to participate: Participants must propose a data analysis plan to the project organizers and feedback the
analysis results (processed data, figures, and tables), analysis code, and documentation on time. Participants are
also encouraged to propose a data generation plan and generate multi‐omics data using the reference materials.
Objectives: The goal of this project is to provide a set of multi‐omics reference materials and reference datasets to
promote the repeatability, reproducibility, and comparability of massive analysis technologies. It also aims to
develop appropriate metrics to evaluate and monitor the performance of high‐throughput omics platforms, tests,
or laboratories.
50
Background
Emerging big data technologies has changed our way of studying disease and health, and reproducibility is the
foundation for translating the high‐throughput omics approaches to clinical utilities. However, errors may come
from the generation, analysis, and interpretation of multi‐omics data. Well‐characterized reference materials are
essential to understand the sources of errors, calibrate the measurements, and evaluate the performance of high‐
throughput omics tests. We are generating the Chinese Quartet reference materials at the levels of DNA, RNA,
proteins, and metabolites. We are also generating multi‐omics datasets using different platforms at various
laboratories. Therefore, it’s time for the community‐wide efforts to establish the multi‐omics benchmarking values
and metrics to assess the reproducibility and performance. In addition, multi‐omics integrative analyses will
improve the reliability of the benchmarking values and the interpretations. The MAQC Society’s guideline for
quality control and standardization using reference materials and datasets will make massive analysis technologies
more reproducible in the future by improving laboratory proficiencies.
Specific Aims:
Aim 1: Benchmarking values for the Chinese Quartet multi‐omics references. Jointly analyze the Chinese
Quartet multi‐omics datasets to characterize the benchmarking for genomic, transcriptomic, proteomic,
and metabolomic values for the DNA, RNA, protein, and metabolite reference materials, respectively.
Aim 2: Performance metrics. Develop performance metrics to assess the reproducibility and performance of
high‐throughput omics platforms, tests, and laboratories, including accuracy, precision, sensitivity,
specificity and reference intervals.
Aim 3: Guidelines for QC/standardization using reference standards. Develop guidelines to help the
community using reference materials and reference datasets to quality control the whole process of high‐
throughput omics assays.
Study Design
The Chinese Quartet multi‐omics reference materials (DNA, RNA, proteins, and metabolites) were generated
simultaneously from the same set of immortalized cell lines including father, mother, and two monozygotic twin
daughters. The “genetic ground truths” from the quartet family and the multi‐omics integrative analyses will make
the benchmarking values more reliable.
Timeline
2017.01.01 – 2018.05.30 Generate multi‐omics data using different platforms.
2018.06.01 – 2018.08.31 Jointly analyze the datasets.
2018.09.01 – 2018.12.31 Establish the benchmarking values and develop the performance metrics.
2019.01.01 – 2019.05.30 Validate selected reference values using orthogonal methods
51
Project #4: QC/standardization of proteomics materials and technology
The project coordinator:
Chen Ding Fudan University [email protected]
Jun Qin National Center for Protein Sciences ∙ Beijing [email protected]
Objectives/Goals: To establish a reference proteome standard material and SOP for measure the proteome across
different MS instruments and in different labs of different locations
Background:
The Human Proteome Project has promoted the development of proteomics and made it an indispensable tool for
research in life sciences and medicine. Improvements in “the next‐generation proteomics” including
instrumentation, sample preparation, and computational analysis, have facilitated the generation of data that cover
protein profiling, post‐translational modifications (PTMs), and protein‐protein interactions (PPIs). It is crucial to
measure samples with reproducibility. This is challenging in proteome research. Thus, a uniform standard reference
material, a standard operating procedure (SOP) and a reference dataset that may facilitate measurement validation
cross different labs lab are in urgent needs.
phase I of the Microarray Quality Control (MAQC‐I) tested agreement across sites and platforms for gene‐
expression microarrays; MAQC‐II surveyed approaches in microarray‐based predictive model development to
understand sources of variability in prediction performance, to assess the influences of endpoint signal strength in
data and to develop good modelling practice guidelines; The Sequencing Quality Control project (SEQC/MAQC‐III)
assessed the performance of RNA‐seq across laboratories and to test different sequencing platforms and data
analysis pipelines.
Comparing with the genome, transcriptome and microbiome, relatively few proteomic studies have evaluated data
reproducibility across laboratories to identify potential measurement variability in each step of proteome general
workflow. This probably because: (i) The proteomics field lacks uniform standard reference material; (ii)
Laboratories differ in methods for samples preparation; (iii) Data quality is affected by different data acquisition
methods; (iv) Subsequent analysis has no consistent evaluation systems, etc.
Chen Ding, Ph.D
Fudan University
Shanghai, China
Professor Chen Ding is the doctor of Cell Biology for the School of Life Sciences at University of Science and
Technology of China. He was selected in National “One‐Thousand‐Young‐Talents” Program and Beijing
“High‐level Oversea Young Talents” Project. He is the executive member of the council of Chinese Human
Proteome Organization (CNHUPO). He focuses on the development of in‐depth proteome platform,
standardization and its applications on biological researches, with special emphasis on transcriptional
regulation, signal transduction, and liver biology. His studies published in Nature
Biotechnology, Cell, Molecular Cell, Nature Communications, Proceedings of the National Academy of
Sciences of the United States of America, Journal of Experimental Medicine, and Mol Cell Proteomics as
research articles.
52
To solve these problems, we initiate the proteome standard initiative that focuses on the research in a proteome
standard for quality control, aiming to establish the first international proteome standard reference material with a
standard operating procedure (SOP) and generate the first international proteome benchmark dataset for assessing
variation in proteome analysis. A rigorous proteome standard and SOP and quality control system will be enacted
to ensure the reliability and reproducibility for measuring proteomes, accelerating the development of life sciences
and medicine.
Specific Aims:
(i) To establish a proteome standard reference material;
(ii) To establish a proteome standard operating procedure;
(iii) To generate a proteome benchmark dataset.
Study Design:
(i) We performed 1200 repeating measurements of a 293T cell line proteome with 10 different LC‐MS
instruments with a relatively fixed LC method and column in one year time span as our routine QC for the
Phoenix Center proteomics technology platform. We found that the results from the same instruments tended
to cluster together, i. e. batch effect from different instruments. We will analyse these data in a fashion of “big
data” to generate a preliminary SOP so that measurements across different mass spectrometers can be
compared independent of operators and instruments.
a) Quantification values of each peptide/protein are calculated and corrected using different normalization
methods to generate distribution curves for each peptide and proteins from the same instrument. We will
then perform outlier analysis based on Gaussian mixture model (GMM) to reject outliers and to determine
a reference index of each peptide/protein within 95% confidence intervals.
b) Then, Identify peptides and proteins whose quantification are independent of or insensitive to instruments
and operators. These peptides and proteins then form the basis for protein quantification of a reference
proteome across different instruments. This yields the identification/quantification SOP.
(ii) A reference proteome standard material from MAQC IV will be prepared and analysed.
a) We will use/recommend our routine LC method as a LC method SOP to measure the MAQC IV reference
proteome standard material. Thoroughly analyze the features of the MAQC standard material in different
aspects (identification, quantification, protein abundance ranking) and keep evaluating the reproducibility
and stability of the protein samples over 1 year.
b) We will then dispatch the MAQC IV reference proteome standard material, recommend the LC method
SOP and the identification/quantification SOP to different labs for proteome identification/quantification.
We will compare the results of the MAQC proteome standard material across different MS instruments in
different labs and in different locations
(iii) Generate the final MAQC IV SOP and the proteome benchmark dataset for the MAQC proteome standard.
Timeline
2017.01.1 – 2017.12.31 Collect 293T QC experiment data complying with SOP.
2018.01.1 – 2018.12.31 Analyse 293T data and collect experimental data from standard reference material of
MAQC with SOP
2019.01.1 – 2019.12.31 Analyse data from standard reference material and generating proteome benchmark
dataset.
53
Shraddha Thakkar, Ph.D
Division of Bioinformatics
Biostatistics NCTR/FDA Jefferson, AR, USA Dr. Thakkar works at FDA’s National Center for
Toxicological Research. Her research interests are in
applying bioinformatics and chemoinformatics for
study of toxicity and drug development with specific
interest in drug‐induced liver injury. She has received
multiple research and leadership awards regionally
and nationally and with FDA. That includes Genentech
Innovation in Biotechnology Award from American
Association of Pharmaceutical Scientist (AAPS),
Margret C. Etter Student lecturer award from
American Crystallography Association, and
Outstanding Service award from FDA. Dr. Thakkar has
adjunct appointments at both University of Arkansas
for Medical Sciences and University Arkansas at Little
Rock (Assistant Professor). Furthermore, Dr. Thakkar
was elected as Board member of the Mid‐South
Computational Biology and Bioinformatics Society
(MCBIOS) in 2014 and served as President for the
Society from 2016‐2017. She is also the Chair of
Pharmacogenomics Group at AAPS.
Weida Tong, Ph.D Director, Division of
Bioinformatics and Biostatistics
NCTR/FDA
Jefferson, AR, USA
Dr. Tong is Director of Division of Bioinformatics and
Biostatistics at FDA’s National Center for Toxicological
Research (NCTR/FDA). He has served a science advisory
board member for several large projects involving
multiple institutes in Europe and USA. He also holds
several adjunct positions at universities in US and
China. His division at FDA is to develop bioinformatic
methodologies and standards to support FDA research
and regulation and to advance regulatory science and
personalized medicine. The most visible projects from
his group are (1) leading the Microarray Quality Control
(MAQC) consortium to develop standard analysis
protocols and quality control metrics for emerging
technologies to support regulatory science and
precision medicine; (2) development of liver toxicity
knowledge base (LTKB) for drug safety; (4) in silico drug
repositioning for the enhanced treatment of rare
diseases; and (4) development of the FDA
bioinformatics system, ArrayTrackTM suite, to support
FDA review and research on pharmacogenomics. Dr.
Tong has published more than 230 papers.
Project #5: A structured approach to comparing genomics, computational, and high content biology
screening methods for predicting toxicity
Project coordinators:
Shraddha Thakkar ([email protected])
Weida Tong ([email protected])
How to participate: Please contact the coordinators to express your interest to analyze genomics data, in vitro data,
and/or in silico data.
Objectives: The goal of this project is to benchmark and comparative analysis of three major high throughput
methodologies (i.e., genomics, in vitro and computational approaches) for predicting drug‐induced liver injury (DILI).
We will evaluate these methodologies to understand their performance individually and in combination.
Background
54
Drug‐induced liver injury (DILI) is one of the primary challenges for drug development as well as regulatory application owing to the poor performance of existing preclinical models. This concern has led to significant efforts on evaluating alternative methods such as computational and genomic methodologies. The adoption of these types of approaches marks a paradigm shift for 21st century toxicology. Importantly, some have emerged as critical tools in regulatory decision‐making in the EU under the REACH/3Rs initiative. In the US, both Tox21 and ToxCast have been evaluating these methodologies for regulatory applications. Mathematically, a predictive model is a solution of Y=f(X), where f is a mathematical function used to predict the toxicological endpoint (Y) from input data X. The selection of X (technologies used to generate input data) and f (such as machine learning methods) affects the predictive performance of Y, along with the choice of compounds used in this equation. Many DILI models using these technologies have been described but their performances are difficult to compare due to minimal overlap in the choice of drugs, mathematical functions used (f) and measurement technologies (X). Thus, the true value of these technologies for DILI prediction is poorly understood. This study aims to compare/contrast the three main approaches (genomics, in vitro, and in silico) by systematically assessing the factors important in Y=f(X). Benchmarking data will be generated via a direct comparison of these three to assess their strengths individually and in combination. For the analysis, crowdsourcing approaches will be used via the societies with existing mechanisms. This systematic approach will generate evidence for realistic expectations to assess these technologies for DILI. Furthermore, it will also provide a guidance to analyze readiness and gaps for application of predictive modeling with new data streams in the context of regulatory decision making.
Specific Aims:
Aim 1: Generation of comprehensive list with DILI annotation We have developed Liver Toxicity Knowledgebase (LTKB) classifying the DILI risk of FDA approved drug. However, we need to extend this list beyond FDA approved drug to capture global DILI landscape of the drugs. We are planning to generate a comprehensive list of drugs with the DILI classification information. This list will be used in all the further mentioned project.
Aim 2: Benchmarking Genomic predictive methods for DILI (gDILI)
Two genomics datasets will be used, cMAP and L1000, both produced by Broad Institute. cMAP represents the
whole‐genome profiling approach while L1000 data is focused on a biologically relevant small and focused gene
sets. Gene expression based DILI predictive models will be generated for each dataset with various machine
learning methods (including both supervised and unsupervised approaches). The impact of machine learning
approaches and genomics platforms will be evaluated. We will work with Critical Assessment of Massive Data
Analysis (CAMDA) for this component of the project. CAMDA has established a community to conduct
crowdsourcing projects for the past 12 years. We will distribute the datasets via CAMDA.
Aim 3: Benchmarking in silico predictive methods for DILI (isDILI)
Different computational model (QSAR based or non‐QSAR based) have been reported to predict human DILI. For
this comparison, we will benchmark all the known methods and compare them across various DILI classification
schemes. For the past 8 years of developing Liver Toxicity Knowledge Base (project# E0721501), we have
established collaborations with many institutes for in silico DILI modeling. Most of them are listed as external
collaborators of this project, including the MAQC Society. We will expand this list for this component.
Aim 4: Comparative evaluation of gDILI versus isDILI
Genomics and in‐silico methods bring two different prospective in DILI prediction. Comparative evaluation of gDILI
and isDILI will help establishing the realistic expectations from each approach individually or in combinations. For
this comparison, we will analyze both the training and validation results from both gDILI and isDILI.
55
Study Design
We will engage the societies that have experience with an established mechanism to conduct the crowdsourcing project. During the preparation of this project, we have successfully obtained the commitment from two Societies, CAMDA (www.camda.info/) and MAQC (www.MAQCSociety.org). We will generate 5‐6 datasets which contains DILI positives and negatives defined by various methods (e.g., based on drug labeling, literature, case reports, or registry). These datasets will divided into a training set (2/3) and a validation set (1/3) by proportionally splitting across different therapeutic categories between training and validation sets. Both training and validation sets will be released to the societies at once with the former having the class labels that will be absent for the validation set. The participants will send the following information to us once the analyses are complete: (1) The analysis protocol, (2) The training set results, such as accuracy, specificity, and sensitivity, and (3) The predicted class label for the validation set. We will analyze all the results to determine the performance of the submitted models. With this study design, we will be able to understand:
1. The performance landscape of both genomics (gDILI) and in‐silico (isDILI) methods for human DILI prediction (e.g., upper‐ and lower‐bound performance).
2. The difference between two methodologies. 3. The fit‐for‐purpose application of these two methodologies individually and in combination.
Timeline
2 months: Generation of comprehensive list with the DILI classification
3 ‐ 5 months: Data compilation for the gDILI and isDILI
5 ‐ 9 months: data analysis
9 – 12: comparative meta‐analysis analysis of gDILI with isDILI
57
Presenter name: Longhui Deng ([email protected])
Title: Technical validation of AIM10TM, a hybridization capture‐based next‐generation sequencing (NGS)
clinical test for lung cancer
Authors: Longhui Deng1*, Ting Yang1*, Diange Li1, Guojie Qi1, Hui Li1, Jianbing Fan1, Weihong Xu1#
Affiliations: 1 AnchorDx Medical Co., Ltd, Guangzhou, China, 510300, * These authors contributed equally
to this work.
Poster Number: 10
Presenter name: Jongsuk Chung ([email protected])
Title: PR score: Single unified quality metric for clinical targeted sequencing
Authors: Jongsuk Chung1,3, Chung Lee1,2, Ki‐Wook Lee1,2, Taeseob Lee1,2, Woongyang Park1,2,3, Dae‐Soon
Son1*
Affiliations: 1*Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea, 06351, 2Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences &
Technology, Sungkyunkwan University, Seoul, Republic of Korea, 06351, 3Department of Molecular Cell
Biology, School of Medicine, Sungkyunkwan University, Suwon, Republic of Korea, 16419
Poster Number: 11
Presenter name: Li Zhang ([email protected])
Title: BioSV: an accurate and efficient tool for multiple sample‐based structural variation calling and
genotyping
Authors: Li Zhang1, Wubing Ding1, Wenqiang Liu1, Tieliu Shi1*
Affiliations: 1The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of
Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal
University, Shanghai 200241, China
Poster Number: 12
Presenter name: Marco Chierici ([email protected])
Title: Improved prognostic profiling in High‐Risk neuroblastoma by multi‐task deep learning with
distillation of the clinical diagnostic algorithm
Authors: Marco Chierici, Valerio Maggio, Giuseppe Jurman, and Cesare Furlanello
Affiliations: Fondazione Bruno Kessler, Trento, Italy
Poster Number: 13
Presenter name: Sayaka Itoh ([email protected])
Title: Boundary Organizations in the International Standardization Process of bio‐Analysis Technologies
Authors: Sayaka Itoh1,2, Junko Ikeda1, and Hiroki Nakae1
Affiliations: 1JMAC Japan Multiplex bio‐Analysis Consortium, Chiyoda‐ku, Tokyo, 102‐0083, 2Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,
University of Tokyo, Kashiwa‐shi, Chiba, 277‐8561
Poster Number: 14
58
Presenter name: Wenqiang Liu ([email protected])
Title: Integrative method significantly increases the accuracy of CNV detection
Authors: Wenqiang Liu, Li Zhang, Tieliu Shi*
Affiliations: The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of
Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal
University, Shanghai 200241, China
Poster Number: 15
Presenter name: Yingyi Hao ([email protected])
Title: Applications of chemometrics in precision medicine
Authors: Yuan Liu1, Yingyi Hao1, FanFan Xie1, Yu Liang1, Zhining Wen*1, Monglong Li*1
Affiliations: 1*College of Chemistry, Sichuan University, Chengdu, China, 610065
Poster Number: 16
Presenter name: Xiangjun Ji ([email protected])
Title: QuaPra
Authors: 1Xiangjun Ji, 1Geng Chen*, 1Tieliu Shi*
Affiliations: 1The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of
Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal
University, Shanghai, China, 200241
Poster Number: 17
Presenter name: Jiyang Zhang ([email protected])
Title: Reproducibility of CNV in replicate whole‐genome sequencing experiments
Authors: Jiyang Zhang1 and Leming Shi*1
Affiliations: 1State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University,
Shanghai, China, 200433.
Poster Number: 18
Presenter name: Luyao Ren ([email protected])
Title: Prediction of Cancer of Unknown Primary Site (CUP) Using Tissue‐Specific Molecular Signatures from
GTEx and TCGA Data
Authors: Luyao Ren, Jingcheng Yang, Bin Li, Chen Suo, Ying Yu, Yuanting Zheng, Leming Shi
Affiliations: Center for Pharmacogenomics, School of Life Sciences, Fudan University, Shanghai 200438,
China. Email: [email protected]
Poster Number: 19
Presenter name: Jingcheng Yang ([email protected])
Title: The Storage Scheme for Biological & Medical Big Data
Authors: Dajie Zhang1, Jingcheng Yang2, Zhaojie Xia3 and Li Guo*3
59
Affiliations: 1 SuperSAN Technologies (Suzhou) Co., Ltd., Suzhou, Jiangsu, China, 215000, 2School of Life, Fudan University, Shanghai, China, 200433, 3Institute of Process Engineering, Chinese
Academy of Sciences, Beijing, China, 100190
Poster Number: 20