Download pdf - MAQC Society 2nd Annual Meeting - PMGenomics · MAQC Society 2nd Annual Meeting ... Science and the American Statistical Association, and Kaggle Grandmaster will lead the

MAQC Society 2nd Annual Meeting

Theme: Precision Medicine and Clinical Omics

Fudan University, 220 Handan Road, Shanghai 200433, China

February 24‐27, 2018 (Saturday – Tuesday)

Organized by:

1

Table of Contents

1. General Information 2

2. Program 3‐8

3. Biographies: Open Remarks and Session Co‐Chairs 10‐15

4. Biographies and Abstracts for Sessions 1‐3 (in alphabetic order by Speakers) 16‐39

5. Descriptions of the proposed MAQC Projects (Session 4) 40‐54

6. Posters 55‐58

7. MAQC2019 (Trento, Italy) 59

2

General Information

MAQC website: www.maqcsociety.org

Venue:

Guanghua Tower East Wing / Conference Room 202 Fudan University, 220 Handan Road, Shanghai 200433, China

Date: February 24‐27, 2018 (Saturday – Tuesday)

Scientific Program Committee:

Scientific Program Chairs: Weida Tong ([email protected]) and Leming Shi ([email protected])

The MAQC Society Board Directors and President Officers

Local Organizing Committee Administrators:

Main contact: Ms. Lei Zhang ([email protected], +86‐15901714785)

Backup contact: Ms. Wanwan Hou ([email protected], +86‐18721040304)

Surrounding Hotels (addresses and estimated cost):

Fortune Hotel, 399 Handan Road, Shanghai. Tel: +86‐21‐6511‐0000, http://shanghaifortunehotel.com/ (~$70/night, most participants will stay at this hotel; 11 mins walking to conference venue)

Crowne Plaza Fudan, 199 Handan Road, Shanghai. Tel: +86‐21‐5552‐9999, www.crowneplazafudan.com (~$150/night; 10 mins walking to conference venue)

Hyatt Regency Wujiaochang, 88 East Guoding Road, Shanghai. Tel: +86‐21‐2565‐1234, shanghaiwujiaochang.regency.hyatt.com (~$150/night; 20 mins walking to conference venue)

Nearby Airports: Shanghai Pudong International Airport (PVG)

Shanghai Hongqiao International Airport (SHA)

Key Activities:

Poster presentation: Setup in the morning of Saturday and presentation in the afternoon of Sunday

February 24 (Saturday): Welcome reception for all attendees (Sponsored by Fudan University)

February 25 (Sunday): Dinner reception for the speakers and poster presenters (Sponsored by Illumina)

February 26 (Monday): Post‐conference workshop for the SEQC2 project discussion

February 27 (Tuesday): Post‐conference workshop on advanced analytics by SAS Institutes

Sponsors:

3

7:00 am, Saturday, February 24, 2018: Registration and poster hanging

Day 1 Morning, Saturday, February 24, 2018

Session I: Precision Medicine and Clinical Omics

Co‐Chairs: Matthias Fischer (Cologne University, Germany) and Wendell Jones (Q2 Solutions – EA

Genomics, USA)

8:30 am Welcome remarks Li Jin and Leming Shi (Fudan

University, China)

8:45 am Keynote address: Using Mandel’s row linear model

in the ‘omics era

Terry Speed (Walter and Eliza Hall

Institute of Medical Research,

Australia)

9:25 am Evolution of the breast cancer genome under

immune surveillance Lajos Pusztai (Yale University, USA)

9:50 am Intratumoral heterogeneity and clonal selection of

breast cancer

Zhimin Shao (Fudan University,

China)

10:15 am Group photo and coffee break

10:45 am The genetic basis of tumor progression and

spontaneous regression in neuroblastoma

Matthias Fischer (Cologne

University, Germany)

11:10 am Early staged lung cancer: challenges and

opportunities

Haiquan Chen (Fudan University,

China)

11:35 am Panel discussion

Terry Speed

Lajos Pusztai

Zhimin Shao

Matthias Fischer

Haiquan Chen

12:00 pm Lunch break

4

Day 1 Afternoon, Saturday, February 24, 2018

Session II: Reproducibility and Standards

Co‐Chairs: Susanna Sansone (Oxford University, UK) and Chris Mason (Weill Cornell Medicine, USA)

1:30 pm Keynote address: Closing the reproducibility gap

with standards and best practices

Leonard Freedman (Global

Biological Standards Institute, USA)

2:10 pm

The FAIR principles: Findability, Accessibility,

Interoperability and Reusability of the research

assets

Susanna Sansone (Oxford

University, UK)

2:35 pm

International standardization activity on emerging

technologies for medical and food industries: an ISO

perspective

Hiroki Nakae (JMAC Japan

Multiplex bio‐Analysis Consortium,

Japan)

3:00 pm Coffee break

3:30 pm Reliability of whole‐exome sequencing for assessing

intratumor genetic heterogeneity in breast cancer

Christos Hatzis (Yale University,

USA)

3:55 pm Quality control and standardization of precision

oncology related gene mutation detection in China

Jinming Li (National Center for

Clinical Laboratories, China)

4:20 pm Enable precision data for precision medicine Jun Ye (Sentieon, USA)

4:45 pm Panel discussion

Leonard Freedman

Susanna Sansone

Hiroki Nakae

Christos Hatzis

Jinming Li

Jun Ye

5:30 pm Adjourn

6:30 pm Welcome and dinner reception

5

Day 2 Morning, Sunday, February 25, 2018

Session III: Pharmacogenomics and Bioinformatics

Co‐Chairs: Russ Wolfinger (SAS, USA) and Cesare Furlanello (FBK, Italy)

8:30 am Keynote address: Pharmacogenomics and precision

medicine: current and future perspectives

Munir Pirmohamed (University of

Liverpool, UK)

9:10 am Clinical‐grade bioinformatics systems: Overview and

lessons learned

Wendell Jones (Q2 Solutions – EA

Genomics, USA)

9:25 am Towards robust clinical use of NGS in precision

medicine Han‐Yu Chuang (Illumina, USA)

9:40 am A general framework for analysis of clonal

heterogeneity and tumor evolution Mehdi Pirooznia (NIH, USA)

9:55 am Performance assessment of de novo assembly‐based structural variation detection in the human genome

Chunlin Xiao (NCBI/NIH, USA)

10:10 am Coffee break

10:30 am Prediction of drug efficacy in breast cancer subtypes

Melissa Davis (Walter and Eliza Hall

Institute of Medical Research,

Australia)

10:45 am

Detecting mutations induced by genotoxic

carcinogens using whole‐genome sequencing of

clonal cells

Tao Chen (NCTR/FDA, USA)

11:00 am Towards the Development of an Omics Data Analysis

Framework for Regulatory Application

Florian Caiment (Maastricht

University, The Netherlands)

11:15 am Personalization in molecular diagnostics of acute

lymphoblastic leukemia for Polish children

Aleksandra Gruca (Silesian

University of Technology, Poland)

11:30 am Power and limitations of RNA‐Seq ‐ Putting

reproducibility to the test

Paweł Łabaj (Boku University,

Austria)

11:45 am

Oncogenomics of c‐Myc transgenic mice reveal

novel regulators of extracellular signaling,

angiogenesis and invasion with clinical significance

for human lung adenocarcinoma

Jürgen Borlak (Hannover Medical

School, Germany)


6

Day 2 Afternoon, Sunday, February 25, 2018

Session IV: Poster Session and Society Projects

Co‐Chairs: Benjamin Haibe‐Kains (Canada) and Rebecca Kusko (Immuneering Corp, USA)

1:30 pm Poster session Poster presenters must stand by

their posters

MAQC Society Projects (15+5 min each):

2:30 pm Computational reproducibility project Benjamin Haibe‐Kains (University

of Toronto, Canada)

2:50 pm Project #1: Reproducible machine learning for

pathology image analysis

Roberto Salgado (The International

Immuno‐Oncology Biomarker

Working Group) and Wentao Yang

(Fudan University, China)

3:10 pm Project #2: Challenges and opportunities in N‐of‐1

clinical trial: a reality of applying genomics in clinic

Xichun Hu (Fudan University,

China)

3:30 pm

Project #3: Developing reference materials and

reference data sets for the QC/standardization of

multi‐omics platforms

Yuanting Zheng (Fudan University,

China)

3:50 pm Project #4: QC/standardization of proteomics

technology

Chen Ding (Fudan University,

China)

4:10 pm

Project #5: A structured approach to comparing

genomics, computational, and high content biology

screening methods for predicting toxicity

Weida Tong (NCTR/FDA, USA)

4:30 pm Poster award announcement Wendell Jones, The President‐Elect

of the Society

4:45 pm MAQC2019 announcement Cesare Furlanello, Vice‐President of

the Society

5:00 pm Adjourn of MAQC2018

7

Day 3, Monday, February 26, 2018

Post‐Conference Workshop on SEQC2

Co‐Chairs: Weida Tong (NCTR/FDA, USA) and Leming Shi (Fudan University, China)

8:30 am Welcome and overview Weida Tong (NCTR/FDA, USA)

8:45 am Keynote address: The role of journals/publishers in

promoting research standards and reproducibility

Andrew Marshall (Chief Editor,

Nature Biotechnology, USA)

9:30 am

Session 1: Cancer genomics with Whole Genome Sequencing (Led by Wenming Xiao,

NCTR/FDA, USA)

(10+5min presentation for each manuscript)

1. Establishment of reference samples for detection of somatic mutation in cancer

2. A comprehensive investigation of factors impacting cancer mutation detection

3. Effect of tumor purity on somatic mutation detection

4. Comprehensive investigation of false mutation discoveries in FFPE samples

10:30 am Coffee break

11:00 am

Session 2: Cancer genomics with onco‐panel sequencing (Led by Joshua Xu, NCTR/FDA,

USA)


1. Establishment of reference samples for onco‐panel sequencing (Joshua Xu,

NCTR/FDA, USA)

2. Spike‐in controls for reliable use of onco‐panel sequencing in clinical diagnostic

applications (James Willey, University of Toledo, USA)

3. Sensitivity and reproducibility of onco‐panel sequencing across multiple

laboratories and technologies (Joshua Xu, NCTR/FDA, USA)

4. Integration of DNA‐seq and RNA‐seq for enhanced clinical application (David Kreil,

Boku University, Vienna, Austria)


1:00 pm

Session 3: Germline variants (Led by Huixiao Hong, NCTR/FDA, USA)


1. Assessing reproducibility of SNVs and small indels detected in WGS (Huixiao Hong,

NCTR/FDA, USA)

2. Establishment of reproducible metrics for structural variant detection with WGS

(Marghoob Mohiyuddin, Roche, USA)

3. WGS in detection and characterization of important pharmacogenomic genes –

genetic variations and pseudogenes in DMETs (Baitang Ning, NCTR/FDA. USA)

8

1:45 pm

Session 4: Epigenomics (Led by Chris Mason, Weill Cornell Medicine, New York, USA)


1. WGBS and ATAC‐seq metrics, inter‐site and intra‐site reproducibility, and best

epiQC practices

2. Single molecule computational methods for base modifications detection (PacBio

and ONT) and validation

3. Varied computational methods for differentially methylated CpGs (DMCs),

differentially methylated regions (DMRs), and peak‐calling

2:30 pm Coffee break

3:00 pm

Session 5: Additional manuscript ideas (Chaired by Leming Shi)

(5min each with all the questions answered in the end of presentations)

1. Cross‐lab and platforms comparison for of single cell sequencing (Charles Wang,

Loma Linda University, USA)

2. A close look at the inconsistent FFPE artifact myth with onco‐panel sequencing

(Thomas Blomquist, University of Toledo, USA)

3. Variations on ATAC‐seq enzymes (Tn5059) and impact on epigenome variation

(Chris Mason, Weill Cornell Medicine, New York, USA)

3:30 pm Session 6: SEQC2 Manuscript Discussion

5:30 pm Adjourn

9

Day 4 (Tuesday, Feb 27th, 2018)

Advanced Data Analysis and Deep Learning Workshop

Data scientists from the SAS / JMP Life Sciences division will offer a free advanced level hands‐on

workshop to MAQC Society conference attendees. We will analyze one or more complex experiments

together, discuss various statistical methods and concepts and share perspectives on deep learning. You

will be able to follow along on your laptop.

When: Feb. 27th, 2018 9am‐4pm

Where: Fudan University

Data: Submit your omics, NGS, clinical trial, or laboratory data before February 14, 2018. We will

select representative data sets to demonstrate analyses. Your data set must be publicly

shareable, but we will request that other attendees keep it confidential until you provide

permission to use it more broadly.

Analysis: Depending on the problems, topics can include: Design of Experiments, Quality Assessment,

Normalization, ANOVA and Mixed Modeling, Reproducibility, Pattern Discovery, Predictive

Modeling, Genetic Marker Screening, Genome‐Wide Association Study, Population

Analysis, Marker‐Assisted Breeding and Cross‐Evaluation, Best Linear Unbiased Prediction,

Linkage Mapping, Quantitative Trait Loci, Bioassay, Clinical Trials, Bioequivalence, Method

Comparison, Calibration Curves, Limit of Quantification, Feature Engineering, Cross

Validation Model Comparison, Boosted Trees, Neural Networks, Ensembling, Data Science

Competitions

Software: JMP, with dashboards created by JMP Genomics and/or JMP Clinical

Instructors: Dr. Russ Wolfinger(JMP/ SAS), a fellow of the American Association for the Advancement of

Science and the American Statistical Association, and Kaggle Grandmaster will lead the

workshop with assistance from Dr. Wenjun Bao, Dr. Li Li and Dr. Kelci Miclaus.

Contact: Dr. Wenjun Bao (JMP/SAS) Email: [email protected] to register Tel: 1‐919‐531‐1484 (0), 1‐919‐244‐0260

10

Biographies: Open Remarks and

Session Co‐Chairs

11

Open Remarks (February 24th):

Dr. Leming Shi is a professor at the School of Life Sciences and Shanghai Cancer Center of Fudan University in

Shanghai, China where he established and directs the Center for Pharmacogenomics. Dr. Shi is the president of the

International Massive Analysis and Quality Control (MAQC) Society (2017‐2018). Dr. Shi’s research focuses on

pharmacogenomics, bioinformatics, and cheminformatics aiming to realize precision medicine by developing

biomarkers for early cancer diagnosis, prognosis, and personalized therapy. As a principal investigator at the US

Food and Drug Administration (FDA) from 2003 to 2012, Dr. Shi conceived and led the MicroArray and Sequencing

Quality Control (MAQC/SEQC) project aimed at realizing precision medicine by standardizing genomics and

bioinformatics, leading to the development of several FDA guidance documents. Dr. Shi was a co‐founder of

Chipscreen Biosciences Ltd. in Shenzhen, China where he co‐developed a chemogenomics‐based drug discovery

platform leading to several novel small‐molecule drug candidates with promising efficacy and safety profiles in

anticancer and antidiabetic clinical trials in China, US, and Japan; one novel compound (Chidamide) was approved in

2014 by China FDA for treating T‐cell lymphoma and another antidiabetic candidate (Chiglitazar) is in Phase III

clinical trials. Dr. Shi is a co‐inventor on nine issued patents about novel therapeutic molecules and has published

over 200 peer‐reviewed papers (12 of them appeared in Nature Biotechnology) with >10,000 citations by SCI

journals. Dr. Shi received his Ph.D. in computational chemistry from the Chinese Academy of Sciences in Beijing.

Professor Jin received his doctoral degree in genetics from the University of Texas and is an academician of the

Chinese Academy of Sciences. He worked as a faculty at University of Texas and University of Cincinnati College

of Medicine. He is also an external member of Max‐Planck Society and served as a board member of Human

Genome Organization (HUGO). He is one of the founders of CAS‐MPG Partner Institute of Computational

Biology, National Center of Human Genome at Shanghai, and Fudan Taizhou Institute of Health Sciences.

Professor Jin assumed his current position of Vice President in 2007. He is Director of the Collaborative

Innovation Centre of Genetics and Development. Professor Jin holds a Haoqing‐Fudan Professorship and has

been awarded the Ho Leung Ho Lee Foundation Award for Science and Technology Achievement, Second Prize

for National Natural Science Award (twice) among others. He serves as an editorial board member for nine

academic journals, is president of the Shanghai Society of Genetics, and is president of the Shanghai Society of

Anthropology. His research interests lie in medical genetics and genetic epidemiology, computational biology,

human population genetics and genomics. Professor Jin has published over 500 articles in journals

including Nature, Science, Cell, New England Journal of Medicine, PNAS, JCI and JAMA.

Leming Shi

Professor and Director, Center for Pharmacogenomics and

Fudan‐Zhangjiang Center for Clinical Genomics

School of Life Sciences, Fudan University

Shanghai, China

Li Jin, Ph.D.

Professor and Vice President Fudan University Shanghai, China

12

Co‐Chairs for Session 1 (February 24th):

Dr. Matthias Fischer is a physician‐scientist heading the Department of Experimental Pediatric Oncology at the University Children’s Hospital of Cologne, Germany. Dr. Fischer is serving as Senior Physician at the University Children’s Hospital of Cologne since 2009, and was appointed as full Professor for Pediatrics in 2016. His laboratory is focused on elucidating the genetic etiology and molecular pathogenesis of neuroblastoma, a pediatric tumor of the sympathetic nervous system. In particular, Dr. Fischer and his team is applying high‐throughput technologies, such as massively parallel sequencing and microarray analysis, to discover relevant alterations of neuroblastoma development, to establish prognostic and predictive biomarkers, and to identify therapeutic targets. All of this work is geared to translate novel findings from basic research into clinical practice, in order to improve clinical management of neuroblastoma patients. Dr. Fischer has authored more than 100 peer‐reviewed publications, and served as advisory board member in several national and international committees, covering both basic and clinical research projects.

Dr. Jones is currently Principal Bioinformaticist and Scientific Advisor at Q2 Solutions | EA Genomics. He conducts

collaborative scientific research with clients in multiple areas, specially in oncology and immuno‐oncology. His

background includes leading the analysis, development and validation of the bioinformatic and computational

systems that process complex genomic assays, including next generation sequencing assays, evaluating new and

emerging genomic technologies, and developing bioinformatic implementation strategies. He consults with clients

and provides thought leadership in industry and public consortiums involved in genomic science and measurement.

Dr. Jones has over 15 years of experience in advanced genomic technologies and 20 years of experience in scientific

and technology leadership positions, including serving as Vice President of Statistics and Bioinformatics at

Expression Analysis, Inc (EA) and Chief Science Officer at Reliametrics, a Nortel Networks business unit. He has

authored over 30 peer‐reviewed publications and has presented at numerous scientific meetings and industry

conferences and consortium workshops.

Matthias Fischer

Professor and Senior Physician

Experimental Pediatric Oncology

University Children's Hospital, Colonge University

Cologne, Germany

Wendell Jones

Principal Bioinformaticist and Scientific Advisor,

Q2 Solutions | EA Genomics,

Morrisville, North Carolina, USA

13


Prof. Assunta‐Sansone’s activities are in the areas of knowledge and information management, and interoperability of applications, impacting on the reproducibility of research outputs and the evolution of scholarly publishing. Prof. Sansone seats on the board of several non‐for‐profit efforts, and she is a consultant for Springer Nature and Honorary Academic Editor of the Scientific Data journal. She leads the Centre in several UK, European, NIH and pharma‐ funded projects in the life and biomedical sciences, and is a founding member of the ELIXIR UK Node, where she is responsible for standards and curation areas. Working with and for data producers and consumers, service providers, pre‐competitive informatics initiatives, journals and funding agencies, she strives to make digital research objects Findable, Accessible, Interoperable and Reusable = FAIR. She holds a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London; after few years working on vaccine genetics in an Imperial's spin off she moved to the European Bioinformatics Institute (EBI, Cambridge) where she worked for nine years as a Project and Team Coordinator and Principal Investigator before moving to Oxford in 2010.

Dr. Mason is an associate professor of Computational Genomics at Weill Cornell Medical College. He completed his

B.S In Genetics and Biochemistry from University of Wisconsin‐Madison and Ph.D. in Genome Evolution and

postdoctoral in Neuroscience from Yale University. His laboratory work utilizes computational and experimental

methodologies to identify and characterize the essential genetic elements that guide the function of the human

genome. He perform research in three principal areas: (1) the functional annotation of the human genome by

mutational profiling in families with brain malformations and cancer patients, (2) the examination of the elements

that orchestrate the development of the human brain and their evolutionary changes, and (3) the development of

models for systems and synthetic biology. Mason Lab uses high‐throughput methods to generate cell‐specific

molecular maps of genetic, epigenetic, and transcriptional activity and we use them to create multi‐dimensional

molecular portraits of development and disease. He also develops algorithms to detect, catalog and functionally

annotate variants in the genetic pathways that control developmental processes. He has more than 130

publications.

Susanna‐Assunta Sansone

Associate Professor and Associate Director

Oxford e‐Research Centre,

Engineering Science Department,

University of Oxford, UK

Christopher Mason

Associate Professor

Department of Physiology and Biophysics

Weill Cornell Medicine, New York, USA

14


Dr. Wolfinger leads a team in research and development of JMP‐based software solutions in the areas of genomics

and clinical research. He joined SAS in 1989 after earning a PhD in Statistics from North Carolina State University

(NCSU). For ten years he devoted his efforts to developing statistical procedures in the areas of linear and nonlinear

mixed models, multiple testing, and density estimation. In 2000 he started the Scientific Discovery department at

SAS. Wolfinger is co‐author of more than 100 publications and a fellow of both the American Association for the

Advancement of Science and the American Statistical Association. He also is an adjunct faculty member at NCSU

and the University of North Carolina at Chapel Hill and a Kaggle Grandmaster.

Dr. Cesare Furlanello is head of Data Science at the Kessler Foundation (Trento, Italy), where he is Senior

researcher. He also leads the MPBA Lab (https://mpbalab.fbk.eu), previously the ITC‐IRST Neural Networks for

Complex Data Analysis Project, since 1995. After graduating with honors in Mathematics at the University of Padua

in 1986, he joined IRST, the first Artificial Intelligence research centre in Italy. He is a data scientist and an expert in

machine learning applied to complex data, with a focus on predictive models for human and environmental health

and scientific reproducibility. He has been PI of more than 60 projects funded by competitive grants or industry,

notably in the first national project on industrial and health applications of neural networks in 1993, 4 projects of

the European Institute of Technology, and many other EU research grants. He has published in machine learning

and bioinformatics on Nature, Nature Biotech, Nature Genetics, Bioinformatics, IEEE J. Sign Proc., IEEE Trans Nano

Biosc, Brief. Bioinformatics and others. CF was Scientific secretary of the GNCB‐CNR school on Neural Networks for

Signal Processing (Trento 1989) and organizer of other workshops on applications of Machine Learning and Neural

Networks. I Local Conference Chair of the MGED11 Workshop of the MGED Society for international standards in

bioinformatics (and in its Advisory Board since 2007). He is adjunct research faculty of The Wistar Institute cancer

research centre in Philadelphia. He is in the PhD Board of the Centre for Integrative Biology of the University of

Trento, and a founder of the Laboratory of Biomolecular Sequence and Structure Analysis for Health (FBK, Univ. of

Trento, CNR). In Dec 2017 he has attained the national habilitation as full professor in bioengineering. Since 2001,

he is Scientific Director of WebValley, the first summer school in Data Science for interdisciplinary research

dedicated to talented high school students. His research currently aims at developing reproducible Deep Learning

methods for Precision Medicine, with a focus on the integration of multi‐modal omics and imaging data. He is

President Elect of the MAQC international society.

Russell Wolfinger

Director of Scientific Discovery and Genomics

SAS Institute Inc.

Cary, NC, USA

Cesare Furlanello

FBK ‐ Fondazione Bruno Kessler

MPBA: Predictive Models for Biomedicine and Environment

Senior Researcher, Head of Research Unit

Povo (Trento), Italy

15


Dr. Haibe‐Kains earned his Ph.D in Bioinformatics at the Université Libre de Bruxelles (Belgium), for which he was

awarded the Solvay Award (Belgium). Supported by the Fulbright Award, Dr. Haibe‐Kains did his postdoctoral

fellowship at the Dana‐farber Cancer Institute and Harvard School of Public Health (USA). He started his laboratory

at the Institut de Recherches Cliniques de Montréal (Canada) and moved to PM in November 2013. His research

focuses on the integration of high‐throughput data from various sources to simultaneously analyze multiple facets

of carcinogenesis. His team is analyzing high‐throughput (pharmaco)genomic datasets to develop new prognostic

and predictive models and to discover new therapeutic regimens in order to significantly improve disease

management. Dr. Haibe‐Kains’ main scientific contributions include several prognostic gene signatures in breast

cancer, subtype classification models for ovarian and breast cancers, as well as genomic predictors of drug response

in cancer cell lines.

Dr. Kusko is a computational biologist by training with expertise in translating NGS and other genomic data to

actionable discoveries. After completing her undergraduate degree in Biological Engineering at Massachusetts

Institute of Technology (MIT), she went on to complete her Ph.D. in Computational Biomedicine at the Boston

University School of Medicine. Her doctoral thesis focused on the transcriptome in Chronic Obstructive Pulmonary

Disease, or COPD, and lung cancer in never‐smokers. She has integrated directly with clinicians on study design,

with lab scientists to plan experiments, with senior leadership for strategic planning, and with fellow computational

scientists to collaborate. Her published areas of experience include: drug mechanism of action (MOA), drug

repositioning, target identification, drug combinations, and big data reproducibility.

Benjamin Haibe‐Kains Scientist, Princess Margaret Cancer Center, University Health Network Assistant Professor, Department of Medical Biophysics, University of Toronto Adjunct Professor, Department of Computer Science, University of Toronto OICR Associate, Ontario Institute of Cancer Research Toronto, Canada

Rebecca Kusko

Vice President of Genomics

Immuneering Corporation,

Cambridge, MA, USA

16

Biographies and Abstracts for

Session 1‐3 (Alphabetically ordered

by last name)

17

Dr. Jürgen Borlak was born in Neu‐Ulm, Germany in 1958. After studies at Universities in Germany and abroad he

obtained his Doctorate in Pharmacology and Toxicology at the University of Reading, GB. Following residencies in

the UK and France (Strasbourg) he was habilitated in pharmacology and toxicology and received the venia legend

(“Privatdozent”) at Hannover medical Scholl in the year 2000. Two years later he was appointed as full professor of

Pharmacology and Toxicology at Hannover Medical School. From 2002 onwards he has been the Director of the

Institute of Pharmaco‐ and Toxicogenomics at Hannover Medical School. This new field of genomic science

applies a wide range of methods in genetics, molecular biology, molecular toxicology and functional genomics for

a better understanding of disease causing mechanisms and drug induced toxicities. An array of enabling

technologies are applied for an identification of “drugable” targets and for a better understanding of inter‐

individual differences in drug response, therefore allowing individualized drug treatment regimens and disease

prevention strategies. Jürgen Borlak is also an appointed Professor of Molecular Anatomy at the Medical Faculty

of the University Leipzig; a Professor of Experimental Medicine at Uppsala University, Sweden and is

Distinguished Visiting Professor at the University of Trento, Italy. Jürgen Borlak is author of > 270 original

publications and 25 book chapters and editor of the Handbook of Toxicogenomics. He is reviewer and member of

the editorial board for various scientific journals. Amongst others he is an appointed expert of the World

Health Organisation (WHO), of the US governmental agency FDA, the European Medicines Agency EMA and is

also an international reviewer for many European, US and Asian Research Organisations.

The c‐Myc transcription factor is frequently deregulated in cancers. To search for disease diagnostic and druggable

targets a transgenic lung cancer disease model was investigated. Oncogenomics identified c‐Myc target genes in

lung tumors. These were validated by RT‐PCR, Western Blotting, EMSA assays and ChIP‐seq data retrieved from

public sources. Gene reporter and ChIP assays verified functional importance of c‐Myc binding sites. The clinical

significance was established by RT‐qPCR in tumor and matched healthy control tissues, by RNA‐seq data retrieved

from the TCGA Consortium and by immunohistochemistry recovered from the Human Protein Atlas repository. In

transgenic lung tumors 25 novel candidate genes were identified. These code for growth factors, Wnt/β‐catenin

and inhibitors of death receptor signaling, adhesion and cytoskeleton dynamics, invasion and angiogenesis. For 10

proteins over‐expression was confirmed by IHC thus demonstrating their druggability. Moreover, c‐Myc over‐

expression caused complete gene silencing of 12 candidate genes, including Bmp6, Fbln1 and Ptprb to influence

lung morphogenesis, invasiveness and cell signaling events. Conversely, among the 75 repressed genes TNFα and

TGF‐β pathways as well as negative regulators of IGF1 and MAPK signaling were affected. Additionally, anti‐

angiogenic, anti‐ invasive, adhesion and extracellular matrix remodeling and growth suppressive functions were

repressed. For 15 candidate genes c‐Myc‐dependent DNA binding and transcriptional responses in human lung

cancer samples were confirmed. Finally, Kaplan‐Meier survival statistics revealed clinical significance for 59 out of

100 candidate genes, thus confirming their prognostic value.In conclusion, previously unknown c‐Myc target genes

in lung cancer were identified to enable the development of mechanism‐based therapies. (*The paper was

published in Oncotarget. 2017; 8:101808‐101831 and the abstract is a transcript of this paper.)

Jürgen Borlak, Ph.D

Univ.‐Professor, Hannover Medical School

Centre for Pharmacology and Toxicology

Hannover, Germany

Oncogenomics of c‐Myc transgenic mice reveal novel regulators of extracellular signaling,

angiogenesis and invasion with clinical significance for human lung adenocarcinoma

18

Dr. Florian Caiment is Assistant professor with a main expertise is in the recently emerging next‐generation

sequencing (NGS) technology, which allows sequencing complete genome or transcriptome of any biological

material for unlimited applications. He was involved in this technology from the very beginning, initially in the lab

during his phD then moving to the bioinformatics analysis during his Post‐Doc. This unique double expertise allows

him to design innovative and coherent experiment both from the biological and the analytical point of view. Florian

joined the department of Toxicogenomics in Maastricht as a postdoctoral fellow in April 2011, as a full time

bioinformatician on the ASAT knowledge base project (assuring safety without animal testing). He followed up with

the DiXa European project (Data Infrastructure for alternatives to animal‐based Chemical SAfety testing). Florian is

now supervising the RNA‐Seq activities of the EU FP7 HeCaToS project (14 partners) as well as in the Horizon 2020

Eu‐ToxRisk (39 partners).

Despite the expanding number of research scientific publications using omics in the field of toxicology ‐ with the

exception of few cases in the domain of drug development ‐ no omics data has been used till date to support a

chemical regulatory application, for instance under REACH. Regulatory agencies mainly report two major issues

concerning the use of omics technologies: 1/ The high technical variance for each given technological platform,

which make the data sometimes difficult to correlate within and between different platforms; 2/ The impact that

the choice of bioinformatics analysis pipeline has on the results, reflected in pipeline‐dependent differences in the

lists of biological systems significantly affected by the compounds of interest, making the “truth” of toxicity difficult

to assess or believe from omics data.

While several scientific consortium had been carried out to tackle these two main issues, notably with respect to

microarray quality control (MAQC‐I and II) followed by sequencing quality control (SEQC), both leading to major

publications in high impact factor journals, no consensus on an omics analysis framework (ODAF) for regulatory

application has been achieved yet . To date, there are no OECD guidance documents available for the generation

and analysis of omics data. Here, one of the major roadblocks is the lack of a standardized procedure for the

analysis of the data. This results in different conclusions possibly being derived from one and the same set of data

depending on the transformations and statistical procedures used. This creates an issue for regulators who are not

able to assess whether the results generated from such data support the conclusions being drawn and do not have

the means to verify the conclusions.

In this particular context, this new project aims to regroup toxicogenomics experts to test and further develop a

regulatory ODAF (R‐ODAF) proposal for the toxicogenomics community with the ambition to enable the regulatory

bodies to consider omics as a relevant data type to support compound submissions. For this, we will focus our

project on transcriptomics data, and will start by identifying, collect and review the analysis methods for all relevant

toxicogenomics dataset on the three major transcriptomics platforms: microarrays, RNA‐Seq and TempO‐Seq

technology (from BioSpyder). Ultimately, this project will propose a common foundation method to the regulatory

agency with clear guidelines on how to recognize and discard bad quality samples (such as outliers), how to define

thresholds and parameters for identifying differential expression (pvalue, multiples testing correction methods, fold

change…) for each platform.

Towards the Development of an Omics Data Analysis Framework for Regulatory Application

Florian Caiment, Ph.D

Maastricht University

School of Oncology & Developmental Biology

Maastricht, The Netherlands

19

Early staged lung cancer, challenges and opportunities

Haiquan Chen, M.D.

Director of Lung Cancer Center

Fudan University Shanghai Cancer Center

Shanghai, China

20

Dr. Tao Chen received his Ph.D. degree in Toxicology from the University of Arkansas for Medical Sciences in 1997

and received his diplomat of the American Board of Toxicology in 1999. He was a postdoctoral scientist in Duke

University during 1998‐2000. He joined the Division of Genetic and Molecular Toxicology, National Center for

Toxicological Research, U.S. Food and Drug Administration in 2000 as a research toxicologist. He is also an adjunct

professor in several universities. Dr. Chen has served as an editor or an editor board member for more than six

scientific journals. He has been a consultant in the World Health Organization (WHO) and the Organization for

Economic Co‐operation and Development (OECD) for development of regulatory documents and a grant reviewer

for the U.S. National Science Foundation and European Research Council. Dr. Chen has served as an organization

committee member or a chair for a meeting or a meeting session many times. He has also been invited to present

several keynote speeches and planetary lectures in national and international scientific meeting. He has published

more than 130 articles in peer‐reviewed scientific journals and books. Dr. Chen’s current approaches addresses on

evaluation of mutagenicity and carcinogenicity of FDA regulated agents using next generation sequencing.

Mutations are heritable changes in the nucleotide sequence of DNA that can lead to many adverse effects, such as

cancers. Genotoxicity assays have been used to identify chemical mutagenicity and carcinogenicity. Current FDA‐

recommended mutation assays, such as the Ames test and mouse lymphoma assay, predict mutagenicity of test

agents in the genes that allow mutant cells to be positively selected when mutations occur in the genes. These

assays only detect mutations related to the genes, but not the whole genome. The mutations induced by the test

agents may bias to certain types of mutations due to the target genes’ natures. Although the assays have been

used for many years, a new mutation assay that can directly measure all types of mutations in genome has been

expected for a long time. Recently developed next‐generation sequencing (NGS) technology allows us to detect

genome mutations in the cells directly. In our laboratory, we have used whole genome sequencing method to

screen mutagens using Salmonella typhimurium TA100 cells, a bacteria system, to detect germline mutations in

Caenorhabditis elegans, a worm system, and to evaluate mutational spectra in mouse lymphoma cells, a

mammalian system. The results show that NGS technology can sensitively detect mutation induction caused by

genetic carcinogens and effectively evaluate the different types of mutations including base pair substitutions,

insertions and deletions (indels), loss of heterozygosity, and chromosome number changes, suggesting that the

unparalleled advantages of NGS for evaluating mutagenicity of chemicals can be applied for the next generation of

mutagenicity tests.

Tao Chen, Ph.D

National Center for Toxicological Research

U.S. Food and Drug Administration

Jefferson, Arkansas, USA

Genome‐wide characterizations of mutations induced by genetic carcinogens using next‐generation

sequencing

21

Dr. Chuang is Senior Manager of Bioinformatics and Genomic Applications Partnerships in Illumina, Inc. She has

been working on various genomic applications with NGS technology, from whole genome sequencing to targeted

sequencing, from short reads to long reads, from multiplex PCR to hybrid capture, from service models to

distributed kits, etc. Previously, she served as the informatics core team lead for developing Illumina’s clinical‐grade

comprehensive gene panel products in cancer diagnostics and therapy selection. Beyond leading a group of

Bioinformatics Scientists in developing NGS informatics solutions for oncology applications, Dr. Chuang also

provides technical consultant for other functional teams in assay development, software development, business

development, marketing, manufacturing, and regulatory in clinical product development. Another big part of her

role is the interaction with key opinion leaders in the oncology field, including pharma partners and translational

researchers, to bridge gaps in customer demand and product design. Han‐Yu is always thinking ahead for next

products to facilitate the realization of precision medicine, so technology scouting is also a huge piece in her pursuit.

Currently, she serves as a broader role to facilitate Illumina’s partnership with external innovation in genomic

applications.Prior to joining Illumina, Han‐Yu got her PhD degree in Bioinformatics and Systems Biology from

University of California, San Diego and her bachelor’s and master’s degree in Computer Science and Information

Technology from National Taiwan University. In her past life in academic, she has published more than 20 peer

reviewed journal papers in the field of cancer biomarker selection and functional genomics, with in total near to

3,400 citations by research journal papers. Her work has pioneered the use of network based approaches for cancer

patient classification and risk stratification towards precision medicine.

More and more studies have indicated the utility of next‐generation sequencing (NGS) in precision medicine, from

rare genetic disease, reproductive health, to oncology, thanks to increasing data output and decreasing costs of the

technology. The vast amount of data output has brought in the promise for comprehensive diagnostic approaches

but might also raise the challenge in robust clinical use. Starting from product design to validation, sample

accessibility, workflow complexity, and data analysis optimization are integral parts of a robust NGS solution for

clinical use. In this talk, I would like to discuss key considerations on the above vital components in developing NGS

diagnostics for companion therapeutics, and use Illumina’s most recent tumor profiling tool TruSight Tumor 170 as

an example for illustration.

Han‐Yu Chuang, Ph.D

Senior Manager, Bioinformatics and Genomic Applications Partnerships

Illumina Inc

San Diego, USA

Towards robust clinical use of NGS in precision medicine

22

Dr. Melissa Davis is a computational cancer biologist, and Laboratory Head at the Walter and Eliza Hall Institute of

Medical Research (https://www.wehi.edu.au/people/melissa‐davis/). She is an expert in the analysis and

reconstruction of molecular mechanisms of cancer progression and response to therapy. Her background is

genetics and computational cell biology, and she currently holds a National Breast Cancer Foundation Fellowship to

study molecular mechanisms of breast cancer metastasis. Dr. Davis and her team work on the analysis of large,

heterogeneous datasets from national and international cancer projects and seek to discover patterns in the data

that will help to target patient treatment and personalise cancer therapy.

More effective targeting of cancer therapies has resulted in dramatically improved outcomes for patients with

breast cancer, however breast cancer is a heterogeneous disease with many molecular subtypes, and survival gains

have not been uniform. Even for subtypes with relatively good treatment options and outcomes, patients do not

show a uniform response to therapy; some patients will not respond to the standard treatment regimens indicated

by their clinical presentation, and others will experience disease recurrence after initially favourable responses.

Considerable work has been undertaken in recent years to develop computational models that will predict the

response of a patient to therapy to improve the precision with which patients can be treated. As the collection of

molecular data on individual patients becomes increasingly feasible in the clinical setting, these in silico methods

have the potential to improve the precision of treatment and outcomes for patients. Gene expression data has

repeatedly been shown to be the most effective kind of molecular measurement for training predictive models,

exceeding the performance of genetic and proteomic data in predicting response to therapy. As such, methods that

use gene expression data are likely to provide the most powerful predictions of patient response.

We have generated a series of predictive models that can be used to estimate the drug sensitivity of breast cancer

samples to determine which patients are most likely to respond well to a given therapy. We have also segregated

patients based on molecular phenotypes, such as signalling status and epithelial‐mesenchymal plasticity, in addition

to traditional molecular subtypes subtypes (such as basal, luminal and normal‐like) to identify therapies that show

enhanced efficacy against specific subtypes.

Prediction of drug efficacy in breast cancer subtypes

Melissa Davis, Ph.D

The Walter and Eliza Hall Institute of Medical Research, Australia

23

Dr. Matthias Fischer is a physician‐scientist heading the Department of Experimental Pediatric Oncology at the University Children’s Hospital of Cologne, Germany. Dr. Fischer is serving as Senior Physician at the University Children’s Hospital of Cologne since 2009, and was appointed as full Professor for Pediatrics in 2016. His laboratory is focused on elucidating the genetic etiology and molecular pathogenesis of neuroblastoma, a pediatric tumor of the sympathetic nervous system. In particular, Dr. Fischer and his team is applying high‐throughput technologies, such as massively parallel sequencing and microarray analysis, to discover relevant alterations of neuroblastoma development, to establish prognostic and predictive biomarkers, and to identify therapeutic targets. All of this work is geared to translate novel findings from basic research into clinical practice, in order to improve clinical management of neuroblastoma patients. Dr. Fischer has authored more than 100 peer‐reviewed publications, and served as advisory board member in several national and international committees, covering both basic and clinical research projects.

Neuroblastoma is a pediatric tumor of the sympathetic nervous system. The clinical course of the disease varies

dramatically, ranging from spontaneous regression to fatal progression despite intensive cytotoxic treatment. The

pathogenesis underlying the distinct phenotypes, however, has been poorly understood to date. To determine the

molecular mechanisms of favorable and unfavorable courses, we performed massively parallel sequencing of 416

untreated neuroblastomas. We detected genomic alterations of 17 genes related to the RAS and p53 pathways in

73/416 patients. The presence of these mutations was strongly associated with dismal outcome in the entire cohort,

as well as in the clinical high‐risk and non‐high‐risk subgroups. We noticed, however, that the prognostic effect of

RAS/p53 pathway mutations was strictly dependent on the occurrence of telomere maintenance mechanisms.

Survival of patients whose tumors were telomere maintenance‐positive was dramatically inferior when additional

RAS/p53 pathway mutations were present as compared to those without such alterations. By contrast, all patients

whose tumors lacked telomere maintenance mechanisms have survived to date, and spontaneous regression or

differentiation into ganglioneuroblastoma occurred both in the presence and absence of RAS/p53 pathway

mutations. Our data suggest a precise definition of clinical neuroblastoma phenotypes: High‐risk tumors are

characterized by telomere maintenance activation, and additional mutations in RAS or p53 pathway genes

delineate a patient subgroup with devastating outcome. By contrast, patient outcome is excellent in the absence of

telomere maintenance, and mutations in RAS/p53 pathway genes fail to establish fully malignant tumors in this

subgroup. Together, our results emphasize the importance of activating telomere maintenance mechanisms in the

development of human malignancies, and provide a starting point for refined risk assessment and new therapies in

neuroblastoma.

The genetics of spontaneous regression and fatal progression in neuroblastoma

Matthias Fischer, Ph.D

Professor and Senior Physician

Experimental Pediatric Oncology

University Children's Hospital, Colonge University

Cologne, Germany

24

Dr. Freedman is the founding President of the Global Biological Standards Institute (GBSI). He has held leadership

positions in basic biomedical research, drug discovery, and science policy in both the private and non‐profit sectors,

as well as in academia.

Prior to starting GBSI, Dr. Freedman served as Vice Dean for Research and Professor of Biochemistry & Molecular

Biology at Jefferson Medical College, Thomas Jefferson University. Dr. Freedman also led discovery research efforts

in the pharmaceutical industry as a Vice President at Wyeth Pharmaceuticals and Executive Director at Merck

Research Laboratories. Before moving to industry, Dr. Freedman was a Member and Professor of Cell Biology &

Genetics at Memorial Sloan‐Kettering Cancer Center and Weil Cornell Medical College. There, Dr. Freedman and his

lab made several highly impactful discoveries in the area of nuclear hormone receptor structure and function.

Dr. Freedman has received numerous competitively funded grants, and has been the recipient of several research

honors, including the Boyer Award for Biomedical Research, and a MERIT award from the National Institutes of

Health. He was also the 2002 recipient of the Ernst Oppenheimer Award from The Endocrine Society. Dr.

Freedman has published extensively and served on numerous scientific review panels and editorial boards. He was

an editor of Molecular and Cellular Biology for ten years. In addition, Dr. Freedman has served on the Board of

Directors of the American Type Culture Collection (ATCC).

Dr. Freedman earned a B.A. degree in Biology from Kalamazoo College and a Ph.D. in Molecular Genetics from the

University of Rochester. He completed his post‐doctoral fellowship in the laboratory of Dr. Keith Yamamoto at the

University of California, San Francisco.

Irreproducible basic biological research is a tremendously expensive and global problem. The inability to reproduce

experimental data in preclinical studies has resulted in the invalidation of research breakthroughs, retraction of

published papers, abrupt discontinuation of clinical studies, and reduced trust in the research and development

enterprise. More importantly, valuable time and critical resources are wasted by irreproducibility as opportunities

to enhance human health are delayed or simply lost. Although the causes of irreproducible preclinical research are

complex, they can be traced to cumulative errors/flaws in one or more of the following areas: 1) study design, (2)

biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting. This

presentation will use examples of how biological reagents, specifically cell lines and antibodies, impact

irreproducibility in preclinical research and how the implementation of consensus‐based standards to authenticate

these critical and widely used reagents will lead to both increased rates of reproducibility and dramatic returns on

research funding investments.

Closing the Reproducibility Gap with Standards and Best Practices

Leonard P. Freedman, Ph.D

President

Global Biological Standards Institute, USA

25

Dr. Aleksandra Gruca obtained her PhD degree in technical sciences, specialty bioinformatics and works as an

assistant professor at the Institute of Informatics at the Silesian University of Technology (Gliwice, Poland). In her

PhD entitled Characterisation of gene groups using decision rules she developed a data mining system for

automated functional interpretation of the results of high‐throughput biological experiments. Currently her

research is focused on application of data mining methods for multi‐omics data integration, analysis and

interpretation. She is involved in cooperation with several polish clinical centres interested in development of such

methods to analysis of heterogonous cancer data in order to improve diagnostic, classification and treatment

personalisation.

As a co‐leader of a Community/platform‐building Working Group within COST Action CA15110 ‐ Harmonising

standardisation strategies to increase efficiency and competitiveness of European life‐science research (CHARME)

she is also interested in development and implementation of data reproducibility and standardisation practices in

life‐sciences.

Since 2010 she is a member of the Board of the Polish Bioinformatics Society, a scientific society with a mission of

support and popularisation of a bioinformatics in Poland. She is author or co‐author of almost 50 peer‐reviewed

publications in scientific journals and conference proceedings.

Acute lymphoblastic leukemia (ALL) is the most frequently occurring childhood cancer, comprising approximately

30% of all pediatric malignancies. Each year, diagnosis of ALL is established in approximately 200 children in Poland.

They are all treated uniformly according to European standards in the centers of the Polish Pediatric

Leukemia/Lymphoma Study Group (PPLLSG). The results of the last treatment protocol ALL‐IC BFM 2009 were

very good with overall survival >90%. Nevertheless, >10% of all patients suffered from ALL relapse and required

additional intensive treatment including hematopoietic stem cell transplantation (HSCT).

Here we present the information system and data workflow that is developed within the PersonALL project ‐ a

collaborative project among the PPLLSG centers that focuses on research on molecular mechanisms of ALL, aiming

into improved diagnostics, classification and, finally, treatment personalization.

The system is dedicated to store and analyze heterogeneous data collected from different clinical centers, integrate

results from molecular biology analysis and clinical information to assess the prognostic and therapeutic relevance

of different diagnostic parameters. Collected data will cover aspects such as genomics, transcriptomics,

PersonALL – towards treatment personalization in molecular diagnostics of acute lymphoblastic leukemia for Polish children

Aleksandra Gruca1, Roman Jaksik2, and Marek Sikora1,3

1Institute of Informatics, Silesian University of Technology, Gliwice, Poland 2 Institute of Automated Control, Silesian University of Technology, Gliwice, Poland

3 Institute of Innovative Technologies EMAG, Katowice, Poland

Aleksandra Gruca, Ph.D

Institute of Informatics, Silesian University of Technology

Gliwice, Poland

26

cytogenetics, fluorescent in situ hybridization (FISH) and immunophenotyping as well as selected clinical

information and applied therapy. The main challenge is to link all this heterogeneous information into a

comprehensive expert system, which enable rapid recognition of the features associated with treatment outcome.

The system will provide uniformed access to the data, allowing the biologists and clinicians to analyze it from

different aspects and summarize into useful information. The users will be provided with “analysis assistant” ‐ a set

of advanced analytical tools in a form of a simple GUI interface allowing statistical and data mining analysis.

Proposed workflow for data integration and analyses will allow grouping the patients according to certain criteria in

order to discover discriminant features for the groups (including the relevance of the features) related to the

treatment outcome.

The overall result of the PersonALL project will be development of innovative diagnostics of childhood ALL that

should enable more targeted therapies and lead towards improved treatment outcome.

27

Dr. Hatzis is an Associate Professor of Medicine at the Yale University School of Medicine. He has 20 years of

experience in senior research and management roles in biocomputational techniques, systems biology modeling,

genomic analysis and clinical diagnostics. He received his Ph.D. from the University of Minnesota and held several

senior research roles in the biotechnology industry. He has been the cofounder of two startup companies

specializing in bioinformatics tools development and in clinical diagnostics. Dr. Hatzis had been an active member of

the Biostatistics committee of FDA's Microarray QC program, co‐investigator on the NCI Cancer Biospecimen

Integrity program and an investigator of the Breast Cancer Research Foundation. Among his most significant

contributions are the co‐development with colleagues from MD Anderson of the RCB index, a continuous index of

residual disease in breast cancer, and the development of a gene‐expression based prognostic signature for

patients treated with standard chemotherapy that accounts for phenotypic differences and integrates endocrine

sensitivity, and chemotherapy response and resistance endpoints. Dr. Hatzis continues to be involved in the design

of biomarker validation clinical studies and development of strategies for translating genomic diagnostic assays to

clinical practice. His current research interests focus on developing methods to characterize the genetic and

molecular heterogeneity of breast cancer subtypes and the implications it might have on response and resistance to

treatment. A key area of interest is to develop methodology that integrates genomic level information of individual

patients to lead to more focused treatment decisions tailored for the individual tumor. Dr. Hatzis is serving as

academic editor on biomarker journals, has been a reviewer on NCI and NSF panes and is serving as ad‐hoc

reviewer on several bioinformatics and clinical journals.

Multi‐region sequencing is used to detect intratumor genetic heterogeneity (ITGH) in tumors. To assess whether

true ITGH can be distinguished from sequencing artifacts, we used whole‐exome sequencing (WES) of three

anatomically distinct regions of the same tumor, and also the same DNA twice to estimate technical noise. Somatic

variants were detected with three different WES pipelines (tumor only, cohort normal, matched normal) and

subsequently validated by high‐depth amplicon sequencing. The cancer‐only pipeline was unreliable, with about

69% of the identified somatic variants being false positive. Even with matched normal DNA where 82% of the

somatic variants were detected reliably, only 36%‐78% were found consistently in technical replicate pairs. Overall

34%‐80% of the discordant somatic variants, which could be interpreted as ITGH, were found to constitute technical

noise. Excluding mutations affecting low mappability regions or occurring in certain mutational contexts was found

to reduce artifacts, yet detection of subclonal mutations by WES in the absence of orthogonal validation remains

unreliable.

Reliability of Whole‐Exome Sequencing for Assessing Intratumor Genetic Heterogeneity in Breast

Cancer

Christos Hatzis, Ph.D

Associate Professor of Medicine

Director Bioinformatics

Breast Medical Oncology, Yale Cancer Center

Yale School of Medicine, USA

28

Dr. Jones is currently Principal Bioinformaticist and Scientific Advisor at Q2 Solutions | EA Genomics. He conducts

collaborative scientific research with clients in multiple areas, specially in oncology and immuno‐oncology. His

background includes leading the analysis, development and validation of the bioinformatic and computational

systems that process complex genomic assays, including next generation sequencing assays, evaluating new and

emerging genomic technologies, and developing bioinformatic implementation strategies. He consults with clients

and provides thought leadership in industry and public consortiums involved in genomic science and measurement.

Dr. Jones has over 15 years of experience in advanced genomic technologies and 20 years of experience in scientific

and technology leadership positions, including serving as Vice President of Statistics and Bioinformatics at

Expression Analysis, Inc (EA) and Chief Science Officer at Reliametrics, a Nortel Networks business unit. He has

authored over 30 peer‐reviewed publications

The landscape of proper processes and procedures for constructing and validating clinical‐grade bioinformatics

systems is sometimes muddled in clinical research and practice due to potential regulatory confusion regarding FDA

vs. CLIA oversight, the interactions between wet‐lab and dry‐lab methods, and an industry where traditionally the

software (if any) assessing clinically actionable analytes was tightly integrated with the laboratory device. This talk

will provide a quick overview of the relevant regulatory landscape as well as the risk‐based approaches taken by EA

Genomics, a business unit of Q2 Solutions, to address them. We will also discuss lessons learned in handling and

integrating custom and open‐source software, building appropriate validation datasets and scenarios, and

regulatory compliance.

Clinical‐grade Bioinformatics Systems: Overview and Lessons Learned

Wendell Jones, Ph.D

Principal Bioinformaticist and Scientific Advisor,

Q2 Solutions | EA Genomics,

Morrisville, North Carolina, USA

29

Paweł Łabaj has studied Computer Science in Medicine at Silesian University of Technology (Gliwice, Poland). For

his MSc thesis he was working on the project in The Institute of Medical Technology and Equipment (Zabrze, Poland)

where he has developed system of automatic analysis and pattern recognition using neural networks applied in

Fetal Heart Rate monitoring devices. He then has joined the Vienna Science Chair of Bioinformatics at Boku

University Vienna, where he obtained PhD in Bioinformatics on Measurement and data analysis in the face of noise

and complex backgrounds – Advances from improved bioinformatics algorithms. He was active member of FDA

MAQC‐III/SEQC consortium where he studied performance of the platforms and pipelines for high throughput

expression profiling. This experience has led to winning the APART Fellowship of Austrian Academy of Sciences as

well as recent competition for bioinformatics group leader at Malopolska Centre of Biotechnology (Krakow, Poland).

Dr Łabaj’s research focuses on consequences of gene vs. alternative transcript expression profiling, as well as on the

approaches for assessing performance / benchmarking of the platforms and analysis pipelines.

The MAQC/SEQC consortium has recently compiled a key benchmark that can serve for testing the latest

developments in analysis tools for microarray and RNA‐seq expression profiling. Such objective benchmarks are

required for basic and applied research, and can be critical for clinical and regulatory outcomes. It is invaluable in

times when about 90% of questioned scientists has confirmed that there is ‘reproducibility crisis’ in science and it is

estimated that 85% of research resources are wasted. This rich and publicly available benchmark enables to identify

the underperforming computational tools which are the major offender in science’s reproducibility crisis.

In our recent research work we are going beyond the first comparisons presented in the original SEQC study. We

have demonstrated the benefits that can be gained by analysing results in the context of other experiments

employing a reference standard sample. This allowed the computational identification and removal of hidden

confounders, for instance, by factor analysis. In itself, this already substantially improved the empirical False

Discovery Rate (eFDR) without changing the overall landscape of sensitivity. Further filtering of false positives,

however, is required to obtain acceptable eFDR levels. Appropriate filters noticeably improved reproducibility of

differentially expressed calls both across sites and between alternative differential expression analysis pipelines.

Power and limitations of RNA‐Seq ‐ Putting reproducibility to the test

Paweł Łabaj, PhD Austrian Academy of Sciences APART Fellow, Vienna, Austria Chair of Bioinformatics Rsearch Group, Boku University Vienna, Austria Bioinformatics Group Leader, Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland

30

Dr. Li is Associate director of National Center for Clinical Laboratories (NCCL) and director of the Department of

immunoassay and molecular diagnosis of NCCL. He got his Ph.D. from Peking Union Hospital College in July, 1993.

He is responsible for proficiency testing of immunoassay and nucleic acid testing of infectious diseases,

pharmacogenomics and tumor gene mutation detection for clinical laboratories in hospitals of China. His research

interesting is major in the methodology and standardization of molecular diagnosis and immunoassay. He had got

six grants from The National Natural Science Funds Fund, one grant from the National High Technology Research

and Development Program of China 863 program and AIDS and hepatitis, and other major infectious disease control

and prevention Program of China respectively as the project principal. He has published 108 papers in academic

journals (Ann Rheum Dis., Clin Chem., J Mol Diagn. , J Clin Endocrinol Metab., Int J Cancer, J Thorac Oncol. and so

on).

Precision oncology takes advantage of individual differences in a patient’s tumor biomarkers, which are associated with patient prognosis and tumor response to therapy, and applies this information to better inform medical care. Tumor biomarkers can be DNA, RNA, protein and metabolomic profiles (Panomic analyses) that predict therapy response. However, the most recent approach is the detection or sequencing of tumor DNA, which can reveal genomic alterations that have implications for cancer treatment.

Since 2012, the National Center for Clinical Laboratories has established more than 10 external quality assessment(EQA)/proficiency testing(PT) programs, which include gene mutations in EGFR, KRAS, BRAF, PIK3A, Her2

（FISH）, EML4‐ALK(FISH and RT‐PCR), BCR‐ABL(qRT‐PCR), ctDNA (ARMS, ddPCR and NGS) and multiple gene

detection by NGS，based on reference materials or controls developed in our laboratory. In the beginning of each program, only nearly half of participants get satisfactory results. Most of the participants reported false positive and false negative results, especially in false positive. Now, improvement of quality has been achieved greatly because of implement of quality control measures in clinical laboratories and training program of personnel. The EQA/PT program for bioinformatics pipeline (dry bench process) of NGS (whole whole genome sequencing, whole

exome sequencing and targeted sequencing）is being prepared based on the reference materials developed in our

laboratory.

Quality control and standardization of precision oncology related gene mutation detection in China

Jinming Li, Ph.D

Associated director of National Center for Clinical Laboratories

Beijing Hospital of the Ministry of Health,

Beijing, China

31

Dr. Nakae is Director‐General of an Japanese industrial consortium in the field of biotechnology, called JMAC (Japan

Multiplex bio‐Analysis Consortium). JMAC is a unique industrial consortium consisting wide variety of companies

including microarray manufactures, material providers, plastic‐processing technology providers, trading companies

and consultants, pursuing the common target, namely industrialization of biotechnology. The major activities of

JMAC are to support large research projects from the standpoint of quality control and to develop international

standard by taking in the outcome of the project works. The representative project is “Project focused on

developing key technology for discovering and manufacturing drugs for next‐generation treatment and diagnosis”

supported by AMAD in JAPAN. He is leading the development of the quality control system including miRNA

standard materials collaborating with AIST (National Institute of Advanced Industrial Science and Technology) in this

project. For the development of international standards, he is leading and supporting the development of over 10

ISO standards in broad areas of industries. He is an expert member of TC 212 (Clinical laboratory testing and in vitro

diagnostic test systems), TC 34/SC 16 (Horizontal methods for molecular biomarker analysis), TC 276

(Biotechnology), TC 229 (Nanotechnologies) and a formal liaison observer among these committees and to another

TC and SC such as TC 34/SC 9 (Microbiology), TC272 (Forensic sciences). Dr. Nakae is also an assessor of medical

laboratory accreditation program based on ISO 15189, belonging to Japan Accreditation Board (JAB).

The MAQC/SEQC project had been started in order to discuss the quality issues in submitted data for the

application of drug approval. In the project, multi‐platform measurement data were analyzed and discussed by

broad approaches including multi‐laboratory testing, software pipeline comparison and statistical analysis of the

outcomes of the genome‐wide analysis systems.

One of the goals of the project was to reach a consensus for the emerging technologies to ensure the quality of

data and to maintain the compatibility to understand the accuracy of the submitted data. For precision medicine,

companion diagnostics play an important role for selecting medicines for each person based on his/her genetic

background. In this sense, the accuracy of IVD is the key issue for the realization of precision medicine.

Quality control of IVD needs another aspect other than the issues discussed in the series of discussion in

MAQC/SEQC. The testing is performed not only in selected high‐level laboratories, e.g. laboratories of

pharmaceutical companies, but also in a large number of small clinical laboratories all over the world. The

standards play a significant role in controlling such general laboratory works for realizing the society of precision

medicine. For example, ISO 15189 clearly states “The laboratory shall validate examination procedures derived

from the following sources; a) non‐standard methods; …”, and “Validated examination procedures used without

modification shall be subject to independent verification by the laboratory…”. In addition, the procedure to record

“the metrological traceability of the calibration standard and the calibration of the item of equipment” shall be

International standardization activity on emerging technologies for medical and food industries – An

ISO perspective

Hiroki Nakae, Ph.D Director‐General Japan Multiplex bio‐Analysis Consortium (JMAC) Tokyo, Japan

32

documented. Thus, the reference material is the key tool not only for equipment calibration, but also for preparing

the quality control materials to be used for each measurement method. Standardization would be helpful for such

kind of quality control in clinical laboratories for sustaining compatibility of the results of molecular testing.

Presently, many standardizations of molecular testing are going forward in the ISO world.

ISO is an international organization for standardization. It develops and publishes International Standards in many

fields of technologies other than the electrical industry. In order to cover the wide variety of industries, specific TCs

(Technical Committees) are formed on the technology and industry‐field base. Member bodies (countries) assign

experts for each TC, who participate in discussions to develop International Standard documents are developed by

the assigned experts according to ISO directives.

Japan Multiplex bio‐Analysis Consortium (JMAC) was mainly established to actively engage in the development of

ISO standards by providing formal experts to the ISO/TCs and advising any other experts in the field of

biotechnology. JMAC has provided such activities to TC 212 (Clinical laboratory testing and in vitro diagnostic test

systems), TC 34/ SC 16 (Horizontal methods for molecular biomarker analysis), TC 276 (Biotechnology), and TC 229

(Nanotechnologies).

Emerging technologies related to MAQC / SEQC meetings are also discussed In the ISO world. Especially, the major

technology NGS is under discussion in TC 212, TC 34/SC 16, TC 34/SC 9 (Microbiology), and TC 276. Briefly, the

preparation for starting a formal development of guidance documents related to the introduction of emerging

technologies into clinical laboratories has been underway in TC 212. In TC 34/SC 16, a document for application of

NGS to identify animal species in food and feed has been prepared. In TC 34/SC 9, a standard for the whole genome

sequencing of foodborne pathogen genome mainly by NGS is being discussed. In TC 276, two documents for NGS

are currently under development. One document is ISO/AWI 20397‐2, entitled “Biotechnology ‐‐ General

requirements for massive parallel sequencing ‐‐ Part 2: Methods to evaluate the quality of sequencing data”, and

the other is a document for the pre‐analysis phase of NGS analysis. Details of these works will be introduced in my

talk.

The emerging technology including NGS or massive parallel sequencing is a powerful tool for many industrial fields

including medical and food industries. For industrial use of emerging technologies, quality is a very important issue

that should be discussed, not only for approval of IVDs, but also for the daily management of the test results,

namely “from approval to lab”. MAQC has been focusing on the development of regulation science regarding

quality control of data for application. Now it should expand its scope to the infrastructure for controlling the

quality of testing in clinical laboratories. MAQC/SEQC should carefully watch the ISO standardization activity and

collaborate with them for future works in order to achieve the common goal; assurance of test quality for emerging

technologies.

33

Professor Sir Munir Pirmohamed (MB ChB, PhD, FRCPE, FRCP, FBPhS, FMedSci) is currently David Weatherall Chair

in Medicine at the University of Liverpool, and a Consultant Physician at the Royal Liverpool University Hospital. He

is also the Associate Executive Pro Vice Chancellor for Clinical Research for the Faculty of Health and Life

Sciences. He also holds the only NHS Chair of Pharmacogenetics in the UK, and is Director of the M.R.C. Centre for

Drug Safety Sciences, Director of the Wolfson Centre for Personalised Medicine and Executive Director, Liverpool

Health Partners. He was awarded a Knights Bachelor in the Queen’s Birthday Honours list in 2015. He is also an

inaugural NIHR Senior Investigator, and Fellow of the Academy of Medical Sciences in the UK. He is also a

Commissioner on Human Medicines. His research focuses on personalised medicine in order to optimise drug

efficacy and minimise toxicity, move discoveries from the lab to the clinic, and from clinic to application. He has

authored over 420 peer‐reviewed publications, and has a H‐index of 85.

Pharmacogenomics is the study of how genetic variation affects drug responses. Precision Medicine is a wider term

also encompassing other technologies that personalise therapeutic and preventive approaches. However,

pharmacogenomics is an important component of precision medicine and needs to be considered alongside other

aspects to ensure patients get the right drugs at the right time and at the right dose.

Pharmacogenomic variation can affect drug pharmacokinetics or drug pharmacodynamics, and is important for

both drug efficacy and drug safety. Crucial to both is precision dosing. Individual dose requirements vary, but are

not currently accounted for in clinical practice. The one‐dose‐fits‐all paradigm leads to marked variability in

exposure, and therefore in drug responses. Development of novel dosing algorithms and their clinical

implementation is now being undertaken for certain drugs such as warfarin.

The greatest success in efficacy studies for pharmacogenomics has been in the development of targeted agents in

cancer and rare diseases. In the former, drugs targeting somatic mutations have had major impact in certain

malignancies, but the challenge for the future will be to ensure the use of combinations of drugs to ensure that any

response is durable. In rare diseases, drugs targeting novel mutations, for example ivacaftor for the G551D

mutation in cystic fibrosis, have had a transformational effect on patients’ lives.

There have been important advances for drug safety pharmacogenomics, notably in determining the role of HLA

gene polymorphisms in predisposing to serious immune mediated adverse drug reactions. To date almost 30 novel

HLA allele associations have been identified, and this number is increasing. Non‐HLA gene polymorphisms are also

being identified now, and this will be an area of growth in the future.

In the future, as more patients have their whole genome sequence, the challenge will be to ensure that both

common and rare genetic variation is taken into account in optimising drug responses. With all these

developments, the greatest challenges for all healthcare systems will be to deliver these advances in a cost‐

effective and sustainable manner that is acceptable to both patients and the public.

Pharmacogenomics and precision medicine ‐ current and future perspectives

Professor Sir Munir Pirmohamed, MB ChB, PhD, FRCPE, FRCP, FBPhS, FMedSci

David Weatherall Chair of Medicine and NHS Chair of Pharmacogenetics

University of Liverpool, UK

34

Dr. Mehdi Pirooznia is the Director of Bioinformatics and Computational Biology (BCB) Core Facility at the National Heart Lung and Blood Institute at the NIH (NHLBI/NIH). The BCB core operates with the goal to facilitate and accelerate biomedical research and discovery through the application of computational and statistical tools in the analysis and interpretation of high‐throughput and high‐dimensional biological data. Dr. Pirooznia supervises and spearheads this effort by providing bioinformatics analyses support for intramural scientists in life sciences, clinical and translational research. In particular, the BCB core specializes in analyses pertaining to next‐generation sequencing and biomedical informatics in genomics, transcriptomics, epigenomics and disease biomarkers. Towards this end, Dr. Pirooznia’s team takes an integrative approach to incorporate site‐specific sequence variations changes with gene expression and proteomics data to investigate molecular mechanisms underlying disease progression and treatment responses. Dr. Pirooznia has published many articles in peer‐reviewed journals and serves as an editor and reviewer for scientific journals. Dr. Pirooznia is also an Adjunct Assistant Professor at the Johns Hopkins University School of Medicine, where he served for 8 years as a faculty prior to joining the NIH in 2016, and provided leadership, scientific direction and was responsible for implementing the high performance computational laboratory and bioinformatics system.

The evolution of subclones during cancer progression due to accumulation of a number of somatic mutations

represents Intra‐tumor heterogeneity. Despite recent advances, determination of sub‐populations within a tumor

remains a challenge. Here, we address this problem through designing a computational workflow for identifying the

sub‐populations within a tumor. The workflow infers clonal populations and their frequencies from bulk tumor

samples. It profiles a reliable set of for somatic copy number alterations and point somatic mutations along with

allele‐specific coverage ratios between the tumor and matched normal sample, estimates cellular fractions of them,

identifies and evaluates the clustering of the mutations, infers clonal ordering, and visualizes and interprets the

results. The analysis workflow will be presented in detail as well as results from simulated datasets and NGS

sequencing data from a CLL cancer study, to demonstrate the efficiency of the analysis pipeline.

A General Framework for Analysis of Clonal Heterogeneity and Tumor Evolution

Mehdi Pirooznia, MD., PhD.

Director, Bioinformatics and Computational Biology Core Facility

National Heart Lung and Blood Institute of National Institutes of Health, USA

35

Dr Pusztai is Professor of Medicine at Yale University, Director of Breast Cancer Translational Research and Co‐

Director of the Yale Cancer Center Genomics Genetics and Epigenetics Program. He is also Chair of the Breast

Cancer Research Committee of the Southwest Oncology Group (SWOG). Dr. Pusztai received his medical degree

from the Semmelweis University of Medicine in Budapest, and his D.Phil. degree from the University of Oxford in

England. His research group has made important contributions to establish that estrogen receptor‐positive and‐

negative breast cancers have fundamentally different molecular, clinical and epidemiological characteristics. He has

been a pioneer in evaluating gene expression profiling as a diagnostic technology to predict chemotherapy and

endocrine therapy sensitivity and have shown that different biological processes are involved in determining the

prognosis and treatment response in different breast cancer subtypes. He made important contributions to clarify

the clinical value of preoperative (neoadjuvant) chemotherapy in different breast cancer subtypes. Dr Pusztai is also

principal investigator of several clinical trials investigating new drugs, including immunotherapies for breast cancer.

He has published over 250 scientific manuscripts in high impact medical journals including the NEJM, JAMA, Journal

of Clinical Oncology, Nature Biotechnology, PNAS, Lancet Oncology and JNCI. He is among the top 1% most highly

cited investigators in clinical medicine according to a 2015 Thomson Reuters report. He is member of the Scientific

Advisory Board of the Breast Cancer Research Foundation and a Susan Komen Scholar.

Tumor‐infiltrating lymphocyte (TIL) count and gene expression signatures that reflect the extent of immune

infiltration in the tumor microenvironment has long been recognized as prognostic markers in early stage triple

negative (TNBC), HER2 positive, and highly proliferative estrogen receptor (ER) positive breast cancers. Extensive

immune infiltration in the tumor microenvironment also predicts for greater chemotherapy sensitivity. It has been

suggested that high mutation load and consequently, large number of potentially immunogenic new antigens drive

immune infiltration in cancer. However, in breast cancer higher TIL counts and greater immune metagene

expression is associated with significantly lower clonal heterogeneity in all breast cancer subtypes and with a trend

for lower overall mutation, neoantigen and CNV loads in TNBC and HER2+ cancers. The high immune gene

expression and lower clonal heterogeneity suggest an immune pruning effect and equilibrium between immune

surveillance and clonal expansion. This suggests that anti‐tumor immune surveillance in immune‐rich tumors leads

to elimination of clones, lower clonal heterogeneity and “simpler” genomes. The surviving neoplastic cell

population exists at a near equilibrium with the immune surveillance explaining the better prognosis of these

cancers. The higher genomic diversity of immune‐poor TNBC suggest escape from immune surveillance and

genomic diversification. When we examined the immune microenvironment of paired primary tumors and

metastasis, most immune cell subtypes, immune functions, and immune‐associated gene expression were lower in

metastases compared to primary tumors, consistent with immune escape. These immunological differences suggest

that immunotherapy will be more effective in early stage disease than in metastatic cancers. While breast cancer

metastases are immunologically more inert than the corresponding primary tumors, several immune‐oncology

targets, macrophage and angiogenesis signatures show preserved expression in metastases suggesting rational

therapeutic combinations for clinical testing.

Evolution of the breast cancer genome under immune surveillance

Lajos Pusztai, M.D, D.Phil.

Professor of Medicine and Director of Breast Cancer Translational Research

Yale Cancer Center

Yale School of Medicine, USA

36

Prof. Assunta‐Sansone’s activities are in the areas of knowledge and information management, and interoperability of applications, impacting on the reproducibility of research outputs and the evolution of scholarly publishing. Prof. Sansone seats on the board of several non‐for‐profit efforts, and she is a consultant for Springer Nature and Honorary Academic Editor of the Scientific Data journal. She leads the Centre in several UK, European, NIH and pharma‐ funded projects in the life and biomedical sciences, and is a founding member of the ELIXIR UK Node, where she is responsible for standards and curation areas. Working with and for data producers and consumers, service providers, pre‐competitive informatics initiatives, journals and funding agencies, she strives to make digital research objects Findable, Accessible, Interoperable and Reusable = FAIR. She holds a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London; after few years working on vaccine genetics in an Imperial's spin off she moved to the European Bioinformatics Institute (EBI, Cambridge) where she worked for nine years as a Project and Team Coordinator and Principal Investigator before moving to Oxford in 2010.

A growing worldwide movement for reproducible research encourages making data, along with the experimental details, available according to the FAIR principles of Findability, Accessibility, Interoperability and Reusability (see http://www.nature.com/articles/sdata201618). Several data management, sharing policies and plans have emerged and, in parallel, a growing number of community‐based groups are developing hundreds of standards to harmonize the reporting of different experiments. Community mobilization is evident also by the number of efforts and alliances, but also data journals and data centres being launched. I will paint this dynamic landscape, highlighting NIH Data Commons and ELIXIR related activities, including FAIRsharing (https://fairsharing.org) and their role in scholarly communication.

The FAIR principles: Findability, Accessibility, Interoperability and Reusability of the research assets

Susanna‐Assunta Sansone, Ph.D

Associate Professor and Associate Director

Oxford e‐Research Centre,

Engineering Science Department,

University of Oxford, UK

37

Dr. Terry Speed completed a BSc (Hons) in mathematics and statistics at the University of Melbourne (1965), and a

PhD in mathematics at Monash University (1969). He held appointments at the University of Sheffield, U.K. (1969‐

73) and the University of Western Australia in Perth (1974‐82), and he was with Australia’s CSIRO between 1983

and 1987. In 1987 he moved to the Department of Statistics at the University of California at Berkeley (UCB), and

has remained with them ever since. In 1997 he took an appointment with the Walter & Eliza Hall Institute of

Medical Research (WEHI) in Melbourne, Australia, and was 50:50 UCB:WEHI until 2009, when he became emeritus

professor at UCB and full‐time at WEHI, where he headed the Bioinformatics Division until 2014. His research

interests lie in the application of statistics to genetics and genomics, and to related fields such as proteomics,

metabolomics and epigenomics, with a focus on cancer and epigenetics.

In a landmark 1959 paper (Technometrics 1: 251‐267) entitled “The Measuring Process”, the statistician John

Mandel from the US National Bureau of Standards in Washington, DC presented the theory for what he later called

the “row‐linear model” for the analysis of two‐way arrays of measurements made on the same set of units

(materials) across a number of laboratories. A companion paper published at the same time in the American

Society for Testing Materials (ASTM) Bulletin discussed practical and computational aspects of his method.

Mandel’s focus was interlaboratory studies of a single test method, and his method is now embodied in ASTM

Standard E691. Interestingly, his model can also be used with data from studies of different measurement methods

in a single laboratory, or studies involving multiple methods and laboratories. Note that in Mandel’s interlab

studies, a single measurement (perhaps replicated) such as of sulfur in petroleum was taken on each unit in each

lab.

All of the preceding notions can be applied with little change to the MAQC enterprise, where method is replaced by

platform (e.g. one of several microarray platforms, sequencing or qrt‐PCR assays). But there is one major difference:

instead of a single measurement being taken on each unit (e.g. sample of cells) in each lab or by each platform, as

was normal 60 years ago, these days we might take hundreds (qrt‐PCR), thousands (gene expression microarrays),

or millions (DNA methylation) of measurements on each unit in each lab with each platform. Mandel’s method

remains relevant, and illuminating, but we need to address the multiplicity of measurements on each unit. This talk

will summarize a recent study involving measurements on cell samples using multiple microarray and sequencing

assays measuring gene expression and DNA methylation, where we have adopted Mandel’s row linear model to the

‘omics era. Also, relevant to MAQC, Mandel’s approach works best with a good number of different samples

spanning a wide range of measurement values, and it becomes stronger the more labs or platforms are used on the

these samples.

Using Mandel’s row linear model in the ‘omics era

Terry Speed, Ph.D.

Walter and Eliza Hall Institute of Medical Research

Australia

38

After graduating from Shanghai Medical University in 1985, Dr. Shao underwent surgical residency training at the

Cancer Hospital affiliated with Shanghai Medical University. From 1990 to 1995, he completed his postdoctoral

research at the University of Maryland Cancer Center in the U.S. He was a visiting scientist to the Breast Center at

the University of California, Los Angeles from 1999 to 2001. In the year 2000, Dr. Shao was appointed Chairman of

the Department of Breast Surgery at the Cancer Hospital/Cancer Institute affiliated with Shanghai Medical

University. He was elected as the Director of Fudan University’s Breast Cancer Institute in 2002. In 2005, he became

the Chairman of the Chinese Anti‐Cancer Association’s Breast Cancer Society. Since his appointment in 2006, Dr.

Shao has been the Chairman of the Department of Surgery at the Cancer Hospital/Cancer Institute affiliated with

Fudan University, and he was appointed as the Director of Fudan University’s Breast Cancer Institute in 2012.

Dr. Shao's research focuses on the translational and clinical research of breast cancer, especially upon breast cancer

susceptibility and metastasis. Either as the principal investigator or in collaboration with others, he has conducted

excellent research in these areas. Dr. Shao has published over 300 articles in the field of breast cancer research,

which have been cited more than 3,000 times all over the world.

Breast cancer is the most common cancer diagnosed in women, and approximately one in eight women living in the

United States has a lifetime risk of developing the disease. Breast cancer is also one of the most studied solid

tumors and has the potential to being tractable to a precision medicine approach. It has been well established that

these tumors are extremely heterogeneous, so intra‐ and inter‐tumoral heterogeneity are the basic characteristics

of breast cancer. According to gene expression profiles, breast cancer can be divided into several molecular

subtypes, including luminal breast cancer, HER2‐positive breast cancer, and triple‐negative breast cancer (TNBC).

For each subtype, there are unique tumor biology characteristics and treatment strategies. Intrinsic subtypes are

associated with different gene expression and mutation profiles as well as different prognoses and responses to

therapies. In the era of precision medicine, therapies are being developed using the framework of molecular

subtyping. Here, we summarize the major challenges and possible solutions of treating the intratumoral

heterogeneity of breast cancer.

Intratumoral heterogeneity and clonal selection of breast cancer

Zhimin Shao, Ph.D

Director of Fudan University’s Breast Cancer Institute

Fudan University, China

39

Dr. Chunlin Xiao is a Staff Scientist at National Center for Biotechnology Information (NCBI), National

Institute of Health (NIH). His primary role is to deal with large scale sequencing data analysis and

management involving next‐generation sequencing technologies, such as 1000Genomes Project,

Genome‐in‐a‐Bottle project, and Sequence Quality Control project. His research interests include

population genetics, reference material and reference sequence dataset development, structural

variation detection method development, and cloud computing.

Structural variations (SVs) contribute to genetic diversity of human populations, affect biological functions, and

cause various human disorders. However, accurately identifying SVs with correct sizes and locations in the human

genome remains challenging due to the complexity of the human genome, limitations of sequencing technologies,

and drawbacks of analysis methods. The advancement of next‐generation sequencing technologies has dramatically

decreased the sequencing cost, while substantially increased the lengths of the sequencing reads. Thus, using de

novo assembly based approaches for discovering a full spectrum of SVs in human genome becomes appealing.

While various assembly methods have been developed and proposed for general use by the community, the

relative efficiency and predictive accuracy of SVs calling based on these assembly methods have not been fully

evaluated. In this study, we applied several popular de novo assembly tools to the sequencing read data that were

generated using multiple sequencing technologies with technical replicates for NA12878/HG001, a well‐studied

individual from NIST‐led Genome‐in‐a‐Bottle (GIAB) project; an HapMap Caucasian trio and a Chinese Quartet from

FDA‐led Sequencing Quality Control Phase II (SEQC2) project. Assemblies and SVs callsets were generated for each

of the samples, and repeatability in the SVs of the technical replicates and reproducibility across sequencing sites

were evaluated. These results allow better understanding of the impacts of de novo assembly methods on SVs

calling, thus providing a better insight to precision medicine.

Performance assessment of de novo assembly‐based structural variation detection in the human

genome

Chunlin Xiao, Ph.D

National Center for Biotechnology Information (NCBI)

National Institute of Health (NIH), USA

40

Dr. Jun Ye is co‐founder, president, and CEO of Sentieon, a bioinformatics company established in 2014. Prior to

Sentieon, Ye was co‐founder, president, and CEO of Founton Technologies, a company that specialized in

datamining, which is now part of Alibaba Group. Prior to Founton, Ye was co‐founder, president, and CTO of Brion

Technologies, a company specializing in computational lithography for semiconductor manufacturing, now part of

ASML. Prior to Brion, he was director of engineering at Onetta, an optical telecom company, working on

communication system control. He also served as director of engineering at KLA‐Tencor, where he worked on the

software and algorithm for mask inspection. From 2001 to 2015, Ye also served as a consulting professor of

electrical engineering at Stanford University, where he mentored and supervised graduate student research in

microlithography and other areas. Ye earned BSEE from Fudan University in 1987, MS‐Physics from Iowa State

University in 1991, and Ph.D. EE from Stanford University in 1996. During his career, he has authored or co‐

authored more than 50 U.S. patents covering algorithm, software, hardware, and system architecture. In 2014 he

received the ISU John V. Atanasoff Discovery Award for his work to advance scientific knowledge.

Sentieon (www.sentieon.com), incorporated in 2014, develops and supplies a suite of bioinformatics secondary

analysis tools that process genomics data with high computing efficiency, fast turnaround time, exceptional

accuracy, and 100% consistency. Current released products include Sentieon DNAseq, a germline DNA pipeline, and

Sentieon TNseq and TNscope, for tumor‐normal somatic variant detection. The Sentieon tools are easily scalable,

easily deployable, easily upgradable, software‐only solutions. The Sentieon tools achieve their efficiency and

consistency through optimized computing algorithm design and enterprise‐strength software implementation, and

achieve high accuracy using the industry’s most validated mathematics models. Sentieon products have won

multiple top awards at precisionFDA challenges, and ranked first place on the most recent ICGC‐TCGA DREAM

Mutation Calling challenge leaderboard in all three categories (snv, indel, SV). We strive to enable

precision genomics data for precision medicine.

Enable Precision Data for Precision Medicine

Jun Ye, Ph.D

Chief Executive Office and Co‐Founder

Sentieon, USA

41

Session 4: Speakers’ Biographies

and Proposed Society Projects

42

Project #1: Computational Reproducibility

Project coordinator: Benjamin Haibe‐Kains, [email protected]

How to participate: Participants must propose studies, either their own or independent ones, for which they plan

to reproduce all the computational analysis results (figures, tables and supplementary materials). Pointers to raw

data, processed data, analysis code, documentation and tutorial must be submitted to the project organizers.

Objectives: The goal of this project is to provide practical examples and guidelines to help scientists make their own

research fully reproducible. From the diversity of studies and tools will emerge templates that could be used to

ensure full reproducibility of future studies.

Background: Biomedical science is undergoing a “reproducibility crisis” where the results of many studies cannot be

reproduced by independent investigators, or even the original authors. While reproducing biological experiments is

difficult, computational analyses can be reproduced. However, the amount of data and the complexity of the

analysis pipeline is ever increasing, making computational reproducibility challenging. There exist many ways to

make the computational analyses of a given study fully reproducible. Size and accessibility of the data, software

tools and computing resource requirements are among the factors that will define how an analysis can be made

fully reproducible. There is a dire need for practical guidelines to make biomedical studies more reproducible.

Specific Aims

Aim 1: Identification of candidate studies. The MAQC Society asks its members to share examples of manuscripts

that can be fully reproduced.

Benjamin Haibe‐Kains, Ph.D Scientist, Princess Margaret Cancer Center, University Health Network Assistant Professor, Department of Medical Biophysics, University of Toronto Adjunct Professor, Department of Computer Science, University of Toronto OICR Associate, Ontario Institute of Cancer Research Toronto, Canada Dr. Haibe‐Kains earned his Ph.D in Bioinformatics at the Université Libre de Bruxelles (Belgium), for which he

was awarded the Solvay Award (Belgium). Supported by the Fulbright Award, Dr. Haibe‐Kains did his

postdoctoral fellowship at the Dana‐farber Cancer Institute and Harvard School of Public Health (USA). He

started his laboratory at the Institut de Recherches Cliniques de Montréal (Canada) and moved to PM in

November 2013. His research focuses on the integration of high‐throughput data from various sources to

simultaneously analyze multiple facets of carcinogenesis. His team is analyzing high‐throughput

(pharmaco)genomic datasets to develop new prognostic and predictive models and to discover new therapeutic

regimens in order to significantly improve disease management. Dr. Haibe‐Kains’ main scientific contributions

include several prognostic gene signatures in breast cancer, subtype classification models for ovarian and breast

cancers, as well as genomic predictors of drug response in cancer cell lines.

43

Aim 2: Reproducing the studies. The data must be freely accessible, the code must be open‐source and documented,

and a tutorial describing how to rerun the analyses to generate the figures and tables of the manuscript must be

provided with the submission.

Aim 3: Generating guidelines and templates. The set of studies that have been successfully reproduced will be used

to generate guidelines and templates to help the community in their quest for full computational reproducibility.

Study Design

The Code Ocean platform (codeocean.com) will be used to store the code, processed data and all the software

dependencies. Code Ocean allows to create a Docker virtual machine for each project, ensuring that anybody can

easily run the code and reproduce all the analysis results.

Timeline

April 1st: Submission of the list of proposed studies.

April 16th: Selection of the proposed studies.

August 31st: Submission of the Code Ocean instance (pointers to raw data, processed data, code, documentation)

and tutorial

October 15th: Reproduction of the study results using the participants’ Code Ocean instance

November 26th: Selection of the three top submissions

44

Project #2: An Internationally‐conducted External Quality Control scheme on Machine Learning

Algorithms to assess Tumor Infiltrating Lymphocytes in Breast Cancer

Project coordinator: Roberto Salgado, [email protected]

How to participate: all groups with documented analytically validated machine learning tools are welcome to

participate. Interested groups need to provide to the coordinators a motivated request to participate, with

documentation of the analytical validity of the method they are going to apply for this program. The information

provided by the groups will be considered strictly confidential. The method that is going to applied for this

assessment needs to be locked, may not be changed during the assessment and should be described in detail in

order to avoid implicit overfitting.

Objectives/Goals:

To set quality standards and performance metrics on machine learning algorithms before introduction in a

clinical trial setting and/or daily practice setting.

To set quality standards and performance metrics that can be used by regulatory agencies to certify machine

learning algorithms for use in patient management.

To develop a framework for comparison of machine learning algorithms to determine precise quantitative

metrics of other breast cancer biomarkers, like Ki67.

Roberto Salgado, Ph.D

Department of Pathology/GZA, Antwerp

Breast Cancer Translational research Laboratory

Jules Bordet Institute, Brussels, Belgium

Translational Breast Cancer Genomic and Therapeutics Laboratory of

the Peter Mac Callum Cancer Center, Melbourne, Australia

Dr. Roberto Salgado is board certified in Anatomic Pathology since 2006, has obtained his medical training at

the University Hospital of Antwerp (Belgium) and the University Hospital in Leiden (The Netherlands). A PhD‐

thesis was obtained working with the Translational Cancer Research Group of the AZ Sint‐Augustinus

Hospital/Antwerp and at the Department of Pathology at the University Hospital of Antwerp, studying the

interactions of hemostasis and angiogenesis in breast cancer. His training in Anatomic and Molecular

Pathology took place at the University Hospital Antwerp, the University Hospital Leuven and at the Jules

Bordet Institute, Brussels, Belgium. Currently he works as a Pathologist in Antwerp, is a scientific collaborator

with the Breast Cancer Translational research Laboratory of the Jules Bordet Institute, the Immuno‐Task

Force of the Breast International Group in Brussels, the Translational Breast Cancer Genomic and

Therapeutics Laboratory of the Peter Mac Callum Cancer Center, Melbourne, Australia, and he works in close

collaboration with the EORTC, of which he’s co‐leading the development of the Specta‐trial concept. He is

also an auditor on Molecular Pathology/Genetic laboratories for the Federal Belgian Government.

45

Background:

At present, in early‐stage disease clinico‐pathological risk stratification is performed using a limited set of features

such as tumor size and lymph node status. Very large adjuvant trials such as ALTTO and APHINITY that have applied

these stratification schemes have illustrated the key problems with the current classification scheme – it does not

stratify patients with sufficient granularity to permit selection for clinical trials. The current scheme also takes the

approach of placing patients on a continuum of risk. This is at odds with results from high‐throughput technologies

such as gene expression profiling and genomic assays, which focus on identifying individual patient groups with

particular clinical behaviour. Several results in this area have identified genomic, transcriptomic or proteomic

features which in hindsight are associated with particular histological features. This suggests that the histological

appearance of a tumor represents a useful cancer phenotype which can be further explored, and contribute to

staging and stratification.

Machine learning refers to the general computational approach whereby data is used by algorithms to develop

predictive models. These models are finely tuned to optimize accuracy and generalizability as applied to new data.

Although machine learning existed for some time, more recently, advances in algorithm development and

hardware infrastructure has enabled ‘deep learning’ approaches. Deep learning was originally designed to mimic

the neural architecture of the human brain, and conceptually uses a series of connected nodes (neural nets) which

respond to input in a way that is tuned with repeated cycles of learning. Neural nets have the ability to learn rich

representations of complex data, which may contain hierarchical and non‐linear relationships. These abilities make

neural nets ideally suited to image classification. They have exhibited spectacular results in this area, often

matching the performance of experts in the field or exceeding it (superhuman capabilities).

With recent advances in deep learning provides a path forward for numerous applications in digital pathology. On

one level, the robust performance and training characteristics of deep learning allows us to develop accurate

automated assays for pathological features such as grade and lymphocyte infiltration. These have the potential to

be ‘learn once, apply everywhere’. This is in contrast to existing imaging methods, which lack the precision and

robustness to be used in the clinical setting. If the promise of deep learning can be validated, the use of digital

pathology would aid pathologists in routine reporting, and could be expected to improve the validity of current

pathology based clinico‐pathological features. In the short term, digital pathology would also help standardize

pathology results within and across trials given the time required for pathology assessed quantitative metrics.

TILs have been shown to be a reliable and reproducible marker of tumor immunogenicity in breast cancer. It is clear

that higher levels of TILs are associated with improved prognosis in early stage TNBC and HER2‐positive breast

cancer, as well as a higher probability of achieving pCR in the neoadjuvant setting. Analysis of TILs in residual

disease specimens after neoadjuvant therapy has also been shown to have prognostic value. The evaluation of TILs

as a biomarker in breast cancer is likely to be extended from the research domain to the clinical setting in the near

future. The assessment of TILs by digital image analysis might be useful for standardization in the future, since this

approach has the potential, for example, to determine the number of TILs per mm² stromal tissue as an exact

measurement contrary to the approximate semi‐quantitative evaluation suggested at this moment. In the first

International Guidelines on TIL‐assessment in breast cancer we proposed to develop an inter‐laboratory Ring study

to assess the reproducibility and clinical validity of TILs assessment, including machine learning algorithms. While

TILs have been measured morphologically and have been shown to add predominantly prognostic information,

methodological open questions in the morphological evaluation of TILs still remain, for example the assessment and

importance of spatial TIL‐heterogeneity. The measurement on H&E‐stained slides most likely represents the

46

beginning of the efforts to use infiltrating cell properties as companion diagnostic tests. Thus, as a field, we should

be open to the introduction of molecular methods, most likely in situ, that can classify the TILs‐component and

bring higher levels of information to the patient sample. However, at this time, these deep learning approaches are

still experimental and not sufficiently documented for introduction into standard practice.

On another level however, deep learning also permits discovery of image based features which may be very difficult

for current approaches to identify, particularly if they only exist in small groups of patients. The key benefit of deep

learning here is to rapidly identify pathological features in clinical trials that are predictive of treatment or

prognostic of outcome in a standardized way. This is an essential first step in deciding if previously undescribed

pathological features are clinically relevant, and is largely infeasible using current approaches. Deep learning also

permits modification and retraining of the feature set to optimize accuracy and interpretability, which is again

infeasible with current methods.

The Working Group is therefore proposing a collaboration with the Massive Analysis and Quality Control

Consortium (www.maqcsociety.org) characterizing tumor infiltrating lymphocytes using machine learning

algorithms. Developing a machine learning based assay for tumor infiltrating lymphocytes would enable rapid

expansion of this promising pathological feature, and by providing an adjunct to human pathologists, enhance the

validity and robustness for prognosis/prediction.

Specific Aims:

Comparison of the machine learning image classification metrics with those of pathologists in the RING‐study

which the Working group has published (Carsten Denkert et al., Mod. Pathol. 2016)

Comparison of automated TILs scoring with pathologist scoring results in different settings, namely core

biopsies, full sections, pre‐invasive (DCIS), untreated and treated tumors.

Comparing the performance of deep learning approaches to identify complex features such as clustering/spatial

statistics including proximity of TILs to cancer cells that are prognostic of outcome or predictive of treatment.

Comparing the clinical validity of different machine learning algorithms, the utility of combining models for

improved accuracy and to identify possible false positives and false negatives.

To combine annotated training data from different sites to create a comprehensive breast cancer ML training

and validation data base hosted by the consortium.

A framework will be developed to facilitate automated testing, validation and certification of image

classification derived pathology metrics that can improve the standard of care.

Develop together with both groups a kind of review, perspective or opinion paper on the use of Artificial

Intelligence/machine learning tools in Oncology, focusing but not exclusively on TILs and including the quality

requirements for use of these technologies in a clinical trial and daily practice setting, similar in kind as the Lisa

Mc Shane paper in Nature Criteria for the use of omics‐based predictors in clinical trials, doi:

10.1038/nature12564, Nature 2013, which is an exercise that may be very useful for regulatory (FDA; EMA). If

we pursue this idea, we should aim for a high level journal like Nature, Nature Biotechnology, Nature Reviews

Clinical Oncology or Nature Reviews Drug Discovery.

47

Study Design:

Breast cancer slide‐sets in different settings (invasive, DCIS, residual disease) with known TIL‐assessment by

pathologists will be posted on the website of the International Immune‐oncology Biomarker Group.

A clinical trial slide‐set, with clinical annotation, will be hosted on the website of the International Immune‐

oncology Biomarker Group.

All these slides can then be assessed by all participating groups.

The metrics assessed will be reported on pre‐specified formats to the coordinators.

A systematic comparison of the output of the machine learning/deep learning approaches with the

pathologists’ score will be performed in all datasets and eventual added clinical validity to the pathologists’

TIL‐score will be evaluated using the clinical trial datasets.

Timeline:

The project is aimed to start in 2019 and aims to be finished within 1 year from the start of the program.

Results will be presented at the annual meeting of the International Immuno‐Oncology Biomarker Working

Group held at the San Antonio Breast Cancer Conference and at the annual MAQC‐Conference.

Publication is aimed within 6 months after completion of the program in a high level journal.

48

Project #3: Challenges and opportunities in N‐of‐1 clinical trial: reality of applying genomics in clinic

Xichun Hu, Ph.D.

Fudan University Shanghai Cancer Center

Shanghai, China

49

Yuanting Zheng, Ph.D

Associate Professor

School of Life Sciences

Fudan University

Shanghai, China

Dr. Yuanting Zheng is an associate professor at the

School of Life Sciences of Fudan University. Dr.

Zheng’s research focuses on precision medicine,

pharmacogenomics, and clinical pharmacy. She is

developing multi‐omics reference materials and

quality control metrics to facilitate the translation of

multi‐omics technologies into reliable clinical

biomarkers and companion diagnostics for cancer and

type 2 diabetes. Dr. Zheng received her Ph.D. in

clinical pharmacology from China Pharmaceutical

University in 2009 when she joined the School of

Pharmacy of Fudan University as an assistant

professor. Dr. Zheng has published 40 peer‐reviewed

papers in clinical pharmacology, pharmacogenomics,

and bioinformatics. She is also an inventor on two

issued patents about drug repositioning and

combination therapies.

Leming Shi, Ph.D Professor and Director,

Center for Pharmacogenomics

School of Life Sciences

Fudan University

Shanghai, China

Dr. Leming Shi is a professor at the School of Life

Sciences and Shanghai Cancer Center of Fudan

University in Shanghai, China where he established and

directs the Center for Pharmacogenomics. Dr. Shi is the

president of the International Massive Analysis and

Quality Control (MAQC) Society (2017‐2018). Dr. Shi’s

research focuses on pharmacogenomics,

bioinformatics, and cheminformatics aiming to realize

precision medicine by developing biomarkers for early

cancer diagnosis, prognosis, and personalized therapy.

Dr. Shi is a co‐inventor on nine issued patents about

novel therapeutic molecules and has published over

200 peer‐reviewed papers (12 of them appeared in

Nature Biotechnology) with >10,000 citations by SCI

journals. Dr. Shi received his Ph.D. in computational

chemistry from the Chinese Academy of Sciences in

Beijing.

Project #3: Developing reference materials and reference data sets for the QC/standardization of multi‐

omics platforms

Project coordinators:

Yuanting Zheng, PhD Fudan University, China. [email protected]

Leming Shi, PhD Fudan University, China. [email protected]

How to participate: Participants must propose a data analysis plan to the project organizers and feedback the

analysis results (processed data, figures, and tables), analysis code, and documentation on time. Participants are

also encouraged to propose a data generation plan and generate multi‐omics data using the reference materials.

Objectives: The goal of this project is to provide a set of multi‐omics reference materials and reference datasets to

promote the repeatability, reproducibility, and comparability of massive analysis technologies. It also aims to

develop appropriate metrics to evaluate and monitor the performance of high‐throughput omics platforms, tests,

or laboratories.

50

Background

Emerging big data technologies has changed our way of studying disease and health, and reproducibility is the

foundation for translating the high‐throughput omics approaches to clinical utilities. However, errors may come

from the generation, analysis, and interpretation of multi‐omics data. Well‐characterized reference materials are

essential to understand the sources of errors, calibrate the measurements, and evaluate the performance of high‐

throughput omics tests. We are generating the Chinese Quartet reference materials at the levels of DNA, RNA,

proteins, and metabolites. We are also generating multi‐omics datasets using different platforms at various

laboratories. Therefore, it’s time for the community‐wide efforts to establish the multi‐omics benchmarking values

and metrics to assess the reproducibility and performance. In addition, multi‐omics integrative analyses will

improve the reliability of the benchmarking values and the interpretations. The MAQC Society’s guideline for

quality control and standardization using reference materials and datasets will make massive analysis technologies

more reproducible in the future by improving laboratory proficiencies.

Specific Aims:

Aim 1: Benchmarking values for the Chinese Quartet multi‐omics references. Jointly analyze the Chinese

Quartet multi‐omics datasets to characterize the benchmarking for genomic, transcriptomic, proteomic,

and metabolomic values for the DNA, RNA, protein, and metabolite reference materials, respectively.

Aim 2: Performance metrics. Develop performance metrics to assess the reproducibility and performance of

high‐throughput omics platforms, tests, and laboratories, including accuracy, precision, sensitivity,

specificity and reference intervals.

Aim 3: Guidelines for QC/standardization using reference standards. Develop guidelines to help the

community using reference materials and reference datasets to quality control the whole process of high‐

throughput omics assays.

Study Design

The Chinese Quartet multi‐omics reference materials (DNA, RNA, proteins, and metabolites) were generated

simultaneously from the same set of immortalized cell lines including father, mother, and two monozygotic twin

daughters. The “genetic ground truths” from the quartet family and the multi‐omics integrative analyses will make

the benchmarking values more reliable.

Timeline

2017.01.01 – 2018.05.30 Generate multi‐omics data using different platforms.

2018.06.01 – 2018.08.31 Jointly analyze the datasets.

2018.09.01 – 2018.12.31 Establish the benchmarking values and develop the performance metrics.

2019.01.01 – 2019.05.30 Validate selected reference values using orthogonal methods

51

Project #4: QC/standardization of proteomics materials and technology

The project coordinator:

Chen Ding Fudan University [email protected]

Jun Qin National Center for Protein Sciences ∙ Beijing [email protected]

Objectives/Goals: To establish a reference proteome standard material and SOP for measure the proteome across

different MS instruments and in different labs of different locations

Background:

The Human Proteome Project has promoted the development of proteomics and made it an indispensable tool for

research in life sciences and medicine. Improvements in “the next‐generation proteomics” including

instrumentation, sample preparation, and computational analysis, have facilitated the generation of data that cover

protein profiling, post‐translational modifications (PTMs), and protein‐protein interactions (PPIs). It is crucial to

measure samples with reproducibility. This is challenging in proteome research. Thus, a uniform standard reference

material, a standard operating procedure (SOP) and a reference dataset that may facilitate measurement validation

cross different labs lab are in urgent needs.

phase I of the Microarray Quality Control (MAQC‐I) tested agreement across sites and platforms for gene‐

expression microarrays; MAQC‐II surveyed approaches in microarray‐based predictive model development to

understand sources of variability in prediction performance, to assess the influences of endpoint signal strength in

data and to develop good modelling practice guidelines; The Sequencing Quality Control project (SEQC/MAQC‐III)

assessed the performance of RNA‐seq across laboratories and to test different sequencing platforms and data

analysis pipelines.

Comparing with the genome, transcriptome and microbiome, relatively few proteomic studies have evaluated data

reproducibility across laboratories to identify potential measurement variability in each step of proteome general

workflow. This probably because: (i) The proteomics field lacks uniform standard reference material; (ii)

Laboratories differ in methods for samples preparation; (iii) Data quality is affected by different data acquisition

methods; (iv) Subsequent analysis has no consistent evaluation systems, etc.

Chen Ding, Ph.D

Fudan University

Shanghai, China

Professor Chen Ding is the doctor of Cell Biology for the School of Life Sciences at University of Science and

Technology of China. He was selected in National “One‐Thousand‐Young‐Talents” Program and Beijing

“High‐level Oversea Young Talents” Project. He is the executive member of the council of Chinese Human

Proteome Organization (CNHUPO). He focuses on the development of in‐depth proteome platform,

standardization and its applications on biological researches, with special emphasis on transcriptional

regulation, signal transduction, and liver biology. His studies published in Nature

Biotechnology, Cell, Molecular Cell, Nature Communications, Proceedings of the National Academy of

Sciences of the United States of America, Journal of Experimental Medicine, and Mol Cell Proteomics as

research articles.

52

To solve these problems, we initiate the proteome standard initiative that focuses on the research in a proteome

standard for quality control, aiming to establish the first international proteome standard reference material with a

standard operating procedure (SOP) and generate the first international proteome benchmark dataset for assessing

variation in proteome analysis. A rigorous proteome standard and SOP and quality control system will be enacted

to ensure the reliability and reproducibility for measuring proteomes, accelerating the development of life sciences

and medicine.

Specific Aims:

(i) To establish a proteome standard reference material;

(ii) To establish a proteome standard operating procedure;

(iii) To generate a proteome benchmark dataset.

Study Design:

(i) We performed 1200 repeating measurements of a 293T cell line proteome with 10 different LC‐MS

instruments with a relatively fixed LC method and column in one year time span as our routine QC for the

Phoenix Center proteomics technology platform. We found that the results from the same instruments tended

to cluster together, i. e. batch effect from different instruments. We will analyse these data in a fashion of “big

data” to generate a preliminary SOP so that measurements across different mass spectrometers can be

compared independent of operators and instruments.

a) Quantification values of each peptide/protein are calculated and corrected using different normalization

methods to generate distribution curves for each peptide and proteins from the same instrument. We will

then perform outlier analysis based on Gaussian mixture model (GMM) to reject outliers and to determine

a reference index of each peptide/protein within 95% confidence intervals.

b) Then, Identify peptides and proteins whose quantification are independent of or insensitive to instruments

and operators. These peptides and proteins then form the basis for protein quantification of a reference

proteome across different instruments. This yields the identification/quantification SOP.

(ii) A reference proteome standard material from MAQC IV will be prepared and analysed.

a) We will use/recommend our routine LC method as a LC method SOP to measure the MAQC IV reference

proteome standard material. Thoroughly analyze the features of the MAQC standard material in different

aspects (identification, quantification, protein abundance ranking) and keep evaluating the reproducibility

and stability of the protein samples over 1 year.

b) We will then dispatch the MAQC IV reference proteome standard material, recommend the LC method

SOP and the identification/quantification SOP to different labs for proteome identification/quantification.

We will compare the results of the MAQC proteome standard material across different MS instruments in

different labs and in different locations

(iii) Generate the final MAQC IV SOP and the proteome benchmark dataset for the MAQC proteome standard.

Timeline

2017.01.1 – 2017.12.31 Collect 293T QC experiment data complying with SOP.

2018.01.1 – 2018.12.31 Analyse 293T data and collect experimental data from standard reference material of

MAQC with SOP

2019.01.1 – 2019.12.31 Analyse data from standard reference material and generating proteome benchmark

dataset.

53

Shraddha Thakkar, Ph.D

Division of Bioinformatics

Biostatistics NCTR/FDA Jefferson, AR, USA Dr. Thakkar works at FDA’s National Center for

Toxicological Research. Her research interests are in

applying bioinformatics and chemoinformatics for

study of toxicity and drug development with specific

interest in drug‐induced liver injury. She has received

multiple research and leadership awards regionally

and nationally and with FDA. That includes Genentech

Innovation in Biotechnology Award from American

Association of Pharmaceutical Scientist (AAPS),

Margret C. Etter Student lecturer award from

American Crystallography Association, and

Outstanding Service award from FDA. Dr. Thakkar has

adjunct appointments at both University of Arkansas

for Medical Sciences and University Arkansas at Little

Rock (Assistant Professor). Furthermore, Dr. Thakkar

was elected as Board member of the Mid‐South

Computational Biology and Bioinformatics Society

(MCBIOS) in 2014 and served as President for the

Society from 2016‐2017. She is also the Chair of

Pharmacogenomics Group at AAPS.

Weida Tong, Ph.D Director, Division of

Bioinformatics and Biostatistics

NCTR/FDA

Jefferson, AR, USA

Dr. Tong is Director of Division of Bioinformatics and

Biostatistics at FDA’s National Center for Toxicological

Research (NCTR/FDA). He has served a science advisory

board member for several large projects involving

multiple institutes in Europe and USA. He also holds

several adjunct positions at universities in US and

China. His division at FDA is to develop bioinformatic

methodologies and standards to support FDA research

and regulation and to advance regulatory science and

personalized medicine. The most visible projects from

his group are (1) leading the Microarray Quality Control

(MAQC) consortium to develop standard analysis

protocols and quality control metrics for emerging

technologies to support regulatory science and

precision medicine; (2) development of liver toxicity

knowledge base (LTKB) for drug safety; (4) in silico drug

repositioning for the enhanced treatment of rare

diseases; and (4) development of the FDA

bioinformatics system, ArrayTrackTM suite, to support

FDA review and research on pharmacogenomics. Dr.

Tong has published more than 230 papers.

Project #5: A structured approach to comparing genomics, computational, and high content biology

screening methods for predicting toxicity

Project coordinators:

Shraddha Thakkar ([email protected])

Weida Tong ([email protected])

How to participate: Please contact the coordinators to express your interest to analyze genomics data, in vitro data,

and/or in silico data.

Objectives: The goal of this project is to benchmark and comparative analysis of three major high throughput

methodologies (i.e., genomics, in vitro and computational approaches) for predicting drug‐induced liver injury (DILI).

We will evaluate these methodologies to understand their performance individually and in combination.

Background

54

Drug‐induced liver injury (DILI) is one of the primary challenges for drug development as well as regulatory application owing to the poor performance of existing preclinical models. This concern has led to significant efforts on evaluating alternative methods such as computational and genomic methodologies. The adoption of these types of approaches marks a paradigm shift for 21st century toxicology. Importantly, some have emerged as critical tools in regulatory decision‐making in the EU under the REACH/3Rs initiative. In the US, both Tox21 and ToxCast have been evaluating these methodologies for regulatory applications. Mathematically, a predictive model is a solution of Y=f(X), where f is a mathematical function used to predict the toxicological endpoint (Y) from input data X. The selection of X (technologies used to generate input data) and f (such as machine learning methods) affects the predictive performance of Y, along with the choice of compounds used in this equation. Many DILI models using these technologies have been described but their performances are difficult to compare due to minimal overlap in the choice of drugs, mathematical functions used (f) and measurement technologies (X). Thus, the true value of these technologies for DILI prediction is poorly understood. This study aims to compare/contrast the three main approaches (genomics, in vitro, and in silico) by systematically assessing the factors important in Y=f(X). Benchmarking data will be generated via a direct comparison of these three to assess their strengths individually and in combination. For the analysis, crowdsourcing approaches will be used via the societies with existing mechanisms. This systematic approach will generate evidence for realistic expectations to assess these technologies for DILI. Furthermore, it will also provide a guidance to analyze readiness and gaps for application of predictive modeling with new data streams in the context of regulatory decision making.

Specific Aims:

Aim 1: Generation of comprehensive list with DILI annotation We have developed Liver Toxicity Knowledgebase (LTKB) classifying the DILI risk of FDA approved drug. However, we need to extend this list beyond FDA approved drug to capture global DILI landscape of the drugs. We are planning to generate a comprehensive list of drugs with the DILI classification information. This list will be used in all the further mentioned project.

Aim 2: Benchmarking Genomic predictive methods for DILI (gDILI)

Two genomics datasets will be used, cMAP and L1000, both produced by Broad Institute. cMAP represents the

whole‐genome profiling approach while L1000 data is focused on a biologically relevant small and focused gene

sets. Gene expression based DILI predictive models will be generated for each dataset with various machine

learning methods (including both supervised and unsupervised approaches). The impact of machine learning

approaches and genomics platforms will be evaluated. We will work with Critical Assessment of Massive Data

Analysis (CAMDA) for this component of the project. CAMDA has established a community to conduct

crowdsourcing projects for the past 12 years. We will distribute the datasets via CAMDA.

Aim 3: Benchmarking in silico predictive methods for DILI (isDILI)

Different computational model (QSAR based or non‐QSAR based) have been reported to predict human DILI. For

this comparison, we will benchmark all the known methods and compare them across various DILI classification

schemes. For the past 8 years of developing Liver Toxicity Knowledge Base (project# E0721501), we have

established collaborations with many institutes for in silico DILI modeling. Most of them are listed as external

collaborators of this project, including the MAQC Society. We will expand this list for this component.

Aim 4: Comparative evaluation of gDILI versus isDILI

Genomics and in‐silico methods bring two different prospective in DILI prediction. Comparative evaluation of gDILI

and isDILI will help establishing the realistic expectations from each approach individually or in combinations. For

this comparison, we will analyze both the training and validation results from both gDILI and isDILI.

55

Study Design

We will engage the societies that have experience with an established mechanism to conduct the crowdsourcing project. During the preparation of this project, we have successfully obtained the commitment from two Societies, CAMDA (www.camda.info/) and MAQC (www.MAQCSociety.org). We will generate 5‐6 datasets which contains DILI positives and negatives defined by various methods (e.g., based on drug labeling, literature, case reports, or registry). These datasets will divided into a training set (2/3) and a validation set (1/3) by proportionally splitting across different therapeutic categories between training and validation sets. Both training and validation sets will be released to the societies at once with the former having the class labels that will be absent for the validation set. The participants will send the following information to us once the analyses are complete: (1) The analysis protocol, (2) The training set results, such as accuracy, specificity, and sensitivity, and (3) The predicted class label for the validation set. We will analyze all the results to determine the performance of the submitted models. With this study design, we will be able to understand:

1. The performance landscape of both genomics (gDILI) and in‐silico (isDILI) methods for human DILI prediction (e.g., upper‐ and lower‐bound performance).

2. The difference between two methodologies. 3. The fit‐for‐purpose application of these two methodologies individually and in combination.

Timeline

2 months: Generation of comprehensive list with the DILI classification

3 ‐ 5 months: Data compilation for the gDILI and isDILI

5 ‐ 9 months: data analysis

9 – 12: comparative meta‐analysis analysis of gDILI with isDILI

56

Poster Session

57

Presenter name: Longhui Deng ([email protected])

Title: Technical validation of AIM10TM, a hybridization capture‐based next‐generation sequencing (NGS)

clinical test for lung cancer

Authors: Longhui Deng1*, Ting Yang1*, Diange Li1, Guojie Qi1, Hui Li1, Jianbing Fan1, Weihong Xu1#

Affiliations: 1 AnchorDx Medical Co., Ltd, Guangzhou, China, 510300, * These authors contributed equally

to this work.

Poster Number: 10

Presenter name: Jongsuk Chung ([email protected])

Title: PR score: Single unified quality metric for clinical targeted sequencing

Authors: Jongsuk Chung1,3, Chung Lee1,2, Ki‐Wook Lee1,2, Taeseob Lee1,2, Woongyang Park1,2,3, Dae‐Soon

Son1*

Affiliations: 1*Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea, 06351, 2Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences &

Technology, Sungkyunkwan University, Seoul, Republic of Korea, 06351, 3Department of Molecular Cell

Biology, School of Medicine, Sungkyunkwan University, Suwon, Republic of Korea, 16419

Poster Number: 11

Presenter name: Li Zhang ([email protected])

Title: BioSV: an accurate and efficient tool for multiple sample‐based structural variation calling and

genotyping

Authors: Li Zhang1, Wubing Ding1, Wenqiang Liu1, Tieliu Shi1*

Affiliations: 1The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of

Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal

University, Shanghai 200241, China

Poster Number: 12

Presenter name: Marco Chierici ([email protected])

Title: Improved prognostic profiling in High‐Risk neuroblastoma by multi‐task deep learning with

distillation of the clinical diagnostic algorithm

Authors: Marco Chierici, Valerio Maggio, Giuseppe Jurman, and Cesare Furlanello

Affiliations: Fondazione Bruno Kessler, Trento, Italy

Poster Number: 13

Presenter name: Sayaka Itoh ([email protected])

Title: Boundary Organizations in the International Standardization Process of bio‐Analysis Technologies

Authors: Sayaka Itoh1,2, Junko Ikeda1, and Hiroki Nakae1

Affiliations: 1JMAC Japan Multiplex bio‐Analysis Consortium, Chiyoda‐ku, Tokyo, 102‐0083, 2Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,

University of Tokyo, Kashiwa‐shi, Chiba, 277‐8561

Poster Number: 14

58

Presenter name: Wenqiang Liu ([email protected])

Title: Integrative method significantly increases the accuracy of CNV detection

Authors: Wenqiang Liu, Li Zhang, Tieliu Shi*

Affiliations: The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of


University, Shanghai 200241, China

Poster Number: 15

Presenter name: Yingyi Hao ([email protected])

Title: Applications of chemometrics in precision medicine

Authors: Yuan Liu1, Yingyi Hao1, FanFan Xie1, Yu Liang1, Zhining Wen*1, Monglong Li*1

Affiliations: 1*College of Chemistry, Sichuan University, Chengdu, China, 610065

Poster Number: 16

Presenter name: Xiangjun Ji ([email protected])

Title: QuaPra

Authors: 1Xiangjun Ji, 1Geng Chen*, 1Tieliu Shi*

Affiliations: 1The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of


University, Shanghai, China, 200241

Poster Number: 17

Presenter name: Jiyang Zhang ([email protected])

Title: Reproducibility of CNV in replicate whole‐genome sequencing experiments

Authors: Jiyang Zhang1 and Leming Shi*1

Affiliations: 1State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University,

Shanghai, China, 200433.

Poster Number: 18

Presenter name: Luyao Ren ([email protected])

Title: Prediction of Cancer of Unknown Primary Site (CUP) Using Tissue‐Specific Molecular Signatures from

GTEx and TCGA Data

Authors: Luyao Ren, Jingcheng Yang, Bin Li, Chen Suo, Ying Yu, Yuanting Zheng, Leming Shi

Affiliations: Center for Pharmacogenomics, School of Life Sciences, Fudan University, Shanghai 200438,

China. Email: [email protected]

Poster Number: 19

Presenter name: Jingcheng Yang ([email protected])

Title: The Storage Scheme for Biological & Medical Big Data

Authors: Dajie Zhang1, Jingcheng Yang2, Zhaojie Xia3 and Li Guo*3

59

Affiliations: 1 SuperSAN Technologies (Suzhou) Co., Ltd., Suzhou, Jiangsu, China, 215000, 2School of Life, Fudan University, Shanghai, China, 200433, 3Institute of Process Engineering, Chinese

Academy of Sciences, Beijing, China, 100190

Poster Number: 20

60