12
Development of a bioinformatics tool for the automated generation of a report of the somatic mutations found in a Normal/Tumor cancer experiment Isaac Noguera Guixà Universitat Autònoma de Barcelona 15th of July, 2014 Project tutor: Dr. Raúl Tonda Data analysis team. Centre Nacional d‘Anàlisi Genòmica (CNAG), PCB Academic tutor: Dr. Miguel Perez-Enciso. Centre for Research in Agricultural Genomics (CRAG), UAB Course 2013 - 2014 Master’s Thesis

Normal/Tumor somatic mutations report tool

Embed Size (px)

DESCRIPTION

Presentation used for my oral Master's Thesis defense for the Universtat Autònoma de Barcelona. It shows the development of a Perl script for the automated generation of a report of the somatic mutations found in a Normal/Tumor cancer experiment.

Citation preview

Page 1: Normal/Tumor somatic mutations report tool

Development of a bioinformatics tool for the automated generation of a report of the somatic mutations found

in a Normal/Tumor cancer experiment

Isaac Noguera Guixà

Universitat Autònoma de Barcelona15th of July, 2014

Project tutor:

Dr. Raúl TondaData analysis team. Centre Nacional d‘Anàlisi Genòmica (CNAG), PCB

Academic tutor: Dr. Miguel Perez-Enciso. Centre for Research in Agricultural Genomics (CRAG), UAB

Course 2013 - 2014

Master’s Thesis

Page 2: Normal/Tumor somatic mutations report tool

2

Table of contents

Introduction◦ Cancer genetics

◦ Cancer in Bioinformatics

Objectives

Material and methods

Results

Conclusions

Page 3: Normal/Tumor somatic mutations report tool

3

Introduction

Loss of normal growth control

Cell damage (no repair)Normal cell

Cell suicide (apoptosis)

Uncontrolled growth

1st mutation

2nd mutation 3rd mutation

Yulug, I. (2006). Molecular basis of cancer [PowerPoint slides]. Retrieved from http://www.hugointernational.org/resources/Isik_Yulug_Molecular_Basis_of_cancer_bilingual.ppt

Page 4: Normal/Tumor somatic mutations report tool

4

Introduction

Cancer in Bioinformatics

Normal sample

Tumor sample

Read mapping and

variant calling

Normal/Tumor experiment

Lopez-Bigas, N. (2011). Identification of cancer drivers across tumor types [PowerPoint slides]. Retrieved from http://es.slideshare.net/nurialopezbigas/identification-of-cancer-drivers-across-tumor-types#

A variant is determined by the joint status in tumor-normal sequence pairs

Page 5: Normal/Tumor somatic mutations report tool

5

Variant call format (vcf)

Introduction

Cancer in Bioinformatics

Normal/Tumor experiment

(Danecek, P. et al., 2011)

Page 6: Normal/Tumor somatic mutations report tool

6

Objectives

Main objective

Develop an automated tool to produce a report of the somatic variants found in a Normal/Tumor experiment

→ Process the output of the CNAG’s variant calling pipeline

→ Filter the somatic variants from it and extract relevant statistics from them

→ Identify those variants that are already known and annotated in cancer somatic mutations databases

→ Transform the obtained data into some tables and graphics to include in the report

→ Fill a report template independently from the code of the main script with the processed data

→ Generate the report document in printable format such as a portable document format (pdf)

→ Execute all these steps sequentially and automatically

Additional objective

Incorporate the developed tool as an additional step in the variant calling pipeline from the CNAG’s Data Analysis team

Page 7: Normal/Tumor somatic mutations report tool

7

Material and methods

Basis of the developed tool:

Main script Template document

Perl script

Template module

Input data processing

Output data generation

Template Toolkit script

LaTeX code with R and Template Toolkit

code embedded

Page 8: Normal/Tumor somatic mutations report tool

8

Material and methods

Template Toolkit document

Noweb document

CNAG’s vcf

Data processing

COSMICdb annotation

Somatic variants filtering

Output data storing/generation

Template processing

Template processing

R Sweave

LaTeX document

pdflatexPdf

document

Inputdata

Designed pipeline:

##INFO=<ID=FP,Number=1,Type=Float,Description="Fisher test P-value for somatic comparison.">#CHROM POS ID REF ALT QUAL FILTER FORMAT INFO NORMAL TUMORChr1 883814 . A G 18.1 mrd10 DP=36;UPSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000496938|);FP=0.00604 GT:PL:DP 0/0:0,96,255:32 0/1:51,0,26:3Chr20 126154 dbSNPBuildID=137;GMAF=0.1648 T A 64.7 mrp0.05 INDEL;EFF=FRAME_SHIFT(HIGH||||DEFB126|protein_coding|CODING|ENST00000382398|exon_20_126056_126392;FP=1 GT:PL:DP 1/1:255,255,0:274 0/1:253,0,45:26

Page 9: Normal/Tumor somatic mutations report tool

9

Results

Script's usage description...

usage: main.pl -f file [-template file] [-p value] [-s value] [-project "string"] [-cnv "string "] [-methods] [-cosmic file] [-h]

- h this (help) message

- f file variant call format file (.vcf) to be analyzed

- template file template Toolkit file (.tt) to be used as a template. If not defined, it will use the default (“reporttemplate.tt”)

- p valueadd extra p-values to the default p-values (1,0.05 and 0.001) that will be used for the somatic variants filtering

- s valuesomatic variants will be only filtered for the specified p-values defined by this option

- cosmic fileCOSMIC database file for SNPSift annotation (default “CosmicCodingMuts_v68")

- cnv "string"specify the path where the script will look for the Control-FREEC output. If it is found, it will be added to the report

- project "string"add the name of the project to the report title page

- methodsprint the methods appendix in the report (if not defined it will be not printed)

Page 10: Normal/Tumor somatic mutations report tool

10

Results

Adobe Acrobat Document

$ perl main.pl –f PatientX.vcf –s ‘1,0.001’ –cnv “/Project/Production/DAT/CNV/” –project “FAMCOLON” –methods

Page 11: Normal/Tumor somatic mutations report tool

11

Conclusions

1) We developed a functional automated tool which automatically generates a report document for the somatic variants found in a Normal/Tumor experiment.

2) The content of the report is acceptable but it can be improved.

3) The tool has been successfully tested. It also has already been implemented within CNAG’s variant calling pipeline to be run as its last step.

4) The template document is independent from the main script. It, in addition to the set of configurable parameters from the main script, makes the tool really customizable.

5) Not limited by the use of computational resources. The execution time and memory usage required by the tool seems not to be a limiting factor for its usage.

Tool's last aim Make easier the transfer of information from the basic research to the clinical diagnostic .

Page 12: Normal/Tumor somatic mutations report tool

12

Thank you for your attention