31
Bioinformatics|Software|Services NOVOALIGN BASESPACE APP Zayed Albertyn Bioinformatics Director, Novocraft technologies Sdn Bhd Illumina® BaseSpace Developer Conference, San Francisco 9 th December 2013

Talk at BaseSpace Developer conference SF 2013

Embed Size (px)

Citation preview

Page 1: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

NOVOALIGN BASESPACE APP

Zayed AlbertynBioinformatics Director, Novocraft technologies Sdn BhdIllumina® BaseSpace Developer Conference, San Francisco9th December 2013

Page 2: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Novocraft Technologies Sdn Bhd

• Incorporated in 2008, BioNexus Status Company

• Small team of Mathematicians, Biologists & Software Engineeers

• Develop Innovation & World Class Products

• High-Performance Computing in growing Genomics Era

• International Market & User Base

Page 3: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Products• Novoalign– Illumina, 454• NovoalignCS – SOLiD • Novosort • Cluster Solutions

– NovoalignMPI, NovoalignCSMPI• NGS WorkBench (web)• All running on standard commodity hardware

– No special GPU/supercomputer required– Mac OS & Linux versions available– Open source operating system (Linux)

• NGS Cloud computing HPC workflows – Amazon EC2/S3/EBS

Page 4: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

NGS ServicesConsultation on NextGen projects

• Exome• Whole genome• SNV, Indel, Structural

Variations (SVs)• RNASeq• CHIP-Seq• Methylome• Small RNA• de-novo assembly

Automated pipelines

In-house/custom and open source software

Illumina and other platforms

Cloud Solutions-packaged AMIs,containers

Page 5: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Collaborations

• Academic/research institutes• Industry– HPC providers– Pharma– Cloud solutions

• Resellers– US and Global

Page 6: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

A few of our NOVOALIGN users

Page 7: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

User Examples

Page 8: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

• Hash-based aligner• Peer reviewed publications: 2009-present• Accuracy– SNPs and short Indels

• Read length > 250 bp as of V3.X.X

NOVOALIGN

Page 9: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services ROC Curves

• True Positive vs False positive rate• Higher Y value - better at finding the

“true” result• Lower X value – better at excluding “false”

results

http://lh3lh3.users.sourceforge.net/alnROC.shtml

Page 10: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

The performance of various methods for mapping reads to reference repeats.

Highnam G et al. Nucl. Acids Res. 2013;41:e32

Page 11: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

The performance of various methods for mapping reads to reference repeats.

Highnam G et al. Nucl. Acids Res. 2013;41:e32

Page 12: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

http://www.bioplanet.com/gcathttp://www.bioplanet.com/gcat/reports/112/variant-calls/ion-torrent-225bp-se-exome-30x/novoalign-gatk-ug/compare-183-119/group-read-depth

Genome-in-a-bottle Consortium dataset

Page 13: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

http://bcbio.wordpress.comCourtesy Brad Chapman & Oliver Hoffman. HSPH

“Our standard workflow uses novoalign based on its stringency in resolving large insertions and deletions. These results suggest equally good results using bwa mem, along with improved processing times”

Page 14: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|ServicesGraphical representation of the total number of

downstream false positives expressed as a percentage...

Oliver GR. 2012 [http://f1000r.es/NMpsFc] F1000Research 2012, 1:2 (doi: 10.12688/f1000research.1-2.v2)

Page 15: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Novosort comparison on Illumina reads

Page 16: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Developing on BaseSpace

Page 17: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Motivation

• Reach out to more users• Enable seamless integration with the cloud• Establish BaseSpace Novoalign community

Page 18: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Alignment

• Alignment Quality Calibration

• Multithreaded• Adaptor

stripping

Sorting

• Novosort• Multithreaded

Variant Calling

• Freebayes• SNPs & Indels

What is the App?

Page 19: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Page 20: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Page 21: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Page 22: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

New-developer Challenges

• The “Docker” way of doing things– Image vs Container

• Front-end : Javascript/CSS• Basck-end: Algorithms/scripting

Page 23: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Back-end process

Front-end

process

Perl/C++/R/Python

Page 24: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Back-end Development ProcessStart the Native VM•Vmware•Linux environment

Start your own Docker Repository•Create new IMAGE on Docker.io•Done automatically on your first push

Attach to your image•Docker run …

Make small test dataset• Illumina cancer panel read•Subset chr22 alignmnents

Develop the app back-end process•Automated script runs pipeline•Alignment->sorting->variant calling

Postprocess •Charting with R•ggplot2

Page 25: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Front-end Development Process

BaseSpace Developer tools• Code editor• Preview form inputs

Initiate test runs• Send data to your

backend Native app

Build Report form• Write Liquid/Js/HTML5

Page 26: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

App Screenshots

Page 27: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Page 28: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Page 29: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Page 30: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

Page 31: Talk at BaseSpace Developer conference SF 2013

Bioinformatics|Software|Services

NovocraftLeadershipColin HercusHaniza HashimBioinformaticsAkzam SaidinKaamesh KaamahalaranAbdul Malik AhmadSoftware DevelopmentDeepa MuruganSharon ChinLaura Hamit

Acknowledgements

IlluminaRaymond TeckotzkyMayank Tyagi

VT/GeneByGeneDavid MittelmanGareth HighnamNir LiebovichJason Wang

HSPH Bioinformatics CoreOliver HoffmanBrad Chapman