Talk at BaseSpace Developer conference SF 2013

Preview:

Citation preview

Bioinformatics|Software|Services

NOVOALIGN BASESPACE APP

Zayed AlbertynBioinformatics Director, Novocraft technologies Sdn BhdIllumina® BaseSpace Developer Conference, San Francisco9th December 2013

Bioinformatics|Software|Services

Novocraft Technologies Sdn Bhd

• Incorporated in 2008, BioNexus Status Company

• Small team of Mathematicians, Biologists & Software Engineeers

• Develop Innovation & World Class Products

• High-Performance Computing in growing Genomics Era

• International Market & User Base

Bioinformatics|Software|Services

Products• Novoalign– Illumina, 454• NovoalignCS – SOLiD • Novosort • Cluster Solutions

– NovoalignMPI, NovoalignCSMPI• NGS WorkBench (web)• All running on standard commodity hardware

– No special GPU/supercomputer required– Mac OS & Linux versions available– Open source operating system (Linux)

• NGS Cloud computing HPC workflows – Amazon EC2/S3/EBS

Bioinformatics|Software|Services

NGS ServicesConsultation on NextGen projects

• Exome• Whole genome• SNV, Indel, Structural

Variations (SVs)• RNASeq• CHIP-Seq• Methylome• Small RNA• de-novo assembly

Automated pipelines

In-house/custom and open source software

Illumina and other platforms

Cloud Solutions-packaged AMIs,containers

Bioinformatics|Software|Services

Collaborations

• Academic/research institutes• Industry– HPC providers– Pharma– Cloud solutions

• Resellers– US and Global

Bioinformatics|Software|Services

A few of our NOVOALIGN users

Bioinformatics|Software|Services

User Examples

Bioinformatics|Software|Services

• Hash-based aligner• Peer reviewed publications: 2009-present• Accuracy– SNPs and short Indels

• Read length > 250 bp as of V3.X.X

NOVOALIGN

Bioinformatics|Software|Services ROC Curves

• True Positive vs False positive rate• Higher Y value - better at finding the

“true” result• Lower X value – better at excluding “false”

results

http://lh3lh3.users.sourceforge.net/alnROC.shtml

Bioinformatics|Software|Services

The performance of various methods for mapping reads to reference repeats.

Highnam G et al. Nucl. Acids Res. 2013;41:e32

Bioinformatics|Software|Services

The performance of various methods for mapping reads to reference repeats.

Highnam G et al. Nucl. Acids Res. 2013;41:e32

Bioinformatics|Software|Services

http://www.bioplanet.com/gcathttp://www.bioplanet.com/gcat/reports/112/variant-calls/ion-torrent-225bp-se-exome-30x/novoalign-gatk-ug/compare-183-119/group-read-depth

Genome-in-a-bottle Consortium dataset

Bioinformatics|Software|Services

http://bcbio.wordpress.comCourtesy Brad Chapman & Oliver Hoffman. HSPH

“Our standard workflow uses novoalign based on its stringency in resolving large insertions and deletions. These results suggest equally good results using bwa mem, along with improved processing times”

Bioinformatics|Software|ServicesGraphical representation of the total number of

downstream false positives expressed as a percentage...

Oliver GR. 2012 [http://f1000r.es/NMpsFc] F1000Research 2012, 1:2 (doi: 10.12688/f1000research.1-2.v2)

Bioinformatics|Software|Services

Novosort comparison on Illumina reads

Bioinformatics|Software|Services

Developing on BaseSpace

Bioinformatics|Software|Services

Motivation

• Reach out to more users• Enable seamless integration with the cloud• Establish BaseSpace Novoalign community

Bioinformatics|Software|Services

Alignment

• Alignment Quality Calibration

• Multithreaded• Adaptor

stripping

Sorting

• Novosort• Multithreaded

Variant Calling

• Freebayes• SNPs & Indels

What is the App?

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Bioinformatics|Software|Services

What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine

• Alignment coordinate-sorting– Novosort

• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )

Bioinformatics|Software|Services

New-developer Challenges

• The “Docker” way of doing things– Image vs Container

• Front-end : Javascript/CSS• Basck-end: Algorithms/scripting

Bioinformatics|Software|Services

Back-end process

Front-end

process

Perl/C++/R/Python

Bioinformatics|Software|Services

Back-end Development ProcessStart the Native VM•Vmware•Linux environment

Start your own Docker Repository•Create new IMAGE on Docker.io•Done automatically on your first push

Attach to your image•Docker run …

Make small test dataset• Illumina cancer panel read•Subset chr22 alignmnents

Develop the app back-end process•Automated script runs pipeline•Alignment->sorting->variant calling

Postprocess •Charting with R•ggplot2

Bioinformatics|Software|Services

Front-end Development Process

BaseSpace Developer tools• Code editor• Preview form inputs

Initiate test runs• Send data to your

backend Native app

Build Report form• Write Liquid/Js/HTML5

Bioinformatics|Software|Services

App Screenshots

Bioinformatics|Software|Services

Bioinformatics|Software|Services

Bioinformatics|Software|Services

Bioinformatics|Software|Services

Bioinformatics|Software|Services

NovocraftLeadershipColin HercusHaniza HashimBioinformaticsAkzam SaidinKaamesh KaamahalaranAbdul Malik AhmadSoftware DevelopmentDeepa MuruganSharon ChinLaura Hamit

Acknowledgements

IlluminaRaymond TeckotzkyMayank Tyagi

VT/GeneByGeneDavid MittelmanGareth HighnamNir LiebovichJason Wang

HSPH Bioinformatics CoreOliver HoffmanBrad Chapman

Recommended