View
0
Download
0
Category
Preview:
Citation preview
11/14/13%
1%
2013%(%BMMB%597D:%Analyzing%Next%Genera>on%Sequencing%Data%
%%Week%12,%Lecture%24%
István'Albert''
Biochemistry%and%Molecular%Biology%%and%Bioinforma>cs%Consul>ng%Center%
%Penn%State%
Topics%to%be%covered%in%the%next%lectures%
%1. Assembly%–%building%new%genomes/
transcriptomes%%%
2. Metagenomics%–%characterizing%mul>ple%genomes%%
3. Chip<Seq%–%quan>fying%loca>ons%in%genomes%%
4. RNA<Seq'–%quan>fying%gene%expressions%
Genome%Assembly%%%
The%path%to%a%whole%genome%
Mul>step%process%%
1. Assemble%short%reads%into%longer%sequences%!%conBgs'
2. Scaffolding%–%arrange%con>gs%rela>ve%to%one%another%based%on%external%informa>on%'
3. Genome%finishing%(gap%closure)%!%fill%in%gaps%with%%directed%sequencing%procedures%and%manual%cura>on%
AMOS%–%A%Modular%Open(Source%whole%genome%assembler%
It%is%not%a%single%so[ware%rather%than%a%collec>on%of%%interoperable%tools,%standards%and%techniques%
11/14/13%
2%
Read%assembly%challenges%repeated%elements:%RPT%A1%and%RPT%A2%
A%valid%assembly%of%two%con>gs%instead%of%one%
Other%challenges:%%
• %%genomic%varia>on,%heterozygosity,%copy%number%varia>on%• %%misassembly%due%to%sequencing%errors%• %chimeric%sequences%
%
Scaffolding%• Orien>ng%con>gs%via%paired%end%(or%mate(pair)%informa>on%
Addi>onal%informa>on%to%assist%the%process:%%• %%use%alignment%posi>ons%in%related%genomes%• %%use%gene%synteny%(co(localiza>on%of%gene>c%loci)%%There%are%fewer%automated%pipelines:%%%'BAMBUS'–%Hierarchical%Scaffolding%With%Bambus%by%M.%Pop,%D.%Kosack%and%S.%Salzberg%%
Finishing%genomes% Genome%assembly%is%an%art'
Many%different%approaches%–%substan>al%supervision/evalua>on%required%at%each%step%of%the%process.%%
%Genomes'can'vary'greatly'in'complexity'–'genome'size/repeBBveness''
is'usually'the'limiBng'factor%%
Constant%tuning%and%evalua>on%is%needed.%%
11/14/13%
3%
The%N50%sta>s>c%
• N50%length%is%defined%as%the%con>g%length%L%for%which%50%%of%all%bases%in%the%sequences%are%in%con>gs%of%length%longer%than%L.%%
1. Sort%all%con>gs%by%size%from%highest%to%lowest%%
2. Compute%cumula>ve%sum%of%lengths%%
3. Smallest%number%of%con>gs%that%add%up%to%the%half%of%the%assembled'length'
'The%NG50%sta>s>c%uses%the%genome%size%instead%of%con>g%size%
Using%the%Velvet%Assembler%
Download,%unpack,%and%make%Velvet%%Download%the%23.tar.gz%dataset%from%the%webpage%%Velvet%Assembly%is%a%two%step%process:%%• %velveth%!%builds%the%hashtable'• %velvetg%!%run%the%from%the%hashtable'
Running%velvet%single%end% Running%velvet%paired%end%mode%
Same%data,%radically%improved%assembly,%the%paired%end%informa>on%allows%the%assembler%to%resolve%repe>>ve%data%
11/14/13%
4%
A%few%observa>ons%
• Paired%end%assembly%leads%to%radically%beder%assembly%!%N50%of%667%vs%49270%
• Hash%size%maders%!%how%to%pick%the%right%one?%Experts%say%to%try%“explore”%the%parameters.%
• VelvetOp>mizer.pl%(in%the%velvet%contrib)%
Other%resources:%quite%a%few%review%papers%
• A'PracBcal'Comparison'of'De'Novo'Genome'Assembly'SoSware'Tools'for'Next<GeneraBon'Sequencing'Technologies'(PLoS%ONE%2011)%
• GAGE:'A'criBcal'evaluaBon'of'genome'assemblies'and'assembly'algorithms'(Genome%Research,%2011)%
Heng%Li’s%Fermi%
Bioinforma>cs%(2012)%28%(14):%
Assembly%evalua>on%
• O[en%feels%surprisingly%ad(hoc%(people%write%home%grown%scripts%to%fetch%sta>s>cs/subselect%con>gs%etc)%
• AMOS%–%contains%visualizers%hawkeye%
• To%compare%to%related%genomes%we%need%op>mal%aligners%not%short%read%mappers!%
%
11/14/13%
5%
Aligning%con>gs%
download%and%install%
MUMmer%tools%
• nucmer%!%(NUCleo>de%MUMmer)%%DNA%sequence%alignment%
• promer%!%PROmer%(PROtein%MUMmer)%(%all%matching%and%alignment%rou>nes%%performed%on%the%six%frame%amino%acid%transla>on%of%the%DNA%input%sequence%
Older%but%exceedingly%useful%tools%Their%formats%are%somewhat%complicated%%Blast'is'designed'to'search'target'databases'Mummer'is'designed'to'align'genomes!'
Homework%24%
%Simulate%reads%and%generate%assemblies%with%three%different%hash%sizes.%%%Which%assembly%produces%the%best%N50%sta>s>c?%%Which%assembly%produces%the%longest%con>gs%and%how%does%that%compare%to%the%expecta>ons?%
Recommended