23
Assembly Kristoffer H. Ring INF-BIO5121

Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Embed Size (px)

Citation preview

Page 1: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Assembly

Kristoffer H. RingINF-BIO5121

Page 2: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 1.2 – Velvet assembly

Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this was not available on the course server.

Therefore, I took advantage of the multiprocessing capability on abel:

#Building hash tables using 39 different hash lengths, k=21-99:

bash-4.1$ velveth auto 21,99,2 -fastq -shortPaired -separate ../Sample280_1.fastq ../Sample280_2.fastq

#Submitting the velvetg-jobs to Abel (building the graphs)

./run_velvet.sh

Page 3: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Submit.sh

Page 4: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

#Greping N50-values from the auto_X/log-files and plotting the results using Python:-bash-4.1$ python plot.py auto_*

Page 5: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

-bash-4.1$ python plot.py auto_*Best k = 75Best n50 = 119377

However, k=65 and 69 might be good candidates as well.

Page 6: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

-bash-4.1$ velveth asm1 75 -fastq -shortPaired -separate \../Sample280_1.fastq ../Sample280_2.fastq \

-bash-4.1$ velvetg asm1 -exp_cov auto -cov_cutoff auto

K=75 might be a good kmer size.

The following commands were used to create assembly asm1:

Ran assemblathon_stat.pl on the asm1 to judge the assembly:

bash-4.1$ assemblathon_stat.pl -s 5.4 contigs.fa > metrics_asm1.txtMost relevant metrics:

Page 7: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 1.3 – Velvet assembly with mate pair reads

Used the same approach as for the pair end reads (task 1.2)

#Building the hash indices using velveth for k=21-99:

bash-4.1$ velveth asm 21,99,2 -fastq -shortPaired -separate \../Sample280_1.fastq ../Sample280_2.fastq \-shortPaired2 -separate -fastq \../TY2482_6kb_1_50x_RC.fq ../TY2482_6kb_2_50x_RC.fq

#Submitting the velvetg-jobs to Abel:./run_velvet.sh

Page 8: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Submit.sh

Page 9: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

K=45, N50=2,876,052

Page 10: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

1.3-bash-4.1$ velveth asm2 45 -fastq -shortPaired -separate \../Sample280_1.fastq ../Sample280_2.fastq \-shortPaired2 -separate -fastq \../TY2482_6kb_1_50x_RC.fq ../TY2482_6kb_2_50x_RC.fq

1.3-bash-4.1$ velvetg asm2 -exp_cov auto -cov_cutoff auto -shortMatePaired2 yes

The following commands were used to create assembly asm2:

Ran assemblathon_stat.pl on the asm2 to judge the assembly:

bash-4.1$ assemblathon_stat.pl -s 5.4 contigs.fa > metrics_asm2.txtbash-4.1$ less metrics_asm2.txt

K=45 stands out as the best kmer size.

Page 11: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Most relevant Assemblathon metrics for asm2:

Page 12: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

The output from velvetg showed: [276.666654] Paired-end library 1 has length: 444, sample standard deviation: 33[276.850824] Paired-end library 2 has length: 6199, sample standard deviation: 777

Compared to the lecture data set, these numbers were (298, 19) and (3177, 2132) for the two libraries respectively.

Insert size and standard deviation

Maybe the exam assembly is better because it has a smaller standard deviation for the mate pairs..

Page 13: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 2.1 – Spades Assembly

Ran Spades to create assembly asm3:

-bash-4.1$ module load spades

-bash-4.1$ spades.py -t 2 -k 21,29,37,43 --careful \--pe1-1 ../Sample280_1.fastq \--pe1-2 ../Sample280_2.fastq \--mp1-1 ../TY2482_6kb_1_50x_RC.fq \--mp1-2 ../TY2482_6kb_2_50x_RC.fq \--mp1-fr -o asm3

Ran assemblathon_stat.pl on the asm2 to judge the assembly:

bash-4.1$ assemblathon_stat.pl -s 5.4 scaffolds.fasta > metrics_spades.txtbash-4.1$ less metrics_spades.txt

Page 14: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Most relevant Assemblathon metrics for Spades assembly, asm3:

Page 15: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 3.1 – Mapping

Indexing the asm2 (Velvet) assembly using bwa:

-bash-4.1$ cd asm2-bash-4.1$ module load bwa-bash-4.1$ bwa index -a bwtsw contigs.fa

### Mapping paired end reads for asm2:bwa mem -t 2 ../contigs.fa \../../../Sample280_1.fastq \../../../Sample280_2.fastq \| samtools view -buS - | samtools sort - map_pe.sorted

## Mapping mate pairs for asm2:bwa mem -t 2 ../contigs.fa \../../../TY2482_6kb_1_50x_RC.fq \../../../TY2482_6kb_2_50x_RC.fq \| samtools view -buS - | samtools sort - map_mp.sorted

Page 16: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 3.1 – Mapping

Indexing the asm3 (Spades) assembly:

-bash-4.1$ cd asm3-bash-4.1$ bwa index -a bwtsw contigs.fasta-bash-4.1$ cd bwa

### Mapping paired end reads to asm3:-bash-4.1$ bwa mem -t 2 ../scaffolds.fasta \ ../../../Sample280_1.fastq \../../../Sample280_2.fastq \| samtools view -buS - | samtools sort - map_pe.sorted

-bash-4.1$ samtools index map_pe.sorted.bam

## Mapping mate pairs for to asm3:-bash-4.1$ bwa mem -t 2 ../scaffolds.fasta \../../../TY2482_6kb_1_50x_RC.fq \../../../TY2482_6kb_2_50x_RC.fq \| samtools view -buS - | samtools sort - map_mp.sorted

-bash-4.1$ samtools index map_mp.sorted.bam

Page 17: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

IPython notebook plots of the insert size distribution for the asm3 assembly

Page 18: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Task 3.2 REAPR analysis for the asm2 and asm3 assemblies

REAPR analysis asm2 (Velvet)

-bash-4.1$ cd asm2-bash-4.1$module load reapr-bash-4.1$ reapr pipeline contigs.fa bwa/map_mp.sorted.bam reapr_out > reapr.out

#Replacing all spaces with in 03.score.errors.gff.gz ‘_’:-bash-4.1$ zcat reapr_out/03.score.errors.gff.gz |sed 's/ /_/g' > 03.score.errors_nospaces.gff

##REAPR analysis asm3 (Spades)

-bash-4.1$ cd asm3-bash-4.1$ reapr pipeline scaffolds.fasta bwa/map_mp.sorted.bam reapr_out > reapr.out

#Replacing all spaces with in 03.score.errors.gff.gz ‘_’:-bash-4.1$ zcat reapr_out/03.score.errors.gff.gz |sed 's/ /_/g' > 03.score.errors_nospaces.gff

Page 19: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

From 05.summary.report.txt for asm2 From 05.summary.report.txt for asm3

REAPER results

REAPR made 9 breaks in asm2 and 2 in asm3

Page 20: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Fragment Coverage Distribution (FCD) failure over gap in asm2

Page 21: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Fragment Coverage Distribution (FCD) failure over gap in asm2

Page 22: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

Fragment Coverage Distribution (FCD) failure over gap in asm3

Page 23: Assembly Kristoffer H. Ring INF-BIO5121. Task 1.2 – Velvet assembly Was planning to use VelvetOptimiser to determine the optimal kmer size, however, this

ConclutionThe Spades assembly (asm3) might be better:

• Fewer and larger scaffolds and contigs• Total scaffold length in agreement with known genome size (100.3%)• REAPR identified less FCD failures over gaps (breaks).

Asseblathon stats: