20
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph by Stephen Dalton/Animals Animals - Earth Scenes Preliminary Res ults

Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Embed Size (px)

Citation preview

Page 1: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid

salamanders

Eric M. O’Neill

David W. Weisrock

Photograph by Stephen Dalton/Animals Animals - Earth Scenes

Preliminary Results

Page 2: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Ambystoma tigrinum complex

Page 3: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Coalescent Processes

• Stochastic• Incomplete lineage

sorting• Gene tree

incongruence• Capture variance• Many loci

Degnan and Rosenberg, 2006 PLOS Genetics

Page 4: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Goals

• Sequence >100 independent loci from 100s of samples– both alleles

• Population genetics• Species delimitation• Gene phylogenies• Species phylogeny

Jeremiah Smith

Page 5: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Past Option

• Sanger Sequencing– expensive– cloning or computational phasing alleles– low throughput

Page 6: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

454 (Roche) Next Generation Sequencing

1 million reads × 400 bp each = 400 Million bp

Page 7: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Meyer et al. 2008 Nature Protocols

Barcoding

Page 8: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph
Page 9: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Methods• Screened ~250 EST loci across 16 representative samples• Found >100 variable loci that amplify well at the same

temperature• Amplified 95 loci for one individual in one plate• 94 individuals

– 8930 amplicons• Pooled across 95 loci for each individual• Barcoded 94 individuals and pooled• UKY-AGTC: 454 Libraries, emPCR, 454 sequencing

Page 10: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Preliminary Results

• Two test runs: 1/8th picotiter plate– 65K + 20K sequences

• One final run: 1/4th picotiter plate– 225K sequences

• Total ~ 300K sequences• Coverage of about 34X per sample per locus• Sorted >95%

Page 11: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

1664 seqs / 95 loci = 18X coverage96% loci have sequence45 loci had >10X coverage

Page 12: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Genotyping

• Clonal amplification through emPCR• Each sequence is derived from a single DNA strand• Identify both alleles without bacterial cloning

Page 13: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Errors

• Homopolymer regions• Single nucleotide mismatches

Page 14: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Automated Statistical Genotyping

Hohenlohe et al., 2010 PLOS Genetics

Page 15: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Genotyping

• Let n be the total number of reads per site

• Let n = n1 + n2 + n3, where ni is the read count for each possible nucleotide at the site

• For diploid, there are 10 possible genotypes– 4 homozygous (AA, TT, GG, CC)– 6 heterozygous (AT, AG, AC, TG, TC, GC)

• Calculate the likelihood of each possible genotype using a multinomial sampling distribution, which gives the probability of observing a set of read counts (n1,n2,n3,n4)

Page 16: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Likelihood of a Homozygote

Page 17: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Likelihood of a Heterozygote

Page 18: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Assigning Genotypes

• The 2 equations give the likelihoods of the two most likely hypotheses out of 10

• Use a LRT to compare the Homo vs. Het hypotheses (df=1)

• If the test is significant, we assign the most likely genotype at that site for that individual

• If the test is not significant, we do not assign a genotype

• This process tests for each SNP independently, but we want to genotype the entire sequence

Page 19: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

8 ways to be Het at 3 SNPs: C—T—C G—T—CC—C—C G—C—CC—T—T G—T—TC—C—T G—C—T

We need to maintain the correct info.

Page 20: Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph

Desired Workflow• 454 data received as FASTA files• Sort by barcode

– Tommy has some code for this

• Assemble by locus (alignments)– Currently in Geneious, what other options?

• Genotype (phase the alleles)– Need to implement automated method– Quality scores

• Export data as sequences for phylogenetic analysis• Export data as alleles for population genetic analysis