A targeted subgenomic approach for phylogenomics based on microfluidic pcr and high throughput...

  • View
    182

  • Download
    2

  • Category

    Science

Preview:

Citation preview

A targeted subgenomic approach for phylogenomics based on microfluidic

PCR and high-throughput sequencing

Simon Uribe-Convers, Matt L. Settles and David C. TankUniversity of Idaho

www.simonuribe.com@uribe_convers

The era of Genomics

The era of Genomics

Sequence Capture

Genome Skimming

GBS/RADSeq

Transcriptomes

Whole Genome

The era of GenomicsIllumina platform

Genomic Library Amplification Sequencing

http://www.dddmag.com/sites/dddmag.com/files/legacyimages/Articles/2009_11/fluidigm.jpg

Targeted (sub-) genomics

-Using Fluidigm Access Array -48 x 48 (2304 PCRs) -Ready for next-gen sequencing

Microfluidic PCR

Mod

ified

from

:http

://w

ww.

dddm

ag.c

om/s

ites/

dddm

ag.c

om/f

iles/

lega

cyim

ages

/Art

icle

s/20

09_1

1/flu

idig

m.jp

g

Primer: forward & reverseConserved sequenceBarcodesSequencing adaptors

-4 primer reaction

-Dual barcodes and adapters are incorporated in the reaction

-No need for library preparation!

Microfluidic PCR

Primer design criteria

700bp

-Variable regions between 400-900bp -Conserved flanking regions -Every primer has the same annealing temperature (60°C)

Success

Dimer

Fail

1000 Plants Project (1KP)MarkerMiner

Chloroplast data-Six complete plastomes (via long PCR)

-Most variable regions in the chloroplast

-Designed 74 primer pairs

-53 primer pairs were successfully validated -72% success rate -The 48 most informative ones were chosen

average variability 2.7% (0.8%-7.5%)

LSC IRB SSCSmall Single CopyLarge Single Copy Inverted Repeat

Chloroplast data

-Low coverage genomic data

-Shotgun sequencing for four sample - three species HiSeq 2000 - 100bp paired-end reads

Nuclear data

Orthology, yes!

-Compared our reads to public databases PPR gene family COSII !

-Pipeline: BLAT Keeps reads and gene MAFFT IntronFinder from SolGenomics

Nuclear data

R primerExonExon

Target gene

F primer400-800 bp

Raw reads

Data Processing

Raw reads

-Trimming (optional)

-different values for R1 and R2 !

-Merge reads

-Min. 20 bp overlap

-Red colors are joined reads

-Grey colors are unpaired !

-Very little missing data !

!

Sample 1 Sample 2 Sample3

Raw reads

-Split reads into samples by dual barcodes (demultiplexing)

Region 1 Region 2 Region 3

Sample 1

-Split reads into amplicons by primers

-Up to 2 primer mismatches

-4 last bp of primers must match to produce clean ends

Sample 1 - Region 1

Sample 1 - Region 1

40% 40% 15% 5%

Minimum 5 reads and 5% of all reads

Sample 1 - Region 1

21%

Minimum 5 reads and 5% of all reads

21% 21% 21% 12.5% 4.1%

Neobartsia - Orobanchaceae (Uribe-Convers et al. in prep; UIdaho)

576 samples Nuclear: 21 PPR, 24 COSII, 1 ITS, 1 ETS, 1 Phototropin2 Chloroplast: 48 most variable regions Total: ~50,000 bp

Gene Family No. Primer Pairs Validated Primer Pairs Success ratePPR 44 26 59.09

COSII 130 25 19.23ITS 4 3 75ETS 4 4 100

Phototropin1 3 0 0Phototropin2 3 3 100

Total 188 61 32.44

Castilleja - Orobanchaceae

96 samples Nuclear: In primer design Chloroplast: 48 most variable regions Total: ~25,000 bp

CNMR.8

CNMR12

CAC

C17

CNMR.9

CNRM.4

CAPB28

CAC

C10

CAPB.1

CAPB29

CNRM.1

CATB26

CATB23

CAC

C13

CNRM30

CNAC

21

CNAC

10

CNAC

22

CNAC

19

CNNR28

CAM

D.4

CAM

D.2

CAM

D.7

CASC

S13

CWMT.1

CWMT.2

CMPAL29

CMPAL.8

CMPAL20

CAPR

C.8

CAPR

C.9

CAPR

L.8

CAPR

D.6

CNPH

12

CNPC

21

CNPC

15

CNPC

.9

CNPH

14 CNPH

13 CNNR30

CASC

31

CASC

.1

CMJH21

CMJH20

CMJH10 C

LiWA26

CLiW

A30

CLiW

A16

CLaPL.6

CLaPL.1

CLaPL.4

CLaG

P.5 CLaG

P.3

1103a

1103b

CWMB.5

CWBH

.1

CWMB.6

CWMB.1

CWBH

20

767d

770

771a

CLiH

D15

CLiPP21

CLIPP16

CLIH

D13

CMNP19

CLIH

D10

CLiTB18

CMNP25

CMNP13

CLiSW

15

CLiSW

16

CLiSW

14

CLiPP12

CMMP.2

CMMP.9

CMMP10

CAAR

.1

CAAR

10

CAAR

11

CLiD

N.5

CLiD

N.4

CLiD

N16

CLiD

N17

CLiTB.2

BS ≥ 75%BS ≥ 90%BS = 100%

C. affinis var. affinisC. affinis var. neglectaC. affinis var. inflataC. affinis var. contentiosaC. affinis var. insularisC. wightiiC. mendocinensisC. latifoliaC. litoralis

A

CD

E

F

G

B

A

B

C

D

E

F

G

Castilleja affinis vars. affinis/neglecta/inflataCastilleja mendocinensis / C. wightiiCastilleja latifoliaCastilleja affinis var. contentiosaCastilleja wightiiCastilleja affinis var. insularisCastilleja litoralis / C. mendocinensis

Tank et al. in prep

Lachemilla - Rosaceae (Diego Morales-Briones et al., UIdaho)288 samples Nuclear: 48 genes, Chloroplast: 48 most variable regions Total: ~55,000 bp

Autopolyploidy Allopolyploidy

Cucurbita - Cucurbitaceae (Heather-Rose Kates et al.; UFlorida)

22 species Nuclear: 48 genes

Draba and Solanum - Solanaceae (Ingrid Jordon-Thaden et al.; Bucknell University)

Nuclear: Genes based on transcriptomes using MarkerMiner

Tank lab Diego Morales-Briones, Hannah Marx Sarah Jacobs, Maribeth Latvis !IBEST Sam Hunter, Dan New, Tamara Max !

Acknowledgments

@uribe_convers

Recommended