Biochip

  • View
    2.027

  • Download
    0

Embed Size (px)

Text of Biochip

  • 1.Algorithms forBiochip Design and Optimization Ion Mandoiu Computer Science & Engineering Department University of Connecticut

2. Overview

  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

3. Driver Biochip Applications

  • Driver applications
    • Gene expression (transcription analysis)
    • SNP genotyping
    • CNP analysis
    • Genomic-based microorganism identification
    • Point-of-care diagnosis
    • healthcare, forensics, environmental monitoring,
  • As focus shifts from basic research to clinical applications, there are increasingly stringent design requirements on sensitivity, specificity, cost
    • Assay design and optimization become critical

4.

  • Human Genome310 9base pairs
  • Main form of variation between individual genomes:single nucleotide polymorphisms (SNPs)
  • Total #SNPs110 7
  • Difference b/w any two individuals 310 6SNPs ( 0.1%of entire genome)

Single Nucleotide Polymorphisms ataggtcc C tatttcgcgc C gtatacacggg T ctata ataggtcc G tatttcgcgc A gtatacacggg A ctata ataggtcc C tatttcgcgc C gtatacacggg T ctata 5. Watson-Crick Complementarity

  • Four nucleotide types: A,C,T,G
    • As paired with Ts(2 hydrogen bonds)
    • Cs paired with Gs(3 hydrogen bonds)

6. SNP genotyping via direct hybridizationHybridization

  • SNP1 with alleles T/G
  • SNP2 with alleles A/G

Array with 2 probes/SNP Labeled sample A C T C G A A C T C G A Optical scanning used to identify alleles present in the sample 7. In-Place Probe Synthesis CGACCGACACG AGGAGCProbes to be synthesized A A A A A 8. In-Place Probe Synthesis CGACCGACACG AGGAGCProbes to be synthesized A A A A A C C C C C C 9. In-Place Probe Synthesis CGACCGACACG AGGAGCProbes to be synthesized A A A A A C C C C C C GGG G G G 10. Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene expression levels, SNP genotypes, Analysis of Hybridization Intensities Mask Manufacturing Physical Design: Probe Placement & Embedding Design Manufacturing End User 11. Unwanted Illumination Effect

  • Unintended illumination during manufacturingsynthesis of erroneous probes
  • Effect gets worse with technology scaling

12. Border Length Minimization Objective

  • Effects of unintended illuminationborder length

A A A A A C C C C C C GGG G G Gborder CGACCGACACG AGGAGC 13. Synchronous Synthesis

  • Periodic deposition sequence,e.g., (ACTG) k
    • Each probe grown by one nucleotide in each period

# border conflicts b/w adjacent probes = 2 x Hamming distance T G C A T G T G C A C A period C T A C G T 14. 2D Placement Problem

  • Find minimum cost mapping of the Hamming graph onto the grid graph
  • Special case of the Quadratic Assignment Problem

Edge cost = 2 x Hamming distance probe 15. 2D Placement: Sliding-Window Matching

  • Slide window over entire chip
  • Repeat fixed # of iterations ( O(N) time for fixed window size), or until improvement drops below certain threshold
  • Proposed by [Doll et al. 94] in VLSI context

1 3 2 5 4 Select mutually nonadjacent probes from small window 2 2 3 1 4 Re-assign optimally 16. 2D Placement: Epitaxial Growth

  • Proposed by [PreasL88, ShahookarM91] in VLSI context
  • Simulates crystal growth
  • Efficient row implementation
    • Use lexicographical sorting for initial ordering of probes
    • Fill cells row-by-row
    • Bound number of candidate probes considered when filling each cell
    • Constant # of lookahead rowsO(N 3/2 ) runtime, N = #probes

17. 2D Placement: Recursive Partitioning

  • Very effective in VLSI placement [AlpertK95,Caldwell et al.00]
  • 4-way partition using linear time clustering
  • Repeat until Row-Epitaxial can be applied

18. Asynchronous Synthesis A A A C C C T T T G G G A C T G A G T G T G A A Deposition Sequence Probes Synchronous Embedding A G T A G G T A G A A G T A G T ASAP Embedding G 19.

  • Efficient solution by dynamic programming

Optimal Single-Probe Re-Embedding A C T A C G T A C G T Source Sink 20. In-Place Re-Embedding Algorithms

  • 2D placement fixed, allow only probe embeddings to change
    • Greedy:optimally re-embed probe with largest gain
    • Chessboard:alternate re-embedding of black/white cells
    • Sequential:re-embed probes row-by-row

CPU %LB CPU %LB CPU %LB 121.4 120.5 Chessboard 1423 54 127.1 125.7 Greedy 120.9 119.9 Sequential 1535 943 500 64 40 100 Chipsize 21. Integration with Probe Selection Probe Selection Physical Design:Placement & Embedding Probe Pools Chip size 100x100 Pool Row-Epitaxial Pool Size 7515 15.2 16 3645 11.8 8 1796 8.2 4 1040 4.3 2 217 - 1 CPU sec. % Improv 22. Overview

  • Physical design of DNA arrays
  • DNA tag set design
  • Digital microfluidic biochip testing
  • Conclusions

23. Universal Tag Arrays

  • Brenner 97, Morris et al. 98
    • Array consisting of application independenttags
    • Two-part reporter probes: aplication specificprimersligated toantitags
    • Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

24. Universal Tag Array Advantages

  • Cost effective
    • Same tag array used for different analyses
    • can be mass-produced
    • Only need to synthesize new set of reporter probes
  • More reliable!
    • Solution phase hybridization better understood than hybridization on solid support

25. SNP Genotyping with Tag Arrays Tag + Primer G A G C antitag

  • Mix reporter probes with unlabeled genomic DNA

2. Solution phase hybridization 3. Single-Base Extension (SBE) 4. Solid phase hybridization G A G G A G T G A T C C T C C 26. Tag Set Design Problem

  • (H1) Tags hybridize strongly to complementary antitags
  • (H2) No tag hybridizes to a non-complementary antitag

t1 t1 t2 t2 t1 t2 t1 Tag Set Design Problem:Find a maximum cardinality set of tags satisfying (H1)-(H2) 27. Hybridization Models

  • Melting temperature Tm:temperature at which 50% of duplexes are in hybridized state
  • 2-4 rule
  • Tm = 2 #(As and Ts) + 4 #(Cs and Gs)
  • More accurate models exist, e.g., the near-neighbor model

28.

  • Hamming distance model, e.g., [Marathe et al. 01]
    • Models rigid DNA strands
  • LCS/edit distance model, e.g., [Torney et al. 03]
    • Models infinitely elastic DNA strands
  • c-token model[Ben-Dor et al. 00]:
    • Duplex formation requires formation ofnucleation complex between perfectly complementary substrings
    • Nucleation complex must have weightc, where wt(A)=wt(T)=1, wt(C)=wt(G)=2 (2-4 rule)

Hybridization Models (contd.) 29. c-h Code Problem

  • c-token: left-minimalDNA string of weightc, i.e.,
    • w(x)c
    • w(x) < c for every proper suffix x of x
  • A set of tags is ac-h codeif
    • (C1) Every tag has weighth
    • (C2) Every c-token is used at most once

c-h Code Problem[Ben-Dor et al.00]Given c and h, find maximum cardinality c-h code 30. Algorithms for c-h Code Problem

  • [Ben-Dor et al.00] approximation algorithm based on DeBruijn sequences
  • Alphabetic tree search algorithm
    • Enumerate candidate tags in lexicographic order, save tags whose c-tokens are not used by previously selected tags
    • Easily modified to handle various combinations of constraints
  • [MT 05, 06] Optimum c-h codes can be computed in practical time for small values of c by using integer programming
    • Practical runtime using Garg-Koneman approximation and LP-rounding

31. Token Content of a Tag

  • c=4
  • CCAGATT
  • CC
  • CCA
  • CAG
  • AGA
  • GAT
  • GATT

Tagsequence of c-tokens End pos:234567c-token:CC CCA CAG AGA GAT GATT 32. Layered c-token graph for length-l tags s t c 1 c N l l-1 c/2 (c/2)+1 33. Integer Program Formulation [MPT05]

  • Maximum