22
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

  • View
    228

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Genomic Signal Processing

Dr. C.Q. Chang

Dept. of EEE

Page 2: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Outline

• Basic Genomics

• Signal Processing for Genomic Sequences

• Signal Processing for Gene Expression

• Resources and Co-operations

• Challenges and Future Work

Page 3: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Basic Genomics

Page 4: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Genome• Every human cell contains 6 feet of double stranded (ds) DNA• This DNA has 3,000,000,000 base pairs representing 50,000-

100,000 genes• This DNA contains our complete genetic code or genome• DNA regulates all cell functions including response to disease,

aging and development• Gene expression pattern: snapshot of DNA in a cell• Gene expression profile: DNA mutation or polymorphism over

time• Genetic pathways: changes in genetic code accompanying

metabolic and functional changes, e.g. disease or aging.

Page 5: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE
Page 6: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Gene: protein-coding DNA

Protein

mRNA

DNA

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

Page 7: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

In more detail(color ~state)

Page 8: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Signal Processing for Genomic Sequences

Page 9: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

The Data Set

Page 10: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

The Problem• Genomic information is digital letters A, T, C and G• Signal processing deals with numerical sequences,

character strings have to be mapped into one or more numerical sequences

• Identification of protein coding regions• Prediction of whether or not a given DNA segment

is a part of a protein coding region• Prediction of the proper reading frame• Comparing to traditional methods, signal processing

methods are much quicker, and can be even more accurate in some cases.

Page 11: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Sequence to signal mapping

1 , 1 , 1 , 1a j t j c j g j

[ ] [ ] [ 1] / 2 [ 2] / 4y n x n x n x n

Page 12: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Signal Analysis

• Spectral analysis (Fourier transform, periodogram)

• Spectrogram

• Wavelet analysis

• HMT: wavelet-based Hidden Markov Tree

• Spectral envelope (using optimal string to numerical value mapping)

Page 13: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Spectral envelope of the BNRF1 gene from the Epstein-Barr virus

(a) 1st section (1000bp), (b) 2nd section (1000bp),

(c) 3rd section (1000bp), (d) 4th section (954bp)

Conjecture: the 4th quarter is actually non-coding

Page 14: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Signal Processing for Gene Expression

Page 15: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Biological Question

Sample preparationMicroarray

Life Cycle

Data Analysis & Modeling

Microarray Reaction

MicroarrayDetection

Taken from Schena & Davis

Page 16: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

cDNA clones(probes)

PCR product amplificationpurification

printing

microarray Hybridise target to microarray

mRNA target)

excitation

laser 1laser 2

emission

scanning

analysis

overlay images and normalise

0.1nl/spot

Page 17: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE
Page 18: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Image Segmentation

• Simple way: fixed circle method• Advanced: fast marching level set segmentation

Advanced Fixed circle

Page 19: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Clustering and filtering methodsPrincipal approaches:• Hierarchical clustering (kdb trees, CART, gene shaving)• K-means clustering• Self organizing (Kohonen) maps• Vector support machines• Gene Filtering via Multiobjective Optimization• Independent Component Analysis (ICA)Validation approaches:• Significance analysis of microarrays (SAM)• Bootstrapping cluster analysis• Leave-one-out cross-validation• Replication (additional gene chip experiments, quantitative PCR)

Page 20: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

ICA for B-cell lymphoma data

Data: 96 samples of normal and malignant lymphocytes.

Results: scatter-plotting of 12 independent components

Comparison: close related to results of hierarchical clustering

Page 21: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Resources and Co-operations

Resources: databases on the internet such as

• GeneBank

• ProteinBank

• Some small databases of microarray data

Co-operations in need:

• First hand microarray data

• Biological experiment for validation

Page 22: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

Challenges and Future Work• Genomic signal processing opens a new signal

processing frontier• Sequence analysis: symbolic or categorical signal,

classical signal processing methods are not directly applicable

• Increasingly high dimensionality of genetic data sets and the complexity involved call for fast and high throughput implementations of genomic signal processing algorithms

• Future work: spectral analysis of DNA sequence and data clustering of microarray data. Modify classical signal processing methods, and develop new ones.