Upload
yogendra-dhakad
View
221
Download
0
Embed Size (px)
Citation preview
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
1/34
March 16, 2004 1
Computational
Methods inBioinformatics
Dr. Moustafa ElshafeiSystems Engineering Department
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
2/34
March 16, 2004 2
Topics What is Bioinformatics ?
Introduction to Molecular genetics Some challenging problems
Review of the current computationaltechniques.
Future approaches
Conclusion
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
3/34
March 16, 2004 3
What is Bioinformatics?
Bioinformatics is a managementinformation system for molecular biology
Organization of a huge amount of
information in Gene Banks and proteinBanks
Data mining and analysis tools Modeling, interpreting and predicting
Biological activities.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
4/34
March 16, 2004 4
Introduction to molecular
genetics Molecules
Lipids
Proteins
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
5/34
March 16, 2004 5
Nucleus and Nucleolus
Plant Cell Note the large nucleus
and nucleolus in the
centre of the cell
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
6/34
March 16, 2004 6
Chromosomes
and Genes
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
7/34
March 16, 2004 7
How long is DNA?
DNA helix ( 2 nm wide) are rounded on
histone fibre of diameter 11 nm, then
compacted in 30 nm cromation fiber, thencoiled in 700 nm diameter then formed as
chromosomes 1400 nm diameter.If the the DNA strand of the human gene had 1 mm
diameter, it would stretch to 25km. It would be winded
and twisted, and coiled until it becomes a chromosome of
50 cm diameter and 4 meter length.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
8/34
March 16, 2004 8
Chromosomes Chromosomes are the cellular components
that contain genes, in animals and plantsthey are located in Nucleolus;
Genes are the functional units ofinheritance.
Genes are specific segments of DNA that
code for specific proteins which control cellstructure and function.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
9/34
March 16, 2004 9
Number of chromosomes vary from
organism to another
Human 46,
Chicken 78, Mouse 40,
wheat 42, corn 20, Fruit fly 8, scorpion 4
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
10/34
March 16, 2004 10
Genes & Genetics
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
11/34
March 16, 2004 11
Deoxyribonucleic acid (DNA)
Pair of sequence of four nucleotides:
cytosine (C), guanine (G), adenine (A), and thymine (T). A
Pairs with T, and C pairs with G, the pairs held together by
hydrogen bonds.
TCTCGGCATTAGGGCCT
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
12/34
March 16, 2004 12
TCTCGGCATTAGGGCCT
AGAGCCGTAATCCCGGA
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
13/34
March 16, 2004 13
Genome length in nucleotide pairs
Virus 5k E.Coli 4700k
Human being 3,000,000k Corn 4,500,000k
Salamander 72,500,000k
G d t i
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
14/34
March 16, 2004 14
Genes and proteins Genes are segments of DNA which code for proteins.
A segment of the DNA that codes for a specific protein is astructural gene.
Protein synthesis is also governed by a genetic code
Every function in a cell is controlled by some kind ofproteins .
Proteins are formed by strands from 20 amino acids
Every three nucleotides are called codons.
The 64 possible codons are mapped into, Start, Stop, andone of the 20 amino acids
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
15/34
March 16, 2004 15
Protein Mapping
Protein consists of a chain of amino acidsThere are 20 amino acids
Each amino acid is coded by three bases.
During protein synthesis T->U; DNA->mRNA
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
16/34
March 16, 2004 16
Protein Expression
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
17/34
March 16, 2004 17
Genes length between 30k-250k, exon region 69-
3106 bp. Introns can be as large as 32k
Mean internal coding exon 150 bp.
Eukaryotes have only 10% of their DNA coding
for proteins. Humans may have a little as 1%coding for proteins. Viruses and prokaryotes use a
great deal more of their DNA.
Human genome project completed 2003, 3 billion
bp, and about 30,000 genes, compared to 13,600for the fruit fly, and over 14,000 genes in
mosquitoes, Rice 50,000.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
18/34
March 16, 2004 18
If the number of genes really turns out
to be about 30,000, then this can be atestament to the marvellous design of
life. Only a genius could create us withso few genes performing so many
functions
A famous scientist in genetics.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
19/34
March 16, 2004 19
An RNA gene is any gene that is not
translated into a protein. Commonly-usedsynonyms of "RNA gene" are noncoding
RNA or (ncRNA).
RNA genes code certain Regulatory
functions.
RNA genes are not predictable by currentalgorithms. Not clear how many of these
are hidden in the human genome.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
20/34
March 16, 2004 20
Gene
Banks
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
21/34
March 16, 2004 21
Challenges
1-Gene finding: try to identify a potential gene region in DNA,
however, only 1-3% of human genome is translated into proteins.
2- Finding a region of interest. Raw sequencing is performed on pieces
of random lengths between 500 to 5000 pbs. With possible largeoverlapping parts at both ends, 6 possible interpretation of each strand.
Need for algorithms to align the fragments
3-Multiple Alignment of a set of genes to reveal regions of similarities,
and cross species changes.
4- Local alignment and similarity search, Statistical grouping,
clustering, statistical similarity measures for course classification.
5- Protein structure prediction: given a protein sequence, how itwould fold itself into a specific 3D complex shape.
Locating the non-coding genes (RNA)
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
22/34
March 16, 2004 22
Methods
Similarity Search
Content search
Signal Search
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
23/34
March 16, 2004 23
Common Software Uses
Similarity analysis Sequence analysis
Sequence alignment
Population genetics statistical analysis
Format conversion, Database maintenance
and searching
b h
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
24/34
March 16, 2004 24
Data base Fast Search
BLAST & FASTA
Query data base for DNAs similar to a givensequence.
Rely on identification of brief subsequences (k-tuples). Multiple k-tuples serve as seeds forextended alignment.
Versions for DNA and protein sequences.
Limited capability to handle gaps in coding
regions.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
25/34
March 16, 2004 25
Gene Prediction/Gene analysis
The most common : GRAIL* FGENEH/FGENES
MZEF GENSCAN*
Procrustes
GeneID
GeneParser
HMMgene
GRAIL
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
26/34
March 16, 2004 26
GRAIL
Gene Recognition and Analysis Link
There are multiple versions:
Grail 1, Grail 1a, Grail 2, GRAIL III, etc.
GRAIL II uses Neural to classify Introns and
Exons. GRAIL III Uses Dynamic Programming to find the
optimal combinations of Introns and Exons.
Refinements: consideration of contextual
information, and linguistic methods.
GenScan
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
27/34
March 16, 2004 27
GenScan
Predicts complete gene structures
Input sequence may represent more than one gene
It follows a probabilistic model
Uses Markov Model, Generalized HiddenMarkov Model.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
28/34
March 16, 2004 28
Multiple Sequence Alignment Programs
Discover the commonalities and evolutionary
relations among a set of genes or proteins.
Examples
ClustalW
DiAlign MAP
Alignment editors
Bioedit
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
29/34
March 16, 2004 29
ClustalW
finds the best global alignment for a set ofinput sequences (nucleic acid or protein).
A global alignment refers to the best match
over the total length of the sequences.
Produces a similarity tree with scores
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
30/34
March 16, 2004 30
CLUSTALW
Step 1: Pairwise alignment, distance matrix
Calculates distance scores between pairs
Cost: O(q2
l2
) , q number of sequences, l mean length Step 2: Guide tree
Group nearest first
Build tree sequentially Cost: O(q3)
Step 3: Progressive alignment
Align, starting at leaves of tree Cost: O(ql2)
Other programs (MAP) use DP to find the most
likely evolutionary sequence.
P t i St t P di ti
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
31/34
March 16, 2004 31
Protein Structure Prediction
NNs are the bases for many known software
packages for predicting protein structures.
The main software packages : nnPredict
Predict Protein
Predator
PSIPRED
SOPMA
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
32/34
March 16, 2004 32
POSSIBLE RESEARCH
DIRECTIONS
Neuro Fuzzy techniques Genetic Algorithm
Theory of Error Correction codes
Wavelets
Spectrum analysis
Dynamic modeling of protein expression.
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
33/34
March 16, 2004 33
7/30/2019 Computational Methods in Bioinformatics-Dr Elshafei
34/34
March 16, 2004 34
THANK YOU