23
Genome & Protein “ Sequence Analysis Programs” application in establishing Epidemiology and Variability RAJESH KUMAR RAJESH KUMAR Ph.D 1 Ph.D 1 st st yr yr Dairy Microbiology Divisi Dairy Microbiology Divisio N.D.R.I N.D.R.I

RAJESH KUMAR Ph.D 1 st yr Dairy Microbiology Division N.D.R.I

  • Upload
    jin

  • View
    36

  • Download
    2

Embed Size (px)

DESCRIPTION

Genome & Protein “ Sequence Analysis Programs” application in establishing Epidemiology and Variability. RAJESH KUMAR Ph.D 1 st yr Dairy Microbiology Division N.D.R.I. Introduction. Bio-informatics/Computational Biology:- Proteomics:- Large-scale study of proteins. - PowerPoint PPT Presentation

Citation preview

Page 1: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Genome & Protein “ Sequence Analysis Programs” application in establishing

Epidemiology and Variability

RAJESH KUMARRAJESH KUMARPh.D 1Ph.D 1stst yr yr

Dairy Microbiology DivisionDairy Microbiology DivisionN.D.R.IN.D.R.I

RAJESH KUMARRAJESH KUMARPh.D 1Ph.D 1stst yr yr

Dairy Microbiology DivisionDairy Microbiology DivisionN.D.R.IN.D.R.I

Page 2: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Introduction

Bio-informatics/Computational Biology:-

Proteomics:- Large-scale study of proteins.

Genomics:- study of an organism’s genome and use of genes.

Comparative Genomics:- comparison of genomes.

Structural Genomics:- determination of tridimensional structure of all proteins of a given organism.

Page 3: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Major Research efforts of Bio-informatics:-

Sequence analysis / alignment.

gene finding.

genome assembly.

protein structure alignment.

protein structure prediction.

prediction of gene expression and protein-protein interactions.

modeling of evolution.

Page 4: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Sequence Analysis

Encompasses the use of various bioinformatic methods to determine the biological function and structure of genes and the proteins.

DNA sequences Decoded Stored in electronic databases

Analysis

Comparative GenomicsPhylogenetic Tree

Page 5: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Shotgun Sequencing

Used in genetics for sequencing long DNA strands.

DNA small segments sequenced

Computer programs

Sequence Alignment:-arrangement of two or more sequences & highlighting their similarity.

tcctctgcctctgccatcat---caaccccaaagt |||| ||| ||||| ||||| |||||||||||| tcctgtgcatctgcaatcatgggcaaccccaaagt

Page 6: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Structural Alignment

More reliable over long evolutionary distances.

Useful in identifying structurally-conserved regions.

Multiple Alignment

extension of pairwise alignment to incorporate more than two sequences into an alignment.

help in the identification of common regions between the sequences. ProgramsClustal is used in cladistics to build phylogenetic trees

Page 7: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Framesearch

It is extension of Smith-Waterman, for pairwise alignment between a protein sequence and a nucleotide sequence.

It dynamically considers every possible single-nucleotide insertion or deletion to generate the translation that best matches the protein sequence.

Software:-

Ssearch

Smith-Waterman remains the gold standard for protein-protein or nucleotide-nucleotide pairwise alignment.

Page 8: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

BLAST

An algorithm for comparing biological sequences.

Widely used tools for searching protein and DNA databases for sequence similarities.

It gives answers of following questions:-

Which bacterial species have a protein that is related in lineage to a certain protein whose amino-acid sequence I know?

Where does the DNA that I've just sequenced come from?

. What other genes encode proteins that exhibit structures or motifs such as the one I've just determined?

Page 9: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

To run, BLAST requires two sequences as input:

a query sequence or target sequencea sequence database.

Search for high scoring sequence alignments.

Three stages of BLAST:-

1st stage, BLAST searches for exact matches of a small fixed length W between the query and sequences in the database.

2nd stage, BLAST tries to extend the match in both directions, starting at the seed.

If a high-scoring ungapped alignment is found, the database sequence is passed on to 3rd stage .

Page 10: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

In 3rd stage BLAST performs a gapped alignment between the query sequence and the database sequence

Alternative to BLAST is BLAT (Blast Like Alignment Tool).

FASTA:-

Slower but more sensitive than BLAST.

DNA and Protein sequence alignment software package.

The original FASTP program was designed for protein sequence similarity searching.

FASTA provided a more sophisticated shuffling program for evaluating statistical significance.

Page 11: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Programs in this package:-

"FAST-Aye", and stands for "FAST-All“."FAST-P" (protein) alignment."FAST-N" (nucleotide) alignment.

Current FASTA package contains programs for:-

protein:proteinDNA:DNA.Protein:translated DNA Ordered or unordered peptide searches.

Recent versions of the FASTA package include special translated search algorithms that correctly handle frameshift errors when comparing nucleotide to protein sequence data.

Page 12: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Clustal

Clustal is a widely used multiple alignment computer program. i) ClustalW ii) ClustalX

Sequence Analysis Programmes:-

EMBOSS

European Molecular Biology Open Software Suite (EMBOSS) is a program suite for nucleic acid and protein sequence analysis.

EMBOSS programs manipulate, analyze, and display nucleic acid and protein sequences.

Similar in functionality to the commercial GCG Wisconsin Software.

Page 13: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

PhyloGibbs

Designed to identify where these regulatory molecules bind to DNA.

PhyloGibbs compares DNA from multiple species in order to identify areas in which the genetic code is statistically similar and filter segments that are most likely to be of interest to scientists.

AutoEditor : Automated correction of sequencing and basecaller errors

a tool for correcting sequencing and basecaller errors using sequence alignment and chromatogram data.

On average AutoEditor corrects 80% of erroneous base calls.

It also greatly improves our ability to discover SNPs between closely related strains and isolates of the same species.

Page 14: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

MUMmer System for aligning whole genome sequences. Using an efficient data structure called a suffix tree, the system is able rapidly to align sequences containing millions of nucleotides.

MUMmer 3.0

Open source.

Improved efficiency.

Ability to find non-unique, repetitive matches as well as unique matches.

New graphical output modules.

Applications:-

MUMmer 1.0 was used to detect numerous large-scale inversions in bacterial genomes.

Page 15: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

MUMmer 2.1 was used to align all human chromosomes to one another and to detect numerous large-scale.

PROmer was used to compare the human and mouse malaria parasites P.falciparium and P.yoelii.

Current use of MUMmer 3.0:-

1) Identifying SNPs and other mutations in a large collection of Bacillus anthracis strains.

2) Comparing different assemblies of the same genome at different stages of sequencing and finishing.

Page 16: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

PSORT WWW Server PSORT is a computer program for the prediction of protein localization sites in cells.

WoLF PSORT WoLF PSORT PredictionPSORT II (Recommended for animal/yeast sequences)PSORT II Users' Manual PSORT II PredictionPSORT (Old version; for bacterial/plant sequences)PSORT-B (Recommended for Gram-negative bacteria)PSORT-B PredictionPSORT-B, a program applicable to the sequences of Gram-negative bacteria.

 E.coli K12 vs. E.coli O157:H7S.cerevisiae vs. S.pombeA.fumigatus vs. A.nidulans P.falciparum vs.P.yoelii

Page 17: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

PSORT Prediction Source of Input Sequence: Gram-positive bacterium Gram-negative bacterium yeast animal plant  Sequence ID (Default is MYSEQ):

Enter your Amino Acid sequence below (by copy & paste):

 

Characters except the standard 20 codes will be removed offTo submit the query, press this button: Submit

Page 18: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

PHIRE This Visual Basic program performs an algorithmic string-based search on bacteriophage genome sequences.

Discovering and extracting blocks displaying sequence similarity, without any prior experimental or predictive knowledge.

MB Advanced DNA Analysis

MB is relatively small and easy to use program.

Main features of MB are:

restriction analysis amino acids analysismultiple sequence alignment tool dot plot calculation of molecular weights and chemical properties of proteins prediction of 3D structures for small amino acids sequences.

Page 19: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

UniPro DPview This is a tool for finding and analyzing matches between genomes.

SEQtoolsProgram package for routine handling and analysis of DNA and protein sequences. The package includes general facilities for sequence and contig editing, restriction enzyme mapping, translation, and repeat identification.

DNA ClubDNA analysis software, Features:- remove vector sequence, find ORF, sequence editing, translate to protein sequence, protein sequence editing, RE Map, RE Map with translation, PCR primer selection, primer or probe evaluation.

Page 20: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

ZCURVENew highly accurate system for recognizing protein coding genes in bacterial and archaeal genomes based on the Z curve theory of DNA sequence.

DNA for Windowsis a compact, easy to use DNA analysis program, ideal for small-scale sequencing projects.  Webcutter is a free on-line tool to help restriction map nucleotide sequences. Features:- a simple, customizable interface worldwide platform-independent accessibility via the web seamless interfaces to NCBI's GenBank DNA sequence database restriction enzyme database.

Page 21: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Multilocus sequence typing (MLST)

Compares sequence variation in numerous housekeeping gene targets.

Developed for Neisseria gonorrhoeae, Streptococcus pneumoniae, and S. aureus.

Based on the classic multilocus enzyme electrophoresis (MLEE) method used to study the genetic variability of a species.

Drawbacks:-labor-intensive, time-consuming, and costly.

Page 22: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Single-locus sequence typing(SLST)

compares sequence variation of a single target.

provides an inexpensive, rapid, objective, and portable genotyping method to subspeciate bacteria.

Using a single target depends on finding a region for sequencing that is sufficiently polymorphic to provide useful strain resolution. Loci with short sequence repeat (SSR) regions may have suitable variability for discriminating outbreaks.

Page 23: RAJESH KUMAR Ph.D 1 st   yr Dairy Microbiology Division N.D.R.I

Two S. aureus genes conserved within the species, protein A (spa) and coagulase (coa), have variable SSR regions constructed from closely related 24- and 81-bp tandem repeat units, respectively.

The genetic alterations in SSR regions include both point mutations and intragenic recombination that arise by slipped-strand mispairing during chromosomal replication and that result in a high degree of polymorphism.