Upload
annis-patterson
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
BIOINFORMATICS REFERENCESYour name and the names of the people who have contributed to this presentation go here.
The names and addresses of the associated institutions go here.
BIOLOGICAL SEQUENCEBIOINFORMATICS is about searching biological databases, comparing sequences, looking at protein structures, and (more generally) asking biological and biomedical questions with a computer. It is the computational branch of molecular biology.
ANALYZING PROTEIN SEQUENCESteak eating familiarizes you with protein Proteins are found in both fish and vegetablesThey are made up of the same basic building blocks known as Amino Acids – these are complex organic molecules, called carbon, hydrogen, oxygen, nitrogen, and sulfur atoms.
PROTEINSProteins are like small machines in the cell. Proteins carry out most of the work in a cell.Proteins are synthesized from RNA sequences. Proteins are like small machines in the cell.Proteins carry out most of the work in a cell. Proteins are synthesized from RNA sequences.
AMINO ACIDSProteins are made of 20 amino acids.Each amino acid is small molecule made up of fewer than 100 atoms.The 20 amino acids have similar terminations; they can be chained to one another like Lego bricks.
PROTEIN SEQUENCESProteins are made of amino acids chained by peptide bonds.Protein sequences are written from the N to the C-terminus.Your average protein is 400 amino acids long. The longest protein is 30,000 amino acids long.Proteins have well-defined 3-dimensional structures.Hydrophobic amino acids are in the protein’s core.Hydrophilic amino acids are on the protein’s surface.
PROTEIN STRUCTURESProteins have well-defined 3-dimensional structures.Hydrophobic amino acids are in the protein’s core.Hydrophilic amino acids are on the protein’s surface.
DNA: DeoxyriboNucleic AcidGenomes and genes are made of DNADNA is the main support of heredity DNA SEQUENCESDNA sequences are made of 4 nucleotides
Adenine AGuanine GCytosine CThymine T
DNA Sequences can be very longHuman chromosomes contain hundreds of millions of nucleotides
NUCLEOTIDESNucleotides have similar terminations.Nucleotides are meant to be chained like Lego bricks.Nucleotides can interact with each other:
Adenine with thymine (A with T)Guanine with cytosine (G with C)
A tiny bacterium can contain a genome of several million nucleotides
DOUBLE-STRAND DNADNA sequences always come in two strands.The strands are complementary and opposite in orientation.By convention, biologists write only the 5’ and 3’ strands.Database-search programs search both strands automatically .
RNA: Ribonucleic AcidRNA is a close relative of DNARNA has many functions
Provides coding for proteinsHelps synthesize proteinsHelps many basic processes in the cell
RNA is not very stableRNA is synthesized and very often degradedDNA, by contrast, is very stable
THE RNA SEQUENCERNA contains 4 nucleotides:
A, G, C, UU is Uracil
RNA does not contain Thymine (T)Uracil replaces Thymine in RNARNA is single-stranded
RNA SECONDARY STRUCTURESRNA can make secondary structuresRNA can make 1 strand with itself as a secondary structureSecondary structures are made of stems and loops
PUBMED/MEDLINE
MULTIPLE-SEQUENCE ALIGNMENTS (MSAS)
RETRIEVING PROTEIN SEQUENCES IN SWISS-PROT
TYPICAL PROKARYOTIC GENOME
GENBANK
EXPLORING THE HUMAN GENOME WITH ENSEMBL
OPTIONALLOGO HERE
OPTIONALLOGO HERE
TURNING DNA INTO PROTEINS:THE GENETIC CODEDNA gets transcribed into RNA using nucleotide complimentarily.
RNA gets translated into proteins using the genetic code:
UCU UAU GCG UAA SER-TYR-ALA-STOP
PubMed is a database containing all the recent scientific publications in biology PubMed is free You can search PubMed using any keyword you are interested in.Open www.ncbi.nlm.nih.gov/pubmedType your favorite keywordsPress Return or Enter Click the Limits tabCheck the boxes you are interested in, such as
ReviewEnglishAIDS
Restrict the search with fields[AU] Author[SO] Source (journal)[TI] Title[AD] Address[MH] Keywords
The words will be searched only in the corresponding fieldsMedline contains only papers published after 1965Use no more than 10 names for papers before 1995Swiss-Prot is a database containing all the proteins with known functions
Swiss-Prot is available from the ExPAsy server at www.expasy.ch/sprot/ExPASy: Expert Protein Analysis SystemExPASy contains many useful online tools
Each Swiss-Prot entry is dedicated to a proteinA Swiss-Prot entry summarizes everything that is known about a given proteinThe entry contains functional information and links to other databases mentioning this protein
LOOKING FOR DNA SEQUENCESThere are many types of DNA sequencesThe most common are
Regulatory regions, often before genesUntranslated regions, often around the genesProtein-coding regionsIntergenic regions (between the genes)
All these sequences can be found in GenBank
FETCHING A DNA SEQUENCE AT THE NCBI
• Navigate to www.ncbi.nlm.nih.gov/Genbank/
• Type in a keyword.• Press Return or Enter.You get a list of entries
matching your keyword.• Point, click, and explore…
Multiple alignments reveal common features between sequencesMultiple alignments are useful for :- Comparing very different sequences, Making phylogenetic trees, Making structure predictionsMultiple-sequence alignments are abbreviated as MSAs
MAKING AN MSA WITH M-COFFEEOpen www.tcoffee.orgClick MCoffee::RegularCut and paste your sequencesSubmit your MSA
MAKING SENSE OF YOUR MSA
Positions are marked:Completely conserved = asterisk ( * )Highly conserved = colon (:)Conserved = period (.)Look for highly conserved blocks:The red box on this slide shows a highly conserved block.These blocks are often functionally important positions.
PROKARYOTIC ORGANISMS - are organisms lacking a true nucleus.EUKSRYOTIC ORGANISMS - are organisms having a true nucleus.GENE – is defined as the contiguous genome segment encompassing all the nucleotide-sequence information necessary to bring about its successful expression – that is, the production of protein or RNA.The 3 most basic classes of living organism are the -PROKARYOTES – such as bacteria,ARCHAEA – these are bacteria-like organisms living in extreme conditions), andTHE EUKARYOTES – going from microscopic yeast to humans, animals, and plants.FOR BIOINFORMATICS – Prokaryotes and Achaea are very much the same – with few exceptions.
TYPICAL PROKARYOTIC PROTEIN - CODING GENE•The gene has an uninterrupted sequence•Prokaryotic mRNA contains
The Ribosome Binding Site (RBS)The Open Reading Frame (ORF) in one pieceIn operons, the RNA can contain several ORFs
•Eukaryotes can be small (yeast) or big (whales)•Genomes are made of linear pieces of DNA called chromosomes•One chromosome: 10 to 700 Mb •The Human Genome
Contains 22+1 chromosomesIs 3 Gb long
• One gene every 100 Kb (human)•5 % of the genome is coding for proteins
•ProkaryotesGenome=one large circular chromosome + a few small circular chromosomes (plasmides) 0.5 to 8 Mb / chromosomeGenes in one piece70% of the genome is coding1 gene / Kb
•EukaryotesGenome= many large linear chromosomes10 to 700 Mb / chromosomeGenes split 5% of the genome is coding1 gene/ 100 Kb (Human)
PROKARYOTES VS. EUKARYOTES
Housed by the National Center for Biotechnologies (NCBI)GenBank is the memory of biological scienceContains EVERY DNA sequence ever publishedGenBank is the original information source for most biological databases GenBank is more complicated to use than gene-centric databases
•ACCESSION is the accession number
•Unique to each entry•Permanent
•LOCUS contains information on gene size•ORGANISM Defines the organism containing the gene•REFERENCE indicates who produced the sequence•FEATURES lists some functional features of the gene•GenBank entries can contain more than one gene
READING A PROKARYOTIC
GENBANK ENTRY
Accessible at www.ensembl.orgENSEMBL is a database of eukaryotic genomesAnnotated entriesWide range of examples: human, mouse, dog, and so onENSEMBL annotation is mostly automatedENSEMBL contains tools toBrowse the complete genomeSearch the complete genome with BLASTVisualize the position of a geneVisualize all experimental information on this gene (transcripts)
By pointing on a chromosome region you can zoom inside the chromosomeAll genes are cross-indexed with databases so you can find all related experimental information