94
Databases in bioinformatics II Marcela Davila-Lopez Department of Medical Biochemistry and Cell Biology Institute of Biomedicine BIOINFORMATICS AND SYSTEMS BIOLOGY, MSC PROGR Sequence analysis, UMF018, 2008

Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases inbioinformatics II

Marcela Davila-LopezDepartment of Medical Biochemistry and Cell Biology

Institute of Biomedicine

BIOINFORMATICS AND SYSTEMS BIOLOGY, MSC PROGR Sequence analysis, UMF018, 2008

Page 2: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 2

Overview

– Uniprot/Swissprot– Divisions at NCBI (nt db)– Sequencing methods– EST– RefSeq vs GenBank– TraceArchive

– Refining searches at Entrez– eUtilis (programer utilities)

Page 3: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 3

UniProt/SwissProt

1980’s Protein sequence databaseHigh quality detailed curationEBI + SIB

Quick release of data not yet annotatedTrEMBL (Translation of EMBL nucleotide sequences)only computationally annotated entries

2002 EBI + SIB + PIRUniprot Consortium

http://www.expasy.ch/sprot/sprot_details.html

Page 4: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 4

UniProtKB

Central hub for the collection of functional Information on proteins with accurate, consistent and rich annotation

OntologiesClassificationsCross-referencesIndications of the quality of annotation (Exp – Comp)

•Manually-annotated records: literature and curator-evaluated "UniProtKB/Swiss-Prot”

•Computationally analyzed records that await full manual annotation"UniProtKB/TrEMBL"

Page 5: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 5

Uniprot - UniRefhttp://www.uniprot.org/

Clustered sets of sequences (UniProt Knowledgebase + UniParc)

complete coverage of sequence space at several resolutionshiding redundant sequences (but not their descriptions)

UniRef100: Identical sequences and sub-fragments (11 or more)sequence of a representative proteinaccession numbers of all the merged entrieslinks to the corresponding records

UniRef90 and UniRef50 by clustering UniRef100 90% or 50% sequence identity

Faster in sequence searches.

Page 6: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 6

UniProt - record

Page 7: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 7

UniProt - record

Page 8: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 8

UniProt - record

Page 9: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 9

UniProt - record

Page 10: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 10

Functional divisions in Nucleotide DB at NCBI

Organization of nucleotide sequence records into discrete functional types:

Query specific subsets particular techniqueinterpretation of data from a proper biological point of view

EST 300-500 bp single reads from mRNA (cDNA)STS 200-500 bp GSS Similar to EST but from genomic originHTG Unfinished DNA sequences generated by HTSHTC Unfinished sequences from HT cDNA projectsPAT Patent sequencesCON Constructed records of chrs, genomes and other long DNA

sequences

Page 11: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 11

Genome sequencing

Encompasses biochemical methods for determining the order of the nucleotide bases (AGCT) in a DNA oligonucleotide (~20, today 200)

Page 12: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 12

Why Sequencing Genomes

Remarkable similar molecular level despite their obvious outward differences

genes similar DNA sequence tend to perform ≈ functions

Understanding the function of a gene in one organism we may get an idea of what function that gene may perform in a more complex organism (humans)

Applied to various fields: medicine, biological engineering, forensics

Page 13: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 13

Sequencing methods

1954 Whitfeld PR. - Sequencing by degradation

Sequencing by Synthesis1975 F. Sanger – AR. Coulson (plus-minus method)1977 Walter Gilbert – A. Maxam (chemical modification)

F. Sanger (chain termination)1979 Shotgun sequencing1984 Ligation based (Applied Biosystems)1988 Pyrosequencing (Roche, Biotage)1994 Reversible dye terminators (Illumina – Helicos)

Non-enzymatic1989 Sequencing by Hybridization (Affymetrix)

DNA cannot be synthesized from scratch.

Archon X Prize 10 million 100 human genomes / 10 days with $10,000 / genome

Page 14: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 14

Maxam-Gilbert sequencing

- Chemical modification of DNA(radiolabelling)

- Cleavage at specific bases(G,G+A,C,C+T)

- Size-separated(gel electrophoresis)

- Autoradiography(X-ray film)

PROS: Purified DNA could be used directly

CONS: Technical complexUse of hazardous chemicalsDifficulties tos scale-up

Strong band 1st w/ weaker band in the 2nd AStrong band 2nd w/ weaker bnad in the 1st GBand in 3rd and 4th CBand only in 4th T

Maxam AM, Gilbert W., A new method for sequencing DNA, Proc Natl Acad Sci U S A. 1977 Feb;74(2):560-4

Page 15: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 15

Sanger method

dNTP (deoxynucleotide) didNTP (dideoxynucleotide)

Page 16: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 16

Sanger method

Radio/fluorescentlylabelled nt

Page 17: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 17

Sanger method

Page 18: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 18

Sanger method

Page 19: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 19

Sanger method

Page 20: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 20

Sanger method: variations

Dye-labeled primer

PROS: Upon completion, these four reactions can be combined into one lane on a gel, and run on a machine that can scan the lanes with a laser

http://www.escience.ws/b572/L8/L8.htm

Page 21: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 21

Sanger method: variations

Dye-terminator sequencing

PROS: Use an optical system fastermore economicalautomation

Single reaction (≠ dye for each nt)

Page 22: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 22

Large scale sequencing strategies

Sanger: Not practical to sequence a complete genomeOnly about 1000 bases can be sequenced accuratelyA primer of known sequence is required

A Privately-Funded Sequencing Project : Celera Genomics

No libraries of BAC clones Human genome fragments of 2-10 kb sequence themAssembly ?

The Publically-funded Human Genome Project : NIH/NSF

'libraries' of BAC clones sequence them

Page 23: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 23

Hierarchichal shotgun sequencing

150 Mb

contig

PROS: Individual clone can be sequenced by different peopleEach stretch of DNA only needs to be sequenced once

CONS: Slow process of sub-cloning and mapping of the clonesRequires significant human manipulation

http://www.scq.ubc.ca/genome-projects-uncovering-the-blueprints-of-biology/

Page 24: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 24

Shotgun sequencing

Prokaryotic genomes (smaller in size,less repetitive DNA)

PROS: Faster and less expensive

CONS: Prone to errors due to incorrect assembly of finished sequenceMuch more sequencing to have p < 1% of missing a sub-clone

http://www.scq.ubc.ca/genome-projects-uncovering-the-blueprints-of-biology/

Page 25: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 25

Next generation platforms

Platform Chemistry Read LengthAffymetrix Sequencing by hybridization ~200bpRoche (454) Pyrosequencing 230 - 400 bpIllumina (Solexa) Sequencing by Synthesis 40 bpABI SOLiD Ligation based sequencing 35 bp

Page 26: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 26

Sequencing by synthesis

ss DNA Enzymatically synthesize its complementary strand Detect fluorescence of one nucleotide at a timeRemove the blocking group Polymerization of another nucleotide

http://www.illumina.com/media.ilmn?Title=Sequencing-By-Synthesis%20Demo&Cap=&PageName=solexa%20technology&PageURL=203&Media=1

Page 27: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 27

Sequencing by synthesis

Page 28: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 28

Sequencing by synthesis

Page 29: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 29

Sequencing by synthesis

Page 30: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 30

Pyrosequencing

Detects the activity of DNA polymerase with a chemiluminescentenzyme by synthesizing the complementary strand.

PROS: 96 samples 1hr (vs 24 hr)CONS: 300-500 nucleotides

Used for resequencing or sequencing of genomes for which the sequence of a close relative is already available

Fungal, bacterial and viral identification

Page 31: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 31

Pyrosequencing

C G T C C G G A

SulfurylaseApy

rase

Luciferin

(1)PPi

(1)ATP

Oxyluciferin

Luciferase

Charge coupleddevice (CCD)

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Page 32: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 32

Pyrosequencing

C G T C C G G A

SulfurylaseApy

rase

Luciferin

(1)PPi

(1)ATP

Oxyluciferin

Luciferase

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Page 33: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 33

Pyrosequencing

C G T C C G G A

Apyra

se

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Page 34: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 34

Pyrosequencing

C G T C C G G A

Sulfurylase

Apyra

se

Luciferin

(1)PPi

(1)ATP

Oxyluciferin

Luciferase

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Page 35: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 35

Pyrosequencing

C G T C C G G A

Sulfurylase

Apyra

se

Luciferin

(2)PPi

(2)ATP

Oxyluciferin

Luciferase

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Page 36: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 36

Pyrosequencing

C G T C C G G A

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Sulfurylase

Apyra

se

Luciferin

(2)PPi

(2)ATP

Oxyluciferin

Luciferase

Page 37: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 37

Pyrosequencing

C G T C C G G A

Pyrogram

http://www.biotagebio.com/DynPage.aspx?id=7454

Sulfurylase

Apyra

se

Luciferin

(1)PPi

(1)ATP

Oxyluciferin

Luciferase

Page 38: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 38

Sequencing by ligation

The method:

It is based on sequential ligation of dye labeled oligonucleotide probes whereby each probe queries two base positions at a time

DNA ligase rather than polymerase

The system uses 4 florescent dyes to enconde for the 16 possibletwo base combinations

Multiple ligation cycles of probe hybridization, ligation, imaging an analysis are preformed

The resulting product is the removed

The process is repeated for 5 more extension rounds with primershybridized to position n-1, n-2, etc in th adaptor.

http://www3.appliedbiosystems.com/AB_Home/applicationstechnologies/SOLiDSystemSequencing/index.htm

Page 39: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 39

Sequencing by ligation2-base color encoding data

1 dye = 4 possible di-nucelotides

2 bases are interrogated in each ligation reaction providing increased specificity

Page 40: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 40

Sequencing by ligationPrimer round 1

Page 41: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 41

Sequencing by ligationPrimer round 2

Total of 5 primer rounds

Each sequence is interrogated twice in different reactionsimproves the signal to noise ratio

Page 42: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 42

Sequencing by ligationDecoding

Color space

Possible dinucleotides

Base zero Decoded sequence

Base space sequence

Page 43: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 43

Sequencing by ligation

Ref seq

CS Ref

CS Reads

CS consensus

BS consensusPolymorphism

Error

RE-sequencing

Higher accuracy in built-in error checking capabilitydiscrimiation between measurement errors and SNP

Page 44: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 44

Sequencing by hybridizationMicroarray – DNA chip

Hybridization

Probe

Page 45: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 45

Sequencing by hybridization

ACG TAC GGG CAT

GAT GTT CTA TTT

CGC CCC ATC GTA

ACT AAG AAA GCA

A C GC G C

G C AC A T

A T CA C G C A T C

A C GC G CG C AC A TA T C

ACGCATCACGCATC ACGCATC ACGCATC ACGCATC

ACGCATC ACGCATC ACGCATC ACGCATC

ACGCATC ACGCATC ACGCATC ACGCATC

ACGCATC ACGCATC ACGCATC ACGCATCACGCATC

ACGCATC

ACGCATC

ACGCATC

3. Spectrum1. DNA sample

4. Reconstruct the sequence

2. Hybridization

A C G C A T C

Drmanac R et al. Adv Biochem Eng Biotechnol. 2002

Page 46: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 46

Sequencing by hybridization

A C C G C C T C C AA C C

C C GC G C

G C CC C T

C T CT C C

C C A

A C C T C C G C C AA C C

C C TC T C

T C CC C G

C G CG C C

C C A

Problem: diferent sequences have the same spectrum

Page 47: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 47

Sequencing by hybridization

Oligomers in chip = 4 # bases 12 bases = 16,777,126 oligomers!(6,5 million )

Probe: 5-25 bases

Probe overlapEach base is read by multiple probes SNP

Not homogeneous hybridization conditions melting temparature depends strongly on the ratio on GC AT

Repeats

Page 48: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 48

Sequencing and gene expression

Although important goals of any sequencing project may be to obtain a genomic sequence and identify a complete set of genes, the ultimate goal is to gain an understanding of when, where, and how a gene is turned on, a process commonly referred to as gene expression.

Expression in normal circumstances

altered state (?)

Identify and study the protein(s) coded by a geneIdentify gene (Genome bioinformatics)

Page 49: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 49

EST

Expressed Sequence TagsPieces of DNA sequence Expressed gene

200 to 500 nt long

Cells, tissues, organsCertain conditions

5’EST coding proteinconserved species

3’EST non-coding (UTR)

Generated rapidly and inexpensively

Used in gene identificationHereditary diseases

Page 50: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 50

Redundancy at GenBank

Many sequences are represented more than once in GenBank

huge degrees of Redundancy

2003 RefSeq collection : curated secondary databasenon-redundtantselected organisms

•Genome DNA (assemblies)•Transcripts (RNA)•Protein

Page 51: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 51

RefSeq vs GenBank

GenBank RefSeqNot curated Curated

Author submits NCBI creates from existing data

Only author can revise NCBI reivses as new data emerge

Multiple records fro sam loci common Single records for each moleculer of major organisms

Records can contradict each other

No limit to species included Limitied to model organisms

Data exchange among INDSC members Exclusive NCBI database

Akin to primary literature Akin to review articles

Proteins identified and linked Proteins and transcripts identified and linked

Access via NCBI Nucleotide db Access via Nucl. and Protein db

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook

Page 52: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 52

Trace Archive

2001 NCBI and EMBL/ENSEMBLpurpose collect raw data at sequencing centers worldwidePERMANENT repository of single-pass reads

Data 22 trillion bytes in size (stack of CDs 10 stories high)keep on growing ...

Traces Pieces of a Puzzlebetween 300 and 1,000 DNA letters

vital hunt for polymorphisms in gene sequences linked to disease (human DNA)linked to virulence (viral DNA)

dbSNP : detailed info > 25 million SNPs

Insigths to the impact of genetic variation on health

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?

Page 53: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 54

Entrez

Page 54: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 55

Refining search resultshttp://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.section.EntrezHelp.Searching_Entrez_usihttp://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Searching_PubMed

Page 55: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 56

Limits

Refine search results retrieve only the most relevant documents

Allow restriction of a search to a defined subset of the database

Page 56: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 57

Refining search results

Page 57: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 58

Index

Alphabetical lists of terms from searchable database fields

Used to browse and/or select the terms by which records and/or data are described

Page 58: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 59

Refining search results

Page 59: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 60

Search Field Descriptions and Qualifiers

Index search field Qualifier

Accession [ACCN] or [ACCESSION]

All Fields [ALL] or [ALL FIELDS]

Author [AUTH] or [AUTHOR]

EC/RN Number [ECNO]

Feature Key [FKEY]

Filter [FILT] or [SB]

Gene Name [GENE]

Issue [ISS] or [ISSUE]

Keyword [KYWD] or [KEYWORD]

Journal Name [JOUR] or [JOURNAL]

Modification Date [MDAT]

Organism [ORGN] or [ORGANISM]

Page Number [PAGE]

Primary Accession [PACC]

Index search field Qualifier

Title [TITL]

Title/Abstract [TIAB]

Volume [VOL]

Entrez date [EDAT]

Journal title [TA]

Language [LA]

MeSH term [MH]

Properties [PROP]

Protein Name [PROT]

Publication Date [PDAT]

SeqID String [SQID]

Sequence Length [SLEN]

Substance Name [SUBS]

Text Word [WORD]

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.table.EntrezHelp.T7http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.section.pubmedhelp.Search_Field_Descrip

Page 60: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 61

Advanced search statements

term [field] OPERATOR term [field]

Find all human nucleotide sequences with D-loop annotations

Find Drosophila population studies published in the Journal of Molecular Evolution

D-loop[FKEY] AND human[ORGN] in Nucleotide database

Page 61: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 62

History

Provides a record of the searches performed during a search session.

Database specificLost after eight hours of inactivity

Used to review, revise, or combine the results of earlier searches.

Page 62: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 63

Combining results

Page 63: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 64

Query translation

Page 64: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 65

Details

Display your search strategy as translated using Entrez's search and syntax rules

Error messages, when applicable

Page 65: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 66

Author search

Page 66: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 67

Example - author

Page 67: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 68

Example - journal

Page 68: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 69

eUtils: Entrez Programming Utilities

•Tools that provide access to Entrez data outside of the regular web query interface.

• Set of 7 server-side programs

• Helpful for retrieving search results (manipulated in another environment)

• Perl, Python, Java, and C++

• Currently includes 35 databases

http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

ESearch

ESummary

EGQuery

EInfo

EFetch

ELink

EPost

Espell

Page 69: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 70

Uses

• Perform searches on large datasets• Implement data pipelines for genomic, proteomic, or

microarray analysis • Create automated searches to keep local databases current • Create and download customized datasets • Seamlessly combine local data with NCBI data • Develop a focused interface to NCBI data

URL Result(XML)

Page 70: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 71

Common Entrez Engine

Assemble a list of UIDs

ESearch (for a given db)

EGQuery (global version all db)

ESummary (for a list of UIDs)

Retrieve a brief summary record (DocSum)

Page 71: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 72

URL

http://www.ncbi.nlm.nih.gov/sites/gquery?term=cancer+stem+cells

[Base_URL] [Query] [DB][Eutils_URL]

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=taxonomy&id=9913&retmode=xml

[Base_URL] [Query][DB][Eutils_URL]

Page 72: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 73

URL: DB

[Base_URL] [Query][DB][Eutils_URL]eSearch =

Entrez Database E-Utility Database Name

3D Domains domains

Domains cdd

Genome genome

Nucleotide nucleotide

OMIM omim

PopSet popset

Protein protein

ProbeSet geo

PubMed pubmed

Structure structure

SNP snp

Taxonomy taxonomy

UniGene unigene

UniSTS unists

Each Entrez DB has an E-Utility name (used instead of its original name)

Page 73: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 74

URL: QueryEFetch

EGQuery Espell EInfo ESearch ESummary Tax

X

X

X

X

X

X

Seq ELink EPost

X

X

X

X

X

X

X

X

X

X

X

X

Lit

db X X

X

history X X X X X

WebEnv X X X X X

query_key X X X X X

X X X X X

X

X

X

X

X

X

X

X

X

term X X

id X X X

dbfrom

report

strand

seq_start

field X

reldate X

mindate X

maxdate X

datatype X

retstart X X X

X

X

X

retmax X X

retmode X

X

X X

rettype X

seq_stop

cmd

Page 74: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 75

Espell

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?db=pubmed&term=brest+cancer

Retrieves spelling suggestions when available

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?

Only PubMed

Page 75: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 76

EInfo

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed

Provides detailed information about a given database:term counts, last update and available links

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?

Page 76: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 77

EGQuery

Provides Entrez database counts in XML for a single search using GQuery

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=brca1+OR+brca2&rettype=html

Page 77: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 78

ESummary

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=11850928,11482001&retmode=xml

xml, ref, html, text, asn.1

Retrieves DocSums from a list of primary IDs

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?

Page 78: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 79

UIDs: Unique ID

Entrez Database Primary ID E-Utility Database Name

3D Domains 3D SDI domains

Domains PSSM-ID cdd

Genome Genome ID genome

Nucleotide GI number nucleotide

OMIM MIM number omim

PopSet Popset ID popset

Protein GI number protein

ProbeSet GEO ID geo

PubMed PMID pubmed

Structure MMDB ID structure

SNP SNP ID snp

Taxonomy TAXID taxonomy

UniGene UniGene ID unigene

UniSTS UniSTS ID unists

•Always integers

•Refers to a unique record in a given Entrez database

•Each Entrez DBs has an E-Utility name (used instead of its original name)

Page 79: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 80

ELink

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=protein&id=7140346

Existence of an external/Related Articles link from a list of UIDsRetrieves related IDs to a list of UIDs (same db, external db)

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?

Page 80: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 81

ELink

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10611131&retmode=ref&cmd=prlinks

Creates a hyperlink to the primary LinkOut provider for a specific IDLists LinkOut URLs and attributes for multiple IDs.

Page 81: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 82

Epost

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=pubmed&id=11237011

Returns a label (query_key) and an encoded server address (WebEnv) that corresponds to a UID list for subsequent search strategies

Optimal for large datasets (see Example)

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?

Page 82: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 83

ESearch

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=60&datetype=edat&retmax=100

Returns a list of matching UIDs (text search) in a given Entrez database

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

edat, mdat, dp

Page 83: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 84

EFetch

Generates formatted output for a list of input IDs: abstracts from PubMedFASTA format from Protein

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?

DBs:Literature Database

PubMed, Journals, PubMed Central, OMIM

Sequence and other Molecular Biology DatabasesNucleotide,Protein, Gene, etc.

Taxonomy

Page 84: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 85

Rettype

Rettype scope Description

count PubMed Hits counts

sort PubMed and gene

abastract PubMed

citation PubMed

medline PubMed

full PubMed

uilist all Default format for viewing hits

native all Default format for viewing sequences

fasta sequence FASTA view of a sequence

gb nucleotide GenBank view for sequences

est dbEST EST Report.

gp protein GenPept view

seqid sequence To convert list of gis into list of seqids.

acc sequence To convert list of gis into list of accessions

chr dbSNP only SNP Chromosome Report.

Page 85: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 86

EFetch - Literature

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345,9997&retmode=html&rettype=abstract

Page 86: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 87

EFetch - Sequences

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta

Strand 1(+), 2(-)

Page 87: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 88

Efetch - Taxonomy

http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=44689&report=docsum

uilist, brief, docsum, xmml

Page 88: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 89

Search in Journals for the term obstetrics:

In PubMed display PMIDs 12091962 and 9997 in html retrieval mode and abstract retrieval type:

From Entrez Gene display as xml the GenomeID 2:

To retrieve PubMed related articles for proteins 61742829 with a publication date from 1995 to the present:

Excercise

Page 89: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 90

Combining eUtils calls

The eUtils are useful when used by themselves in single URLs; however their full potential is reached when successive eUtils URLs are combined to create a data pipeline

• Retrieving data records matching an Entrez query

ESearch → ESummaryESearch → EFetch

• Finding IDs linked to records matching an Entrez query

ESearch → ELink

• Retrieving data records in database B linked to records in database A matching an Entrez query

ESearch → ELink → ESummaryESearch → ELink → EFetch

Page 90: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 91

a PERL example

TASK: Retrieve protein sequences of the factor IX in fasta format

my $Base_URL = "http://www.ncbi.nlm.nih.gov/entrez/eutils/" ;

my $esearch_URL = "esearch.fcgi?" ;

my $DB = "db=protein&";

my $Query = "term=factor ix human";

my $esearch_Parameters= "retmax=1&usehistory=y&";

my $E_search =

"$Base_URL$esearch_URL$DB$esearch_Parameters$Query";

http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&retmax=1&usehistory=y&term=factor ix human

ESearch → EFetch

Page 91: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 92

Output from ESearch

Page 92: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 93

QueryKey - WebEnv

$WebEnv: cookie value used with EFetch in place of primary ID result list

$QueryKey: value used for a history search number

Page 93: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 94

a PERL example

my $efetch_URL= "efetch.fcgi?";

my $efetch_Parameters =

"rettype=fasta&retmode=text&query_key=$QueryKey&WebEnv=$WebEnv";

my $E_fetch = "$Base_URL$efetch_URL$DB$efetch_Parameters" ;

http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&retmode=text&query_key=1&WebEnv=0ujfmXBW0U0hNr3FjaUutLkz1bR-NnJ9kp5vybL3u1AbTQdD7uMETHEtG5N@1EE047D172B3B8D0_0015SID

ESearch → EFetch

TASK: Retrieve protein sequences of the factor IX in fasta format

Page 94: Databases in bioinformatics II - Göteborgs universitetbio.lundberg.gu.se/courses/ht08/bio2/DB2_2008_toprint.pdfDatabases in bioinformatics 13 Sequencing methods 1954 Whitfeld PR

Databases in bioinformatics 95

Output from EFetch