80
NCBI Entrez Digital Tools and Utilities Jonathan A. Kans, Ph.D. Staff Scientist, NCBI [email protected] 1

Entrez Digital Tools and Utilities

Embed Size (px)

DESCRIPTION

A guide on how to use NCBI Entrez

Citation preview

Page 1: Entrez Digital Tools and Utilities

NCBI Entrez Digital Tools and Utilities

Jonathan A. Kans, Ph.D.Staff Scientist, [email protected]

1

Page 2: Entrez Digital Tools and Utilities

Topics

• Advanced Features of Entrez (to help separate the wheat from the chaff)

• Programmatic Access with EUtils (automate repeatable multi-step queries)

• EBot Generated Scripts (if you really don't want to write a program)

2

Page 3: Entrez Digital Tools and Utilities

Comparative Analysis

• Anatomy

• Physiology

• Biochemistry

• Gene Sequences

3

Page 4: Entrez Digital Tools and Utilities

Central Dogma of Molecular Biology

DNA(information)

RNA(expression)

Protein(function)

transcription(polymerase)

translation(ribosome)

mRNA

CDS

4

Page 5: Entrez Digital Tools and Utilities

Genetic Diseases

• Specific molecular defects explain disease

• β-globin gene and protein sequences ...ATGGTGCATCTGACTCCTGAGGAGAAG...AAGTATCACTAA... (M) V H L T P E E K ... K Y H (*)

• Sickle-cell anemia variant ...ATGGTGCATCTGACTCCTGTGGAGAAG...AAGTATCACTAA... (M) V H L T P V E K ... K Y H (*)

5

Page 6: Entrez Digital Tools and Utilities

Evolutionary Conservation3000 M yr

1000 M yr

500 M yr

HumanFlyWormYeastBacteria Mouse

Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642

Colon cancer gene sequence (DNA mismatch repair protein)

6

Page 7: Entrez Digital Tools and Utilities

Design of Entrez

Amino acid sequence similarity

Coding region

features

Literature citations in sequence

Literature citations in sequence

MEDLINE

Nucleotide Protein

Term frequency statistics

Nucleotide sequence similarity

7

Page 8: Entrez Digital Tools and Utilities

Entrez Databases

8

Page 9: Entrez Digital Tools and Utilities

PubMed Search

9

Page 10: Entrez Digital Tools and Utilities

PubMed Fields

10

Page 11: Entrez Digital Tools and Utilities

Advanced Search

11

Page 12: Entrez Digital Tools and Utilities

Field AbbreviationsAffiliation [AFFL] Issue [ISS]All Fields [ALL] Journal [JOUR]Author [AUTH] Language [LANG]Author - Corporate [COLN] Location ID [LID]Author - First [FAUT] MeSH Major Topic [MAJR]Author - Full [FULL] MeSH Subheading [SUBH]Author - Last [LAUT] MeSH Terms [MESH]Book [BOOK] Pagination [PAGE]Date - Completion [CDAT] Pharmacological Action [PAPX]Date - Create [CRDT] Publication Type [PTYP]Date - Entrez [EDAT] Publisher [PUBN]Date - MeSH [MHDA] Publisher ID [PID]Date - Modification [MDAT] Secondary Source ID [SI]Date - Publication [PDAT] Supplementary Concept [SUBS]EC/RN Number [ECNO] Text Word [WORD]Editor [ED] Title [TITL]Filter [FILT] Title/Abstract [TIAB]Grant Number [GRNT] Transliterated Title [TT]ISBN [ISBN] UID [UID]Investigator [INVR] Volume [VOL]Investigator - Full [FINV]

12

Page 13: Entrez Digital Tools and Utilities

MeSH CategoriesAnatomyOrganismsDiseasesChemicals and DrugsAnalytical, Diagnostic and Therapeutic Techniques and EquipmentPsychiatry and PsychologyPhenomena and ProcessesDisciplines and OccupationsAnthropology, Education, Sociology and Social PhenomenaTechnology, Industry, AgricultureHumanitiesInformation ScienceNamed GroupsHealth CarePublication CharacteristicsGeographicals

13

Page 14: Entrez Digital Tools and Utilities

Organism HierarchyEukaryota Alveolata Amoebozoa Animals Animal Population Groups Choradata Invertebrates Choanoflagellata Cryptophyta Diplomonadida Euglenozoa Fungi Haptophyta Mesomycetozoea Oxymonadida Parabasalidea Plants Retortamonadidae Rhizaria StramenopilesArchaeaBacteriaVirusesOther Forms

14

Page 15: Entrez Digital Tools and Utilities

Useful Querieshumans [MESH]pharmacokinetics [MESH]chemically induced [SUBH]all child [FILT]loprovflybase [FILT]randomized controlled trial [FILT]clinical trial, phase ii [PTYP]

mammalia [ORGN]mammalia [ORGN:noexp]cds [FKEY]lacz [GENE]beta galactosidase [PROT]biomol genomic [PROP]dbxref flybase [PROP]gbdiv phg [PROP]src cultivar [PROP]srcdb refseq validated [PROP]150:200 [SLEN]

15

Page 16: Entrez Digital Tools and Utilities

Structured Query

transposition [TITL] AND (protease OR peptidase) NOT humans [MESH]

16

Page 17: Entrez Digital Tools and Utilities

Using History

17

Page 18: Entrez Digital Tools and Utilities

History Results

18

Page 19: Entrez Digital Tools and Utilities

PubMed Record

19

Page 20: Entrez Digital Tools and Utilities

Neighbor Hyperlink

20

Page 21: Entrez Digital Tools and Utilities

Related Citations

21

Page 22: Entrez Digital Tools and Utilities

Relevant Publication

22

Page 23: Entrez Digital Tools and Utilities

Selecting Target

23

Page 24: Entrez Digital Tools and Utilities

GenBank Record

24

Page 25: Entrez Digital Tools and Utilities

Graphical View

25

Page 26: Entrez Digital Tools and Utilities

LOCUS HUMADH1CB 1400 bp mRNA PRI 15-JUN-1989DEFINITION Homo sapiens class I alcohol dehydrogenase (ADH1) alpha subunit mRNA, complete cds.ACCESSION M12271KEYWORDS alcohol dehydrogenase; dehydrogenase.SOURCE Human liver, cDNA to mRNA, clone pUCADH-alpha-15L. ORGANISM Homo sapiens Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Primates; Haplorhini; Catarrhini; Hominidae; Homo; sapiens.REFERENCE 1 (bases 1 to 1400) AUTHORS Ikuta,T., Szeto,S. and Yoshida,A. TITLE Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986) STANDARD full staff_reviewCOMMENT A draft entry and printed copy of the sequence in [1] were kindly provided by A.Yoshida, 30-MAY-1986. The other human class I ADH1 alpha subunit sequence is found under accession M11307.FEATURES Location/Qualifiers mRNA <1..1400 /note="ADH1 mRNA" CDS 16..1143 /note="alcohol dehydrogenase alpha subunit (EC 1.1.1.1)" /map="'4q21' /hgml_locus_uid='LJ0082S'" /gene="ADH1"BASE COUNT 400 a 294 c 340 g 366 tORIGIN 52 bp upstream of PvuII site; chromosome 4q21. 1 gaagacagaa tcaacatgag cacagcagga aaagtaatca aatgcaaagc agctgtgcta 61 tgggagttaa agaaaccctt ttccattgag gaggtggagg ttgcacctcc taaggcccat 121 gaagttcgta ttaagatggt ggctgtagga atctgtggca cagatgacca cgtggttagt 181 ggtaccatgg tgaccccact tcctgtgatt ttaggccatg aggcagccgg catcgtggag 241 agtgttggag aaggggtgac tacagtcaaa ccaggtgata aagtcatccc actcgctatt 301 cctcagtgtg gaaaatgcag aatttgtaaa aacccggaga gcaactactg cttgaaaaac 361 gatgtaagca atcctcaggg gaccctgcag gatggcacca gcaggttcac ctgcaggagg 421 aagcccatcc accacttcct tggcatcagc accttctcac agtacacagt ggtggatgaa 481 aatgcagtag ccaaaattga tgcagcctcg cctctagaga aagtctgtct cattggctgt 541 ggattttcaa ctggttatgg gtctgcagtc aatgttgcca aggtcacccc aggctctacc 601 tgtgctgtgt ttggcctggg aggggtcggc ctatctgcta ttatgggctg taaagcagct 661 ggggcagcca gaatcattgc ggtggacatc aacaaggaca aatttgcaaa ggccaaagag 721 ttgggggcca ctgaatgcat caaccctcaa gactacaaga aacccatcca ggaggtgcta

26

Page 27: Entrez Digital Tools and Utilities

ENTRY DEHUAA #Type ProteinTITLE Alcohol dehydrogenase alpha chain - Human #EC - number 1.1.1.1DATE 28-Dec-1987 #Sequence 28-Dec-1987 #Text 30-Sep-1989PLACEMENT 27.0 1.0 1.0 1.0 1.0 SOURCE Homo sapiens # Common-name manACCESSION A25428REFERENCE (Sequence translated from the mRNA sequence) #Authors Ikuta T., Szeto S., Yoshida A. #Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638 #Title Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence.GENETIC #Map-position 4q21-q25 #Name ADH1SUPERFAMILY #Name alcohol dehydrogenaseKEYWORDS oxidoreductase SUMMARY #Molecular-weight 39858 #Length 375 #Checksum 7545SEQUENCE 5 10 15 20 25 30 1 M S T A G K V I K C K A A V L W E L K K P F S I E E V E V A 31 P P K A H E V R I K M V A V G I C G T D D H V V S G T M V T 61 P L P V I L G H E A A G I V E S V G E G V T T V K P G D K V 91 I P L A I P Q C G K C R I C K N P E S N Y C L K N D V S N P 121 Q G T L Q D G T S R F T C R R K P I H H F L G I S T F S Q Y 151 T V V D E N A V A K I D A A S P L E K V C L I G C G F S T G 181 Y G S A V N V A K V T P G S T C A V F G L G G V G L S A I M 211 G C K A A G A A R I I A V D I N K D K F A K A K E L G A T E 241 C I N P Q D Y K K P I Q E V L K E M T D G G V D F S F E V I 271 G R L D T M M A S L L C C H E A C G T S V I V G V P P D S Q 301 N L S M N P M L L L T G R T W K G A I L G G F K S K E C V P 331 K L V A D F M A K K F S L D A L I T H V L P F E K I N E G F 361 D L L H S G K S I R T I L M F ///

27

Page 28: Entrez Digital Tools and Utilities

Same Publication?

JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986)

#Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638

28

Page 29: Entrez Digital Tools and Utilities

Exponential Growth

29

Page 30: Entrez Digital Tools and Utilities

Sequence Identifiers

Accession: AH006997GI Number: 6849043Accn.Ver: AH006997.2FASTA: >gi|6849043|gb|AH006997.2

30

Page 31: Entrez Digital Tools and Utilities

Sequence AssemblyNC_000022.9

NT_028395.3 NT_011519.10

AP000522.1

AP000523.1GATCTGATAAGTCCCAGGAC …

… TGGTATCCACCTGGGGCCTG …

join(gap(14430000),gi|89058412:1..647850,gap(150000),gi|29806588:1..3661581 …)

join(gi|5931500:1..37693,gi|5931501:2273..41306 …)

… …

… … …

31

Page 32: Entrez Digital Tools and Utilities

Features and Qualifiersgene 1..417 /gene="INS" /db_xref="GeneID:449570"CDS 60..392 /gene="INS" /codon_start=1 /product="proinsulin precursor" /protein_id="NP_001008996.1" /translation="MALWMRLLPLL ... YQLENYCN"sig_peptide 60..131 /gene="INS"mat_peptide 132..389 /gene="INS" /product="Insulin"

32

Page 33: Entrez Digital Tools and Utilities

Graphical Views

33

Page 34: Entrez Digital Tools and Utilities

Translation ValidationDNA ...cgaaaagGTGGTAGTGTAGGAGACGGTGAAGctaaga.../translation - V V * E T V KProtein M V V L E T E K

SEQ_FEAT_StartCodon SEQ_FEAT_MismatchAA

SEQ_FEAT_InternalStop SEQ_FEAT_NotSpliceConsensusDonor

34

Page 35: Entrez Digital Tools and Utilities

Alignments

• Describe relationships between sequences

• Can reflect evolutionary conservation, structural similarity, functional similarity

• Can be generated algorithmically (e.g., BLAST) or manually

MRLTLLC-------EGEEGSELPLCASCGQRIELKYKPECYPDVKNSLHVMRLTLLCCTWREERMGEEGSELPVCASCGQRLELKYKPECFPDVKNSIHAMRLTCLCRTWREERMGEEGSEIPVCASCGQRIELKYKPE-----------

35

Page 36: Entrez Digital Tools and Utilities

Original Databases

Amino acid sequence similarity

Coding region

features

Literature citations in sequence

Literature citations in sequence

MEDLINE

Nucleotide Protein

Term frequency statistics

Nucleotide sequence similarity

36

Page 37: Entrez Digital Tools and Utilities

Discovery Space

Nucleotide sequences

Protein sequences

Taxon

Phylogeny 3-D Structure

MMDB

3 -D Structure

PubMed abstracts

Complete Genomes

PubMed Entrez Genomes

Publishers Genome Centers

37

Page 38: Entrez Digital Tools and Utilities

Data Integration

38

Page 39: Entrez Digital Tools and Utilities

Leveraging ResourcesGenBank

RefSeq

Human Genome

Bacterial Genome

Virus Genome

MMDB

PubMed

UniGene(s)

LocusLink

OMIM

Taxonomy

GEO

PopSet

BLAST

Entrez

ePCR

Sequin

39

Page 40: Entrez Digital Tools and Utilities

Entrez Utilities• EInfo

• ESearch

• ESummary

• EFetch

• ELink

• EPost

40

Page 41: Entrez Digital Tools and Utilities

EUtils Base URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/program.fcgi?arguments

41

Page 42: Entrez Digital Tools and Utilities

EUtils Argumentsdb pubmed | nucleotide | protein

term transposition+AND+(protease+OR+peptidase)id 172344,U54439.1

rettype abstract | acc | seqid | gb | fasta | countretmode text | xml | asn.1retstartretmax

datetype mdat | pdat | edatreldate 60

dbfrom pubmed | nucleotide | proteincmd neighborlinkname gene_snp_genegenotype

usehistory yWebEnv NCID_1_216999436_130...086_61936294query_key 1

version 2.0tool

42

Page 43: Entrez Digital Tools and Utilities

rettype=abstract1. Mol Microbiol. 2012 Feb;83(4):805-20.

Separate structural and functional domains of Tn4430 transposasecontribute to target immunity.

Lambin M, Nicolas E, Oger CA, Nguyen N, Prozzi D, Hallet B.

GSK Biologicals, Rue Flemming, 20, 1300 Wavre, [email protected]

Like other transposons of the Tn3 family, Tn4430 exhibits targetimmunity, a process that prevents multiple insertions of thetransposon into the same DNA molecule. Immunity is conferred bythe terminal inverted repeats of the transposon and is specificto each element of the family, indicating that the transposase...transposition. One class of mutations was found to stimulatetransposition, whereas other mutations appeared to reduce TnpAactivity. The data are discussed with respect to alternativemodels in which TnpA acts as a specific determinant to bothestablish and respond to immunity.

PMID: 22624153 [PubMed - indexed for MEDLINE]

43

Page 44: Entrez Digital Tools and Utilities

rettype=medlinePMID- 22624153OWN - NLMSTAT- MEDLINEDA - 20120523DCOM- 20120529IS - 1365-2958 (Electronic)IS - 0950-382X (Linking)VI - 83IP - 4DP - 2012 FebTI - Separate structural and functional domains of Tn4430 transposase contribute to target immunity.PG - 805-20AB - Like other transposons of the Tn3 family, Tn4430 exhibits target immunity, a process that prevents multiple insertions of the ...AD - GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium. [email protected] - 10.1111/j.1365-2958.2012.07967.x [doi]PST - ppublishSO - Mol Microbiol. 2012 Feb;83(4):805-20.

44

Page 45: Entrez Digital Tools and Utilities

EInfo URLs

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed

45

Page 46: Entrez Digital Tools and Utilities

curl Command in Terminal

https://itservices.stanford.edu/service/sharedcomputing/loggingin

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"

46

Page 47: Entrez Digital Tools and Utilities

Entrez Databases

<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...> <eInfoResult> <DbList> <DbName>pubmed</DbName> <DbName>protein</DbName> <DbName>nuccore</DbName> <DbName>nucleotide</DbName> <DbName>nucgss</DbName> <DbName>nucest</DbName> <DbName>structure</DbName> <DbName>genome</DbName> ...

47

Page 48: Entrez Digital Tools and Utilities

PubMed Fields<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <Description>PubMed bibliographic record</Description> <Count>22006701</Count> <LastUpdate>2012/08/04 03:30</LastUpdate> <FieldList> ... <Field> <Name>TIAB</Name> <FullName>Title/Abstract</FullName> <Description>Free text associated with Abstract/Title</Description> <TermCount>38990504</TermCount> <IsDate>N</IsDate> <IsNumerical>N</IsNumerical> <SingleToken>N</SingleToken> <Hierarchy>N</Hierarchy> <IsHidden>N</IsHidden> </Field> ...

48

Page 49: Entrez Digital Tools and Utilities

PubMed Links<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> ... <LinkList> ... <Link> <Name>pubmed_pubmed</Name> <Menu>Related Citations</Menu> <Description>Calculated set of PubMed ...</Description> <DbTo>pubmed</DbTo> </Link> ... <Link> <Name>pubmed_structure</Name> <Menu>Structure Links</Menu> <Description>Three-dimensional structure ...</Description> <DbTo>structure</DbTo> </Link> ...

49

Page 50: Entrez Digital Tools and Utilities

ESearch URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity

50

Page 51: Entrez Digital Tools and Utilities

ESummary URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&id=2539356

51

Page 52: Entrez Digital Tools and Utilities

EFetch URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=2539356

52

Page 53: Entrez Digital Tools and Utilities

ELink URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&db=pubmed&cmd=neighbor&linkname=pubmed_pubmed&

id=2539356

53

Page 54: Entrez Digital Tools and Utilities

curl GET and POST

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity"

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"-d "db=pubmed&id=22624153,22555593,22253773,21729108,..."

54

Page 55: Entrez Digital Tools and Utilities

Cluttered Result<?xml version="1.0" ?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN""http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd"><eSearchResult><Count>94</Count><RetMax>20</RetMax><RetStart>0</RetStart><IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> <Id>19431236</Id> <Id>19237527</Id> <Id>19188259</Id> <Id>19144000</Id> <Id>19120617</Id> <Id>18931389</Id> <Id>18838147</Id> <Id>18396069</Id> <Id>17966893</Id> <Id>17709741</Id> </IdList><TranslationSet><Translation> <From>immunity</From> <To>"immunity"[MeSH Terms] OR "immunity"[All Fields]</To> </Translation></TranslationSet><TranslationStack> <TermSet> <Term>transposition[All Fields]</Term> <Field>All Fields</Field> <Count>19362</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[MeSH Terms]</Term> <Field>MeSH Terms</Field> <Count>252127</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[All Fields]</Term> <Field>All Fields</Field> <Count>189033</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP> <OP>GROUP</OP> <OP>AND</OP> <OP>GROUP</OP> </TranslationStack><QueryTranslation>transposition[All Fields] AND ("immunity"[MeSH Terms] OR "immunity"[All Fields])</QueryTranslation></eSearchResult>

55

Page 56: Entrez Digital Tools and Utilities

Cleaned for Parsing<?xml version="1.0"?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD...> <eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> ...

56

Page 57: Entrez Digital Tools and Utilities

Reformat XML

xmllint --format -

...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id><Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>

...

... <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>

...

57

Page 58: Entrez Digital Tools and Utilities

Extract ID Numbers

perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g'

...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id>

...

226241532255559322253773

...

58

Page 59: Entrez Digital Tools and Utilities

Remove Blank Lines

grep [0-9]

226241532255559322253773

...

2262415322555593

22253773...

59

Page 60: Entrez Digital Tools and Utilities

UNIX Pipes

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" \-d "db=pubmed&term=transposition+immunity" | \

xmllint --format - | \perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | \

grep [0-9]

60

Page 61: Entrez Digital Tools and Utilities

Resulting List of IDs

22624153225555932225377321729108216952522134731220603074204814922000459019464182...

61

Page 62: Entrez Digital Tools and Utilities

UNIX Shell Script#!/bin/sh

encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')

base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'suffix="&rettype=xml&retmax=200"if [ -n "$3" ]; thensuffix="&rettype=xml&retmax=200&reldate=$3"fi

res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded$suffix"`

flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`

for uid in $fltdoecho "$uid"done

./esrch.sh pubmed "transposition immunity Tn3" 365

62

Page 63: Entrez Digital Tools and Utilities

ESearch -> ESummary

#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`

for uid in $fltdores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done

63

Page 64: Entrez Digital Tools and Utilities

ESearch -> IDs

#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`for uid in $fltdoecho "$uid"done

64

Page 65: Entrez Digital Tools and Utilities

IDs -> ESummary

#!/bin/shbase='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'while read uid; dores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done

./esrch.sh pubmed "transposition immunity" | ./esmry.sh pubmed

65

Page 66: Entrez Digital Tools and Utilities

IDs -> E-Mail Notification

#!/bin/shwhile read uid; doecho $uid | mail -s "$1" "$2"done

./esrch.sh pubmed "Competitor JQ [AUTH]" 30 | \

./eping.sh "Read this new publication" "[email protected]"

66

Page 67: Entrez Digital Tools and Utilities

Document Summaries<eSummaryResult> <DocumentSummarySet status="OK"> <DocumentSummary uid="22624153"> <PubDate>2012 Feb</PubDate> <EPubDate/> <Source>Mol Microbiol</Source> <Authors> <Author> <Name>Lambin M</Name> <AuthType> Author </AuthType> <ClusterID>0</ClusterID> </Author> <Author> <Name>Nicolas E</Name> <AuthType> Author </AuthType>

67

Page 68: Entrez Digital Tools and Utilities

Use Historycurl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&term=transposition+immunity&usehistory=y"

<eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <QueryKey>1</QueryKey> <WebEnv>NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511</WebEnv> <IdList> <Id>22624153</Id> <Id>22555593</Id> ...

68

Page 69: Entrez Digital Tools and Utilities

WebEnv and query_key

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&query_key=1&

WebEnv=NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511"

69

Page 70: Entrez Digital Tools and Utilities

PERL Script#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";$output = get($url);

$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);$key = $1 if ($output =~ /<QueryKey>(\S+)<\/QueryKey>/);

$url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web";$url .= "&rettype=fasta&retmode=text";$data = get($url);

print "$data";

close (STDOUT);

./efaftch.pl nucleotide M65061+OR+U54469

70

Page 71: Entrez Digital Tools and Utilities

ESearch -> XML#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$days = shift or "";

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";if ( $days ne "" ) { $url .= "&reldate=$days";}

$output = get($url);

print "$output";

close (STDOUT);

71

Page 72: Entrez Digital Tools and Utilities

XML -> EFetch [1]#!/usr/bin/perluse LWP::Simple;

$dbase = shift or die "Must supply database on command line\n";$type = shift or die "Must supply rettype on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

while ($thisline = <STDIN>) { $thisline =~ s/\r//; $thisline =~ s/\n//; $web = $1 if ($thisline =~ /<WebEnv>(\S+)<\/WebEnv>/); $key = $1 if ($thisline =~ /<QueryKey>(\S+)<\/QueryKey>/); $num = $1 if ($thisline =~ /<Count>(\S+)<\/Count>/);}

...

72

Page 73: Entrez Digital Tools and Utilities

XML -> EFetch [2]...

$start = 0;$chunk = 500;

while ( $num > 0 ) { $url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web"; $url .= "&retstart=$start&retmax=$chunk&rettype=$type&retmode=text";

$data = get($url);

print "$data";

$start += $chunk; $num -= $chunk;

sleep 1;}

close (STDIN);close (STDOUT);

./esrch.pl nucleotide 1322283 | ./eftch.pl nucleotide fasta

73

Page 74: Entrez Digital Tools and Utilities

EBot

74

Page 75: Entrez Digital Tools and Utilities

Text Query

75

Page 76: Entrez Digital Tools and Utilities

Second Step

76

Page 77: Entrez Digital Tools and Utilities

Output Format

77

Page 78: Entrez Digital Tools and Utilities

Generate Script

78

Page 79: Entrez Digital Tools and Utilities

EBot ResultDEFINITION alcohol dehydrogenase [Cyberlindnera jadinii].ACCESSION BAM34535VERSION BAM34535.1 GI:398298384DBSOURCE accession AB649224.1KEYWORDS .SOURCE Cyberlindnera jadinii ORGANISM Cyberlindnera jadinii Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Phaffomycetaceae; Cyberlindnera.REFERENCE 1 AUTHORS Tamakawa,H., Tomita,Y., Yokoyama,A., Konoeda,Y. and Yoshida,S....FEATURES Location/Qualifiers source 1..348 /organism="Cyberlindnera jadinii" /strain="NBRC0988" /db_xref="taxon:4903" /note="anamorph: Candida utilis" Protein 1..348 /product="alcohol dehydrogenase" CDS 1..348 /gene="ADH1" /coded_by="AB649224.1:1..1047"ORIGIN 1 msipktqkgv ifyenggple ykdipvptpk pneilvnvky sgvchtdlha wkgdwplpvk 61 lplvgghega gvvvakgsev knfeigdyag ikwlngscms cefceksfea ncpkadlsgy 121 thdgsfqqya tadavqaaki skgtdlaeia pilcagvtvy kalktadlep gewvaisgag 181 gglgslaiqf akamglrvla idggddkkql cqelgaevfi dftktkdivk siqdatnggp 241 hgvinvsvse kaieqsteyv rncgtvvlvg lpagavaraq vfaavvksis vkgsyvgnra 301 dtreaidffe rglvkapiki vglselpevy klmeegkilg ryvvdtsk//

LOCUS EJF61282 496 aa linear PLN 12-JUL-2012DEFINITION alcohol dehydrogenase [Dichomitus squalens LYAD-421 SS1].ACCESSION EJF61282VERSION EJF61282.1 GI:395328892DBSOURCE accession JH719411.1...

79

Page 80: Entrez Digital Tools and Utilities

• Entrez Programming Utilities Help

• EBot

• MeSH Browser

References

http://www.ncbi.nlm.nih.gov/books/NBK25501/

http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi

http://www.nlm.nih.gov/mesh/MBrowser.html

80