XML for bioinformatics

<XML>Pierre Lindenbaum

http://plindenbaum.blogspot.com@yokofakun(http://twitter.com/yokofakun)

INSERM-UMR1087 NantesJanuary 2013

https://github.com/lindenb/courses/tree/master/about.xml

http://plindenbaum.blogspot.com

http://twitter.com/yokofakun

Extensible Markup Language

Machine Readeable

Human Readeable

DOM

... not alwaysartOfLineage></rdf:Description><rdf:Descriptionrdf:about="http://purl.uniprot.org/taxonomy/12292"><rdf:typerdf:resource="http://purl.uniprot.org/core/Taxon"/><rankrdf:resource="http://purl.uniprot.org/core/Species"/><reviewedrdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</reviewed><mnemonic>NVMV</mnemonic><scientificName>Nicotianavelutinamosaicvirus</scientificName><commonName>NvMV</commonName><hostrdf:resource="http://purl.uniprot.org/taxonomy/49454"/><rdfs:subClassOfrdf:resource="http://purl.uniprot.org/taxonomy/12429"/><partOfLineagerdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</partOfLineage></rdf:Description><rdf:Descriptionrdf:about="http://purl.uniprot.org/taxonomy/12439"><rdf:typerdf:resource="http://purl.uniprot.org/core/Taxon"/><rankrdf:resource="http://purl.uniprot.org/core/Species"/><scientificName>20SRNAreplicon</scientificName><rdfs:subClassOfrdf:resource="http://purl.uniprot.org/taxonomy/12429"/><partOfLineagerdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</partOfLineage></rdf:Description><rdf:Descriptionrdf:about="http://purl.uniprot.org/taxonomy/12440"><rdf:typerdf:resource="http://purl.uniprot.org/core/Taxon"/><rankrdf:resource="http://purl.uniprot.org/core/Species"/><reviewedrdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</reviewed><replacesrdf:resource="http://purl.uniprot.org/taxonomy/36457"/><replacesrdf:resource="http://purl.uniprot.org/taxonomy/12646"/><mnemonic>HSVAB</mnemonic><scientificName>Non-Anon-Bhepatitisvirus</scientificName><otherName>Non-A,non-Bhepatitisvirus</otherName><otherName>enterically-transmittednon-A,non-BhepatitisvirusET-NANBHV</otherName><otherName>non-A</otherName><otherName>non-A,non-BhepatitisvirusET-NANBHV</otherName><otherN

Just a format

*.txt

PMID- 16381885OWN - NLMSTAT- MEDLINEDA - 20051229DCOM- 20060228LR - 20091118IS - 1362-4962 (Electronic)IS - 0305-1048 (Linking)VI - 34IP - Database issueDP - 2006 Jan 1TI - From genomics to chemical genomics: new developments in KEGG.PG - D354-7AB - The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource (http://www.genome.jp/kegg/) provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps.AD - Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan. [email protected] - Kanehisa, MinoruAU - Kanehisa MFAU - Goto, SusumuAU - Goto SFAU - Hattori, MasahiroAU - Hattori MFAU - Aoki-Kinoshita, Kiyoko FAU - Aoki-Kinoshita KFFAU - Itoh, MasumiAU - Itoh MFAU - Kawashima, ShuichiAU - Kawashima SFAU - Katayama, ToshiakiAU - Katayama TFAU - Araki, MichihiroAU - Araki MFAU - Hirakawa, MikaAU - Hirakawa MLA - engPT - Journal ArticlePT - Research Support, Non-U.S. Gov'tPL - EnglandTA - Nucleic Acids ResJT - Nucleic acids researchJID - 0411011RN - 0 (Enzymes)RN - 0 (Ligands)RN - 0 (Pharmaceutical Preparations)SB - IMMH - *BiotransformationMH - Chemical PhenomenaMH - *ChemistryMH - *Databases, FactualMH - *Databases, GeneticMH - EnvironmentMH - Enzymes/chemistry/geneticsMH - *GenomicsMH - HumansMH - InternetMH - LigandsMH - Pharmaceutical Preparations/chemistry/classificationMH - Signal TransductionMH - Systems IntegrationMH - User-Computer InterfacePMC - PMC1347464OID - NLM: PMC1347464EDAT- 2005/12/31 09:00MHDA- 2006/03/01 09:00CRDT- 2005/12/31 09:00AID - 34/suppl_1/D354 [pii]AID - 10.1093/nar/gkj102 [doi]PST - ppublishSO - Nucleic Acids Res. 2006 Jan 1;34(Database issue):D354-7.

*.xml

<?xml version="1.0"?><!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2008//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_080101.dtd"><PubmedArticleSet><PubmedArticle> <MedlineCitation Status='MEDLINE' Owner='NLM'> <PMID Version='1'>16381885</PMID> <DateCreated> <Year>2005</Year> <Month>12</Month> <Day>29</Day> </DateCreated> <DateCompleted> <Year>2006</Year> <Month>02</Month> <Day>28</Day> </DateCompleted> <DateRevised> <Year>2009</Year> <Month>11</Month> <Day>18</Day> </DateRevised> <Article PubModel='Print'> <Journal> <ISSN IssnType='Electronic'>1362-4962</ISSN> <JournalIssue CitedMedium='Internet'> <Volume>34</Volume> <Issue>Database issue</Issue> <PubDate> <Year>2006</Year> <Month>Jan</Month> <Day>1</Day> </PubDate> </JournalIssue> <Title>Nucleic acids research</Title> <ISOAbbreviation>Nucleic Acids Res.</ISOAbbreviation> </Journal> <ArticleTitle>From genomics to chemical genomics: new developments in KEGG.</ArticleTitle> <Pagination> <MedlinePgn>D354-7</MedlinePgn> </Pagination> <Abstract> <AbstractText>The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource (http://www.genome.jp/kegg/) provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps.</AbstractText> </Abstract> <Affiliation>Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan. [email protected]</Affiliation> <AuthorList CompleteYN='Y'> <Author ValidYN='Y'> <LastName>Kanehisa</LastName> <ForeName>Minoru</ForeName> <Initials>M</Initials> </Author> <Author ValidYN='Y'> <LastName>Goto</LastName> <ForeName>Susumu</ForeName> <Initials>S</Initials> </Author> <Author ValidYN='Y'> <LastName>Hattori</LastName> <ForeName>Masahiro</ForeName> <Initials>M</Initials> </Author> <Author ValidYN='Y'> <LastName>Aoki-Kinoshita</LastName> <ForeName>Kiyoko F</ForeName> <Initials>KF</Initials> </Author> <Author ValidYN='Y'> <LastName>Itoh</LastName> <ForeName>Masumi</ForeName> <Initials>M</Initials> </Author> <Author ValidYN='Y'> <LastName>Kawashima</LastName> <ForeName>Shuichi</ForeName> <Initials>S</Initials> </Author> <Author ValidYN='Y'> <LastName>Katayama</LastName> <ForeName>Toshiaki</ForeName> <Initials>T</Initials> </Author> <Author ValidYN='Y'> <LastName>Araki</LastName> <ForeName>Michihiro</ForeName> <Initials>M</Initials> </Author> <Author ValidYN='Y'> <LastName>Hirakawa</LastName> <ForeName>Mika</ForeName> <Initials>M</Initials> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType>Journal Article</PublicationType> <PublicationType>Research Support, Non-U.S. Gov't</PublicationType> </PublicationTypeList> </Article> <MedlineJournalInfo> <Country>England</Country> <MedlineTA>Nucleic Acids Res</MedlineTA> <NlmUniqueID>0411011</NlmUniqueID> <ISSNLinking>0305-1048</ISSNLinking> </MedlineJournalInfo> <ChemicalList> <Chemical> <RegistryNumber>0</RegistryNumber> <NameOfSubstance>Enzymes</NameOfSubstance> </Chemical> <Chemical> <RegistryNumber>0</RegistryNumber> <NameOfSubstance>Ligands</NameOfSubstance> </Chemical> <Chemical> <RegistryNumber>0</RegistryNumber> <NameOfSubstance>Pharmaceutical Preparations</NameOfSubstance> </Chemical> </ChemicalList> <CitationSubset>IM</CitationSubset> <CommentsCorrectionsList> <CommentsCorrections RefType='Cites'> <RefSource>Nucleic Acids Res. 2001 Jan 1;29(1):22-8</RefSource> <PMID Version='1'>11125040</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>Nucleic Acids Res. 2002 Jan 1;30(1):42-6</RefSource> <PMID Version='1'>11752249</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>J Am Chem Soc. 2003 Oct 1;125(39):11853-65</RefSource> <PMID Version='1'>14505407</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>Bioinformatics. 1998;14(7):591-9</RefSource> <PMID Version='1'>9730924</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>J Am Chem Soc. 2004 Dec 22;126(50):16487-98</RefSource> <PMID Version='1'>15600352</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4</RefSource> <PMID Version='1'>15608248</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>Trends Genet. 1997 Sep;13(9):375-6</RefSource> <PMID Version='1'>9287494</PMID> </CommentsCorrections> <CommentsCorrections RefType='Cites'> <RefSource>Nucleic Acids Res. 2004 Jan 1;32(Database issue):D277-80</RefSource> <PMID Version='1'>14681412</PMID> </CommentsCorrections> </CommentsCorrectionsList> <MeshHeadingList> <MeshHeading> <DescriptorName MajorTopicYN='Y'>Biotransformation</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Chemical Phenomena</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='Y'>Chemistry</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='Y'>Databases, Factual</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='Y'>Databases, Genetic</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Environment</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Enzymes</DescriptorName> <QualifierName MajorTopicYN='N'>chemistry</QualifierName> <QualifierName MajorTopicYN='N'>genetics</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='Y'>Genomics</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Humans</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Internet</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Ligands</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Pharmaceutical Preparations</DescriptorName> <QualifierName MajorTopicYN='N'>chemistry</QualifierName> <QualifierName MajorTopicYN='N'>classification</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Signal Transduction</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>Systems Integration</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN='N'>User-Computer Interface</DescriptorName> </MeshHeading> </MeshHeadingList> <OtherID Source='NLM'>PMC1347464</OtherID> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus='pubmed'> <Year>2005</Year> <Month>12</Month> <Day>31</Day> <Hour>9</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus='medline'> <Year>2006</Year> <Month>3</Month> <Day>1</Day> <Hour>9</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus='entrez'> <Year>2005</Year> <Month>12</Month> <Day>31</Day> <Hour>9</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>ppublish</PublicationStatus> <ArticleIdList> <ArticleId IdType='pii'>34/suppl_1/D354</ArticleId> <ArticleId IdType='doi'>10.1093/nar/gkj102</ArticleId> <ArticleId IdType='pubmed'>16381885</ArticleId> <ArticleId IdType='pmc'>PMC1347464</ArticleId> </ArticleIdList> </PubmedData></PubmedArticle></PubmedArticleSet>

*.json

{ "header": { "type": "efetch.pubmed", "version": "0.3" }, "result": [ { "medlinecitation": { "pmid": { "version": "1", "value": "17284678" }, "datecreated": { "year": "2007", "month": "03", "day": "02" }, "datecompleted": { "year": "2007", "month": "04", "day": "05" }, "daterevised": { "year": "2009", "month": "11", "day": "18" }, "article": { "journal": { "issn": { "issntype": "Print", "value": "1088-9051" }, "journalissue": { "citedmedium": "Print", "volume": "17", "issue": "3", "pubdate": [ "2007", "Mar" ] }, "title": "Genome research", "isoabbreviation": "Genome Res." }, "articletitle": "Sequencing and analysis of chromosome 1 of Eimeria tenella reveals a unique segmental organization.", "pagination": [ "311-9" ], "abstract": { "abstracttexts": [ { "value": "Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation." } ] }, "affiliation": "Malaysia Genome Institute, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia.", "authorlist": [ { "completeyn": true, "type": "authors" }, { "validyn": true, "lastname": "Ling", "forename": "King-Hwa", "initials": "KH", "nameids": [ ] }, { "validyn": true, "lastname": "Rajandream", "forename": "Marie-Adele", "initials": "MA", "nameids": [ ] }, { "validyn": true, "lastname": "Rivailler", "forename": "Pierre", "initials": "P", "nameids": [ ] }, { "validyn": true, "lastname": "Ivens", "forename": "Alasdair", "initials": "A", "nameids": [ ] }, { "validyn": true, "lastname": "Yap", "forename": "Soon-Joo", "initials": "SJ", "nameids": [ ] }, { "validyn": true, "lastname": "Madeira", "forename": "Alda M B N", "initials": "AM", "nameids": [ ] }, { "validyn": true, "lastname": "Mungall", "forename": "Karen", "initials": "K", "nameids": [ ] }, { "validyn": true, "lastname": "Billington", "forename": "Karen", "initials": "K", "nameids": [ ] }, { "validyn": true, "lastname": "Yee", "forename": "Wai-Yan", "initials": "WY", "nameids": [ ] }, { "validyn": true, "lastname": "Bankier", "forename": "Alan T", "initials": "AT", "nameids": [ ] }, { "validyn": true, "lastname": "Carroll", "forename": "Fionnadh", "initials": "F", "nameids": [ ] }, { "validyn": true, "lastname": "Durham", "forename": "Alan M", "initials": "AM", "nameids": [ ] }, { "validyn": true, "lastname": "Peters", "forename": "Nicholas", "initials": "N", "nameids": [ ] }, { "validyn": true, "lastname": "Loo", "forename": "Shu-San", "initials": "SS", "nameids": [ ] }, { "validyn": true, "lastname": "Isa", "forename": "Mohd Noor Mat", "initials": "MN", "nameids": [ ] }, { "validyn": true, "lastname": "Novaes", "forename": "Jeniffer", "initials": "J", "nameids": [ ] }, { "validyn": true, "lastname": "Quail", "forename": "Michael", "initials": "M", "nameids": [ ] }, { "validyn": true, "lastname": "Rosli", "forename": "Rozita", "initials": "R", "nameids": [ ] }, { "validyn": true, "lastname": "Nor Shamsudin", "forename": "Mariana", "initials": "M", "nameids": [ ] }, { "validyn": true, "lastname": "Sobreira", "forename": "Tiago J P", "initials": "TJ", "nameids": [ ] }, { "validyn": true, "lastname": "Tivey", "forename": "Adrian R", "initials": "AR", "nameids": [ ] }, { "validyn": true, "lastname": "Wai", "forename": "Siew-Fun", "initials": "SF", "nameids": [ ] }, { "validyn": true, "lastname": "White", "forename": "Sarah", "initials": "S", "nameids": [ ] }, { "validyn": true, "lastname": "Wu", "forename": "Xikun", "initials": "X", "nameids": [ ] }, { "validyn": true, "lastname": "Kerhornou", "forename": "Arnaud", "initials": "A", "nameids": [ ] }, { "validyn": true, "lastname": "Blake", "forename": "Damer", "initials": "D", "nameids": [ ] }, { "validyn": true, "lastname": "Mohamed", "forename": "Rahmah", "initials": "R", "nameids": [ ] }, { "validyn": true, "lastname": "Shirley", "forename": "Martin", "initials": "M", "nameids": [ ] }, { "validyn": true, "lastname": "Gruber", "forename": "Arthur", "initials": "A", "nameids": [ ] }, { "validyn": true, "lastname": "Berriman", "forename": "Matthew", "initials": "M", "nameids": [ ] }, { "validyn": true, "lastname": "Tomley", "forename": "Fiona", "initials": "F", "nameids": [ ] }, { "validyn": true, "lastname": "Dear", "forename": "Paul H", "initials": "PH", "nameids": [ ] }, { "validyn": true, "lastname": "Wan", "forename": "Kiew-Lian", "initials": "KL", "nameids": [ ] } ], "grantlist": [ { "completeyn": true }, { "agency": "Wellcome Trust", "country": "United Kingdom" } ], "publicationtypelist": [ "Comparative Study", "Journal Article", "Research Support, Non-U.S. Gov't" ], "elocationids": [ ], "languages": [ "eng" ], "articledates": [ { "datetype": "Electronic", "year": "2007", "month": "02", "day": "06" } ] }, "medlinejournalinfo": { "country": "United States", "medlineta": "Genome Res", "nlmuniqueid": "9518021", "issnlinking": "1088-9051" }, "commentscorrectionslist": [ { "reftype": "Cites", "refsource": "Nucleic Acids Res. 1999 Jan 15;27(2):573-80", "pmid": { "version": "1", "value": "9862982" } }, { "reftype": "Cites", "refsource": "Nucleic Acids Res. 1997 Mar 1;25(5):955-64", "pmid": { "version": "1", "value": "9023104" } }, { "reftype": "Cites", "refsource": "Genome Res. 2000 Oct;10(10):1587-93", "pmid": { "version": "1", "value": "11042156" } }, { "reftype": "Cites", "refsource": "Genome Res. 2000 Nov;10(11):1737-42", "pmid": { "version": "1", "value": "11076859" } }, { "reftype": "Cites", "refsource": "Bioinformatics. 2000 Oct;16(10):944-5", "pmid": { "version": "1", "value": "11120685" } }, { "reftype": "Cites", "refsource": "Nature. 2001 Feb 15;409(6822):860-921", "pmid": { "version": "1", "value": "11237011" } }, { "reftype": "Cites", "refsource": "Nature. 2002 Jul 4;418(6893):79-85", "pmid": { "version": "1", "value": "12097910" } }, { "reftype": "Cites", "refsource": "Nature. 2002 Oct 3;419(6906):498-511", "pmid": { "version": "1", "value": "12368864" } }, { "reftype": "Cites", "refsource": "Nature. 2002 Oct 3;419(6906):527-31", "pmid": { "version": "1", "value": "12368867" } }, { "reftype": "Cites", "refsource": "Exp Parasitol. 2002 Jun-Jul;101(2-3):168-73", "pmid": { "version": "1", "value": "12427472" } }, { "reftype": "Cites", "refsource": "Nucleic Acids Res. 2003 Jan 1;31(1):439-41", "pmid": { "version": "1", "value": "12520045" } }, { "reftype": "Cites", "refsource": "Genome Res. 2003 Mar;13(3):443-54", "pmid": { "version": "1", "value": "12618375" } }, { "reftype": "Cites", "refsource": "Avian Pathol. 2003 Apr;32(2):115-27", "pmid": { "version": "1", "value": "12745365" } }, { "reftype": "Cites", "refsource": "Parasitol Res. 2003 Aug;90(6):473-5", "pmid": { "version": "1", "value": "12802683" } }, { "reftype": "Cites", "refsource": "Trends Parasitol. 2004 May;20(5):199-201", "pmid": { "version": "1", "value": "15105014" } }, { "reftype": "Cites", "refsource": "J Mol Biol. 2004 May 14;338(5):1027-36", "pmid": { "version": "1", "value": "15111065" } }, { "reftype": "Cites", "refsource": "BMC Bioinformatics. 2004 May 14;5:59", "pmid": { "version": "1", "value": "15144565" } }, { "reftype": "Cites", "refsource": "Bioinformatics. 2004 Nov 1;20(16):2878-9", "pmid": { "version": "1", "value": "15145805" } }, { "reftype": "Cites", "refsource": "Parasitol Today. 1991 May;7(5):99-105", "pmid": { "version": "1", "value": "15463458" } }, { "reftype": "Cites", "refsource": "Nature. 2005 May 5;435(7038):43-57", "pmid": { "version": "1", "value": "15875012" } }, { "reftype": "Cites", "refsource": "Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W116-20", "pmid": { "version": "1", "value": "15980438" } }, { "reftype": "Cites", "refsource": "Science. 2005 Jul 15;309(5733):416-22", "pmid": { "version": "1", "value": "16020726" } }, { "reftype": "Cites", "refsource": "Nat Genet. 2005 Sep;37(9):986-90", "pmid": { "version": "1", "value": "16086015" } }, { "reftype": "Cites", "refsource": "Chromosome Res. 2005;13(5):517-24", "pmid": { "version": "1", "value": "16132816" } }, { "reftype": "Cites", "refsource": "Nat Rev Genet. 2005 Oct;6(10):743-55", "pmid": { "version": "1", "value": "16205714" } }, { "reftype": "Cites", "refsource": "Bioinformatics. 2006 Feb 1;22(3):361-2", "pmid": { "version": "1", "value": "16332714" } }, { "reftype": "Cites", "refsource": "Mol Microbiol. 2006 Apr;60(1):5-15", "pmid": { "version": "1", "value": "16556216" } }, { "reftype": "Cites", "refsource": "Mol Biochem Parasitol. 1990 Jan 15;38(2):169-73", "pmid": { "version": "1", "value": "2325704" } }, { "reftype": "Cites", "refsource": "Parasite Immunol. 1986 Nov;8(6):529-39", "pmid": { "version": "1", "value": "3543808" } }, { "reftype": "Cites", "refsource": "Parasitol Res. 1994;80(5):366-73", "pmid": { "version": "1", "value": "7971922" } }, { "reftype": "Cites", "refsource": "Nucleic Acids Res. 1995 Dec 25;23(24):4992-9", "pmid": { "version": "1", "value": "8559656" } }, { "reftype": "Cites", "refsource": "Int J Parasitol. 1999 Dec;29(12):1885-92", "pmid": { "version": "1", "value": "10961844" } } ], "meshheadinglist": [ { "descriptorname": { "majortopicyn": false, "value": "Animals" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Base Sequence" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Chromosome Mapping" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Chromosome Structures" }, "qualifiernames": [ { "majortopicyn": true, "value": "genetics" } ] }, { "descriptorname": { "majortopicyn": false, "value": "Computational Biology" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Eimeria tenella" }, "qualifiernames": [ { "majortopicyn": true, "value": "genetics" } ] }, { "descriptorname": { "majortopicyn": false, "value": "Genes, Protozoan" }, "qualifiernames": [ { "majortopicyn": true, "value": "genetics" } ] }, { "descriptorname": { "majortopicyn": false, "value": "Minisatellite Repeats" }, "qualifiernames": [ { "majortopicyn": false, "value": "genetics" } ] }, { "descriptorname": { "majortopicyn": false, "value": "Molecular Sequence Data" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Polymorphism, Restriction Fragment Length" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Sequence Analysis, DNA" }, "qualifiernames": [ ] } ], "citationsubsets": [ "IM" ], "otherids": [ { "source": "NLM", "value": "PMC1800922" } ], "otherabstracts": [ ], "keywordlists": [ ], "spaceflightmissions": [ ], "generalnotes": [ ] }, "pubmeddata": { "history": [ { "pubstatus": "aheadofprint", "year": "2007", "month": "2", "day": "6" }, { "pubstatus": "pubmed", "year": "2007", "month": "2", "day": "8", "hour": "9", "minute": "0" }, { "pubstatus": "medline", "year": "2007", "month": "4", "day": "6", "hour": "9", "minute": "0" }, { "pubstatus": "entrez", "year": "2007", "month": "2", "day": "8", "hour": "9", "minute": "0" } ], "publicationstatus": "ppublish", "articleidlist": [ { "idtype": "pii", "value": "gr.5823007" }, { "idtype": "doi", "value": "10.1101/gr.5823007" }, { "idtype": "pubmed", "value": "17284678" }, { "idtype": "pmc", "value": "PMC1800922" } ] } }, { "medlinecitation": { "pmid": { "version": "1", "value": "9997" }, "datecreated": { "year": "1976", "month": "12", "day": "30" }, "datecompleted": { "year": "1976", "month": "12", "day": "30" }, "daterevised": { "year": "2003", "month": "11", "day": "14" }, "article": { "journal": { "issn": { "issntype": "Print", "value": "0006-3002" }, "journalissue": { "citedmedium": "Print", "volume": "446", "issue": "1", "pubdate": [ "1976", "Sep", "28" ] }, "title": "Biochimica et biophysica acta", "isoabbreviation": "Biochim. Biophys. Acta" }, "articletitle": "Magnetic studies of Chromatium flavocytochrome C552. A mechanism for heme-flavin interaction.", "pagination": [ "179-91" ], "abstract": { "abstracttexts": [ { "value": "Electron paramagnetic resonance and magnetic susceptibility studies of Chromatium flavocytochrome C552 and its diheme flavin-free subunit at temperatures below 45 degrees K are reported. The results show that in the intact protein and the subunit the two low-spin (S = 1/2) heme irons are distinguishable, giving rise to separate EPR signals. In the intact protein only, one of the heme irons exists in two different low spin environments in the pH range 5.5 to 10.5, while the other remains in a constant environment. Factors influencing the variable heme iron environment also influence flavin reactivity, indicating the existence of a mechanism for heme-flavin interaction." } ] }, "authorlist": [ { "completeyn": true, "type": "authors" }, { "validyn": true, "lastname": "Strekas", "forename": "T C", "initials": "TC", "nameids": [ ] } ], "publicationtypelist": [ "Journal Article" ], "elocationids": [ ], "languages": [ "eng" ], "articledates": [ ] }, "medlinejournalinfo": { "country": "NETHERLANDS", "medlineta": "Biochim Biophys Acta", "nlmuniqueid": "0217513", "issnlinking": "0006-3002" }, "chemicallist": [ { "registrynumber": "0", "nameofsubstance": "Cytochrome c Group" }, { "registrynumber": "0", "nameofsubstance": "Flavins" }, { "registrynumber": "14875-96-8", "nameofsubstance": "Heme" }, { "registrynumber": "7439-89-6", "nameofsubstance": "Iron" } ], "meshheadinglist": [ { "descriptorname": { "majortopicyn": false, "value": "Binding Sites" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Chromatium" }, "qualifiernames": [ { "majortopicyn": true, "value": "enzymology" } ] }, { "descriptorname": { "majortopicyn": true, "value": "Cytochrome c Group" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Electron Spin Resonance Spectroscopy" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Flavins" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Heme" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Hydrogen-Ion Concentration" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Iron" }, "qualifiernames": [ { "majortopicyn": false, "value": "analysis" } ] }, { "descriptorname": { "majortopicyn": false, "value": "Magnetics" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Oxidation-Reduction" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Protein Binding" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Protein Conformation" }, "qualifiernames": [ ] }, { "descriptorname": { "majortopicyn": false, "value": "Temperature" }, "qualifiernames": [ ] } ], "citationsubsets": [ "IM" ], "otherids": [ ], "otherabstracts": [ ], "keywordlists": [ ], "spaceflightmissions": [ ], "generalnotes": [ ] }, "pubmeddata": { "history": [ { "pubstatus": "pubmed", "year": "1976", "month": "9", "day": "28" }, { "pubstatus": "medline", "year": "1976", "month": "9", "day": "28", "hour": "0", "minute": "1" }, { "pubstatus": "entrez", "year": "1976", "month": "9", "day": "28", "hour": "0", "minute": "0" } ], "publicationstatus": "ppublish", "articleidlist": [ { "idtype": "pubmed", "value": "9997" } ] } } ] }

XML namespace

<my-database> <record> <title>Record1</title> <html> <head> <title>hello</title> </head> <body> <h1>Hello</h1> </body> </html> </record> </my-database>

<my-database xmlns="http://mydatabase.org" xmlns:h="http://www.w3.org/1999/xhtml"> <record> <title>Record1</title> <h:html> <h:head> <h:title>hello</title> </h:head> <h:body> <h:h1>Hello</h:h1> </h:body> </h:html> </record> </my-database>

xmllint

xsltproc

Parsing

DOM

Element root = document.getDocumentElement();for (Node item=root.getFirstChild(); item!=null; item=item.getNextSibling()){ if (item.getNodeType()==Node.ELEMENT_NODE) { System.out.println( ((Element)item).getAttribute("id")); }}

StAx

public interface XMLStreamReader { public int next(); public boolean hasNext() ; public String getText(); public String getLocalName(); public String getNamespaceURI(); // ...other methods not shown }

SAX

public interface ContentHandler { public void startDocument () ; public void endDocument(); public void startElement(String name, Attributes atts); public void endElement (String name); public void characters (char ch[], int start, int length) }

XPath

<?xml version="1.0" encoding="UTF-8"?><genes> <gene id="1"> <name>Gene1</name> <name>gene-1</name> <sequence>ATAATGCTAGCTAGCTATCGAATG</sequence> </gene> <gene id="2"> <name>Gene2</name> <name>gene-2</name> <sequence>AATTGCGATTCATCGATGCTATA</sequence> </gene></genes>

$ xmllint -xpath \ '/genes/gene[1]/name[2]/text()' \ genes1.xml gene-1

$ xmllint -xpath \ '/genes/gene[1]/name[2]' \ genes1.xml <name>gene-1</name>

$ xmllint -xpath \ 'count(/genes/gene)' \ genes1.xml 2

$ xmllint -xpath \ '/genes/gene[@id='2']/name[1]/text()' \ genes1.xml Gene2

XInclude

<?xml version="1.0" encoding="UTF-8"?><genes xmlns:xi="http://www.w3.org/2001/XInclude"> <gene id="1"> <name>Gene1</name> <name>gene-1</name> <sequence><xi:include href="sequence.txt" parse="text" /></sequence> </gene> <xi:include href="gene2.xml" parse="xml"/></genes>

XHTML

SVG

<svg xmlns="http://www.w3.org/2000/svg" width='300px' height='300px'>

<circle cx='120' cy='150' r='60' style='fill: gold;' />

<polyline points='120 30, 25 150, 290 150' stroke-width='4' stroke='brown' style='fill: none;' />

<polygon points='210 100, 210 200, 270 150' style='fill: lawngreen;' />

<text x='60' y='250' fill='blue'>Hello, World!</text>

</svg>

XSL-FO

<?xml version="1.0" encoding="ISO-8859-1"?>

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set> <fo:simple-page-master master-name="A4">  </fo:simple-page-master></fo:layout-master-set>

<fo:page-sequence master-reference="A4"> </fo:page-sequence>

</fo:root>

RDF

<?xml version="1.0" encoding="UTF-8"?><rdf:RDF (...)><rdf:Description rdf:about="http://…/isbn/2020386682"> <f:titre xml:lang="fr">Le palais des mirroirs</f:titre> <f:original rdf:resource="http://…/isbn/000651409X"/></rdf:Description></rdf:RDF>

RDF

SOAP

<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope (...)> <SOAP-ENV:Body> <r:queryPathwaysForReferenceIdentifiers> <r:referenceIdentifiers> <soapenc:string>Q9Y266</soapenc:string> <soapenc:string>P17480</soapenc:string> <soapenc:string>P2048</soapenc:string> </r:referenceIdentifiers> </r:queryPathwaysForReferenceIdentifiers> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

WSDL

(...) <wsdl:message name="getEvsData"> <wsdl:part element="tns:getEvsData" name="parameters"> </wsdl:part> </wsdl:message> <wsdl:message name="getEvsDataResponse"> <wsdl:part element="tns:getEvsDataResponse" name="parameters"> </wsdl:part> </wsdl:message> <wsdl:portType name="DataQuery"> <wsdl:operation name="getEvsData"> <wsdl:input message="tns:getEvsData" name="getEvsData"> </wsdl:input> <wsdl:output message="tns:getEvsDataResponse" name="getEvsDataResponse"> </wsdl:output> </wsdl:operation> </wsdl:portType> <wsdl:binding name="DataQueryServiceSoapBinding" type="tns:DataQuery"> <soap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http" /> <wsdl:operation name="getEvsData"> <soap:operation soapAction="" style="document" /> <wsdl:input name="getEvsData"> <soap:body use="literal" /> </wsdl:input> <wsdl:output name="getEvsDataResponse"> <soap:body use="literal" /> </wsdl:output> </wsdl:operation> </wsdl:binding>

WSDL

$ wsimport \ "http://evs.gs.washington.edu/wsEVS/EVSDataQueryService?wsdl"

parsing WSDL...Generating code...Compiling code...

WSDL

$ more ./edu/washington/gs/evs/webservice/Locus.java

package edu.washington.gs.evs.webservice;(...)@XmlAccessorType(XmlAccessType.FIELD)@XmlType(name = "locus", propOrder = { "geneName", "chromosome", "strand", "mrnaAccession", "geneId", "txStart", "txEnd", "keggPathwayIds"})public class Locus {

protected String geneName; protected String chromosome; protected String strand; protected String mrnaAccession; protected int geneId; protected int txStart; protected int txEnd; @XmlElement(nillable = true) (...)

Well formed..<a><b>c</a></b>

Validated (DTD)

$ cat genes1.dtd

<!ELEMENT genes (gene+)><!ELEMENT gene ((name+),sequence)><!ELEMENT name (#PCDATA)><!ELEMENT sequence (#PCDATA)><!ATTLIST gene id CDATA #REQUIRED>

$ xmllint --dtdvalid genes1.dtd genes1.xml

DTD/JAXB : no need to create a parser

$ xjc genes1.xsd $ xjc -dtd genes1.dtd parsing a schema...compiling a schema...generated/Gene.javagenerated/Genes.javagenerated/Name.javagenerated/ObjectFactory.java

Validated (XSD)

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

<xsd:complexType name="Genes"> <xsd:sequence> <xsd:element name="gene" type="Gene" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Gene"> <xsd:sequence> <xsd:element name="name" maxOccurs="unbounded" type="xsd:string"/> <xsd:element name="sequence" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="id" use="required" type="xsd:int"/> </xsd:complexType> <xsd:element type="Genes" name="genes"/> </xsd:schema>

Validated (XSD)

$ xmllint --noout \ --schema genes1.xsd \ genes1.xml genes1.xml validates

XSD/JAXB : no need to create a parser

$ xjc genes1.xsd parsing a schema...compiling a schema...generated/Gene.javagenerated/Genes.javagenerated/ObjectFactory.java

XSLT

XSLT (text)

<?xml version='1.0' encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' > <xsl:output method='text'/> <xsl:template match="/"> <xsl:apply-templates select="genes"/> </xsl:template> <xsl:template match="genes"> <xsl:apply-templates select="gene"/> </xsl:template> <xsl:template match="gene"> <xsl:text>>id:</xsl:text> <xsl:value-of select="@id"/> <xsl:text>|</xsl:text> <xsl:value-of select="name[1]"/> <xsl:text> </xsl:text> <xsl:value-of select="sequence"/> <xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet>

$ xsltproc genes2txt.xsl genes1.xml

>id:1|Gene1ATAATGCTAGCTAGCTATCGAATG>id:2|Gene2AATTGCGATTCATCGATGCTATA

XSLT (html)

<?xml version='1.0' encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' > <xsl:output method='html'/> <xsl:template match="/"> <html><body> <xsl:apply-templates select="genes"/> </body></html> </xsl:template> <xsl:template match="genes"> <h1> <xsl:value-of select="count(gene)"/> genes </h1> <xsl:apply-templates select="gene"/> </xsl:template> <xsl:template match="gene"> <h2> <xsl:text>>id:</xsl:text> <xsl:value-of select="@id"/> <xsl:text>|</xsl:text> <xsl:value-of select="name[1]"/> </h2> <pre> <xsl:value-of select="sequence"/> </pre> </xsl:template> </xsl:stylesheet>

$ xsltproc \ genes2html.xsl \ genes1.xml

<html><body><h1>2 genes</h1><h2>>id:1|Gene1</h2><pre>ATAATGCTAGCTAGCTATCGAATG</pre><h2>>id:2|Gene2</h2><pre>AATTGCGATTCATCGATGCTATA</pre></body></html>

XSLT Embedded<?xml-stylesheet type="text/xsl" href="genes2html.xsl"?>

XSLT (xml)<?xml version='1.0' encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns="http://www.w3.org/2000/svg" xmlns:math="http://exslt.org/math" version="1.0" > <xsl:output method='xml'/> <xsl:template match="/"> <svg width="500" height="500" version='1.0'> <xsl:apply-templates select="genes"/> </svg> </xsl:template> <xsl:template match="genes"> <xsl:apply-templates select="gene[1]"/> </xsl:template> <xsl:template match="gene"> <text x="250" y="250"> <xsl:value-of select="name[1]"/> </text> <xsl:call-template name="drawseq"> <xsl:with-param name="i" select="number(1.0)"/> <xsl:with-param name="s" select="sequence"/> </xsl:call-template> </xsl:template> <xsl:template name="drawseq"> <xsl:param name="i"/> <xsl:param name="s" /> <xsl:variable name="L" select="string-length($s)"/> <text> <xsl:variable name="angle" select="$i * ( (2.0*3.14159) div $L )"/> <xsl:attribute name="x"><xsl:value-of select="250+200*math:cos( $angle )"/></xsl:attribute> <xsl:attribute name="y"><xsl:value-of select="250+200*math:sin( $angle )"/></xsl:attribute> <xsl:value-of select="substring($s,$i,1)"/> </text> <xsl:if test="$i+1 <= $L"> <xsl:call-template name="drawseq"> <xsl:with-param name="i" select="1 + $i"/> <xsl:with-param name="s" select="$s"/> </xsl:call-template> </xsl:if> </xsl:template> </xsl:stylesheet>

END

Photos from wikipedia and W3C.

Education

XML for bioinformatics