1
1 Squires RB, et al. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir Viruses. 2012, 6(6):404-16. 2 Noronha JM, et al. Influenza Sequence Feature Variant Type (Flu-SFVT) analysis: evidence for a role of NS1 in influenza host range restriction. J Virol. 2012, 86(10):5857-66. 3 CDC. http://www.cdc.gov/flu/pdf/avianflu/h5n1-inventory.pdf 4 Burke DF, et al. A recommended numbering scheme for influenza A HA subtypes. PLoS One. 2014, 9(11):e112302. References Overview Influenza Research Database (IRD, www.fludb.org ), funded by the National Institute of Allergy and Infectious Diseases, serves as a single publicly-accessible repository of integrated datasets and analysis tools for influenza virus research 1 . IRD Integrates Data from External Sources and Generates Novel Data from Internal Computational Pipelines IRD Provides Analysis and Visualization Tools BLAST Sequence Similarity Search Multiple Sequence Alignment Phylogenetics in Super-Computing Environment Sequence Variation (SNP) Analysis Metadata-driven Comparative Genomics Analysis Sequence Feature Variant Type (SFVT) Analysis 3D Protein Structure Visualization Host Factor Enrichment Analysis Short Peptide Search PCR Primer Design Genome Annotation including SF Annotation HPAI H5N1 & Swine H1 Clade Classifications HA Subtype Numbering Conversion Batch sequence submission to GenBank IRD Provides Personal Workbench for Data Storage & Sharing Figure 1. Search sequences based on swine H1 clade(s) from the Swine H1 Clade Sequence Search page. H1 & H5 Clade Classifications We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing IRD, which has been wholly supported by the NIH/NIAID (No. HHSN272201400028C). Conflict of interest: None declared. Acknowledgements Sequence Feature Phenotypic Variant Type Figure 2. Strain Details page shows that the HA sequence of the A/Jiangsu/4/2007 strain carries the 110N PVT substitution that has been shown to increase binding to alpha 2-6 receptor in the publication cited. Figure 3. HA Subtype Numbering Conversion Result page showing the coordinates of user-provided HA protein sequence are converted to the coordinates of other HA subtypes. HA Subtype Numbering Conversion Custom Metadata Capturing Provides comprehensive enriched influenza virus sequence annotations Supports custom sequence annotation, analysis, and visualization Conclusion Novel Sequence Annotation and Analysis Tools in the Influenza Research Database (IRD) Yun Zhang 1 , Alexandra J. Lee 1 , Catherine Macken 2 , Tavis Anderson 3 , Amy Vincent 3 , David Burke 4 , Brian Aevermann 1 , Douglas S. Greer 1 , Lucy Stewart 1 , Brian Reardon 1 , Sherry He 5 , Lei Tong 5 , Sanjeev Kumar 5 , Zhiping Gu 5 , Christopher N. Larson 6 , Guangyu Sun 6 , Sam Zaremba 5 , Edward B. Klem 5 , Richard H. Scheuermann 1,7 1 J. Craig Venter Institute, La Jolla, CA, USA; 2 University of Auckland, Auckland, New Zealand; 3 U.S. Department of Agriculture, Ames, IA, USA; 4 University of Cambridge, UK; 5 Northrop Grumman Health Solutions, Rockville MD, USA; 6 Vecna Technologies, Greenbelt MD, USA 7 Department of Pathology, University of California, San Diego, CA, USA www.fludb.org User-uploaded sequences Figure 4. A phylogenetic tree constructed from a combination of user-provided (downloaded from GISAID) and IRD sequences. Tree leaves colored-coded by subtype. User-provided sequences are highlighted in green. New utility for capturing user-provided sequence associated metadata Analyze and visualize user-provided sequence data and metadata along with IRD data using any IRD tools Influenza Strain Details for A/duck/Vietnam/LBM568/2014(H5N1) Strain Information Strain Name A/duck/Vietnam/LBM568/2014 Organism Name Influenza A Virus Subtype H5N1 Host IRD:Mallard/Avian GenBank:Anas platyrhynchos var.domestica 2009 Pandemic H1N1like (SOP)? Negative Isolation Country Viet Nam Collection Date 01/08/2014 GenBank Submission Date 08/07/2014 NCBI Taxon ID 1518578 Complete Genome Set Yes Sequence Derived Phenotype Marker alpha26 conferred increased binding to alpha26 without loss of binding to alpha23 by comparing HA activities using enzymatically modified chicken RBCs. HA Influenza A_H5_determinantof virulence_171(3)_171N, 172A, 239N_Decreasedvirulence 171N, 172A, 239N No Introduction of Ser171Asn, Thr172Ala, Ser239Asn substitutions in the A/Vietnam/1203/2004 backbone conferred increased affinity for alpha26SAL using solid phase assay. The mutant virus showed 100 fold reduction in the lethality of WT. PubMed: 19116267 HA Influenza A_H5_species adaptation_171(3)_171N, 172A, 239N_Increasedbindingtoalpha26 171N, 172A, 239N No Introduction of Ser171Asn, Thr172Ala, Ser239Asn substitutions in the A/Vietnam/1203/2004 backbone conferred increased affinity for alpha26SAL using solid phase assay. The mutant virus showed 100 fold reduction in the lethality of WT. PubMed: 19116267 HA Influenza A_H5_species adaptation_172(1)_172A_Increasedbindingto alpha26 172A Yes Introduction of Thr172Ala naturally occurring substitution in the A/Vietnam/1203/2004 backbone conferred increased binding to alpha26 without loss of binding to alpha23 by comparing HA activities using enzymatically modified chicken RBCs. PubMed: 20427525 HA Influenza A_H5_species 172A, 238L No Introduction of Thr172Ala, Gln238Leu naturally PubMed: 20427525 Curated Sequence Features 2 including phenotype markers in the CDC H5 Genetic Changes Inventory 3 Computed Variant Types of each Sequence Feature Annotated all IRD sequences with the presence/absence of Phenotypic Variant Types (PVT) PVT annotation tool for user-provided sequences Based on the HA subtype numbering scheme by Burke and Smith (2014) 4 Automatically convert the coordinates of any HA protein sequences to coordinates of any other subtypes, in order to map functional domains or phenotype markers across subtypes. Integrated with IRD analysis tools including Sequence Variation Analysis and metadata-driven Comparative Analysis Tool for Sequences (meta-CATS) Swine H1 classification algorithm based on the USDA/ OFFLU swine H1 classification scheme H5 classification algorithm based on the CDC/WHO HPAI H5N1 classification scheme Annotated all IRD sequences with H1/H5 clade assignments H1 & H5 clade classification tools for user-provided sequences Data Aggregated by IRD (Source) Strains (GenBank) 106,760 Segment Sequences (GenBank) 442,929 Proteins (GenBank and UniProt) 705,701 3D Protein Structures (PDB) 662 Experimentally Determined Epitopes (IEDB) 6,304 Data Directly Submitted to IRD (Source) Surveillance Records (NIAID CEIRS) 629,403 Serology Data Records (NIAID CEIRS) 35,584 Human Samples with Clinical Metadata (NIAID GSCID) 736,576 Host Factor Experiments (NIAID Systems Biology) 57 Host Factor Data (ViPR Driving Biological Projects) coming soon Data Derived/Annotated by IRD Sequence Features 3,482 Proteins with Predicted Epitopes 616,961 Proteins with Pfam Domains 662,132 Proteins with Other Domains/Motifs 442,032 Proteins with GO IDs 508,568 Segments with Pre-computed Alignments 425,864 Strains with Predicted pH1N1 Classification 44,918 Strains with Predicted H5 Clade Classification 7,028 Antiviral Drugs 70 DATA TO RETURN Segment / Nucleotide Protein Strain SELECT CLADE(S) Include Partial Sequences Complete Segments Only Complete Genomes only SELECT SEGMENTS COMPLETE SEQUENCES DATE RANGE From: YYYY To: YYYY To add month to search, see HOST GEOGRAPHIC GROUPING COUNTRY Results matching your criteria: 904 Swine H1 Clade Sequence Search An IRD algorithm classifies the clade of the HA of H1 viruses, from any host and for any NA subtype, with reference to the USDA classification of US swine H1 viruses. This algorithm, which is based on phylogenetic analysis, is an adaptation of that used for classifying HA(H5) sequences; it was developed by IRD team member Catherine Macken, in conjunction with Tavis Anderson and other swine influenza experts at the USDA. It has been verified as highly accurate (> 99%) for sequences of at least 300 nucleotides of HA1. See SOP for more details. Those HA's not belonging to any of the recognized US swine H1 clades are given the classification "Other." Nonsegment 4 sequences from a virus with a US swine H1 classification are given the same assignment as the HA. Representative tree of swine HA(H1) sequences showing named US swine clades Description of clades with name that include "like"

Novel Sequence Annotation and Analysis Tools in the ... · • Computed Variant Types of each Sequence Feature • Annotated all IRD sequences with the presence/absence of Phenotypic

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Novel Sequence Annotation and Analysis Tools in the ... · • Computed Variant Types of each Sequence Feature • Annotated all IRD sequences with the presence/absence of Phenotypic

1Squires RB, et al. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir Viruses. 2012, 6(6):404-16.

2Noronha JM, et al. Influenza Sequence Feature Variant Type (Flu-SFVT) analysis: evidence for a role of NS1 in influenza host range restriction. J Virol. 2012, 86(10):5857-66.

3CDC. http://www.cdc.gov/flu/pdf/avianflu/h5n1-inventory.pdf 4Burke DF, et al. A recommended numbering scheme for influenza A HA subtypes. PLoS One. 2014, 9(11):e112302.

References

Overview

Influenza Research Database (IRD, www.fludb.org), funded by the National Institute of Allergy and Infectious Diseases, serves as a single publicly-accessible repository of integrated datasets and analysis tools for influenza virus research1.

IRD Integrates Data from External Sources and Generates Novel Data from Internal Computational Pipelines

IRD Provides Analysis and Visualization Tools •  BLAST Sequence Similarity Search •  Multiple Sequence Alignment •  Phylogenetics in Super-Computing Environment •  Sequence Variation (SNP) Analysis •  Metadata-driven Comparative Genomics Analysis •  Sequence Feature Variant Type (SFVT) Analysis •  3D Protein Structure Visualization •  Host Factor Enrichment Analysis •  Short Peptide Search •  PCR Primer Design •  Genome Annotation including SF Annotation •  HPAI H5N1 & Swine H1 Clade Classifications •  HA Subtype Numbering Conversion •  Batch sequence submission to GenBank

IRD Provides Personal Workbench for Data Storage & Sharing

Figure 1. Search sequences based on swine H1 clade(s) from the Swine H1 Clade Sequence Search page.

H1 & H5 Clade Classifications

We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing IRD, which has been wholly supported by the NIH/NIAID (No. HHSN272201400028C). Conflict of interest: None declared.

Acknowledgements

Sequence Feature Phenotypic Variant Type

Figure 2. Strain Details page shows that the HA sequence of the A/Jiangsu/4/2007 strain carries the 110N PVT substitution that has been shown to increase binding to alpha 2-6 receptor in the publication cited.

Figure 3. HA Subtype Numbering Conversion Result page showing the coordinates of user-provided HA protein sequence are converted to the coordinates of other HA subtypes.

HA Subtype Numbering Conversion

Custom Metadata Capturing

•  Provides comprehensive enriched influenza virus sequence annotations

•  Supports custom sequence annotation, analysis, and visualization

Conclusion

Novel Sequence Annotation and Analysis Tools in the Influenza Research Database (IRD) Yun Zhang1, Alexandra J. Lee1, Catherine Macken2, Tavis Anderson3, Amy Vincent3, David Burke4, Brian Aevermann1,

Douglas S. Greer1, Lucy Stewart1, Brian Reardon1, Sherry He5, Lei Tong5, Sanjeev Kumar5, Zhiping Gu5, Christopher N. Larson6, Guangyu Sun6, Sam Zaremba5, Edward B. Klem5, Richard H. Scheuermann1,7

1J. Craig Venter Institute, La Jolla, CA, USA; 2University of Auckland, Auckland, New Zealand; 3U.S. Department of Agriculture, Ames, IA, USA; 4University of Cambridge, UK; 5Northrop Grumman Health Solutions, Rockville MD, USA;

6Vecna Technologies, Greenbelt MD, USA 7Department of Pathology, University of California, San Diego, CA, USA www.fludb.org

User-uploaded sequences

Figure 4. A phylogenetic tree constructed from a combination of user-provided (downloaded from GISAID) and IRD sequences. Tree leaves colored-coded by subtype. User-provided sequences are highlighted in green.

•  New utility for capturing user-provided sequence associated metadata

•  Analyze and visualize user-provided sequence data and metadata along with IRD data using any IRD tools

10/20/2014 Influenza Research Database - Strain A/duck/Vietnam/LBM568/2014(H5N1)

http://www.fludb.org/brc/fluStrainDetails.spg?strainName=A/duck/Vietnam/LBM568/2014(H5N1)&decorator=influenza&context=1413844826442 1/2

Loading Influenza Research Database...

Influenza Strain Details for A/duck/Vietnam/LBM568/2014(H5N1)

Strain Information

Strain Name A/duck/Vietnam/LBM568/2014

Organism Name Influenza A Virus

Subtype H5N1

Host IRD:Mallard/AvianGenBank:Anas platyrhynchos var.domestica

2009 Pandemic H1N1­like(SOP) ?

Negative

Isolation Country Viet Nam

Collection Date 01/08/2014

GenBank Submission Date 08/07/2014

NCBI Taxon ID 1518578

Complete Genome Set Yes

Sequence Derived Phenotype Marker

Sequence Information

Segment Subtype Gene Product NameGenBank Source Sequence

Accession

Complete

Sequence

Segment

Length

IRD

Submission

pH1N1­

like

1 H5N1 PB2 Polymerase (basic) protein 2 AB972688 Complete 2308 ­N/A­ No

2 H5N1 PB1 Polymerase (basic) protein 1, PB1­F2 AB972689 Complete 2309 ­N/A­ No

3 H5N1 PA Polymerase (acidic) protein, PA­Xprotein(+61)

AB972690 Complete 2200 ­N/A­ No

4 H5N1 HA Hemagglutinin AB972691 Complete 1737 ­N/A­ No

5 H5N1 NP Nucleoprotein AB972692 Complete 1550 ­N/A­ No

6 H5N1 NA Neuraminidase AB972693 Complete 1362 ­N/A­ No

7 H5N1 M1 Matrix protein 1, M2 Matrix protein 2 AB972694 Complete 992 ­N/A­ No

8 H5N1 NS1 Non­structural protein 1, NS2 Non­structural protein 2

AB972695 Complete 840 ­N/A­ No

alpha2­6 conferred increased binding to alpha2­6 without lossof binding to alpha2­3 by comparing HA activitiesusing enzymatically modified chicken RBCs.

HA Influenza A_H5_determinant­of­virulence_171(3)_171N, 172A,239N_Decreased­virulence

171N, 172A,239N

No Introduction of Ser171Asn, Thr172Ala, Ser239Asnsubstitutions in the A/Vietnam/1203/2004 backboneconferred increased affinity for alpha2­6SAL usingsolid phase assay. The mutant virus showed 100 foldreduction in the lethality of WT.

PubMed:19116267

HA Influenza A_H5_species­adaptation_171(3)_171N, 172A,239N_Increased­binding­to­alpha2­6

171N, 172A,239N

No Introduction of Ser171Asn, Thr172Ala, Ser239Asnsubstitutions in the A/Vietnam/1203/2004 backboneconferred increased affinity for alpha2­6SAL usingsolid phase assay. The mutant virus showed 100 foldreduction in the lethality of WT.

PubMed:19116267

HA Influenza A_H5_species­adaptation_172(1)_172A_Increased­binding­to­alpha2­6

172A Yes Introduction of Thr172Ala naturally occurringsubstitution in the A/Vietnam/1203/2004 backboneconferred increased binding to alpha2­6 without lossof binding to alpha2­3 by comparing HA activitiesusing enzymatically modified chicken RBCs.

PubMed:20427525

HA Influenza A_H5_species­ 172A, 238L No Introduction of Thr172Ala, Gln238Leu naturally PubMed:20427525

Home Sequence... Sequence... Sequence... Sequence... Sequence Feature Strains Strain Details (A/duck/Vietnam/LBM568/2014)

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HELP

About Us Community Announcements Links Resources Support Workbench Sign In

•  Curated Sequence Features2 including phenotype markers in the CDC H5 Genetic Changes Inventory3

•  Computed Variant Types of each Sequence Feature •  Annotated all IRD sequences with the presence/absence

of Phenotypic Variant Types (PVT) •  PVT annotation tool for user-provided sequences

•  Based on the HA subtype numbering scheme by Burke and Smith (2014)4

•  Automatically convert the coordinates of any HA protein sequences to coordinates of any other subtypes, in order to map functional domains or phenotype markers across subtypes.

•  Integrated with IRD analysis tools including Sequence Variation Analysis and metadata-driven Comparative Analysis Tool for Sequences (meta-CATS)

•  Swine H1 classification algorithm based on the USDA/OFFLU swine H1 classification scheme

•  H5 classification algorithm based on the CDC/WHO HPAI H5N1 classification scheme

•  Annotated all IRD sequences with H1/H5 clade assignments

•  H1 & H5 clade classification tools for user-provided sequences

Data Aggregated by IRD (Source) Strains (GenBank) 106,760

Segment Sequences (GenBank) 442,929 Proteins (GenBank and UniProt) 705,701 3D Protein Structures (PDB) 662 Experimentally Determined Epitopes (IEDB) 6,304

Data Directly Submitted to IRD (Source) Surveillance Records (NIAID CEIRS) 629,403 Serology Data Records (NIAID CEIRS) 35,584 Human Samples with Clinical Metadata (NIAID GSCID) 736,576

Host Factor Experiments (NIAID Systems Biology) 57 Host Factor Data (ViPR Driving Biological Projects) coming soon Data Derived/Annotated by IRD Sequence Features 3,482

Proteins with Predicted Epitopes 616,961 Proteins with Pfam Domains 662,132 Proteins with Other Domains/Motifs 442,032 Proteins with GO IDs 508,568

Segments with Pre-computed Alignments 425,864 Strains with Predicted pH1N1 Classification 44,918 Strains with Predicted H5 Clade Classification 7,028 Antiviral Drugs 70

Loading Influenza Research Database...

Release Date: May 12, 2016

This system is provided for authorized users only. Anyone using this system expressly consents to monitoring while using the system. Improper use of this system may be referred to lawenforcement officials. This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272201400028C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute, and Vecna Technologies.

DATA TO RETURNSegment / NucleotideProteinStrain

SELECT CLADE(S)

Include Partial SequencesComplete Segments OnlyComplete Genomes only

SELECT SEGMENTS

COMPLETE SEQUENCES

DATE RANGEFrom: YYYY To: YYYY

To add month to search, seeAdvance Options: Month Range

HOST GEOGRAPHIC GROUPING

COUNTRY

ADVANCED OPTIONS

SearchClear

Results matching your criteria: 904

Tip: To select multiple or deselect, Ctrl­click (Windows) or Cmd­click (MacOS)

Show All

Swine H1 Clade Sequence Search

An IRD algorithm classifies the clade of the HA of H1 viruses, from any host and for any NA subtype, with reference to the USDA classification of US swine H1 viruses. Thisalgorithm, which is based on phylogenetic analysis, is an adaptation of that used for classifying HA(H5) sequences; it was developed by IRD team member Catherine Macken, inconjunction with Tavis Anderson and other swine influenza experts at the USDA. It has been verified as highly accurate (> 99%) for sequences of at least 300 nucleotides ofHA1. See SOP for more details. Those HA's not belonging to any of the recognized US swine H1 clades are given the classification "Other." Non­segment 4 sequences from avirus with a US swine H1 classification are given the same assignment as the HA. Representative tree of swine HA(H1) sequences showing named US swine clades Description of clades with name that include "­like"

Home Swine H1 Clade Sequence Search

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HELP

About Us Community Announcements Links Resources Support Sign Out

[email protected]