15
Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 4 Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses Objective Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to: Search for virus sequences and view detailed information about these sequences in IRD Save selected sequences as a working set in your private Workbench space Combine multiple working sets Build a phylogenetic tree on a set of sequences to infer their evolutionary relationships Use Meta-CATS to identify nucleotide or amino acid positions that significantly differ between groups of virus sequences Perform a multiple sequence alignment to observe sequence conservation and variations Determine if significant positions are located in viral protein Sequence Features and examine Sequence Feature Variant Type reports Search for 3D protein structures and highlight Sequence Features and custom positions on a structure Background H7 viruses normally circulate in birds and horses. Before March 2013, a search for H7 influenza strains in IRD returned a total of 1485 strains in IRD, with 1306 from birds, 102 from environmental samples (usually bird droppings), 33 from horses, and only 15 from humans (11 H7N7, H7N1, H7N2, 2 H7N3). No human isolates were H7N9. In March 2013, several cases of Influenza virus A H7N9 subtype were identified in Shanghai, China and surrounding provinces. From February to May, 2013, a total of 133 cases have been confirmed, including 37 deaths. Since October 2013, a second wave of human cases has been occurring, a total of 74 as of January 21, 2014. Fortunately, no evidence of ongoing human-to- human transmission for the current H7N9 outbreak has yet been found. This suggests that the virus is undergoing “stuttering transmission” in which a virus that normally circulates in an animal reservoir infects a person, but further human-to-human transmission does not occur. In general, viruses capable of stuttering transmission have acquired novel sequence variations that allow them to infect humans (human adaptation), but have yet to acquire sequence variations that allow them to sustain efficient transmission between humans. We will perform an in-depth statistical analysis using sequence records and analysis tools available in IRD (www.fludb.org) to clarify the origin of the HA segment and to identify candidate sequence variations that might be involved in this type of human adaptation.

Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 4  

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses

Objective

Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to:

• Search for virus sequences and view detailed information about these sequences in IRD

• Save selected sequences as a working set in your private Workbench space

• Combine multiple working sets

• Build a phylogenetic tree on a set of sequences to infer their evolutionary relationships

• Use Meta-CATS to identify nucleotide or amino acid positions that significantly differ between groups of virus sequences

• Perform a multiple sequence alignment to observe sequence conservation and variations

• Determine if significant positions are located in viral protein Sequence Features and examine Sequence Feature Variant Type reports

• Search for 3D protein structures and highlight Sequence Features and custom positions on a structure

Background

H7 viruses normally circulate in birds and horses. Before March 2013, a search for H7 influenza strains in IRD returned a total of 1485 strains in IRD, with 1306 from birds, 102 from environmental samples (usually bird droppings), 33 from horses, and only 15 from humans (11 H7N7, H7N1, H7N2, 2 H7N3). No human isolates were H7N9.

In March 2013, several cases of Influenza virus A H7N9 subtype were identified in Shanghai, China and surrounding provinces. From February to May, 2013, a total of 133 cases have been confirmed, including 37 deaths. Since October 2013, a second wave of human cases has been occurring, a total of 74 as of January 21, 2014. Fortunately, no evidence of ongoing human-to-human transmission for the current H7N9 outbreak has yet been found. This suggests that the virus is undergoing “stuttering transmission” in which a virus that normally circulates in an animal reservoir infects a person, but further human-to-human transmission does not occur. In general, viruses capable of stuttering transmission have acquired novel sequence variations that allow them to infect humans (human adaptation), but have yet to acquire sequence variations that allow them to sustain efficient transmission between humans.

We will perform an in-depth statistical analysis using sequence records and analysis tools available in IRD (www.fludb.org) to clarify the origin of the HA segment and to identify candidate sequence variations that might be involved in this type of human adaptation.

Page 2: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 5  

Analysis Workflow

Highlight significant positions and Sequence Features on protein structure: - search for H7 HA 3D protein structures - highlight Meta-CATS positions and Sequence Features on a structure

Determine if the significant positions are located in Sequence Features: - follow the Sequence Feature linkage on the Meta-CATS report

- examine Sequence Features containing the significant positions

Visualize multiple sequence alignment: - align HA protein sequences

- observe the variant positions on the alignment

Run Metadata-driven Comparative Analysis Tool (Meta-CATS): - input the protein working set (4) to Meta-CATS - group the sequences into two groups: human H7N9 2013 outbreak isolates, and similar older H7 isolates - identify positions that are significantly different between the two sets - convert position coordinates on the H7 numbering scheme into coordinates on the H3 numbering scheme

Construct nucleotide phylogenetic tree: - construct phylogenetic tree using working set (3) - color tree to reveal host and subtype specific branching patterns

Search for sequences and save sequences into working sets: - search for H7N9 HA nucleotide sequences and save sequences as working set (1) - BLAST for HA nucleotide sequences similar to H7N9 2013 outbreak sequences and save them as working set (2) - combine working sets (1)-(2) into one (3) - convert nucleotide working set (3) into protein working set (4)

Page 3: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 6  

I. Search for sequences and save matching sequences into working sets

1. Search for H7N9 HA sequences using structured search interfaces

a. Go to the IRD homepage (http://www.fludb.org/), mouse-over “Search Data” in the grey navigation bar, then “Search Sequences” and click “Nucleotide Sequences”.

b. The Nucleotide Sequence Search page allows you to search for sequences based on data type, virus type, subtype, strain name, segment, host, geographical region, complete sequences or not, H1N1 pandemic sequences or not, and date range.

c. For this exercise, we are going to search for HA segment sequences from H7N9 strains. Select the following criteria and click the orange “Search” button to run the query. Virus Type: ý A Select Segments: ý 4 HA Sub Type: H7N9

Complete Sequences: ý Complete Sequences Only Advanced Options: ý Remove Duplicate Sequences

d. Note that IRD shows instant count of search results here to help you search quickly and efficiently. When you select search criteria on search pages, you will instantly know how many records match your search criteria without clicking the “Search” button and actually running the search.

e. The Search Results page will be displayed. Here you can:

i. Save the search query to your Workbench and rerun the search again later.

Page 4: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 7  

ii. Download the sequences (gene, CDS, protein) by clicking “Download”.

iii. Select records and run an analysis on the selected records by mousing-over the “Run Analysis” button and clicking a desired analysis option.

iv. Store selected sequences as a working set in the Workbench so that you can run various analyses on the working set.

v. View the details for any item in the results table by clicking on “View” next to any row.

f. Now click the “Collection Date” header in the results table to sort records by date. If you need advanced sorting options and want to display additional fields, click the “Display Settings” button.

How many HA segment sequences did you find from the current outbreak?

g. To analyze these sequences, we will select records by ticking the checkbox and adding them to a working set by clicking the “Add to working set” button. This way, we will be able to retrieve the data from the Workbench later and run various analyses on the same data set.

h. You’ll be prompted to log in to your Workbench account in order to save data to a working set. If you don’t have an account already, simply register for an account for free by choosing the “Register for a new account” option and following the prompts.

i. A lightbox of “Add to Working Set” will pop up. Now create a new working set and name it “H7N9 HA complete sequences”. Click “Add to Working Set” to save the sequences to a working set.

2. BLAST for HA sequences similar to H7N9 2013 outbreak sequences

Now we are going to expand the sequence set by including HA sequences that are highly similar to the outbreak sequences. We will select a representative isolate and perform a BLAST search of nucleotide sequences. The IRD BLAST tool utilizes the NCBI BLAST program set and has a collection of custom influenza sequence databases to search against.

a. Select A/Shanghai/02/2013 from the results table, mouse over “Run Analysis” and click “BLAST”.

2/10/14 Influenza Research Database - Nucleotide Sequence Search Results

www.fludb.org/brc/influenza_sequence_search_segment_display.do 1/2

Loading  Influenza  Research  Database...

 Add  to  Working  Set    Save  Search    Download  

1   2   Next  >   Page:   1  of  2

Your  search  returned  95  segments. Search  Criteria Displaying  50  records  per  page  ,  sorted  by  Strain  Name  inascending  order.

Display  Settings

Nucleotide  Sequence  Search  Results

Your  Selected  Items:  95  items  selected      |    Deselect  All

  Select  all  95  segments

More  columns  were  returned  than  can  be  displayed  without  scrolling.  Use  scroll  bars  at  top  and  bottom  of  display  to  move  right  and  left  or  reduce  the  number  ofcolumns  displayed  by  using  the  Display  Settings  link  above.

Segment ProteinName

SequenceAccession

CompleteGenome

SegmentLength Subtype  * Collection

Date Host  Species Country State/ProvinceFlu

Season  (SOP)

Strain  Name

4 HA HQ244407 No 1698 H7N9 01/26/2008 *EurasianTeal/Avian

Spain -­N/A-­ 07-­08 *A/Anas  crecca/Spain/1460/2008(H7N9)

4 HA FN386467 No 1683 H7N9 01/26/2008 *EurasianTeal/Avian

Spain -­N/A-­ 07-­08 *A/Anas  crecca/Spain/1460/2008(H7N9)

4 HA KF768181 No 1683 H7N9 2013 null China -­N/A-­ -­N/A-­ A/Anhui/PA-­1/2013

4 HA CY067670 Yes 1731 H7N9 02/07/2008 *Blue-­WingedTeal/Avian

Guatemala -­N/A-­ -­N/A-­ A/blue-­winged  teal/Guatemala/CIP049-­01/2008

4 HA CY067678 Yes 1731 H7N9 03/05/2008 *Blue-­WingedTeal/Avian

Guatemala -­N/A-­ -­N/A-­ A/blue-­winged  teal/Guatemala/CIP049-­02/2008

4 HA CY024818 Yes 1697 H7N9 2006 Blue-­WingedTeal/Avian

USA Ohio -­N/A-­ *A/blue-­winged  teal/Ohio/566/2006(H7N9)

4 HA KF420296 Yes 1710 H7N9 03/22/2013 Human China -­N/A-­ -­N/A-­ *A/Changsha/1/2013(H7N9)

4 HA KF420297 Yes 1710 H7N9 04/29/2013 Human China -­N/A-­ -­N/A-­ *A/Changsha/2/2013(H7N9)

4 HA CY146908  * Yes 1683 H7N9 05/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ *A/chicken/Guangdong/SD641/2013(H7N9)

4 HA KF938946  * No 1696 H7N9 04/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangsu/1021/2013

4 HA CY146916 Yes 1683 H7N9 04/16/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangsu/S002/2013

4 HA CY146924 Yes 1683 H7N9 04/16/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangsu/SC035/2013

4 HA CY146932 Yes 1683 H7N9 04/16/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangsu/SC099/2013

4 HA CY146940 Yes 1683 H7N9 04/16/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangsu/SC537/2013

4 HA CY146948 Yes 1683 H7N9 05/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Jiangxi/SD001/2013

4 HA KF259042  * No 1683 H7N9 2013 Chicken/Avian China -­N/A-­ -­N/A-­ *A/chicken/Rizhao/515/2013(H7N9)

4 HA KF259058  * No 1683 H7N9 2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Rizhao/713/2013

4 HA KF297293  * No 1683 H7N9 2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Rizhao/719b/2013

4 HA KF542876 Yes 1683 H7N9 04/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/017/2013

4 HA CY146956 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1053/2013

4 HA CY146964 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1055/2013

4 HA CY146972 No 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1076/2013

4 HA CY146980 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1077/2013

4 HA CY147012 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1358/2013

4 HA CY147020 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1410/2013

4 HA CY147028 Yes 1683 H7N9 04/03/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Shanghai/S1413/2013

4 HA KC899669 No 1683 H7N9 04/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Zhejiang/DTID-­ZJU01/2013

4 HA KF768178  * No 1683 H7N9 2013 Chicken/Avian -­N/A-­ -­N/A-­ -­N/A-­ A/chicken/Zhejiang/PA-­DTID-­ZJU01/2013

4 HA CY147036 Yes 1683 H7N9 04/22/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Zhejiang/SD007/2013

4 HA CY147052 Yes 1683 H7N9 04/11/2013 Chicken/Avian China -­N/A-­ -­N/A-­ A/chicken/Zhejiang/SD033/2013

4 HA CY147060 Yes 1683 H7N9 04/16/2013 Duck/Avian China -­N/A-­ -­N/A-­ A/duck/Anhui/SC702/2013

Run  Analysis ▼

Home    Nucleotide  Sequence  Search    Results

SEARCH  DATA ANALYZE  &  VISUALIZE WORKBENCH SUBMIT  DATA

About  Us Community Announcements Links Resources Support Workbench  Sign  In

Page 5: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 8  

b. In the Select Sequence Type lightbox, select “Nucleic Acid (Segment)” and click “Continue”.

c. Now the BLAST setting page is loaded. IRD provides a collection of custom influenza sequence databases to search against using BLAST. Select “Blastn”, “Nucleotides for segment 4 HA”, and “100” results to display. For the other parameters keep the default settings. Click “Run”.

d. On the BLAST Report page, all nearest hits are listed in the table. Click a hit to view its alignment. Click the IRD link (e.g., ird|982104) to view the hit’s segment/ protein details page in IRD. What are the host and subtype of the sequences that are most similar to the H7N9 outbreak sequences?

e. Return to the BLAST Report page by clicking the Results breadcrumb. Now select all hits by ticking the checkbox in the column header and click “Add to Working Set” to add these sequences to a new working set named “A/Shanghai/02/2013 BLAST hits”.

3. Construct segment and protein working sets

a. Now we are going to combine the working sets of H7N9 HA complete sequences and A/Shanghai/02/2013 BLAST hits and use the combined working set for downstream analyses.

2/10/14 Influenza Research Database - Sequence Similarity Search (BLAST)

www.fludb.org/brc/blast.do 1/1

Loading  Influenza  Research  Database...

Cite  IRD Tutorials Glossary  of  Terms Report  a  Bug Request  Web  Training Contact  Us Release  Date:  Jan  24,  2014

This  project  is  funded  by  the  National  Institute  of  Allergy  and  Infectious  Diseases  (NIH  /  DHHS)  under  Contract  No.  HHSN266200400041C  and  is  a  collaboration  between  NorthropGrumman  Health  IT,  J.  Craig  Venter  Institute  ,  Vecna  Technologies,  SAGE  Analytica  and  Los  Alamos  National  Laboratory.

Blastn  (nucleotide)

Blastx  (search  protein  using  nucleotidetranslated  in  6  reading  frames)

FORMAT  OF  SEQUENCES  PROVIDED

Nucleotides for segment 4 H1Nucleotides for segment 4 H3Nucleotides for segment 4 H5Nucleotides for segment 4 HANucleotides for segment 5 NPNucleotides for segment 6 N1

SELECT  DATABASE  TO  SEARCH  *     Use  generated  database

Tip:  To  select  multiple  or  deselect,  Ctrl-­click  (Windows)or  Cmd-­click  (MacOS)

Use  working  set  to  run  blast

1  record  was  previously  selected  from  searchresults

INPUT  SEQUENCES

102550100150

OUTPUT  FORMAT

Remove  data  from  genetically  manipulated  strains  from  the  resultNumber  of  Results  to  Display  for  each  Input  Sequence

ADVANCED  OPTIONS

RemoveSelect  Advanced  Option

Select An Advanced Option

Both(Default)TopBottom

Strand

Existence: 0 Extension: 4Existence: 3 Extension: 3Existence: 6 Extension: 2Existence: 5 Extension: 2Existence: 4 Extension: 2

Gap  Costs

TrueFalse

Apply  Low  Complexity  Filter

Perform  ungapped  alignment

OTHER  PARAMETERS

RunClear

ANALYSIS  NAME

 Show  All

Add  Another  Advanced  Option

Identify  Similar  Sequences  (BLAST)  Compare  sequences  you  provide  or  select  from  the  IRD  database  consisting  of  selected  sets  of  sequences  from  the  IRD  database  or  create  your  own  database  forcomparison  from  a  working  set  on  your  workbench.Note:  An  asterisk  (*)  =  required  field    

Home    Nucleotide  Sequence  Search    Results    Identify  Similar  Sequences  (BLAST)

SEARCH  DATA ANALYZE  &  VISUALIZE WORKBENCH SUBMIT  DATA

About  Us Community Announcements Links Resources Support Workbench  Sign  In

Save Analysis Add to Working Set

BLAST ReportBLASTN 2.2.22 [Sep-27-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query:gb:KF021597|Organism:Influenza#A#virus#A/Shanghai/02/2013|Segme Database: Influenza nucleotide sequences Seg4 70,570 sequences; 95,790,805 total letters

Id Sequence header BitScore

EValue

982104 >ird|982104| Country:China| Influenza A virus (A/Shanghai/02/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KF021597

3386 0.0

970288 >ird|970288| Country:China| Influenza A virus (A/Zhejiang/DTID-ZJU01/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC885956

3378 0.0

973128 >ird|973128| Country:China| Influenza A virus (A/Fujian/1/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete cds.|gb|KC994453 3328 0.0

979408 >ird|979408| Country:China| Influenza A virus (A/environment/Hangzhou/34/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KF001519

3320 0.0

978376 >ird|978376| Country:China| Influenza A virus (A/Hangzhou/1/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC853766

3313 0.0

978147 >ird|978147| Country:China| Influenza A virus (A/Hangzhou/2/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KF001513

3305 0.0

985111 >ird|985111| Country:China| Influenza A virus (A/Nanjing/1/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete cds.|gb|KC896774 3305 0.0

985329 >ird|985329| Country:China| Influenza A virus (A/environment/Nanjing/2913/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC896763

3305 0.0

974693 >ird|974693| Country:China| Influenza A virus (A/Shanghai/4664T/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC853228

3303 0.0

986173 >ird|986173| Country:China| Influenza A virus (A/Zhejiang/HZ1/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KF055470

3297 0.0

979137 >ird|979137| Country:China| Influenza A virus (A/Hangzhou/3/2013(H7N9)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KF001516

3297 0.0

985117 >ird|985117| Country:Taiwan| Influenza A virus (A/Taiwan/T02081/2013(H7N9)) segment 4 hemagglutinin (HA) gene, partial

cds.|gb|KF018053

3253 0.0

984540 >ird|984540| Country:Taiwan| Influenza A virus (A/Taiwan/S02076/2013(H7N9)) segment 4 hemagglutinin (HA) gene, partial

cds.|gb|KF018045

3241 0.0

975767 >ird|975767| Country:China| Influenza A virus (A/chicken/Zhejiang/DTID-ZJU01/2013(H7N9)) segment 4 hemagglutinin (HA) gene,

complete cds.|gb|KC899669

3241 0.0

863382 >ird|863382| Country:China| Influenza A virus (A/duck/Zhejiang/12/2011(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|JQ906576

2781 0.0

863431 >ird|863431| Country:China| Influenza A virus (A/duck/Zhejiang/11/2011(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|JQ906575

2781 0.0

863413 >ird|863413| Country:China| Influenza A virus (A/duck/Zhejiang/2/2011(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|JQ906573

2750 0.0

863387 >ird|863387| Country:China| Influenza A virus (A/duck/Zhejiang/10/2011(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|JQ906574

2726 0.0

984601 >ird|984601| Country:China| Influenza A virus (A/duck/Zhejiang/DK16/2013(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC961629

2694 0.0

984375 >ird|984375| Country:China| Influenza A virus (A/duck/Zhejiang/DK10/2013(H7N3)) segment 4 hemagglutinin (HA) gene, complete

cds.|gb|KC961628

2694 0.0

328330 >ird|328330| Country:South Korea| Influenza A virus (A/wild bird feces/Korea/HDR22/2006(H7N7)) segment 4 hemagglutinin (HA) gene,

partial cds.|gb|FJ750875

2680 0.0

377511 >ird|377511| Country:Japan| Influenza A virus (A/duck/Shiga/B149/2007(H7N7)) HA gene for hemagglutinin, complete cds.|gb|AB558257 2662 0.0

914430 >ird|914430| Country:Mongolia| Influenza A virus (A/duck/Mongolia/867/2002(H7N1)) genomic RNA, segment 4, complete

sequence.|gb|AB473543

2662 0.0

193728 >ird|193728| Country:China| Influenza A virus (A/duck/Jiangxi/1742/03(H7N7)) hemagglutinin (HA) gene, partial cds.|gb|EU158108 2656 0.0

193870 >ird|193870| Country:China| Influenza A virus (A/duck/Jiangxi/1786/03(H7N7)) hemagglutinin (HA) gene, partial cds.|gb|EU158102 2654 0.0

193819 >ird|193819| Country:China| Influenza A virus (A/duck/Jiangxi/1814/03(H7N7)) hemagglutinin (HA) gene, partial cds.|gb|EU158103 2652 0.0

Home Nucleotide... Results Identify Similar Sequences (BLAST) Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Workbench Sign In

Page 6: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 9  

Click the “Workbench” tab from the grey navigation bar to go to your Workbench. You’ll see the saved working set listed at the top of the Workbench table.

b. Click the checkboxes for the two working sets we just saved. Click “More Actions” then “Combine”. Name the combined working set to “H7N9 HA complete seqs + blastn hits”. This working set contains the HA segment sequences from H7N9 isolates and sequences that are most similar to the HA segment sequences of the current H7N9 outbreak. You will notice the duplicate sequences are automatically removed.

c. Next, we are going to convert the combined segment working set into a protein sequence working set. To do so, select the combined working set. Click “More Actions” then “Convert”.

d. In the Convert Working Set lightbox, select Protein as data type and name the working set to: H7N9 HA complete seqs + blastn hits protein. Click “Convert”.

e. Access your Workbench by clicking the “Workbench” tab. You will see the newly created working sets at the top of the content list.

4. (Optional) Combine custom sequences with IRD sequences

When you have custom sequences and want to analyze your sequences along with IRD sequences, you can upload your sequence file to your Workbench and then combine the uploaded file with a saved working set.

a. Prepare your sequences in FASTA format.

b. Go to the Workbench, click “More Actions” then “Upload File”.

c. Next, name your file, choose File Type (i.e., “Unaligned FASTA”) and Data Type, click “Browse” and select the desired FASTA file on your computer. Choose the desired Fasta File Definition Line format. Click “Upload”. The uploaded file will be displayed in the Workbench table.

d. Now select desired working set. Click “More Actions” then “Combine with Uploaded File”.

e. In the pop-up lightbox, choose the desired sequence file from the list of uploaded files, name the combined working set, and click “Combine”. You will see the newly combined working sets at the top of the content list.

II. Construct an HA segment phylogenetic tree

a. Now we will construct a phylogenetic tree using HA segment sequences from H7N9 isolates and segment sequences that are most similar to the current H7N9 outbreak isolates.

b. We are going to add HA sequence from A/ruddy turnstone/DE/2378/1988 as an outlier to the H7N9 HA and closely related sequences.

Page 7: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 10  

i. Mouse over the “Search Data” tab and click “Nucleotide Sequences”.

ii. On the Nucleotide Sequence Search page, select “4 HA” from the “Select Segments” list, type “A/ruddy turnstone/DE/2378/1988” in the strain name box, and click “Search” to run the search.

iii. The search result page will show 1 segment. Select the segment by ticking the checkbox next to it. Then click the “Add to working set” button and add it to the working set: “H7N9 HA complete seqs + blastn hits”.

c. On the Workbench page, click “View” next to H7N9 HA complete seqs + blastn hits.

d. The working set details page displays the sequence records saved in the working set. Select all records by clicking the checkbox above the table. Mouse over “Run Analysis” and click “Generate Phylogenetic Tree”.

e. On the Tree setting page, select “Quick Tree”, choose strain name and date as tree tip label, and click “Build Tree”.

f. While the analysis is running, you can save the analysis to your Workbench by entering a name and then clicking “Save to Workbench”. Once it is saved, you can come back to the Workbench at any time to retrieve the analysis results.

g. After the analysis is finished, a View Phylogenetic Tree page will be loaded. Here you can save the phylogenetic file in Newick or PhyloXML format to your computer. Click “View Tree” to load the Archaeopteryx Tree Viewer window.

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jan 24, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

INPUT146 SEGMENTS SELECTED FOR TREE

LABEL TREE TIPS (ENDS) WITH

Strain Name

Specify custom format of tip label (max 4) Strain Name Accession Number Date Country USA State Segment Protein Symbol Season Type SubType Host Species 2009 pH1N1-like Phenotype Markers

ANALYSIS NAME

H7N9 2013 HA segment phylogeny

TREE GENERATION

Quick Tree (Let IRD set all parameters - view all parameters)

Custom Tree (I want to set my own parameters and/or I have a large

dataset)

Exclude 1 segments with poorly pre-aligned sequences

Use all 146 segments and run MUSCLE to align them

Build TreeClear

Generate Phylogenetic Tree Tutorial

The "Quick Tree" option uses PhyML [ Guindon, S. and Gascuel, O., (2003) Syst Biol. 52: 696-704 ] and IRD-defined settings to infer phylogenies based on sequences fordatasets of at most 1000 sequences. The "Custom Tree" option offers a choice between the PhyML or RaxML [Stamatakis, A. et al. (2005) Bioinformatics 21: 456-463]algorithms, and the ability to define parameter settings. The Custom Tree option, using RaxML, must be used for datasets exceeding 1000 sequences. Click here to view atutorial on generating a phylogenetic tree using IRD tools.

Home My Workbench Working Set (H7N9 2013 HA nr + BLAST) Generate Phylogenetic Tree

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Influenza Research Database - Phylogenetic Tree http://www.fludb.org/brc/tree.do?method=RunFromDataList&t...

1 of 1 2/10/14 7:42 PM

Page 8: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 11  

h. A Tree Viewer window will pop up. Many tree customization options exist including: reroot the tree, collapse/expand/display subtree, swap descendants, decorate (color) the tree leaves by any associated metadata (e.g. host or year of isolation, etc.), resize the tree, zoom in/out, fit the tree to window, change the font size, etc.

i. In the “Tree Decorations” section, select “HA&NA Subtype” from the “Basic Decoration Options”. Click “Show Legend” to display the color code for different subtypes.

ii. The default colors may or may not be ideal for your purpose. You can change the color by using the “Advanced Decoration”. In the Advanced Decoration Options dialog box, select “HA&NA Subtype”, click the Manual Decoration checkbox and click “Go”.

iii. Check H7N9 and choose red in the color palette, then click “Apply”. Now the H7N9 strains are colored in red.

iv. The tree shows that the HA sequences from the H7N9 outbreak are similar to the HA sequences from a series of duck H7N3 isolates from Zhejiang and a series of poultry H7N2/3/7 isolates mostly from Wenzhou, suggesting that at least two independent interspecies transmissions occurred and the HA segment of the new H7N9 isolates was likely derived from an H7N3 ancestor.

v. You can save the tree image by clicking the “File” menu and then a file format.

i. Return to the Tree Results page. Save the tree analysis to your Workbench by clicking “Save Analysis”. Rename the analysis so that you can recognize it later, for example, “H7N9 HA segment phylogeny”. Then click “Save”.

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jan 24, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Save Analysis Newick File PhyloXml File Phylip File Tree Parameters PhyML Log Tree Build Parameters

View Tree

The IRD Tree Decorator is a custom-enhancement of Archaeopteryx . The original FORESTER/ATV library is freely available from SourceForge . Credits: Zmasek C.M. and Eddy S.R.

(2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.

Click the "View Tree" button below to launch the tree viewer software in a new window. If you prefer other viewing software, the tree data is available fordownload in Newick or PhyloXml format using the buttons above.

Due to security concerns, certain browsers (e.g. Safari and Firefox) have disabled Java plug-ins by default. If the Tree Viewer takes a long time toload, please test your browser's Java plug-in to make sure it can display Java Applets properly.Safari has recently tightened the security settings on Java Applets, which may affect image export functions of the Tree Viewer. Click Here forinstructions on how to fix this.

ENHANCED TREE VIEWER

The IRD team provides software that allows 'decoration' of your tree by features such as host species, year, country, and subtype. This custom software isbased on Archaeopteryx . In the tree viewer, use the drop-down menu for basic decoration or advanced decoration to select the feature for coloring. Thedecorated tree and corresponding legend can be exported using options in the File drop-down menu.

A user's guide is available. How to create a publication quality tree image

View Phylogenetic TreeHome My Workbench Working... Generate Phylogenetic Tree Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Influenza Research Database - Phylogenetic Tree Viewer http://www.fludb.org/brc/tree.do?decorator=influenza&method...

1 of 1 2/10/14 7:45 PM

Page 9: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 12  

j. Go to your Workbench. You can see the tree is listed at the top of the Workbench table. Click “View” to retrieve the tree analysis result. The parameters used to generate the tree are also saved.

III. Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS)

Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) • A unique comparative genomics analysis tool in IRD to identify nucleotide

/amino acid positions that significantly differ between two or more groups of virus sequences.

• Meta-CATS consists of three parts: a multiple sequence alignment (using MUSCLE), a chi-square goodness of fit test to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi-square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference.

• Picket BE, et al. (2013) "Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations Dependent on Differences in Viral Metadata." Virology, 447(1-2):45-51. doi: 10.1016/j.virol.2013.08.021.

Page 10: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 13  

Now we will use Meta-CATS to identify amino acid positions on the HA protein that significantly differ between the human isolates from the current outbreak and older H7 isolates.

a. We are going to analyze the protein sequences we saved previously in the working set: H7N9 HA complete seqs + H7N9 blastn hits protein. So go to your Workbench, find the working set and click “View” to display the sequences.

b. Sort the list by the “Date” column. Now select the following protein sequence records:

i. All human H7N9 from 2013

ii. All non-human H7 isolated before 2012

iii. Mouse over “Run Analysis” and click “Metadata-driven Comparative Analysis Tool”.

c. You are taken to the meta-CATS tool setting page. Here we will separate these sequences into two groups according to the phylogenetic tree analysis: 2013 H7N9 isolates as a group and the rest sequences as another group. We can do so by grouping the sequences by year. Select “Auto Grouping”, and then select “Year” from the dropdown list. Now enter year break point “2012” to get groups of: (1) 2012 & before (Eurasian H7 ancestral isolates), and (2) > 2012 (human H7N9 2013 human outbreak isolates). Select “Unaligned FASTA”, use “0.05” as our significance cutoff value and click “Continue”.

d. On the next page, you will see the sequences are separated into two groups: Group 1 containing <=2012 sequences, and Group 2 containing >2012 sequences. Click the “Run” button.

e. This analysis may take a few minutes to finish. You can save the analysis to your Workbench and retrieve it later. To do so, enter in a name (Ex., human H7N9 2013 HA vs. older H7) and click “Save to Workbench”.

2/10/14 Influenza Research Database - Metadata-driven Comparative Analysis Tool (meta-CATS) - Setup Subset

www.fludb.org/brc/mgc.do 1/1

Loading  Influenza  Research  Database...

Cite  IRD Tutorials Glossary  of  Terms Report  a  Bug Request  Web  Training Contact  Us Release  Date:  Jan  24,  2014

This  project  is  funded  by  the  National  Institute  of  Allergy  and  Infectious  Diseases  (NIH  /  DHHS)  under  Contract  No.  HHSN266200400041C  and  is  a  collaboration  between  NorthropGrumman  Health  IT,  J.  Craig  Venter  Institute  ,  Vecna  Technologies,  SAGE  Analytica  and  Los  Alamos  National  Laboratory.

33  sequencesAdd  |  Remove

32  sequencesAdd  |  Remove

RunClear

MAIN  LIST  OF  SEQUENCES

LIST  OF  SEQUENCES  GROUP  1

LIST  OF  SEQUENCES  GROUP  2

-­N/A-­|-­N/A-­|Q3KRH9|A/mallard/Sweden/91/02|AY999981-­N/A-­|-­N/A-­|Q0A3N8|A/turkey/Minnesota/1/1988|CY014786-­N/A-­|-­N/A-­|A7IRT3|A/blue-­winged  teal/Ohio/566/2006|CY024818-­N/A-­|-­N/A-­|B9X1H5|A/duck/Mongolia/119/2008|AB481212-­N/A-­|-­N/A-­|C7C2V7|A/Anas  crecca/Spain/1460/2008|FN386467HA|-­N/A-­|E2JIX6|A/goose/Czech  Republic/1848-­T14/2009(H7N9)|HQ244415HA|-­N/A-­|D0FI67|A/goose/Czech  Republic/1848-­K9/2009(H7N9)|GU060482

HA|-­N/A-­|M9TFV7|A/Zhejiang/DTID-­ZJU01/2013|KC885956HA|-­N/A-­|N0D493|A/Fujian/1/2013|KC994453HA|-­N/A-­|M4YV75|A/Hangzhou/1/2013|KC853766HA|-­N/A-­|R4JCY8|A/Hangzhou/3/2013|KF001516HA|-­N/A-­|R4JC40|A/Hangzhou/2/2013|KF001513HA|-­N/A-­|R4SCY6|A/Taiwan/S02076/2013|KF018045HA|-­N/A-­|R4J9D5|A/Nanjing/1/2013|KC896774

Metadata-­driven  Comparative  Analysis  Tool  (meta-­CATS)  -­  Setup  Subset  Mouse  click  on  the  sequences  to  select  them,  Ctrl  and/or  shift  key  enable  the  multiple  selection.  After  selecting  the  sequences,  click  on  "Add"  to  move  into  the  group."Remove"  will  send  the  selected  sequences  back  to  the  main  list.

Auto  grouping  on:  YearGroup1: <=2012 33

Group2: >2012 32

Home    My  Workbench    Working  Set  (H7N9  2013  HA  nr  +  BLAST  protein)    Metadata-­driven  Comparative  Analysis  Tool  for  Sequences  (meta-­CATS)

SEARCH  DATA ANALYZE  &  VISUALIZE WORKBENCH SUBMIT  DATA

About  Us Community Announcements Links Resources Support Sign  Out

You  are  logged  in  as  [email protected]

Page 11: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 14  

f. The Meta-CATS analysis result has two reports: a Chi-square Test of Independence result table listing the positions that have a significant non-random distribution between your specified groups, and a Pearson's chi-square test result table listing the specific pairs of groups that contribute to the observed statistical difference. Since this analysis only deals with two groups of sequences, we will primarily focus on the first result table.

g. Review the Chi-square test results to see the positions that differ significantly between the current H7N9 outbreak isolates and other isolates. The residue diversity column lists the counts for each residue within a group. Now sort the results by the Chi-square value to push the most different positions to the top of the table. What is the position number with the highest Chi-square value?

h. Some publications record H7 positions on an H3 numbering scheme. To compare the variant positions identified by meta-CATS with positions reported in publications, we will convert the position numbers of the current analysis to the H3 numbering scheme by selecting the “H3 A/Aichi/2/1968(H3N2)” reference strain from the “Reference Coordinate” dropdown menu.

i. The numbering scheme from the custom alignment of the input sequences are now mapped to both the H3 HA0 (uncleaved) and H3 HA1/HA2 (cleaved) coordinates.

j. Save the analysis result to your Workbench by clicking the “Save Analysis” button.

2/10/14 Influenza Research Database - Metadata-driven Comparative Analysis Tool (meta-CATS) Report

www.fludb.org/brc/mgc.do?method=RetrieveResults&ticketNumber=MG_599805474639&selectionContext=1392082613344&sortBy=cs.csvalue&sortOrder=desc 1/5

Loading  Influenza  Research  Database...

 Save  Analysis    Generate  Phylogenetic  Tree    Visualize  Aligned  Sequences  

Metadata-­driven  Comparative  Analysis  Report  (Ticket#  MG_599805474639)

The  Metadata-­driven  Comparative  Analysis  Tool  (meta-­CATS)  (SOP)  consists  of  three  parts:  a  multiple  sequence  alignment  (using  MUSCLE),  a  chi-­square  goodness  of  fit  testto  identify  positions  (columns)  of  the  multiple  sequence  alignment  that  significantly  differ  from  the  expected  (random)  distribution  of  residues  between  all  metadata  groups,and  a  Pearson's  chi-­square  test  to  identify  the  specific  pairs  of  metadata  groups  that  contribute  to  the  observed  statistical  difference.  When  3  or  more  groups  are  included  in  the  analysis,  the  P-­value  from  the  Goodness  of  Fit  test  will  identify  columns  having  significant  variation  between  all  groups,  while  thePearson's  test  will  identify  the  specific  pair(s)  of  groups  that  make  the  column  significant  (i.e.  if  those  groups  were  not  included  in  the  analysis,  the  column  would  no  longer  beidentified  as  significant).

The  link  (ViewSF)  of  each  position  to  Sequence  Feature  page  is  visible  only  if  the  SF  has  been  defined  AND  the  set  of  PROTEINS  data  with  the  same  type:  subtype  orortholog.

Reference  CoordinateSelect  from  the  drop-­down  list  to  convert  the  existing  position  numbers  to  a  different  numbering  (coordinate)  scheme  using  a  Reference  Sequence  (e.g.  convert  from  H1  to  H3numbering)

Select a Reference to Calculate Ref CoordinateSegment_Number:Protein_Name  Strain_Name(Subtype)

Chi-­square  Test  of  Independence  Result

There  are  92  positions  that  have  a  significant  non-­random  distribution  between  the  specified  groups."*"  in  position  column  indicates  fewer  than  5  non-­zero  residues  in  any  cell  of  the  contingency  table.

Position Chi-­square  Value P-­value Degree  Freedom Residue  Diversity Sequence  Feature

235* 64.994 7.704E-­15 2 group1(33  Q)group2(1  I,  31  L)

View  SF

195* 64.994 7.704E-­15 2 group1(1  A,  32  G)group2(32  V)

View  SF

183 64.994 7.704E-­15 2 group1(17  D,  16  K)group2(32  S)

N/A

541 61.057 5.545E-­15 1 group1(33  A)group2(32  V)

N/A

455 61.057 5.545E-­15 1 group1(33  N)group2(32  D)

N/A

410* 57.463 3.327E-­13 2 group1(2  N,  14  S,  17  T)group2(32  N)

N/A

321* 50.765 2.499E-­10 4 group1(12  E,  1  K,  1  P,  4  R,  15  T)group2(32  R)

N/A

198* 47.694 4.4E-­11 2 group1(5  A,  2  N,  26  T)group2(32  A)

View  SF

130 47.694 4.4E-­11 2 group1(5  A,  16  S,  12  T)group2(32  A)

N/A

11* 47.692 2.477E-­10 3 group1(16  C,  5  I,  1  M,  11  V)group2(32  I)

N/A

188* 44.78 1.889E-­10 2 group1(1  A,  26  I,  6  V)group2(32  V)

N/A

307 44.298 2.82E-­11 1 group1(5  D,  28  N)group2(32  D)

N/A

211 44.298 2.82E-­11 1 group1(28  I,  5  V)group2(32  V)

N/A

427 38.799 4.698E-­10 1 group1(7  I,  26  M)group2(32  I)

N/A

335* 30.074 4.727E-­6 4 group1(12  I,  4  K,  4  L,  12  N,  1  S)group2(32  I)

N/A

285* 29.31 1.927E-­6 3 group1(4  D,  13  G,  11  N,  5  S)group2(1  D,  31  N)

N/A

462 25.238 5.066E-­7 1 group1(13  K,  20  R)group2(32  K)

N/A

414* 24.135 5.743E-­6 2 group1(15  K,  16  Q,  2  R) N/A

Home    My  Workbench    Working...    Metadata-­driven  Comparative  Analysis  Tool  for  Sequences  (meta-­CATS)    Results

SEARCH  DATA ANALYZE  &  VISUALIZE WORKBENCH SUBMIT  DATA

About  Us Community Announcements Links Resources Support Sign  Out

You  are  logged  in  as  [email protected]

Page 12: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 15  

IV. Visualize protein sequence alignment

Now we are going to view the protein sequence alignment to confirm the meta-CATS results and to verify clade relationships inferred from the phylogenetic analysis.

a. From the meta-CATS report page, click “Visualize Aligned Sequences” at the top of the page.

Note: You can also run an alignment on the saved sequence working set by navigating to the working set in your Workbench area and then clicking the “Visualize Aligned Sequences” option from the “Run Analysis” pull down menu.

b. The alignment is presented in the JalView visualization window. The window is interactive.

i. The consensus sequence is shown at the bottom of the window. You can choose to show sequence logos by right-clicking on consensus and then selecting "Show logo".

ii. You can manually adjust the alignment and display using various gray menu options.

iii. Scroll right up to the region of 185-235. Several amino acid substitutions, including G195V and Q235L/I are observed in all recent H7N9 isolates, but are absent from the older H7 proteins.

iv. We can change the View Option to “Conserved vs. reference” such that only the first sequence shows full characters, for the remaining sequences, only the nucleotides/residues differing from the reference sequence are shown as full characters.

v. You can download the input sequences or alignment in various formats, or save the alignment to your Workbench. Click “Save Analysis”, give it a name, and click “Save”.

V. Determine if the significant positions are located in Sequence Features

Sequence Features (SFs) are defined as interesting protein regions with known structural or functional properties. They are obtained from literature and other databases and validated by domain experts. Once a Sequence Feature region has been defined, the number of distinct amino acid sequences

Page 13: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 16  

observed in the sequence database are determined and each defined as a unique variant type. The reference strain is always Variant Type 1.

The Sequence Feature (SF) column in the meta-CATS table provides a convenient linkout to a list of all Sequence Features that contain that amino acid position.

a. Return to the meta-CATS report page. Click the “View SF” link for residue position 195. How many Sequence Features are mapped to this position?

b. Click “View” for SF4 to get to the Sequence Feature (SF) Details page.

i. This SF is determinants of receptor binding. It begins at residue 194 and is 5 amino acids long. What is the reference strain used to define the position coordinates of this SF? What is the position range on the reference strain?

ii. How many Variant Types does this SF have?

iii. All 2013 outbreak isolates have a Valine at position 195 and an Alanine at position 198. Now we are going to search for all strains harboring V195 and A198. Click “Find a VT”, fill all positions with wildcards and fill in V at 195 and A at 198. Then click “Search”.

iv. Did you find a VT harboring SVSTA in the 194-198 loop? Click strain count for the VT. Are they from the current outbreak?

2/10/14 Influenza Research Database - Sequence Feature Variant Types

www.fludb.org/brc/influenza_sequenceFeatureVariantTypes_search.do?method=SubmitForm&sourcePosition=195&virusType=3&virusASegmentsProtein=4 HA… 1/1

Loading  Influenza  Research  Database...

Cite  IRD Tutorials Glossary  of  Terms Report  a  Bug Request  Web  Training Contact  Us Release  Date:  Jan  24,  2014

This  project  is  funded  by  the  National  Institute  of  Allergy  and  Infectious  Diseases  (NIH  /  DHHS)  under  Contract  No.  HHSN266200400041C  and  is  a  collaboration  between  NorthropGrumman  Health  IT,  J.  Craig  Venter  Institute  ,  Vecna  Technologies,  SAGE  Analytica  and  Los  Alamos  National  Laboratory.

 Download    Save  Search  

 Download    Save  Search  

Your  search  returned  1  record. Search  Criteria Displaying  50  records  per  page  ,  sorted  by  Feature  Identifier  inascending  order.

Display  Settings

Sequence  Feature  Variant  Types

Your  Selected  Items:  0    items  selected

  Select  the  1  results

FeatureIdentifier Sequence  Feature  Name

SequenceFeatureLength

VariantTypes Category Amino  Acid

Position Evidence Comments

InfluenzaA_H7_SF4

InfluenzaA_H7_determinants-­of-­receptor-­binding_194(5)

5 35 functional 194(178  HA1)-­198 PubMed:22345462 Atypical  European  viruses  showG186V  whereas  the  N.  Americanstrains  show  G186A  or  G186E.

Your  Selected  Items:  0    items  selected

Top*  Please  mouse-­over  the  asterisk  to  see  additional  information

Custom  Amino  Acid  Position Amino  Acid  Position  in  SF  Reference  Strain

195 195

It  is  possible  that  the  numbered  positions  from  the  custom  alignment  and  the  numbered  positions  in  the  Sequence  Featuare  Reference  Strain  are  different  -­  even  though  theyrefere  to  the  equivalent  sequence  posiiton.  Below  is  a  table  showing  how  the  custom  position  relates  to  the  homologous  position  in  the  reference  strain.  

Home    Sequence  Feature

SEARCH  DATA ANALYZE  &  VISUALIZE WORKBENCH SUBMIT  DATA

About  Us Community Announcements Links Resources Support Sign  Out

You  are  logged  in  as  [email protected]

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jul 3, 2013

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

SEQUENCE FEATURE DEFINITION

Protein Name HASequence Feature Name Influenza A_H7_determinants-of-receptor-binding_194(5)Sequence Feature ID Influenza A_H7_SF4VT-1 Strain (reference strain) A/turkey/Italy/220158/2002(H7N3)Reference Sequence Accession AY586409Reference Position 194(178 HA1)-198

SOURCE STRAIN(S)

Source Strain VTNumber

SourcePosition

SourceAccession

3D ProteinStructure Publication Evidence

Codes Comment

A/duck/HONGKONG/293

/1978(H7N2)

VT-1 185 -189 U20461 -N/A- PubMed:22345462 EXP Atypical European viruses show G186Vwhereas the N. American strains show

G186A or G186E.

Excel Download Find a VT(s)

Search

Reset

Fill wildcards

Strain Count

VARIANT TYPES

Phylogenetic tree view disabled because there are not enough variant types to generate the tree.

Edit specific positions in this VT-1 sequence with IUPAC symbols or use "?" as a wild-card. If necessary, use the horizontal scroll bar to access the entire SF.Click Search to find VT(s) conforming to the edited sequence. Click Reset to restore this panel to the default VT-1 sequence.

Enter Sequence Variation to Find

194

?

195

V

196

?

197

?

198

A

Variant Type388 VT-114 VT-11

Sequence Variation

194 195 196 197 198S G S T T

V A

Total Variations02

"?" : A question mark indicates the sequence is unknown."–" : Indicates a deletion (gap) relative to the VT-1 sequence."•" : Indicates the same letter as in the VT-1 sequence."[ ]" : Indicates an insertion.

Sequence Feature Details (SOP)For more information about using SFVTs, click here . For a detailed description of the development and application of the SFVT approach for the study of influenza virus, pleaseread this scientific article: Noronha JM, Liu M, Squires RB, Pickett BE, Hale BG, Air GM, Galloway SE, Takimoto T, Schmolke M, Hunt V, Klem E, García-Sastre A, McGee M,Scheuermann RH. (2012) Influenza Sequence Feature Variant Type (Flu-SFVT) analysis: evidence for a role of NS1 in influenza host range restriction. J Virol, 86: 5857-5866.doi: 10.1128/JVI.06901-11. PMID: 22398283

Home My Workbench Analysis... Sequence... Sequence Feature Details (Influenza A_H7_determinants-of-receptor-binding_194(5)) Sequence Feature Details (Influenza A_H7_d

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Page 14: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 17  

VI. Highlight variant positions and Sequence Features on protein structure

IRD imports experimentally-determined virus protein structures from the Protein Data Bank, integrates data from IEDB and UniProt and provides various visualization options. To investigate the structural implications of the sequence variants identified by the IRD meta-CATS analysis, we are going to highlight the positions on a related H7 protein structure.

a. From the grey navigation bar, mouse over “Search Data” and click “3D Protein Structures”.

b. Search for the 3D structures of influenza A HA subtype H7. Virus Type: ý A Subtype: H7 Select Proteins to search: ý 4 HA

c. The Search Results page displays a list of matching structures. We are going to examine the HA structure of H7N3 subtype, so click “View Structure” for 1TI8 to display the structure.

d. Now we are on the 3D protein structure viewer page. Click and drag with your mouse in display window to change the focus point.

i. In the “Display Options” section, you can change the Display Type to line, stick, space, primary structure, secondary structure, etc. We are going to select “Space”.

ii. You can overlay the structure with a sequence conservation heat map by selecting “Sequence conservation computed using all sequences” in the “Highlight Sequence Conservation” section.

iii. This structure is obtained from A/turkey/Italy/214845/02 and the position numbering in our meta-CATS analysis is the same as this strain. Now select Chain A and type in “195, 235” in the “Highlight by SWISS-PROT Position” section to highlight these binding determinants on the structure. G195V and Q235L could increase the binding of avian H5 and H7 viruses to human-type receptors (Yamada, 2006; Srinivasan, 2013). Alternatively, you can highlight the Sequence Feature of receptor binding determinants by checking “Influenza A_H7_determinants-of-receptor-binding_194(5)” from the functional dropdown list in the “Highlight Sequence Features” section.

iv. Variant position 235 is also located within an experimentally determined epitope. Highlight this epitope on the structure by checking “Influenza A_H7_experimentally-determined-epitope_226(14)” from the epitope dropdown list in the “Highlight Sequence Features” section.

v. Click “Spin” to view the structure spinning. Then click “Rock” to rock the structure back and forth.

vi. The custom highlighted protein structure can be downloaded as an image by clicking “Save View As Image” beneath the image, or a 3D movie of either a spinning structure or a rocking structure by clicking “Generate Video”.

Page 15: Section B. Comparative Genomics Analysis of 2013 H7N9 ......4! Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License Section

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 18  

 

References

Noronha JM, et al. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction. J Virol. 2012 May;86(10):5857-66. doi: 10.1128/JVI.06901-11. PMID:22398283

Picket BE, et al. Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations Dependent on Differences in Viral Metadata. Virology, 2013, 447(1-2):45-51. doi: 10.1016/j.virol.2013.08.021. PMID:  24210098

Sorrel EM, et al. Minimal molecular constraints for respiratory droplet transmission of an avian-human H9N2 influenza A virus. Proc Natl Acad Sci U S A. 2009 May 5;106(18):7565-70. doi: 10.1073/pnas.0900877106. PMID:19380727

Srinivasan K, et al. Quantitative description of glycan-receptor binding of influenza a virus h7 hemagglutinin. PLoS One. 2013;8(2):e49597. doi: 10.1371/journal.pone.0049597. PMID:23437033

Yamada S, et al. Haemagglutinin mutations responsible for the binding of H5N1 influenza A viruses to human-type receptors. Nature. 2006 Nov 16;444(7117):378-82. PMID:17108965

Reset View Save View As Image

Type JMol command here. Run

Show JMol Console

Spin Rock

Generate mp4 movie (default=avi) Generate Video

Description: H7 HAEMAGGLUTININ

Strain Name: A/turkey/Italy/214845/02

PDB Link: 1TI8

JMOL COMMAND LINE:Advanced users can enter a JMol command directly using JMol Interactive Script .

SAVE/RESTORE STATE You can save the current state of the image (Display Options, Zoom, Highlight Epitopes,etc.) for later use. Click the Show JMol Console button below, click State, andcopy/paste/save the contents of the upper panel into any text document. When youreopen this PDB file in a later session, you can restore the image to the saved state.Click the Show JMol Console button, paste the text into the lower panel, and clickExecute.

GENERATE 3D MOVIEGenerate a mp4/avi movie, starting with the rendering in above window and spinningaround Y axis by 360 deg. It takes about 30 seconds to generate the movie.

Display Type: Space

Details

Show:

Turn off sequence conservation display

Clear

Highlight Clear

epitope Clear

DISPLAY OPTIONS:These options control the general appearance of the protein structure inthe viewer.

HIGHLIGHT SEQUENCE CONSERVATION Overlay the structure with a sequence conservation "heat map" (see SOPfor details). Blue represents conserved regions and red representsnon-conserved regions

HIGHLIGHT LIGANDS Highlight Ligands in

HIGHLIGHT EPITOPES Highlight epitopes on the structure in . First, select an epitope type

from the list. Then check epitopes to highlight. Flash epitope during rocking

HIGHLIGHT BY SWISS-PROT POSITION Highlight in in this structure corresponding to defined SwissProt

positions. With the dropdown, select a chain and view the positions presentin this file. Enter one or more comma-delimited positions (15,30) or a range(15-30) then click Highlight. If no chain is selected, these positions will behighlighted on all chains which include them.

Select Chain:

A - Q6GYW3 (21 - 334) 195,235

PDB Sequence/Structure Details

HIGHLIGHT SEQUENCE FEATURESHighlight sequence features on the structures in

Sequence Feature Feature Name SequenceFeaturePositions

Zoom: 100%

Spin

Rock Angle: 12 Speed: 10 Pause: 0.0

Influenza A_H7_SF3

InfluenzaA_H7_experimentally-determined-epitope_340(14)

340-353

Influenza A_H7_SF5

InfluenzaA_H7_experimentally-determined-

226-239

Home 3D Protein Structure Search Results Protein Structure Viewer (1TI8)

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Workbench Sign In

Influenza Research Database - Protein Structure http://www.fludb.org/brc/structureOperation.do?accession=1TI...

1 of 2 2/11/14 4:30 PM