17
Broad Institute – M. Henn Dengue Virus White Paper 3/15/2010 Page 1 of 21 Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease responses A white paper for Microbial Genome Sequencing submitted by: Matthew Henn 1 , Irene Bosch 2 , and Eva Harris 3 in collaboration with the Genomic Resources in Dengue Consortium (GRID) 1 Broad Institute Microbial Genome Sequencing & Analysis, 320 Charles Street, Cambridge, MA 02141 USA tel: 617.324.2341 mailto: [email protected] 2 University of Massachusetts Medical School, Center for Infectious Disease and Vaccine Research, Worcester, MA 01655 USA 3 University of California-Berkeley, School of Public Health, Berkeley, CA 94720 USA Tuesday, June 21, 2005

Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Embed Size (px)

Citation preview

Page 1: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 1 of 21

Comparative Genomics of Dengue Virus: genome population structure,

transmission, and understanding differential inflammatory disease

responses

A white paper for Microbial Genome Sequencing submitted by:

Matthew Henn1, Irene Bosch

2, and Eva Harris

3 in collaboration with the Genomic Resources in Dengue

Consortium (GRID)

1Broad Institute

Microbial Genome Sequencing & Analysis, 320 Charles Street, Cambridge, MA 02141 USA

tel: 617.324.2341 mailto: [email protected]

2University of Massachusetts Medical School, Center for Infectious Disease and Vaccine Research, Worcester, MA 01655 USA

3University of California-Berkeley, School of Public Health, Berkeley, CA 94720 USA

Tuesday, June 21, 2005

Page 2: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 2 of 21

Executive Summary

Dengue virus (DEN), a category-A pathogen, is a significant threat to public health world wide. The

virus is transmitted to humans by the mosquitoes Aedes aegypti and Ae. albopictus. The incidence of

dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS) are rapidly

increasing, and more than 2.5 billion people live in regions endemic for the disease. Presently

approximately 50-100 million cases of DF occur yearly with more than 500,000 resulting in severe and

potentially fatal forms of the disease (DHF & DSS). Several factors contribute to the threat posed by

dengue. Most significant are lack of cross-reactive immunity for the four DEN serotypes (DEN-1, DEN-

2, DEN-3, and DEN-4), hyperendemic circulation of the four different serotypes in the same geographical

area, frequent worldwide travel, high population density, and lack of effective mosquito control programs.

Other socioeconomic factors only amplify the challenge of dengue control.

Despite the threat of this virus to public health and its importance to research on viral hemorrhagic fevers

our understanding of the virulence, pathogenesis, and mechanisms of re-emergence of the dengue virus is

limited. As a consequence, presently no vaccine or anti-viral therapy exists for this flavivirus, and the

genetic underpinnings of disease outcomes are unknown. Studies of nucleotide divergence among he

different serotypes has largely been limited to a single gene. This lack of basic information of viral

diversity severely limits vaccine and anti-viral therapy development efforts. In addition, the presence of

immune pathology in secondary DEN infections indicates that previous non-sterile immune response to

one serotype can exacerbate the immune response to the secondary infections caused by another serotype.

This host-viral interaction also represents a considerable challenge for vaccine development.

We propose to sequence 3500 dengue genomes of distinct geographic origin and disease to build the

genomic infrastructure needed to study and combat this virus. We will define the population structure of

the virus at multiple scales (i.e. within host, local, regional, and continental) which will enable

determination of the impact of introduced strains versus indigenous evolution on disease outcomes.

Additionally, as this project develops, the association of genome sequences to clinical data as well as

human genetic information will provide the first map of genomic distributions with reference to DF,

DHF, and DSS and host immune pressure.

Such a genomic resource will be able to identify important genetic correlates of disease emergence,

virulence, and attenuation in addition to the stability of polymorphisms across the entire genome.

Further, these data will provide the information necessary to develop cost-effective strain diagnostics for

disease tracking and outbreak response.

Page 3: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 3 of 21

I) Dengue Virus a Burgeoning Public Health Threat Worldwide and in the USA

The recent increase in activity of dengue is cause for serious attention to this global vector-borne viral

disease of humans. NIAID has recognized the importance of this RNA virus (family Flaviviridae) and

considers dengue a Category-A organism. More than 2.5 billion people live in areas endemic for the

disease. 50-100 million new dengue infections are estimated to occur yearly worldwide [1, 2] with the

severe forms of the disease, dengue haemorrhagic fever (DHF) and dengue shock syndrome (DSS),

accruing in at least 500,000 of the cases [1, 3]. This is an increase from approximately 100,000 cases per

annum in the 1970s (Figure 1) [3]. Morbidity from DF is also of great public health concern. In severe

epidemics, approximately 5% of DHF cases lead to fatalities [4] and rates as high as 40% have been

reported in endemic regions with poor medical facilities [1]. The prevalence of dengue disease

worldwide is surging, particularly in the Americas, and 2001 epidemiological and media reports indicated

the highest levels of the virus ever recorded [5]. This re-emergence is evident in the United States

Commonwealth of Puerto Rico where all serotypes of the disease are co-circulating [6] and in Hawaii,

which in 2001-2002 experienced its first autochthonous dengue epidemic since 1944 [7]. In addition,

from 2001-2004 377 suspected cases of travel-associated dengue infections were reported in the United

States with CDC confirmed cases occurring in 22 states [8].

Dengue consists of four known serotypes, DEN-1, DEN-2, DEN-3, and DEN-4, that are no longer

geographically isolated (Figure 2). Increased travel between dengue-endemic regions, globalized

markets, increased population densities, and unplanned urbanization particularly in tropical regions have

significantly impacted the epidemiology of dengue infection. Both the virus and the mosquito vector

have spread across the mid-latitudes (Figure 2). Due to the above societal shifts, all of the endemic

regions are now hyperendemic, with multiple DEN viruses co-circulating. This in turn has had a

tremendous impact on the resurgence of DF, DHF, and DSS. The potential of evolutionary mechanisms

leading to new genotypes is likely enhanced by an increased genetic pool. Molecular studies using the

DEN E-protein indicate a dramatic increase in genetic diversity within serotypes at the onset of

widespread human transmission approximately 100 years ago (Figure 3) [9]. Also apparent is the recent

appearance of DEN-1 and DEN-3 (Figure 3) [9]. How such diversity impacts virulence and greater

inflammatory responses remains unanswered. The presence of immune pathology in secondary dengue

infections indicates that previous non-sterile immune response to one serotype could exacerbate the innate

immune response to the secondary infections caused by a different serotype than the first (i.e. Immune

Enhancement Hypothesis). This intricate interaction between the virus and the host impacts disease

outcomes and poses a significant challenge for vaccine development. Tetravalent vaccines with sterile

immunity for all the four types of the virus are required.

Introduction of dengue into the United States mainland is of particular concern. The mosquito Aedes

aegypti is the dominant vector of dengue in the Americas, while Ae. albopictus is a major vector in Asia.

Significantly, the recent epidemic in Hawaii implicated Aedes albopictus as the vector [7], and this

mosquito is now found in at least 24 states on the US mainland [10, 11]. Dengue virus is unique from

related flaviviruses in that it has adapted to humans and can be maintained in an Aedes-human-Aedes

cycle without input from an enzootic cycle [1]. This urbanization of virus transmission increases its

ability to spread beyond its endemic regions as it is not dependent on the forest enzootic life-stage. In

2001, travel between the US mainland and dengue-endemic regions was estimated to be 14 million

passengers [7]. Additionally, the deliberate dispersal of mosquitoes infected with different DEN

serotypes into urban centers could pose a significant threat to public health in multiple regions of the US

where Ae. aegypti and Ae. albopictus are found.

Page 4: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 4 of 21

Figure 1. Reported Cases of Dengue Hemorrhagic Fever from 1955-

1998. Source: World Health Organization

Figure 3. Maximum likelihood phylogeny of four dengue serotypes showing

diversity within serotypes and date (A.D.) of most recent common ancestor

(MRCA). Figure is from Twiddy et al. [9].

Figure 2. Worldwide distribution of Aedes aegypti, dengue virus serotypes, and dengue epidemic activity.

Figure is modified from Gubler [3] Figure 2.

Page 5: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 5 of 21

II) Research Objectives and Goals

Remarkably, for a virus that is considered a Category-A pathogen and that infects ~100 million people

each year, there are only 96 whole genome sequences (10.5 kb) deposited, with the distribution of these

sequences being highly skewed among the serotypes (DEN-1=19, DEN-2=47, DEN-3=26, DEN-4=4).

There is an urgent need for an expansive and collaborative research program that relates viral genomic

sequences with well-defined clinical and epidemiological data as well as genetic information about the

human host. Under this proposal, we will generate the genomic sequence and databases needed to

overcome many present limitations to the dengue research community. Additionally, as a component of

this proposal the Broad Institute, in close collaboration with the greater infectious disease research

community, will develop the necessary framework and infrastructure capable of enabling high-throughput

viral-host interaction studies.

a) Comparative Genomics to Understand Evolution and Transmission of the Virus During an

Epidemic

Using comparative genomics methods we will determine the contributions of mutation, recombination,

genetic drift, gene flow and the natural selection of advantageous mutations to the genetic structure of

dengue virus populations in at least four epidemiologically well-defined settings (Puerto Rico, Venezuela,

Nicaragua, and Vietnam). The immediate availability and density of samples at these sites will enable us

to determine how genetic diversity is changing through time. This will allow us to resolve whether viral

evolution occurs indigenously, or whether new strains are imported from elsewhere, perhaps triggering

major epidemics. Ultimately data from these sites will provide a means to more accurately model how

the virus could respond to the introduction of a vaccine program. The use of viral samples collected

throughout particular epidemics will allow us to test the hypothesis that viruses are selected during the

course of epidemics and are correlated with increased severity [12-14]. This connection will be

investigated by association with the clinical outcome of specific cases as well as with the ratio of severe

to total dengue cases in the population over time (particularly when the epidemic is caused by

predominantly one serotype). Furthermore, by examining the phylogenetic relationships between the viral

sequences sampled (i.e. using coalescent methods) we will estimate the relative population growth rate of

dengue viruses in each geographic region, as well as whether this rate is changing through time and

whether it differs by serotype.

It is important to understand the pattern of genetic change that has occurred in endemic regions. Previous

studies have shown that changes in the DEN population genetic structure are associated in part with strain

introductions from both geographically and socio-economically related populations [15-20]. For

example, throughout the Americas, DEN-2 Asian–American subtype IIIb, linked to DHF/DSS cases,

appears to have displaced the American subtype V, which has only been associated with DF [18-23].

Thus, the first epidemic of DHF/DSS in the Americas in 1981 was attributed to a DEN-2strain imported

from Southeast Asia, where the severe form of the disease had been endemic [19]. Although the onset of

DHF/DSS in the Americas in 1981 was associated with the introduction of a novel strain of DEN-2,

changes in dengue severity have also been associated with hyperendemic transmission patterns (the co-

occurrence of multiple dengue serotypes in the same locality) [24]. There is strong evidence linking

disease severity with secondary heterologous infection [25]. In endogenous regions, secondary infections

have become increasingly more frequent over periods of time in which the virus has mutated.

Complete genome sequences are central to advancing the understanding of DEN population structure and

transmission, yet limited studies have taken into consideration the entire viral genome in such studies

(Appendix A). The DEN population genomic map generated via this proposal will allow the dengue

research community to better understand the population dynamics of the dengue genome during disease

outbreaks and the genetic diversity underlying severe forms of the disease. Given the relative lack of

Page 6: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 6 of 21

knowledge of the genetic mechanisms of pathogenicity and virulence, full genome sequences will provide

the means to identify important genetic motifs beyond the envelope protein and how the evolution of

these elements is associated with disease re-emergence.

b) Comparative Genomics to Develop Strain Diagnostics and Resolve the Contribution of Viral

Determinants to Disease Severity

Some data suggest that hyperendemicity may not be the only factor contributing to the increased disease

severity and that strain virulence may play an important role [26-30]. Some genotypes, or indeed

serotypes, of dengue virus are likely unequivocally more virulent than others. Understanding if and how

viral factors contribute to disease pathogenesis could be very useful in the rational design and targeted

deployment of new anti-virals or prophylactic dengue vaccines. It would also provide valuable

information as to whether the spread of particular strains should be monitored with more attention.

Sequence variation in dengue viruses is probably related to virulence [31]. As mentioned above,

epidemiological studies have suggested that the DEN-2 Asian-American genotype, but not the American

genotype, are capable of causing DHF in secondary infections [19, 32]. These observations led to the

hypothesis that DEN-2 Asian-American genotype viruses were more virulent than DEN-2 American

genotype viruses. Similar observations have been reported with respect to genetically distinct variants of

DEN-3 in Sri Lanka, where strains isolated before 1989 were never associated with severe dengue while

those circulating after 1989 were correlated with the emergence of DHF in Sri Lanka [33]. DEN-4

generally leads to less severe disease than other serotypes, particularly DEN-2 and DEN-3 [34].

Dengue virus enters the body via a mosquito bite and is thought to replicate within a number of

immunological cells including: macrophages, monocytes, B cells, mast, dendritic, and endothelial cells

[35]. A dominant means of viral entry into host cells is thought to occur through the enchanced binding

of viral proteins to Fc! host-cell receptors. Besides the Fc! receptor that is implicated in Antibody

Dependent Enhancement (ADE), knowledge of other DEN receptors is limited. Recently genetic

determinants in the E gene and 5’ and 3’ untranslated regions of the DEN genome were associated with

dengue virulence [25], and it is known that dengue virus reactive T-cells vary in their ability to recognize

different serotypes [25]. Additionally, entry and replication of dengue in dendritic cells can be blocked

by antibodies to the carbohydrate recognition domain of DC-SIGN, a receptor that is utilized with

different efficiencies by different DEN serotypes [36]. All of these findings indicate a multiplicity of

genetic determinants of host infection in addition to immunity responses, and suggests specific host-viral

associations that are dependent of viral genome changes.

What is urgently needed to understand the viral determinants of disease severity is a genome-wide

strategy that identifies polymorphic loci that are associated with distinct infection outcomes (e.g., DF

versus DHF). Presently there is a lack of sufficient viral genomic data with wich to link disease outcome

with viral polymorphisms. Genomic sequence generated under this proposal will directly address this

need. For example, recent studies have shown that the nonstructural regions of dengue virus are involved

in counteracting the antiviral response of the host cell [37, 38]. An appropriately structured genome

database could resolve polymorphisms at these loci and associate this information with the virus’ ability

to block the antiviral response of the host. The ensuing genome sequence reference database we generate

will be invaluable for disease diagnostics and dengue monitoring. A long-term goal of this proposal and

of GRID is the development of high-throughput viral resequencing arrays. As an example, the ability to

quickly type strains using population-specific molecular markers could be applied to track and ultimately

Page 7: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 7 of 21

aid the determination of both the potential threat and the appropriate response to outbreaks or the

malicious release of this Category-A biological agent.

c) Comparative Genomics to Understand In Vivo Diversity and Host–Viral Interaction

RNA viruses exhibit a high degree of sequence variation not just between isolates as addressed above, but

also among viruses within an individual. This in part is due to the viral RNA polymerase and its inability

to proofread. As a result, viral infections within an individual exist as a population of closely related

sequences that are called quasispecies [39-44]. The role of quasispecies in disease pathogenesis,

transmission, and viral evolution is suspected to possibly be important for RNA viruses. Recent data on

poliovirus indicate that intra-host genetic diversity is important for pathogenesis and tropism in

mammalian hosts (Raul Andino, personal communication). A study using the dengue envelope (E) gene

showed that genome-defective viruses exist in vivo [43]. Sequence generated under this proposal will

include a pilot initiative to characterize the extent of viral diveristy within a host. Of particular interest

are the comparisons between individuals with different disease outcomes and multiple time points within

an individual infection (e.g., first acute sample and sample at defervescence). These data will help to

define within-host DEN diversity and allow an assessment of the capacity of present isolation techniques

from serum to capture actual viral diversity. Generation of this data will assist in project design and

sequencing strategy for this project and future viral initiatives.

Dengue is an example of a complex disease; host genetic, viral and environmental factors likely

contribute to infection outcome. Understanding the contribution of host genetics to dengue disease

susceptibility/resistance is made more difficult by the absence of a robust animal model in which to

identify candidate susceptibility/resistance genes. Notwithstanding these limitations, host genetic

variability at the HLA class I loci, Fc! receptor, the vitamin D receptor, and the promoter region of

CD209 have all been associated with distinct symptomatic phenotypes [45-49].

There is an emerging theme in viral research that host immune pressures play a significant role in disease

outcomes and in viral evolution both within individuals (i.e. quasispecies) and at the geographic scale.

For example, research on HIV has revealed that mutations within HIV populations are associated with

with expression of specific HLA class I alleles [50]. In the case of dengue, secondary exposures to the

virus induce a memory response in immunologically primed individuals that can either clear the infection

or contribute to the virus’ pathology (Immune Enhancement Hypothesis, IEH); the genetic underpinnings

of this are unknown, but specific HLA class I alleles have been associated with the different outcomes

[49, 51]. Combined host-viral genomic studies are beginning to reveal that in the setting of highly

polymorphic pathogens (e.g. DEN), immune pressures eventually drive some escape mutations to

fixation. This results in some regions of the virus no longer being immunogenic to individuals expressing

particular HLA molecules [50]. For HIV, data are beginning to reveal population-specific

polymorphisms in HIV, suggesting the loss of some immunogenic regions of the virus in one population

expressing high levels of a particular HLA allele, while this region remains intact in another population

expressing only low levels of the allele (Todd Allen 2005, personal communication). Generation of an

association database that can identify a protective link between host HLA-type or other candidate genes

and dengue viral genomes will facilitate vaccine development for this highly variable pathogen (Figure

3), and aid identifying high risk groups. Such a resource could enable a more focused selection of

specific antigens for inclusion in vaccines aimed at engendering a particular immune response and

avoidance of an immune enhancement response.

While our current efforts are focused on understanding the interaction between human host and dengue,

future efforts are anticipated with the mosquito vector as well to provide a fully integrated genomic view

of the virus’ life-history.

Page 8: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 8 of 21

d) Comparative Genomics Enhanced Vaccine Development and Antiviral Therapy

The development of successful vaccines and antiviral therapies against dengue is dependent on a

comprehensive understanding of the genetic variability across all serotypes and genotypes of the entire

viral genome. The antigenic variation of pathogens has implications in the immune response of the host

through mechanisms of immune evasion, immune attenuation and host mimicry. Presently only

sequence information for the envelope glycoprotein (E) gene exist in substantial number for use in

vaccine attenuation studies. Attenuated strains of all four DEN serotypes exist but a lack of animal

models and in vitro markers of attenuation slow further development of these vaccine candidates [52].

This limitation could be overcome by that ability to search whole genome data for loci polymorphisms

and regions susceptible to attenuation by single base pair or amino acid changes. Additionally, the

association of genomic diversity with virulence and disease outcome will permit the use of structural

genomics to understand conformation changes to disease specific proteins.

III) Strain and Site Selection

Three geographic areas were selected for this proposal -the Caribbean, the Americas, and Southeast Asia-

from which four focus collections/areas were selected: Puerto Rico, Venezuela, Nicaragua, and Vietnam.

These sites were selected as they are existing collections with on going sample collection that can meet

the criteria of our scientific goals. All four serotypes of the virus are circulating at each of the geographic

locations. These sites provide a continuum of sample resolutions that can assess viral population

structure at highly refined scales (i.e., at the resolution of counties in Puerto Rico, Figure 4), country

scales (i.e., Nicaragua and Venezuela), and global scales (i.e., all sites). The Puerto Rico surveillance

system represents a unique model to conduct phylogenetic analysis of dengue virus over an extended

period of time with documented epidemiological and clinical data in a region with very few reported re-

introductions (Appendix C). In addition, the CDC collections in Puerto Rico and the Americas are the

only known collections with a municipality sampling resolution (Figure 4). In contrast, to Puerto Rico

the other sites represent scenarios in which re-introductions have occurred, but also like Puerto Rico have

well documented epidemioligcal and clinical data across multiple epidemics and from within individual

outbreaks of the disease. The Venezuela, Nicaragua, and Vietnam sites also represent case studies with

collections of both RNA and serum samples for quasispecies analysis, as well as existing initiatives in

host-viral studies.

A significant effort was made by the Broad and GRID to identify sample collections and ongoing dengue

sampling efforts that could adequately address our defined scientific goals and generate required cDNAs

in a timely manner. A list was generated of existing sample collections (Appendix B) and each collection

was assessed for resolution of sampling, collection size, collection content, existence of appropriate

clinical data, and compatibility with sampling efforts currently ongoing. From this list, our sampling sites

were selected. One or more members of GRID has access to and have agreed to supply required viral

cDNAs and/or host DNAs for each region. Our selections ensure that all serotypes and known genotypes

are included in the sequencing effort. The selection of these sites will result in a highly structured,

comprehensive DEN genome database

Figure 4. Distribution of DEN-2 and DEN-3 in

Peurto Rico showing resolution of sampling at

the municipality level. The blue shadowing

Page 9: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 9 of 21

While the initial focus of this project is on the collection sites noted, GRID is aware of a growing interest

in this project from other key geographic regions and dengue researchers. We anticipate the potential

inclusion of samples from other geographic regions and countries within selected focus areas. Such a

decision will be made based on how significant genomic viral diversity is within our proposed sampling

sites and the scientific value of broader geographic coverage versus deeper within site strain coverage

based on these findings. Approval of this project is anticipated to facilitate removal of current limitations

on sample access providing the necessary seed to justify additional funding to other granting agencies.

IV) DNA Sequencing, Assembly, and Annotation

• Viral Sequencing Strategy (dengue genome size = 10.7kb)

We anticipate generating 3500 genome sequences derived from a combination of low-passage viral

isolates and serum sample amplifications. The viral sequencing in this proposal will support two distinct

sequencing objectives. First, we will determine the entire sequence of individual viral genomes using

RNA from low-passage isolates as a means to capture viral diversity at the geographic scale and to relate

strain genome structure to disease outcomes. Second, we will determine the consensus genome sequence

of viral populations within an individual host (i.e. quasispecies). These “mixed” genome sequences

represent a sample of the diversity within a host and are required for host-pathogen association studies.

Viral Isolates. Low-passage isolates will yield sequence for individual full-length genomes suitable for

geographic diversity and disease associated studies. These samples are limited by their potential

misrepresentation of the frequency of individual sequence polymorphisms within the host’s

quasispecies complex. Therefore these samples are not suitable for our host-viral interaction

studies.

Serum Samples. Serum sample amplifications represent bulk viral RNA from mixed populations of

genotypes within a host (i.e. quasispecies). To describe this diversity we will employ two different

methods that can both ultimately generate a consensus genome sequence that represents the mixed

population. For these samples we will generate a genome consensus sequence using either a PCR-based

or clone-based approach. These two methods have different sensitivities. The clone-based method will

provide enhanced sensitivity for quasispecies analysis as sequence from strains at low dominance within

the host will be captured. The consensus sequences generated are suitable to accurately identify host-

associated and disease related polymorphisms. Employing the two different sequencing approaches (i.e.

PCR versus clone) to a subset of samples will allow us to determine how well PCR-based consensus

sequence generation represents actual viral diversity within the host.

Strategy. As identification of nucleotide changes are critical to the analysis, high quality sequence is

required to identify real differences rather than sequencing errors. This will be achieved through the

sequencing of overlapping PCR fragments.

PCR-based sequencing: Our sequencing approach is currently based on the utilization of a set of primer

pairs overlapping the entire DEN genome. cDNA will be produced from RNA extracted from low

passage viral isolates or serum samples. RT-PCR from viral isolate samples will be performed using a

standardized high fidelity PCR protocol at each of the four main focus regions and result in

approximately 5-8 overlapping amplicons. These PCR products will be provided to the Broad Institute

MSC for high throughput sequencing. RT-PCR amplification and sequencing in the forward and reverse

Page 10: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 10 of 21

direction of 500bp amplicons generated from the original 5-8 overlapping amplicons will result in an

effective coverage of 8X.

Clone-based sequencing: This approach will also utilize a set of overlapping primer pairs for the entire

DEN genome. Approximately 30 clones will be picked for each PCR fragment (i.e. 30X coverage) and

then sequenced. Genome data will be assembled at the Broad Institute. Genome assembly is an active area of research at

each MSC and improvements in the algorithms are implemented on an ongoing basis. Annotation teams

at the Broad will use all available evidence and follow standard protocols at the Broad MSC for genome

annotation. The Broad will work in collaboration with members of GRID and the greater dengue research

community on genome analysis and the identification of sequence polymorphisms.

• Host Genetic Typing

While the primary focus of this proposal is on the viral genome sequencing, the importance of

understanding the interaction between host and virus is evident. Sequencing during the initial stages of

this project will be directed at the viral genome. As this project develops, we anticipate the typing of

1500 patients from which we have associated viral genome sequences and clinical data. To achieve this

goal and maximum impact of the genetic data on finding tangible disease solutions, the Broad is working

in close collaboration with members of GRID and current collaborators from other related projects (i.e.

HCV and HIV) to develop the appropriate technology and framework to successfully implement such a

program. Presently no high throughput method exists for host genetic typing of HLA and disease loci.

The Broad recognizes the significance of this need and is presently developing a strategy capable of

efficiently and effectively sequencing the numerous variants of the human HLA loci as well as other

relevant disease alleles. As needed, existing collaborations can provide the ability to type host loci.

• Future sequencing goals

The Broad in collaboration with members of GRID is presently exploring the potential of viral

resequencing arrays to provide the dengue research community a means for high throughput viral strain

diagnostics. Resequencing RNA genomes poses an interesting additional technological challenge and

opportunity. First, can RNA be directly hybridized to the DNA arrays or is cDNA required? The

standard gene expression protocol for Affymetrix microarrays calls for a cRNA-DNA hybridization, so

directly resequencing RNA is conceivable, but unproven. Second, the size of typical RNA viral genomes

compared to the high probe density of current microarrays raises the question whether complex variations

besides single nucleotide polymorphisms can be detected with additional probes. For example, all

possible single- and double- base deletions and insertions for the Dengue genome could be represented on

current microarrays. This novel application of Variation Detection Arrays could break new ground in

microarray-based resequencing, but demands new algorithm development.

V) Genome Resources in Dengue Consortium (GRID)

The process for producing this white paper was carried out in a highly collaborative fashion. Over the

past several months, many scientists from the dengue research community with a vital interest in these

data have worked in partnership with the Broad Institute to establish common scientific goals and define

the appropriate genomic infrastructure needed to support them. GRID is the result of this process, and

consists of:

Page 11: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 11 of 21

1)Angel Balmaseda, Department of Virology, National Diagnostics and Reference Center, Ministry of

Health, Managua, Nicaragua

2) Bruce Birren, Broad Institute, Cambridge, MA

3) Irene Bosch, University of. Massachusetts Medical School, Center for Infectious Disease and Vaccine

Research, Worcester, MA

4) Eva Harris, Division of Infectious Diseases, School of Public Health, University of California -

Berkeley, , Berkeley, CA

5) Matthew Henn, Broad Institute, Cambridge, MA

6) Rebeca Rico-Hesse, Dept. Virology & Immunology, Southwest Foundation for Biomedical Research,

San Antonio, TX

7) Jeremy Farrar, Oxford University Clinical Research Unit, Hospital for Tropical Disease, HCMC,

Vietnam

8) Jorge Munoz-Jordán, Centers for Disease Control and Prevention, Division of Vector-Borne Infectious

Disease, San Juan, PR

9) David Kulp, University of Massachusetts, Dept. of Computer Science, Amherst, MA

10) Cameron Simmons, Oxford University Clinical Research Unit, Hospital for Tropical Disease, HCMC,

Vietnam

11) Steve Whitehead, Laboratory of Infectious Disease, NIAID Vaccine Branch, Bethesda, MD

12) Priscilla Yang, Harvard Medical School, Dept. of Microbiology, Boston, MA

GRID sought input and attained consensus concerning sequencing targets through many meetings at the

Broad with various participants, several conference calls with most members of GRID, and numerous e-

mail and telephone exchanges. The Broad has made a significant investment over the past six months in

developing and coordinating this project. A critical component of this effort has been the sharing of

information and data pertaining to goal development and sequencing target selection. In addition, great

weight was given to practical matters in the structure of both the sequencing and analysis components of

the proposed work. The group has worked together to standardize protocols for viral isolation and cDNA

production between the various project sites. Importantly, many consortium members have active

collaborations in our designated focus areas as well as have secure access to the viral collections and

sampling clinics necessary for the acquisition of genomic cDNA for all anticipated sequencing targets.

VI) Management of the Dengue Genome Project by the Research Community

1) Broad Institute (Matthew Henn) 20%

2) Irene Bosch 5%

3) Eva Harris 5%

Both Dr. Bosch and Dr. Harris have been key contacts for the Broad Institute during the development of

this white paper. They both maintain extensive dengue research programs at US universities, and have

well-established collaborations with dengue clinics in the Americas and Asia.

VII) Data Release

In accordance with the NIAID’s principles regarding data release, we will publicly release all data

generated under this contract as rapidly as possible. As required by our contract, NIAID will be provided

with a 21-45 calendar-day period to review and comment upon all data prior to its public release.

Page 12: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 12 of 21

Chromatogram Files: Unless otherwise directed by NIAID, we will submit all sequences and trace files

(chromatograms) generated under this proposal to the Trace Archive at NCBI on a no less than weekly

basis. These data will also include information on templates, vectors, and quality values for each

sequence.

Genome Assemblies: Genome assemblies will be made available via GenBank and the MSCs’ websites,

after internal and community validation. Assuming no significant errors are detected during the

validation process, assemblies will be released within 45 calendar days of being generated.

Genome Annotation: Automated annotation data will be made available via GenBank and our web sites

after internal and community validation. Assuming no significant errors are detected during the

validation process, annotation data will be released within 45 calendar days of being generated.

Page 13: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 13 of 21

VIII) References

1. Gubler, D.J., The changing epidemiology of yellow fever and dengue, 1900 to 2003: full

circle? Comp Immunol Microbiol Infect Dis, 2004. 27(5): p. 319-30.

2. World Helath Organization, Stengthening implementation of the global strategy for

DF/DHF prevention and control. 1999. p. http://www.who.int/vaccines-

disease/research/virus2.shtml.

3. Gubler, D.J., Epidemic dengue/dengue hemorrhagic fever as a public health, social and

economic problem in the 21st century. Trends Microbiol, 2002. 10(2): p. 100-3.

4. World Helath Organization, Vaccines, immunization, and biologicals: dengue and

Japanese encephalitis vaccines. 2001. p. http://www.who.int/vaccines-

diseases/research/virus2.htm.

5. Halstead, S.B., Dengue. Current Opinion In Infectious Diseases, 2002. 15(5): p. 471-476.

6. Rigau-Perez, J.G., et al., The reappearance of dengue-3 and a subsequent dengue-4 and

dengue-1 epidemic in Puerto Rico in 1998. Am J Trop Med Hyg, 2002. 67(4): p. 355-62.

7. Effler, P., et al., Dengue Fever, Hawaii, 2001-2002. Emerging Infectious Disease, 2005.

11(5): p. 742-749.

8. Beatty, M.E., et al., Travel-Associated Dengue Infections - United States, 2001–2004.

Morbidity and Mortality Weekly Report, 2005. 54(22): p. 555-558.

9. Twiddy, S.S., E.C. Holmes, and A. Rambaut, Inferring the rate and time-scale of dengue

virus evolution. Molecular Biology And Evolution, 2003. 20(1): p. 122-129.

10. Moore, C.G., Aedes albopictus in the United States: Current status and prospects for

further spread. Journal Of The American Mosquito Control Association, 1999. 15(2): p.

221-227.

11. Moore, C.G. and C.J. Mitchell, Aedes albopictus in the United States: Ten-year presence

and public health implications. Emerging Infectious Diseases, 1997. 3(3): p. 329-334.

12. Kouri, G.P., M.G. Guzman, and J.R. Bravo, Why dengue haemorrhagic fever in Cuba? 2.

An integral analysis. Trans R Soc Trop Med Hyg, 1987. 81(5): p. 821-3.

13. Rodriguez-Roche, R., et al., Virus evolution during a severe dengue epidemic in Cuba,

1997. Virology, 2005. 334(2): p. 154-9.

14. Guzman, M.G., G. Kouri, and S.B. Halstead, Do escape mutants explain rapid increases

in dengue case-fatality rates within epidemics? Lancet, 2000. 355(9218): p. 1902-3.

15. Fong, M.Y., C.L. Koh, and S.K. Lam, Molecular epidemiology of Malaysian dengue 2

viruses isolated over twenty-five years (1968-1993). Res Virol, 1998. 149(6): p. 457-64.

16. Foster, J.E., et al., Molecular evolution and phylogeny of dengue type 4 virus in the

Caribbean. Virology, 2003. 306(1): p. 126-34.

17. Lewis, J.A., et al., Phylogenetic relationships of dengue-2 viruses. Virology, 1993.

197(1): p. 216-24.

18. Nogueira, R.M., M.P. Miagostovich, and H.G. Schatzmayr, Molecular epidemiology of

dengue viruses in Brazil. Cad Saude Publica, 2000. 16(1): p. 205-11.

19. Rico-Hesse, R., et al., Origins of dengue type 2 viruses associated with increased

pathogenicity in the Americas. Virology, 1997. 230(2): p. 244-51.

20. Uzcategui, N.Y., et al., Molecular epidemiology of dengue type 2 virus in Venezuela:

evidence for in situ virus evolution and recombination. J Gen Virol, 2001. 82(Pt 12): p.

2945-53.

21. Halstead, S.B., et al., Haiti: absence of dengue hemorrhagic fever despite hyperendemic

dengue virus transmission. Am J Trop Med Hyg, 2001. 65(3): p. 180-3.

Page 14: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 14 of 21

22. Ruiz, B.H., et al., Phylogenetic comparison of the DEN-2 Mexican isolate with other

flaviviruses. Intervirology, 2000. 43(1): p. 48-54.

23. Tolou, H., et al., Complete genomic sequence of a dengue type 2 virus from the French

West Indies. Biochem Biophys Res Commun, 2000. 277(1): p. 89-92.

24. Gubler, D.J., The global pandemic of dengue/dengue haemorrhagic fever: current status

and prospects for the future. Ann Acad Med Singapore, 1998. 27(2): p. 227-34.

25. Rothman, A.L., Immunology and immunopathogenesis of dengue disease. Adv Virus Res,

2003. 60: p. 397-419.

26. Cologna, R. and R. Rico-Hesse, American genotype structures decrease dengue virus

output from human monocytes and dendritic cells. J Virol, 2003. 77(7): p. 3929-38.

27. Endy, T.P., et al., Spatial and temporal circulation of dengue virus serotypes: a

prospective study of primary school children in Kamphaeng Phet, Thailand. Am J

Epidemiol, 2002. 156(1): p. 52-9.

28. Rico-Hesse, R., Molecular evolution and distribution of dengue viruses type 1 and 2 in

nature. Virology, 1990. 174(2): p. 479-93.

29. Rico-Hesse, R., Microevolution and virulence of dengue viruses. Adv Virus Res, 2003.

59: p. 315-41.

30. Vaughn, D.W., Invited commentary: Dengue lessons from Cuba. Am J Epidemiol, 2000.

152(9): p. 800-3.

31. Leitmeyer, K.C., et al., Dengue virus structural differences that correlate with

pathogenesis. J Virol, 1999. 73(6): p. 4738-47.

32. Watts, D.M., et al., Failure of secondary infection with American genotype dengue 2 to

cause dengue haemorrhagic fever. Lancet, 1999. 354(9188): p. 1431-4.

33. Messer, W.B., et al., Emergence and global spread of a dengue serotype 3, subtype III

virus. Emerg Infect Dis, 2003. 9(7): p. 800-9.

34. Nisalak, A., et al., Serotype-specific dengue virus circulation and dengue disease in

Bangkok, Thailand from 1973 to 1999. Am J Trop Med Hyg, 2003. 68(2): p. 191-202.

35. Malavige, G.N., et al., Dengue viral infections. Postgraduate Medical Journal, 2004.

80(948): p. 588-601.

36. Altmeyer, R., Virus attachment and entry offer numerous targets for antiviral therapy.

Curr Pharm Des, 2004. 10(30): p. 3701-12.

37. Liu, W.J., et al., Inhibition of interferon signaling by the New York 99 strain and Kunjin

subtype of West Nile virus involves blockage of STAT1 and STAT2 activation by

nonstructural proteins. J Virol, 2005. 79(3): p. 1934-42.

38. Munoz-Jordan, J.L., et al., Inhibition of interferon signaling by dengue virus. Proc Natl

Acad Sci U S A, 2003. 100(24): p. 14333-8.

39. Holland, J.J., J.C. De La Torre, and D.A. Steinhauer, RNA virus populations as

quasispecies. Curr Top Microbiol Immunol, 1992. 176: p. 1-20.

40. Lin, S.R., et al., Study of sequence variation of dengue type 3 virus in naturally infected

mosquitoes and human hosts: implications for transmission and evolution. J Virol, 2004.

78(22): p. 12717-21.

41. Lu, M., et al., Analysis of hepatitis C virus quasispecies populations by temperature

gradient gel electrophoresis. J Gen Virol, 1995. 76 (Pt 4): p. 881-7.

42. Martell, M., et al., Hepatitis C virus (HCV) circulates as a population of different but

closely related genomes: quasispecies nature of HCV genome distribution. J Virol, 1992.

66(5): p. 3225-9.

Page 15: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 15 of 21

43. Wang, W.K., et al., Dengue type 3 virus in plasma is a population of closely related

genomes: quasispecies. J Virol, 2002. 76(9): p. 4662-5.

44. Zhu, T., et al., Genotypic and phenotypic characterization of HIV-1 patients with primary

infection. Science, 1993. 261(5125): p. 1179-81.

45. Loke, H., et al., Strong HLA class I--restricted T cell responses in dengue hemorrhagic

fever: a double-edged sword? J Infect Dis, 2001. 184(11): p. 1369-73.

46. Loke, H., et al., Susceptibility to dengue hemorrhagic fever in vietnam: evidence of an

association with variation in the vitamin d receptor and Fc gamma receptor IIa genes.

Am J Trop Med Hyg, 2002. 67(1): p. 102-6.

47. Stephens, H.A., et al., HLA-A and -B allele associations with secondary dengue virus

infections correlate with disease severity and the infecting viral serotype in ethnic Thais.

Tissue Antigens, 2002. 60(4): p. 309-18.

48. Sakuntabhai, A., et al., A variant in the CD209 promoter is associated with severity of

dengue disease. Nat Genet, 2005. 37(5): p. 507-13.

49. Waganaar, J.F.P., A.T.A. Mairuhu, and E.C. van Gorp, Genetic influences on dengue

virus infections. WHO Dengue Bulletin, 2004. 28: p. 126-134.

50. Moore, C.B., et al., Evidence of HIV-1 adaptation to HLA-restricted immune responses at

a population level. Science, 2002. 296(5572): p. 1439-43.

51. Stephens, H.A., et al., HLA-A and -B allele associations with secondary dengue virus

infections correlate with disease severity and the infecting viral serotype in ethnic Thais.

Tissue Antigens, 2002. 60(4): p. 309-18.

52. Rothman, A.L., Dengue: defining protective versus pathologic immunity. J Clin Invest,

2004. 113(7): p. 946-51.

53. Klungthong, C., et al., The molecular epidemiology of dengue virus serotype 4 in

Bangkok, Thailand. Virology, 2004. 329(1): p. 168-79.

54. Bennett, S.N., et al., Selection-driven evolution of emergent dengue virus. Mol Biol Evol,

2003. 20(10): p. 1650-8.

55. Moncayo, A.C., et al., Dengue emergence and adaptation to peridomestic mosquitoes.

Emerg Infect Dis, 2004. 10(10): p. 1790-6.

56. Wang, E., et al., Evolutionary relationships of endemic/epidemic and sylvatic dengue

viruses. J Virol, 2000. 74(7): p. 3227-34.

57. Uzcategui, N.Y., et al., Molecular epidemiology of dengue virus type 3 in Venezuela. J

Gen Virol, 2003. 84(Pt 6): p. 1569-75.

58. Cologna, R., P.M. Armstrong, and R. Rico-Hesse, Selection for virulent dengue viruses

occurs in humans and mosquitoes. J Virol, 2005. 79(2): p. 853-9.

59. Laille, M. and C. Roche, Comparison of dengue-1 virus envelope glycoprotein gene

sequences from French Polynesia. Am J Trop Med Hyg, 2004. 71(4): p. 478-84.

60. Shurtleff, A.C., et al., Genetic variation in the 3' non-coding region of dengue viruses.

Virology, 2001. 281(1): p. 75-87.

61. Aviles, G., et al., Complete coding sequences of dengue-1 viruses from Paraguay and

Argentina. Virus Res, 2003. 98(1): p. 75-82.

62. Baleotti, F.G., M.L. Moreli, and L.T. Figueiredo, Brazilian Flavivirus phylogeny based

on NS5. Mem Inst Oswaldo Cruz, 2003. 98(3): p. 379-82.

Page 16: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 16 of 21

Appendix A

List of published evolutionary studies of dengue virus showing only two studies that used the complete

viral genome

Serotype

Location

Dengue serotype/No. of

clones sequenced

Target sequence Hypothesis Reference

Taiwan DEN-3. 88 from human

and 50 from mosquitoes

C and E Higher nucleotide

substitutions in

human derived

virus. Host-derived

positive selection

[40]

Americas DEN-2. 62 Isolates total

with 55 DEN-2.

E/NS1 junction No differences

between DF and

DHF. No

association with

Pathogenesis

[19]

Thailand DEN-4 Three genotypes-

SEA (I), SEA-America

(II), Sylvatic (III). 53

Isolates from 27-year

period

E

6 Thai strains

compared with 58

complete coding

regions

No geographical

segregation of the

three genotypes.

Immune driven

positive selection

[53]

Puerto Rico

USA

DEN-4. 82 isolates 40% of the genome

(4,000 bases of C,

PrM, E, NS1, NS2A,

NS4B

Temporal

clustering.

Antigenic variation

of NS2A.

Synonymous

changes

outnumbering non-

shynonymous

changes

[54]

West Africa,

SEA, Oceania

DEN-2 Hypothetic aa

replacements in E by

Wang et al.

Peridomestic

mosquitoes have

higher

dissemination rates

of endemic viruses.

Host-derived

positive selection

[55]

Malaysia and

Thailand

DEN-1, 2, and 4 E Sylvatic genetic

divergence

[56]

Venezuela DEN-3. 15 Isolates E Introduction of

Genotype III no

evidence of re-

emergence of type

V from 1960.

Importation of

dengue strains from

Asia.

[57]

SEA, West

Africa, America

DEN-2. 24 Isolates E Out-competing

Asia vs American

DEN2 strains in

mosquito. Selection

pressure evolution

[58]

Cuba DEN-2. 20 isolates and 6

for complete genome

E, complete genome No a.a. difference

in E.

Evolution during

epidemic in the NS

genes.

[13]

Venezuela DEN-2. 34 isolates E, PrM, NS1 In situ evolution of

dengue 2 within the

[20]

Page 17: Comparative Genomics of Dengue Virus: genome … · Comparative Genomics of Dengue Virus: genome population structure, transmission, and understanding differential inflammatory disease

Broad Institute – M. Henn Dengue Virus White Paper

3/15/2010 Page 17 of 21

same geographical

region

French

Polynesia

DEN-1. Genotype V and

IV

E Introduction of

DEN1 SEA

[59]

SEA, West

Africa,

Americas,

sylvatic

DEN-1, 2, 3, 4 3’ NTR DEN viruses arose

from sylvatic

progenitors and

evolved into human

epidemic strains.

Variation in the

3'NCR does not

correlate with DEN

virus pathogenesis

[60]

Argentina,

Paraguay

DEN-1, Five isolates Whole genome 3- a.a. in NS4A.

Co-existance of two

genotypes of the

five known.

[61]

Brazil DEN and encephalitic

and hemorrhagic in 15

strains of flaviviruses

NS5 Selection pressure

evolution

[62]

Thailand &

Venezuela

DEN-2, 11 isolates Whole genome Correlated

structural

differences to

pathogenesis

[31]