1
Acknowledgements This study was enabled by the National Institutes of Allergy and Infectious Diseases(U19-AI062627 and NO1- A150032). This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2017-17072100002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. Visualizing Disease Seasonality: Comparisons Across Space and Time An Example of Salmonellosis in the United States (1996-2017) Ryan Simpson, Bingjie Zhou, Elena N. Naumova 1. Application of consistent methodologies for evaluating seasonal characteristics permit direct comparisons of amplitude and peak timing 2. By performing annualized seasonality analysis, one can examine trend in amplitude and peak timing 3. The use of multi-panel plots with shared axes allows for in-depth understanding of the relationship between peak timing and amplitude across time, diseases, and locations Recommendations for FoodNet Fast Portal We aim to clarify and emphasize the essential features of a temporal process, such as trend, periodicity, or irregular spikes and troughs. With improved ability to capture, store, and manage big data in near-real time, systematic and effective data visualization techniques are needed to display and decipher large amounts of data in a timely and cost-effective manner. Time series plots are the most commonly used graphical tools to show the underlying structure of time series data with units of time on one axis (commonly horizontal) and a studied quantity on the other axis (commonly vertical). However, choices for visualizing time-referenced data could be substantially expanded, particularly with applications of multi-panel plots with shared axes. We propose using these multi- panel and shared axes plots to illustrate trend and seasonality using CDC publicly available data. Objectives Having established a standardized statistical approach backed with clearly communicative visualizations, we next aim to develop an atlas of foodborne infections to summarize research findings. This tool will enable clear comparisons of trend and seasonality characteristics across the same data levels from this research (e.g. diseases, geographic locations, and inter-annually) as well as for specific subpopulations whose data also is available on FoodNet (e.g. age, gender, and race). Such an analysis will permit greater ease of comparison across infection case definitions for improved national surveillance. Additionally, this tool can be used to explore causal inferences between seasonal infection outbreaks and seasonal drivers influencing disease incidence. Future Directions CONS Not capturing within year variations, essential for diseases with strong seasonality Annual averages are masking potential changes in peak timing and amplitude Figure S1. Graphical displays of trend (Panel A) and seasonality (Panel B) for Salmonellosis using the FoodNet Fast publically available data portal. Rationale: How Can Trend and Seasonality Visualizations Be Improved? We estimated seasonal characteristics of peak timing and amplitude along with their confidence intervals for each year of the study using δ-methods applied to ANBHR models (see Figure S2). [2, 3]. We have applied these methods in similar works for analyzing geographic and temporal variations for salmonellosis as well as shifts in the seasonality of legionellosis within the United States [4, 5]. We used time series plots (Figure 1), seasonal signature plots (Figure 2A), box plots (Figure 2B), radar plots (Figure 2C), forests plots (Figure 3A and 3B), and their multi-panel compilations (Figure 4). Figure 3. A multi-panel graph displaying relationships between peak timing and amplitude across each study year (22 total) as a national estimate for Salmonellosis from 1996 to 2017. Peak timing estimates are displayed in the top panel while amplitude estimates are displayed in the bottom panel. Figure 4. A multi-panel graph displaying relationships between peak timing and amplitude across for national estimates of Salmonellosis from 1996 to 2017. The top-left panel displays a forest plot of annual (vertical axis) peak timing estimates while the bottom-right panel displays a forest plot of annual (horizontal axis) amplitude estimates. The bottom-left panel provides a scatterplot across the shared peak timing and amplitude axes (with 95% CI) to evaluate any possible trends related to peak timing shift. Dashed red lines indicate the average peak timing and amplitude estimates for the full time series. Relationship Between Annual Peak Timing and Amplitude Estimates Using Shared Axes Year of Study Year of Study Amplitude (Cases per 100,000) Peak Timing (Month) Figure 2. Seasonality of Salmonellosis in the United States from 1996 to 2017. The two left panels overlay annualized time series by study year (top) as well as annualized descriptive statistics across all years (bottom). Both left panel graphs share the common axis of month-of-year with median values colored in red. Seasons, as defined by Earth’s solar orbit, are defined using background colors of blue (Winter Solstice), green (Spring Equinox), yellow (Summer Solstice), and orange (Fall Equinox). The same information is displayed in the right panel using a spectral plot where cases are expressed using polar coordinates (rotational angle indicates time of year while radial distance indicates case counts). Figure 1. Trend of Salmonellosis in the United States from 1996 to 2017. The right panel shows the full 264-month time series of Salmonellosis cases with median cases (447 infections) marked by the red line and a combined linear, quadratic, and cubic trend marked by the dashed line (with 95% CI). The solid blue line shows infections with lighter colors indicating earlier years and darker colors indicating later years. The histogram plot sharing the vertical axis depicts the number of months for each infection level reported with bin size of ~50 infections. The trend estimate is not adjusted for increases in sampling population from incremental inclusion of surveyed counties over time. Panel B Annualized Negative Binomial Harmonic Regression (ANBHR) model for disease monthly counts, Y t log [E(Y t )] = β 0 + β L (Seasonality) + ε t , where β L (Seasonality) = β s (sin(2πωt)) + β c (cos(2πωt)) with a frequency of ω = 1/12 Amplitude Peak Timing Cases Methods: Calculating Seasonality Characteristics Multi-Panel Plots With Shared Axes Allow For: Improvements in specificity can bring reductions in usability. The more the user refines the panels to construct complex data visualizations, the more difficult modifications of these plots become There is such a thing as too much information! In particular, this occurs when conclusions are drawn between observations or seasonality characteristics without clear meaning of their relationship Possibility for cluttering. As the dimensionality of the plot increases, inter- group differences become harder to see if groups are relatively similar Standardizing units of analysis. Shared axes applies the same scale of measurement to all groups making this plot type unwise for very heterogenous data Disadvantages of Multi-Panel, Shared Axis Plots • More veracity, or clearer meaning and quality of data visualizations. This comes by more easily visualizing trends of incidence over time, intra- and inter-seasonal fluctuations of these trends, or relationships between seasonality characteristics like peak timing and amplitude. • Improved velocity for generating new insights. This can be achieved by using these shared-axis, multi-panel plots in formal reporting. This process can be standardized by increasing the capacity of visualization software platforms and packages. • Increased data volume for visualization. This includes more observations displayed within a plot such as a wider range of annual estimates or greater temporal resolution (e.g. days, weeks, etc.) • Greater variety of the type and form of information displayed. This includes evaluating multiple diseases in the same geographic location, locations for the same disease, or seasonality characteristics for the same disease-location dyad. PROS Simple capturing of a general trend by standardizing counts as incidence rates Simple capturing of average annual seasonality expressed by contribution of individual months Data Source Data was retrieved from the CDC FoodNet Fast publically available database [1]. We abstracted annual incidence and monthly percentages of confirmed infections for salmonellosis in the United States from 1996 to 2017. To recover monthly confirmed infections, we multiplied annual incidence for each year by it’s monthly percentage of confirmed infections. Figure S2. Graphical depiction of seasonality characteristics (top) and general regression equations to analyze them (below) A B C A B References [1] Centers for Disease Control and Prevention. (2016, Nov). Foodborne Diseases Active Surveillance Network (FoodNet). Retrieved from https://www.cdc.gov/foodnet/foodnet-fast.html . Accessed on 31 Dec 2018. [2] Lofgren, E., Fefferman, N., Doshi, M., & Naumova, E. N. (2007, May). Assessing seasonal variation in multisource surveillance data: annual harmonic regression. In NSF Workshop on Intelligence and Security Informatics (pp. 114-123). Springer, Berlin, Heidelberg. [3] Naumova, E. N., & MacNeill, I. B. (2007). Seasonality assessment for biosurveillance systems. In Advances in Statistical Methods for the Health Sciences (pp. 437-450). Birkhäuser Boston. [4] Falconi, T. A., Cruz, M. S., & Naumova, E. N. (2018). The shift in seasonality of legionellosis in the USA. Epidemiology & Infection, 146(14), 1824-1833. [5] Chui, K. K., Webb, P., Russell, R. M., & Naumova, E. N. (2009). Geographic variations and temporal trends of Salmonella-associated hospitalization in the US elderly, 1991-2004: A time series analysis of the impact of HACCP regulation. BMC Public Health, 9(1), 447.

Visualizing Disease Seasonality: Comparisons …...2019/07/17  · evaluating multiple diseases in the same geographic location, locations for the same disease, or seasonality characteristics

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visualizing Disease Seasonality: Comparisons …...2019/07/17  · evaluating multiple diseases in the same geographic location, locations for the same disease, or seasonality characteristics

Acknowledgements

This study was enabled by the National Institutes of Allergy and Infectious Diseases(U19-AI062627 and NO1-A150032). This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2017-17072100002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.

Visualizing Disease Seasonality: Comparisons Across Space and Time An Example of Salmonellosis in the United States (1996-2017)

Ryan Simpson, Bingjie Zhou, Elena N. Naumova

1.  Application of consistent methodologies for evaluating seasonal characteristics permit direct comparisons of amplitude and peak timing

2.  By performing annualized seasonality analysis, one can examine trend in amplitude and peak timing

3.  The use of multi-panel plots with shared axes allows for in-depth understanding of the relationship between peak timing and amplitude across time, diseases, and locations

Recommendations for FoodNet Fast Portal

We aim to clarify and emphasize the essential features of a temporal process, such as trend, periodicity, or irregular spikes and troughs. With improved ability to capture, store, and manage big data in near-real time, systematic and effective data visualization techniques are needed to display and decipher large amounts of data in a timely and cost-effective manner. Time series plots are the most commonly used graphical tools to show the underlying structure of time series data with units of time on one axis (commonly horizontal) and a studied quantity on the other axis (commonly vertical). However, choices for visualizing time-referenced data could be substantially expanded, particularly with applications of multi-panel plots with shared axes. We propose using these multi-panel and shared axes plots to illustrate trend and seasonality using CDC publicly available data.

Objectives

Having established a standardized statistical approach backed with clearly communicative visualizations, we next aim to develop an atlas of foodborne infections to summarize research findings. This tool will enable clear comparisons of trend and seasonality characteristics across the same data levels from this research (e.g. diseases, geographic locations, and inter-annually) as well as for specific subpopulations whose data also is available on FoodNet (e.g. age, gender, and race). Such an analysis will permit greater ease of comparison across infection case definitions for improved national surveillance. Additionally, this tool can be used to explore causal inferences between seasonal infection outbreaks and seasonal drivers influencing disease incidence.

Future Directions

CONS •  Not capturing within

year variations, essential for diseases with strong seasonality

•  Annual averages are masking potential changes in peak timing and amplitude

Figure S1. Graphical displays of trend (Panel A) and seasonality (Panel B) for Salmonellosis using the FoodNet Fast publically available data portal.

Rationale: How Can Trend and Seasonality Visualizations Be Improved?

We estimated seasonal characteristics of peak timing and amplitude along with their confidence intervals for each year of the study using δ-methods applied to ANBHR models (see Figure S2). [2, 3]. We have applied these methods in similar works for analyzing geographic and temporal variations for salmonellosis as well as shifts in the seasonality of legionellosis within the United States [4, 5]. We used time series plots (Figure 1), seasonal signature plots (Figure 2A), box plots (Figure 2B), radar plots (Figure 2C), forests plots (Figure 3A and 3B), and their multi-panel compilations (Figure 4).

Figure 3. A multi-panel graph displaying relationships between peak timing and amplitude across each study year (22 total) as a national estimate for Salmonellosis from 1996 to 2017. Peak timing estimates are displayed in the top panel while amplitude estimates are displayed in the bottom panel.

Figure 4. A multi-panel graph displaying relationships between peak timing and amplitude across for national estimates of Salmonellosis from 1996 to 2017. The top-left panel displays a forest plot of annual (vertical axis) peak timing estimates while the bottom-right panel displays a forest plot of annual (horizontal axis) amplitude estimates. The bottom-left panel provides a scatterplot across the shared peak timing and amplitude axes (with 95% CI) to evaluate any possible trends related to peak timing shift. Dashed red lines indicate the average peak timing and amplitude estimates for the full time series.

Relationship Between Annual Peak Timing and Amplitude Estimates Using Shared Axes

YearofStudy

YearofStudyAm

plitu

de(C

asesper100,000)

PeakTiming(Month)

Figure 2. Seasonality of Salmonellosis in the United States from 1996 to 2017. The two left panels overlay annualized time series by study year (top) as well as annualized descriptive statistics across all years (bottom). Both left panel graphs share the common axis of month-of-year with median values colored in red. Seasons, as defined by Earth’s solar orbit, are defined using background colors of blue (Winter Solstice), green (Spring Equinox), yellow (Summer Solstice), and orange (Fall Equinox). The same information is displayed in the right panel using a spectral plot where cases are expressed using polar coordinates (rotational angle indicates time of year while radial distance indicates case counts).

Figure 1. Trend of Salmonellosis in the United States from 1996 to 2017. The right panel shows the full 264-month time series of Salmonellosis cases with median cases (447 infections) marked by the red line and a combined linear, quadratic, and cubic trend marked by the dashed line (with 95% CI). The solid blue line shows infections with lighter colors indicating earlier years and darker colors indicating later years. The histogram plot sharing the vertical axis depicts the number of months for each infection level reported with bin size of ~50 infections. The trend estimate is not adjusted for increases in sampling population from incremental inclusion of surveyed counties over time.

PanelB Annualized Negative Binomial Harmonic Regression (ANBHR) model for disease monthly counts, Yt

log [E(Yt)] = β0 + βL (Seasonality) + εt ,

where βL (Seasonality) = βs (sin(2πωt)) + βc(cos(2πωt))

with a frequency of ω = 1/12

Amplitu

de

PeakTiming

Cases

Methods: Calculating Seasonality Characteristics

Multi-Panel Plots With Shared Axes Allow For: •  Improvements in specificity can bring reductions in usability. The more the user refines the panels to construct complex data visualizations, the more difficult modifications of these plots become

• There is such a thing as too much information! In

particular, this occurs when conclusions are drawn between observations or seasonality characteristics without clear meaning of their relationship

•  Possibility for cluttering. As the dimensionality of the plot increases, inter-group differences become harder to see if groups are relatively similar

•  Standardizing units of analysis. Shared axes applies the same scale of measurement to all groups making this plot type unwise for very heterogenous data

Disadvantages of Multi-Panel, Shared Axis Plots

• More veracity, or clearer meaning and quality of data visualizations. This comes by more easily visualizing trends of incidence over time, intra- and inter-seasonal fluctuations of these trends, or relationships between seasonality characteristics like peak timing and amplitude.

•  Improved velocity for generating new insights. This can be achieved by using these shared-axis, multi-panel plots in formal reporting. This process can be standardized by increasing the capacity of visualization software platforms and packages.

•  Increased data volume for visualization. This includes more observations displayed within a plot such as a wider range of annual estimates or greater temporal resolution (e.g. days, weeks, etc.)

• Greater variety of the type and form of information displayed. This includes evaluating multiple diseases in the same geographic location, locations for the same disease, or seasonality characteristics for the same disease-location dyad.

PROS •  Simple capturing of a

general trend by standardizing counts as incidence rates

•  Simple capturing of

average annual seasonality expressed by contribution of individual months

Data Source

Data was retrieved from the CDC FoodNet Fast publically available database [1]. We abstracted annual incidence and monthly percentages of confirmed infections for salmonellosis in the United States from 1996 to 2017. To recover monthly confirmed infections, we multiplied annual incidence for each year by it’s monthly percentage of confirmed infections.

Figure S2. Graphical depiction of seasonality characteristics (top) and general regression equations to analyze them (below)

A

B

C

A

B

References [1] Centers for Disease Control and Prevention. (2016, Nov). Foodborne Diseases Active Surveillance Network (FoodNet). Retrieved from https://www.cdc.gov/foodnet/foodnet-fast.html. Accessed on 31 Dec 2018. [2] Lofgren, E., Fefferman, N., Doshi, M., & Naumova, E. N. (2007, May). Assessing seasonal variation in multisource surveillance data: annual harmonic regression. In NSF Workshop on Intelligence and Security Informatics (pp. 114-123). Springer, Berlin, Heidelberg. [3] Naumova, E. N., & MacNeill, I. B. (2007). Seasonality assessment for biosurveillance systems. In Advances in Statistical Methods for the Health Sciences (pp. 437-450). Birkhäuser Boston. [4] Falconi, T. A., Cruz, M. S., & Naumova, E. N. (2018). The shift in seasonality of legionellosis in the USA. Epidemiology & Infection, 146(14), 1824-1833. [5] Chui, K. K., Webb, P., Russell, R. M., & Naumova, E. N. (2009). Geographic variations and temporal trends of Salmonella-associated hospitalization in the US elderly, 1991-2004: A time series analysis of the impact of HACCP regulation. BMC Public Health, 9(1), 447.