14
1 Antibiotic Time Series Project By Lunmei Huang & Zhaoxiang Chen I. INTRODUCTION Antibiotics are a type of antimicrobial used in the treatment and prevention of bacterial infection. They are widely used and have become the most significant and commonly used methods in medication practice today [1]. The overuse of antibiotics can also cause an emergence of resistance of bacteria to them. This phenomenon often reflects evolutionary processes that take place during antibiotic therapy. The antibiotic treatment may select for bacterial strains with physiologically or genetically enhanced capacity to survive high doses of antibiotics. Under certain conditions, it may result in preferential growth of resistant bacteria, while growth of susceptible bacteria is inhibited by the drug. Other problems like misuse should also be put into consideration while using antibiotics [1]. Therefore, it is necessary to analyze the amount of antibiotics used over time. This project aims to find the seasonality of seventeen kinds of antibiotic drugs and their total usage through seven years, and to explain the reason behind the seasonality. The data set is composed of eighteen time series of antibiotic monthly use (unit: mean use of days of therapy per 1000 patient days) from January 2007 to December 2013. We first used the locally weighted scatterplot smoothing (Loess) method to perform seasonality and trend decomposition (STL) for the time series. We found that only two out of seventeen drugs, Macrolides and antiMRSA showed pronounced seasonality based on the data. The remainder of this paper mainly focuses on the analysis of these two drugs. Section II of this paper explains in detail about the STL method, and how the two drugs were found to have pronounced seasonality by applying this method. The next two sections (III and IV) present in details of the analyses and results for the Macrolides and antiMRSA data sets. Section V is the conclusion regarding the results we obtained from the analysis before, and section VI is the appendix. II. STL METHOD STL is a method that decomposes a time series into seasonal, trend and irregular components using Loess[2], S stands for seasonal, T stands for trend, and L stands for Loess. By applying the STL method, the seasonal component is found by Loess smoothing the seasonal sub-series. In this paper, the smoothing for the seasonal sub-series was replaced by taking the mean. The seasonal components are then removed from the original data, and the remainder is smoothed by Loess method to find the trend. Subtracting both seasonality and trend from the data provided the remainder [2]. The corresponding expression for the STL model is:

Antibiotics Time Series Report

Embed Size (px)

Citation preview

Page 1: Antibiotics Time Series Report

1

Antibiotic Time Series Project

By Lunmei Huang & Zhaoxiang Chen

I. INTRODUCTION

Antibiotics are a type of antimicrobial used in the treatment and prevention of bacterial

infection. They are widely used and have become the most significant and commonly used

methods in medication practice today [1]. The overuse of antibiotics can also cause an

emergence of resistance of bacteria to them. This phenomenon often reflects evolutionary

processes that take place during antibiotic therapy. The antibiotic treatment may select for

bacterial strains with physiologically or genetically enhanced capacity to survive high doses

of antibiotics. Under certain conditions, it may result in preferential growth of resistant

bacteria, while growth of susceptible bacteria is inhibited by the drug. Other problems like

misuse should also be put into consideration while using antibiotics [1]. Therefore, it is

necessary to analyze the amount of antibiotics used over time.

This project aims to find the seasonality of seventeen kinds of antibiotic drugs and their total

usage through seven years, and to explain the reason behind the seasonality. The data set is

composed of eighteen time series of antibiotic monthly use (unit: mean use of days of therapy

per 1000 patient days) from January 2007 to December 2013. We first used the locally

weighted scatterplot smoothing (Loess) method to perform seasonality and trend

decomposition (STL) for the time series. We found that only two out of seventeen drugs,

Macrolides and antiMRSA showed pronounced seasonality based on the data. The remainder

of this paper mainly focuses on the analysis of these two drugs.

Section II of this paper explains in detail about the STL method, and how the two drugs were

found to have pronounced seasonality by applying this method. The next two sections (III and

IV) present in details of the analyses and results for the Macrolides and antiMRSA data sets.

Section V is the conclusion regarding the results we obtained from the analysis before, and

section VI is the appendix.

II. STL METHOD

STL is a method that “decomposes a time series into seasonal, trend and irregular

components using Loess” [2], S stands for seasonal, T stands for trend, and L stands for Loess.

By applying the STL method, the seasonal component is found by Loess smoothing the

seasonal sub-series. In this paper, the smoothing for the seasonal sub-series was replaced by

taking the mean. The seasonal components are then removed from the original data, and the

remainder is smoothed by Loess method to find the trend. Subtracting both seasonality and

trend from the data provided the remainder [2]. The corresponding expression for the STL

model is:

Page 2: Antibiotics Time Series Report

2

t t t ty Tr S

where, t = 1, 2, … , 72, represents 72 month time and yt was used to represent the data, or the

response variable, Trt to represent the trend component, St to represent the seasonality

component, and εt to represent the remainder component.

Refer to Table-1; the results are ratios of the InterQuartile Range (IQR) of each component to

the IQR of the original data resulting from the STL model, in the descending order for

seasonal component. The ratios of the seasonal components for Macrolides and antiMRSA are

93.9% and 82.6%, while the ratios for other drugs are less than 50%, we conclude that only

these two drugs use has pronounced seasonal change, so further analyses would be provided

regarding these two drugs.

Table-1: Ratios of the IQR of each component to the IQR of the data

Name Trt St et

Macrolides 49.1 93.9 32.8

antiMRSA 51.9 82.6 68.1

total BLI stable 38.2 30.7 62.5

Ceph1and2 81.6 27.0 23.8

Nafcillin 69.1 26.8 64.9

Clindamycin 75.2 26.7 48.5

Bactrim 96.4 23.4 62.1

amino glycosides 107.6 21.0 16.2

vancomycin 68.9 20.5 42.2

tetracyclines 82.0 20.3 42.5

Zosyn 84.1 17.3 32.1

total BLIs 81.4 17.0 30.4

Metro nidazole 100.2 15.8 31.7

Ceph3and4 97.9 13.9 10.2

carbapenems 107.5 11.8 25.8

penicillins 90.7 10.1 28.9

Quinolones 104.9 10.0 6.5

Total 39.7 43.8 45.8

III. MACROLIDES

Macrolides are bacteriostatic antibiotics with a broad spectrum of activity against many

gram-positive (which means bacteria relates to lung) bacteria [4]. Dramatic seasonality of

Macrolides use is often found in medication practice, and is likely related to the increase in

the incidence of respiratory tract infections in the first and fourth quarters (winter) of the

calendar year.

Page 3: Antibiotics Time Series Report

3

A. Modeling Procedure

The first step was to de-trend the data, where t represents the monthly time from January 2007

to December 2013, ty represents the data, ( )f t is a linear function of time t that represents

the trend of the data, and εt is the error term. The expression is:

( )t ty f t

where ( )f t assumes the linear form:

( )f t t

here α and β are fitted parameters. Figure-1 shows the line fit to de-trend the data. After the

de-trend process, we obtained the residuals

ˆ( )t te y f t

Figure-2 represents the plot for errors after the de-trend procedure. Intuitively, the error plot

shows possible seasonal pattern. The Autocorrelation Function (ACF) and Partial

Autocorrelation Function (PACF) plots were then generated for the residuals. The ACF

(Figure-3) and PACF (Figure-4) plots both show lags beyond the 95% confidence bound, and

six such lags were found from the PACF plot, this indicates the drug use for one month was

correlated with six months after it. If seasonality exists, one season or one period would be

one year, and significant correlation between one month and the month half a year later is the

evidence of yearly round seasonality. This conclusion also corresponds to the intuition

obtained from the error plot before. In this case, further analyses are required to represent the

seasonality.

The second step was to use eleven dummy variables to fit a linear model to represent the

seasonality of the residuals. The ACF (Figure-5) and PACF (Figure-6) plots for the residuals

after fitting the dummy variable model show long lags, this indicates that the residuals after

the de-trend and dummy variable processes are not just white noise, so the dummy variable

process was not a sufficient fit here. The expression and details related to the dummy variable

modeling procedure will be explained more in section IV where the modeling process for

antiMRSA is presented. An Auto-regression (AR) model could be appropriate if significant

lags are consistently found from a PACF plot for the de-trended data. Here, the first six lags

are found to be significant, and from the 7th lag and on, lags are no longer significant, so an

AR(6) model was used to represent the seasonality of the residuals after the de-trend

procedure, The expression is:

6

1

t i t i t

i

e c e

Page 4: Antibiotics Time Series Report

4

where et-i represents the variables fit, and six such variables were used for the AR(6) model,

and i is the coefficient for each et-i, also c is the constant for the model.

The corresponding residuals of this procedure are:

6

1

ˆˆt t i ti

i

d e c x

Figure-7 shows the plot for the residuals after the AR(6) model was performed. The ACF

(Figure-8), PACF (Figure-9), and Q-Q plot (Figure-10) for the residuals were generated. Since

no lags are beyond the 95% confidence bound for both ACF and PACF plots, and most points

on the Q-Q plot are close to the diagonal line; we conclude the residuals left are white noise,

and good models were fit for the data.

Figure-1: Trend line fit for Macrolides

Figure-2: Residuals after the de-trend process for Macrolides

Page 5: Antibiotics Time Series Report

5

Figure-3: ACF for the residuals after the de-trend process for Macrolides

Figure-4: PACF for the residuals after the de-trend process for Macrolides

***************************************************************************

Figure-5: ACF for Residuals after fitting eleven indicator variables for Macrolides

Page 6: Antibiotics Time Series Report

6

Figure-6: PACF for Residuals after fitting eleven indicator variables for Macrolides

***************************************************************************

Figure-7: Residuals after the AR(6) process for Macrolides

Figure-8: ACF for residuals after the AR(6) process for Macrolides

Page 7: Antibiotics Time Series Report

7

Figure-9: PACF for residuals after the AR(6) process for Macrolides

Figure-10: Q-Q plot for residuals after the AR(6) process for Macrolides

B. Results and Tests

Refer to Table-2, the p-values of both coefficients for the linear trend model are significant.

Further, the p-value of the F-test for the whole model is 0.0179, which also indicates the

model is necessary. On the other hand, R-squared and adj-R-squared values for the trend

model is very low, which means further modeling is needed. The PACF (Figure-4) shows six

significant lags, and Table-3 also indicates that AR(6) model provides the smallest AIC value.

This means the AR(6) model is the best among all models, which is consistent with what the

PACF shows. Table-4 shows that φ1, φ2, and φ6 are the significant coefficients for the AR

model, but all variables would stay in the model.

Table-2: Results for Detrended model of Macrolides

Coefficient Estimate S.E. t Stat p-value

α 31.4123 0.3334 94.2110 <0.0001

β 7.3838 3.0559 2.4160 0.0179

R-Squared 0.0665 Adj-R-Squared 0.0551

F-Stat 5.8380 on 1 and 82 df p-value 0.0179

Page 8: Antibiotics Time Series Report

8

Table-3: AR Models and corresponding AIC

Model AIC

AR(1) 354.8400

AR(2) 333.8800

AR(3) 330.7600

AR(4) 320.7100

AR(5) 318.0300

AR(6) 312.9400

AR(7) 313.3200

AR(8) 314.7400

Table-4: Results for ARIMA(6) model of Macrolides

Coefficients φ1 φ2 φ3 φ4 φ5 φ6 c

Estimate 0.7718 -0.2946 0.0839 -0.2268 0.0032 -0.2847 0.0313

p-value <0.0001 0.0276 0.5379 0.0933 0.9810 0.0061 0.8513

AIC 312.9400

IV. ANTI-MRSA

MRSA stands for Methicillin-resistant Staphylococcus aureus, which is a bacterium

responsible for several difficult-to-treat infections in humans. They are resistant to

beta-lactam antibiotics, which include the penicillins (methicillin, dicloxacillin, nafcillin,

oxacillin, and the resistance does make MRSA infection more difficult to treat with standard

types of antibiotics and thus more dangerous [5].

MRSA is especially troublesome in hospitals, prisons, and nursing homes, where patients with

open wounds, invasive devices, and weakened immune systems are at greater risk of

nosocomial infection than the general public [6]. The analysis shows that the peak use of

antiMRSA happens during summer time, this is probably because infection related bacteria

are more active during the warm temperature. Also metabolism precedure is faster than other

seasons, which means people sweat more, and that can cause patients described above to be

more easily infected by the MRSA, which leads to the higher use of antiMRSA.

A. Modeling Procedure

For antiMRSA data set, the first step was to draw a plot for the data (Figure-11), and by

looking at the plot, a quadratic trend was possibly appropriate. So the next step was to

de-trend the data, and the quadratic model was generated

2

1 2( )f t t t

A trend line was then fit for the data (Figure-11). After the de-trend process, we obtained the

residuals:

Page 9: Antibiotics Time Series Report

9

ˆ( )t te y f t

Figure-12 plots the residuals. Also, the ACF (Figure-13) and PACF (Figure-14) plots for the

de-trended data were obtained, and significant lags from PACF plot were found. An AR

model could be appropriate if significant lags are consistently found from a PACF plot.

However, the 1st, 5th, and 6th lags from the PACF plot are significant, which are not

consistently significant, an AR model would not be appropriate here. So the second step was

to use eleven dummy variables to fit a linear model to represent the seasonality of the data,

the model is:

11

1

t i it t

i

e c M

121,, 1,2,3,...,11

0, . .

MOD

it

t iM i

o w

where Mit are the indicator variables used for each month, with i as the coefficient, and c is

the constant parameter. The corresponding residual of this procedure is:

11

1

ˆˆt t it it

i

d e c M

Figure-15 represents the residuals after applying the linear model with eleven dummy

variables. Since we obtained the residuals, ACF (Figure-16), PACF (Figure-17), and Q-Q plot

(Figure-18) for the residuals were generated. No lags are beyond the 95% confidence bound

for the ACF plot, and only a few significant lags are found for the PACF plot, and most points

on the Student‟s t Q-Q plot are close to the diagonal line; we conclude the residuals left are

white noise, and models were adequate to the data.

Figure-11: Trend line fit for antiMRSA

Page 10: Antibiotics Time Series Report

10

Figure-12: Residuals after the de-trend process for antiMRSA

Figure-13: ACF for the residuals after the de-trend process for antiMRSA

Figure-14: PACF for the residuals after the de-trend process for

***************************************************************************

Page 11: Antibiotics Time Series Report

11

Figure-15: Residuals after fitting eleven indicator variables for antiMRSA

Figure-16: ACF for Residuals after fitting eleven indicator variables for antiMRSA

Figure-17: PACF for Residuals after fitting eleven indicator variables for antiMRSA

Page 12: Antibiotics Time Series Report

12

Figure-18: Student’s t Q-Q plot of residuals of eleven indicator variables model for

Macrolides

B. Results and Tests

From Table-5, p-values for the coefficients of the quadratic model are less than 0.05, and the

p-value for the whole model is <0.0001, this means the quadratic model is good fit for the

data. But as R-suqared and adj-R-squared values are both small, further modeling is necessary.

A linear model with eleven dummy variables was generated. With p-value <0.0001, we

conclude the model is useful. Table-6 shows the p-values for the dummy variables fit, and the

p-values for months February, March, April, May, and June are less than 0.05, which means

the use of antiMRSA for these months are significantly higher than other months, and this

monthly difference is constant throughout all six year. This shows seasonality exists in the

data set.

Table-5: Results for Quadratic Model of antiMRSA

Coefficients Estimate S.E. t-value p-value

α 23.5883 0.1497 157.5910 <0.0001

β1 5.9074 1.3718 4.3060 <0.0001

β2 -3.8814 1.3718 -2.8290 0.0058

R-Squared 0.2468 Adj-R-Squared 0.2283

F-statistic: 13.2700 on 2 and 81 df p-value <0.0001

Page 13: Antibiotics Time Series Report

13

Table-6: Results for Indicator Model of antiMRSA

Coefficients Estimate S.E. t-value p-value

c -0.8326 0.4063 -2.0490 0.0441

φ1 0.1381 0.5746 0.2400 0.8107

φ2 1.3778 0.5746 2.3980 0.0191

φ3 2.6319 0.5746 4.5800 <0.0001

φ4 1.8062 0.5746 3.1430 0.0024

φ5 1.5507 0.5746 2.6990 0.0087

φ6 1.5453 0.5746 2.6890 0.0089

φ7 0.8288 0.5746 1.4420 0.1536

φ8 0.4809 0.5746 0.8370 0.4054

φ9 -0.5553 0.5746 -0.9660 0.3371

φ10 -0.0899 0.5746 -0.1560 0.8761

φ11 0.2771 0.5746 0.4820 0.6311

R-Squared 0.4542 F-stat 5.4470 on 11 and 17 df

Adj-R-Squared 0.3708 p-value: <0.0001

V. CONCLUSION

The goal of the project is to find and model the seasonality of seventeen series of drugs use

and total use, and try to explain the reason behind the seasonality. From the modeling

procedure, two of the drugs, Macrolides and antiMRSA have pronounced seasonality.

By de-trending the original data for Macrolides, we found significant seasonality from the

error term, and an AR(6) model was then generated to represent the seasonality. No

significant lags were found from the ACF and PACF plots for the residuals, and the Q-Q plot

shows the residuals after the modeling process are white noise.

The de-trending process was applied to the antiMRSA data set as well, but a quadratic model

was a better fit for this data set instead of a first order model. After the de-trend, eleven

dummy variables were fit to build a linear model for the data set, and the corresponding ACF

and PACF for residuals do not show significant lags after the modeling procedure. The Q-Q

plot also indicates that the residuals are white noise.

The result of the analysis procedure for this paper is coherent with previous research. Based

on resources found for this paper (see [4] e.g.), the use of Macrolides increases a lot in winter

and early spring, and is significantly different from summer time, that is about half a year later.

Significant correlation between one month and the month half a year later were found from

our analysis, and it proved this phenomenon statistically. For the antiMRSA drug, the use for

February, March, April, May, and June are significantly different from December, that means

as weather warms up, people are more easily infected by MRSA, and so more antiMRSA are

needed. This also corresponds to the explanation from resources found for this paper.

Page 14: Antibiotics Time Series Report

14

VI. REFERENCE

[1] Katie J. Sudaa, Lauri A. Hicksb, Rebecca M. Robertsb, Robert J. Hunklerc and Thomas

H. Trends and Seasonal Variation in Outpatient Antibiotic Prescription Rates in the

United States, 2006 to 2010 Antimicrob. Agents Chemother. May 2014 vol. 58 no. 5

2763-2766

[2] B.D. Ripley; Fortran code by Cleveland et al (1990) from „netlib‟. Seasonal

Decomposition of Time Series by Loess [Online]. Available:

stat.ethz.ch/R-manual/R-devel/library/stats/html/stl.html

[3] Original: Paul Gilbert, Martyn Plummer. Extensive modifications and univariate case of

pacf by B. D. Ripley. Auto- and Cross- Covariance and -Correlation Function

Estimation [Online]. Available: stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html

[4] DRUG RECORD MACROLIDE ANTIBIOTICS/ Overview [Online]. Available:

livertox.nih.gov/MacrolideAntibiotics.htm#overview

[5] Alan Johnson Methicillin-resistant Staphylococcus aureus (MRSA) infection [Online].

Available: netdoctor.co.uk/diseases/facts/mrsa.htm

[6] MRSA Bacteria: Facts, Information & Treatment [Online]. Available:

disabled-world.com/health/mrsa/