Upload
zhaoxiang-chen
View
226
Download
1
Embed Size (px)
Citation preview
1
Antibiotic Time Series Project
By Lunmei Huang & Zhaoxiang Chen
I. INTRODUCTION
Antibiotics are a type of antimicrobial used in the treatment and prevention of bacterial
infection. They are widely used and have become the most significant and commonly used
methods in medication practice today [1]. The overuse of antibiotics can also cause an
emergence of resistance of bacteria to them. This phenomenon often reflects evolutionary
processes that take place during antibiotic therapy. The antibiotic treatment may select for
bacterial strains with physiologically or genetically enhanced capacity to survive high doses
of antibiotics. Under certain conditions, it may result in preferential growth of resistant
bacteria, while growth of susceptible bacteria is inhibited by the drug. Other problems like
misuse should also be put into consideration while using antibiotics [1]. Therefore, it is
necessary to analyze the amount of antibiotics used over time.
This project aims to find the seasonality of seventeen kinds of antibiotic drugs and their total
usage through seven years, and to explain the reason behind the seasonality. The data set is
composed of eighteen time series of antibiotic monthly use (unit: mean use of days of therapy
per 1000 patient days) from January 2007 to December 2013. We first used the locally
weighted scatterplot smoothing (Loess) method to perform seasonality and trend
decomposition (STL) for the time series. We found that only two out of seventeen drugs,
Macrolides and antiMRSA showed pronounced seasonality based on the data. The remainder
of this paper mainly focuses on the analysis of these two drugs.
Section II of this paper explains in detail about the STL method, and how the two drugs were
found to have pronounced seasonality by applying this method. The next two sections (III and
IV) present in details of the analyses and results for the Macrolides and antiMRSA data sets.
Section V is the conclusion regarding the results we obtained from the analysis before, and
section VI is the appendix.
II. STL METHOD
STL is a method that “decomposes a time series into seasonal, trend and irregular
components using Loess” [2], S stands for seasonal, T stands for trend, and L stands for Loess.
By applying the STL method, the seasonal component is found by Loess smoothing the
seasonal sub-series. In this paper, the smoothing for the seasonal sub-series was replaced by
taking the mean. The seasonal components are then removed from the original data, and the
remainder is smoothed by Loess method to find the trend. Subtracting both seasonality and
trend from the data provided the remainder [2]. The corresponding expression for the STL
model is:
2
t t t ty Tr S
where, t = 1, 2, … , 72, represents 72 month time and yt was used to represent the data, or the
response variable, Trt to represent the trend component, St to represent the seasonality
component, and εt to represent the remainder component.
Refer to Table-1; the results are ratios of the InterQuartile Range (IQR) of each component to
the IQR of the original data resulting from the STL model, in the descending order for
seasonal component. The ratios of the seasonal components for Macrolides and antiMRSA are
93.9% and 82.6%, while the ratios for other drugs are less than 50%, we conclude that only
these two drugs use has pronounced seasonal change, so further analyses would be provided
regarding these two drugs.
Table-1: Ratios of the IQR of each component to the IQR of the data
Name Trt St et
Macrolides 49.1 93.9 32.8
antiMRSA 51.9 82.6 68.1
total BLI stable 38.2 30.7 62.5
Ceph1and2 81.6 27.0 23.8
Nafcillin 69.1 26.8 64.9
Clindamycin 75.2 26.7 48.5
Bactrim 96.4 23.4 62.1
amino glycosides 107.6 21.0 16.2
vancomycin 68.9 20.5 42.2
tetracyclines 82.0 20.3 42.5
Zosyn 84.1 17.3 32.1
total BLIs 81.4 17.0 30.4
Metro nidazole 100.2 15.8 31.7
Ceph3and4 97.9 13.9 10.2
carbapenems 107.5 11.8 25.8
penicillins 90.7 10.1 28.9
Quinolones 104.9 10.0 6.5
Total 39.7 43.8 45.8
III. MACROLIDES
Macrolides are bacteriostatic antibiotics with a broad spectrum of activity against many
gram-positive (which means bacteria relates to lung) bacteria [4]. Dramatic seasonality of
Macrolides use is often found in medication practice, and is likely related to the increase in
the incidence of respiratory tract infections in the first and fourth quarters (winter) of the
calendar year.
3
A. Modeling Procedure
The first step was to de-trend the data, where t represents the monthly time from January 2007
to December 2013, ty represents the data, ( )f t is a linear function of time t that represents
the trend of the data, and εt is the error term. The expression is:
( )t ty f t
where ( )f t assumes the linear form:
( )f t t
here α and β are fitted parameters. Figure-1 shows the line fit to de-trend the data. After the
de-trend process, we obtained the residuals
ˆ( )t te y f t
Figure-2 represents the plot for errors after the de-trend procedure. Intuitively, the error plot
shows possible seasonal pattern. The Autocorrelation Function (ACF) and Partial
Autocorrelation Function (PACF) plots were then generated for the residuals. The ACF
(Figure-3) and PACF (Figure-4) plots both show lags beyond the 95% confidence bound, and
six such lags were found from the PACF plot, this indicates the drug use for one month was
correlated with six months after it. If seasonality exists, one season or one period would be
one year, and significant correlation between one month and the month half a year later is the
evidence of yearly round seasonality. This conclusion also corresponds to the intuition
obtained from the error plot before. In this case, further analyses are required to represent the
seasonality.
The second step was to use eleven dummy variables to fit a linear model to represent the
seasonality of the residuals. The ACF (Figure-5) and PACF (Figure-6) plots for the residuals
after fitting the dummy variable model show long lags, this indicates that the residuals after
the de-trend and dummy variable processes are not just white noise, so the dummy variable
process was not a sufficient fit here. The expression and details related to the dummy variable
modeling procedure will be explained more in section IV where the modeling process for
antiMRSA is presented. An Auto-regression (AR) model could be appropriate if significant
lags are consistently found from a PACF plot for the de-trended data. Here, the first six lags
are found to be significant, and from the 7th lag and on, lags are no longer significant, so an
AR(6) model was used to represent the seasonality of the residuals after the de-trend
procedure, The expression is:
6
1
t i t i t
i
e c e
4
where et-i represents the variables fit, and six such variables were used for the AR(6) model,
and i is the coefficient for each et-i, also c is the constant for the model.
The corresponding residuals of this procedure are:
6
1
ˆˆt t i ti
i
d e c x
Figure-7 shows the plot for the residuals after the AR(6) model was performed. The ACF
(Figure-8), PACF (Figure-9), and Q-Q plot (Figure-10) for the residuals were generated. Since
no lags are beyond the 95% confidence bound for both ACF and PACF plots, and most points
on the Q-Q plot are close to the diagonal line; we conclude the residuals left are white noise,
and good models were fit for the data.
Figure-1: Trend line fit for Macrolides
Figure-2: Residuals after the de-trend process for Macrolides
5
Figure-3: ACF for the residuals after the de-trend process for Macrolides
Figure-4: PACF for the residuals after the de-trend process for Macrolides
***************************************************************************
Figure-5: ACF for Residuals after fitting eleven indicator variables for Macrolides
6
Figure-6: PACF for Residuals after fitting eleven indicator variables for Macrolides
***************************************************************************
Figure-7: Residuals after the AR(6) process for Macrolides
Figure-8: ACF for residuals after the AR(6) process for Macrolides
7
Figure-9: PACF for residuals after the AR(6) process for Macrolides
Figure-10: Q-Q plot for residuals after the AR(6) process for Macrolides
B. Results and Tests
Refer to Table-2, the p-values of both coefficients for the linear trend model are significant.
Further, the p-value of the F-test for the whole model is 0.0179, which also indicates the
model is necessary. On the other hand, R-squared and adj-R-squared values for the trend
model is very low, which means further modeling is needed. The PACF (Figure-4) shows six
significant lags, and Table-3 also indicates that AR(6) model provides the smallest AIC value.
This means the AR(6) model is the best among all models, which is consistent with what the
PACF shows. Table-4 shows that φ1, φ2, and φ6 are the significant coefficients for the AR
model, but all variables would stay in the model.
Table-2: Results for Detrended model of Macrolides
Coefficient Estimate S.E. t Stat p-value
α 31.4123 0.3334 94.2110 <0.0001
β 7.3838 3.0559 2.4160 0.0179
R-Squared 0.0665 Adj-R-Squared 0.0551
F-Stat 5.8380 on 1 and 82 df p-value 0.0179
8
Table-3: AR Models and corresponding AIC
Model AIC
AR(1) 354.8400
AR(2) 333.8800
AR(3) 330.7600
AR(4) 320.7100
AR(5) 318.0300
AR(6) 312.9400
AR(7) 313.3200
AR(8) 314.7400
Table-4: Results for ARIMA(6) model of Macrolides
Coefficients φ1 φ2 φ3 φ4 φ5 φ6 c
Estimate 0.7718 -0.2946 0.0839 -0.2268 0.0032 -0.2847 0.0313
p-value <0.0001 0.0276 0.5379 0.0933 0.9810 0.0061 0.8513
AIC 312.9400
IV. ANTI-MRSA
MRSA stands for Methicillin-resistant Staphylococcus aureus, which is a bacterium
responsible for several difficult-to-treat infections in humans. They are resistant to
beta-lactam antibiotics, which include the penicillins (methicillin, dicloxacillin, nafcillin,
oxacillin, and the resistance does make MRSA infection more difficult to treat with standard
types of antibiotics and thus more dangerous [5].
MRSA is especially troublesome in hospitals, prisons, and nursing homes, where patients with
open wounds, invasive devices, and weakened immune systems are at greater risk of
nosocomial infection than the general public [6]. The analysis shows that the peak use of
antiMRSA happens during summer time, this is probably because infection related bacteria
are more active during the warm temperature. Also metabolism precedure is faster than other
seasons, which means people sweat more, and that can cause patients described above to be
more easily infected by the MRSA, which leads to the higher use of antiMRSA.
A. Modeling Procedure
For antiMRSA data set, the first step was to draw a plot for the data (Figure-11), and by
looking at the plot, a quadratic trend was possibly appropriate. So the next step was to
de-trend the data, and the quadratic model was generated
2
1 2( )f t t t
A trend line was then fit for the data (Figure-11). After the de-trend process, we obtained the
residuals:
9
ˆ( )t te y f t
Figure-12 plots the residuals. Also, the ACF (Figure-13) and PACF (Figure-14) plots for the
de-trended data were obtained, and significant lags from PACF plot were found. An AR
model could be appropriate if significant lags are consistently found from a PACF plot.
However, the 1st, 5th, and 6th lags from the PACF plot are significant, which are not
consistently significant, an AR model would not be appropriate here. So the second step was
to use eleven dummy variables to fit a linear model to represent the seasonality of the data,
the model is:
11
1
t i it t
i
e c M
121,, 1,2,3,...,11
0, . .
MOD
it
t iM i
o w
where Mit are the indicator variables used for each month, with i as the coefficient, and c is
the constant parameter. The corresponding residual of this procedure is:
11
1
ˆˆt t it it
i
d e c M
Figure-15 represents the residuals after applying the linear model with eleven dummy
variables. Since we obtained the residuals, ACF (Figure-16), PACF (Figure-17), and Q-Q plot
(Figure-18) for the residuals were generated. No lags are beyond the 95% confidence bound
for the ACF plot, and only a few significant lags are found for the PACF plot, and most points
on the Student‟s t Q-Q plot are close to the diagonal line; we conclude the residuals left are
white noise, and models were adequate to the data.
Figure-11: Trend line fit for antiMRSA
10
Figure-12: Residuals after the de-trend process for antiMRSA
Figure-13: ACF for the residuals after the de-trend process for antiMRSA
Figure-14: PACF for the residuals after the de-trend process for
***************************************************************************
11
Figure-15: Residuals after fitting eleven indicator variables for antiMRSA
Figure-16: ACF for Residuals after fitting eleven indicator variables for antiMRSA
Figure-17: PACF for Residuals after fitting eleven indicator variables for antiMRSA
12
Figure-18: Student’s t Q-Q plot of residuals of eleven indicator variables model for
Macrolides
B. Results and Tests
From Table-5, p-values for the coefficients of the quadratic model are less than 0.05, and the
p-value for the whole model is <0.0001, this means the quadratic model is good fit for the
data. But as R-suqared and adj-R-squared values are both small, further modeling is necessary.
A linear model with eleven dummy variables was generated. With p-value <0.0001, we
conclude the model is useful. Table-6 shows the p-values for the dummy variables fit, and the
p-values for months February, March, April, May, and June are less than 0.05, which means
the use of antiMRSA for these months are significantly higher than other months, and this
monthly difference is constant throughout all six year. This shows seasonality exists in the
data set.
Table-5: Results for Quadratic Model of antiMRSA
Coefficients Estimate S.E. t-value p-value
α 23.5883 0.1497 157.5910 <0.0001
β1 5.9074 1.3718 4.3060 <0.0001
β2 -3.8814 1.3718 -2.8290 0.0058
R-Squared 0.2468 Adj-R-Squared 0.2283
F-statistic: 13.2700 on 2 and 81 df p-value <0.0001
13
Table-6: Results for Indicator Model of antiMRSA
Coefficients Estimate S.E. t-value p-value
c -0.8326 0.4063 -2.0490 0.0441
φ1 0.1381 0.5746 0.2400 0.8107
φ2 1.3778 0.5746 2.3980 0.0191
φ3 2.6319 0.5746 4.5800 <0.0001
φ4 1.8062 0.5746 3.1430 0.0024
φ5 1.5507 0.5746 2.6990 0.0087
φ6 1.5453 0.5746 2.6890 0.0089
φ7 0.8288 0.5746 1.4420 0.1536
φ8 0.4809 0.5746 0.8370 0.4054
φ9 -0.5553 0.5746 -0.9660 0.3371
φ10 -0.0899 0.5746 -0.1560 0.8761
φ11 0.2771 0.5746 0.4820 0.6311
R-Squared 0.4542 F-stat 5.4470 on 11 and 17 df
Adj-R-Squared 0.3708 p-value: <0.0001
V. CONCLUSION
The goal of the project is to find and model the seasonality of seventeen series of drugs use
and total use, and try to explain the reason behind the seasonality. From the modeling
procedure, two of the drugs, Macrolides and antiMRSA have pronounced seasonality.
By de-trending the original data for Macrolides, we found significant seasonality from the
error term, and an AR(6) model was then generated to represent the seasonality. No
significant lags were found from the ACF and PACF plots for the residuals, and the Q-Q plot
shows the residuals after the modeling process are white noise.
The de-trending process was applied to the antiMRSA data set as well, but a quadratic model
was a better fit for this data set instead of a first order model. After the de-trend, eleven
dummy variables were fit to build a linear model for the data set, and the corresponding ACF
and PACF for residuals do not show significant lags after the modeling procedure. The Q-Q
plot also indicates that the residuals are white noise.
The result of the analysis procedure for this paper is coherent with previous research. Based
on resources found for this paper (see [4] e.g.), the use of Macrolides increases a lot in winter
and early spring, and is significantly different from summer time, that is about half a year later.
Significant correlation between one month and the month half a year later were found from
our analysis, and it proved this phenomenon statistically. For the antiMRSA drug, the use for
February, March, April, May, and June are significantly different from December, that means
as weather warms up, people are more easily infected by MRSA, and so more antiMRSA are
needed. This also corresponds to the explanation from resources found for this paper.
14
VI. REFERENCE
[1] Katie J. Sudaa, Lauri A. Hicksb, Rebecca M. Robertsb, Robert J. Hunklerc and Thomas
H. Trends and Seasonal Variation in Outpatient Antibiotic Prescription Rates in the
United States, 2006 to 2010 Antimicrob. Agents Chemother. May 2014 vol. 58 no. 5
2763-2766
[2] B.D. Ripley; Fortran code by Cleveland et al (1990) from „netlib‟. Seasonal
Decomposition of Time Series by Loess [Online]. Available:
stat.ethz.ch/R-manual/R-devel/library/stats/html/stl.html
[3] Original: Paul Gilbert, Martyn Plummer. Extensive modifications and univariate case of
pacf by B. D. Ripley. Auto- and Cross- Covariance and -Correlation Function
Estimation [Online]. Available: stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html
[4] DRUG RECORD MACROLIDE ANTIBIOTICS/ Overview [Online]. Available:
livertox.nih.gov/MacrolideAntibiotics.htm#overview
[5] Alan Johnson Methicillin-resistant Staphylococcus aureus (MRSA) infection [Online].
Available: netdoctor.co.uk/diseases/facts/mrsa.htm
[6] MRSA Bacteria: Facts, Information & Treatment [Online]. Available:
disabled-world.com/health/mrsa/