Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Utilizing computational tools to visually explore multivariable datasets: a
Lassa fever case study in Nigeria Sara Asad, Andres Colubri, Pardis Sabeti
FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138
Abstract Background
select plot type select range
indication of
significance based on
selected P-value
add a variable to column for
comparison / use variable as a
covariate / sort columns based on
mutual information (MI)
adjust
composite
coefficient
add to composite
adjust composite range
Results
MathWorks: Neural
Network Toolbox
%% This script assumes these
variables are defined:
% % clinical_data_inputs - input data.
%% clinical_data_targets - target data.
for k=1:10
k
if k == 1
inputs = day_inputs1;
targets = day_targets1;
elseif k == 2
inputs = day_inputs2;
targets = day_targets2;
8 66.7%
1 8.3%
2 16.7%
1 8.3%
83.3% 16.7%
66.7% 33.3%
88.9% 11.1%
88.9% 11.1%
66.7% 33.3%
8 66.7%
1 8.3%
2 16.7%
1 8.3%
83.3% 16.7%
66.7% 33.3%
88.9% 11.1%
88.9% 11.1%
66.7% 33.3%
8 66.7%
1 8.3%
2 16.7%
1 8.3%
83.3% 16.7%
66.7% 33.3%
88.9% 11.1%
88.9% 11.1%
66.7% 33.3%
8 66.7%
1 8.3%
2 16.7%
1 8.3%
83.3% 16.7%
66.7% 33.3%
88.9% 11.1%
88.9% 11.1%
66.7% 33.3%
8 66.7%
1 8.3%
2 16.7%
1 8.3%
83.3% 16.7%
66.7% 33.3%
88.9% 11.1%
88.9% 11.1%
66.7% 33.3% P
red
icte
d C
lass
Re
co
ve
red
D
ied
Actual Class
1, 2, 3…50
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Pe
rce
nta
ge
(%
) Recovered
Died
80
82
84
86
88
90
92
94
96
98
100
1 2 3 4 5 6 7 8 9 10
Ac
cu
rac
y (
%)
Days of Treatment
Single
Recurring
Figure 7. The accuracy of outcome based on daily vital signs recorded during the course of treatment. Points labeled single used data on that specific day of treatment whereas points labeled recurring incorporate the patients’ vital signs on previous days of treatment.
Figure 1. Case-fatality rate for the top nine variables from Zye correlation ranking. The threshold for blood urea nitrogen is above 31.0 (mg/dl), creatine above 6.00 (mg/dl), granulocytes above 9.50 (%) and white blood cell concentration above 18.0 (103/mm3).
0%
20%
40%
60%
80%
100%
21 and under 22 - 34 35 - 44 45 - 54 55 - 64 65 and above
Pe
rce
nta
ge
(%
)
Age of Patient
0%10%20%30%40%50%60%70%80%90%
100%
1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 19 - 21 22-24 25-28
Pe
rce
nta
ge
(%
)
Days of Fever Before Presentation
Figures 2, 3. The effect of a patients’ age and the days of fever before presentation at the hospital on CFR.
133
80.1%
6
3.6%
25
15.1%
2
1.2%
95.2%
4.8%
92.6%
7.4%
95.7%
4.3%
98.5%
1.5%
80.6%
19.4%
Pre
dic
ted
Cla
ss
Re
co
ve
red
D
ied
Actual Class
Figure 4. The outcome confusion matrix using SCNS, ARF, ENC, and HYPO followed by its corresponding sensitivity and specificity.
Recovered Died
Sensitivity 98.5% 80.6%
Specificity 95.7% 92.6%
1.00E-12
1.00E-11
1.00E-10
1.00E-09
1.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
1.00E-01
1.00E+00
P-V
alu
e (l
og
-sc
ale
)
Figure 6. P-values for the degree of significance between discrete clinical variables and patient outcome using Fisher’s Exact Test. Variables above the line indicate a P-value less than 0.01.
First and foremost, I would like to thank my advisor, Dr. Andres Colubri. Without his guidance and support this project would not have been possible. I am grateful to Dr. Pardis Sabeti, all members of the Sabeti lab, and everyone at Fathom for this opportunity and valuable experience. This work was generously supported by the FAS Center for Systems Biology and HHMI Interdisciplinary Undergraduate Fellows Program.
• SCNS, ENC, ARF, and HYPO are the four discrete variables with highest correlation with patient outcome.
• These four variables have high predictive powers with a sensitivity of 98.5% and 80.6% and a specificity of 95.7% and 92.6%, for recovered and deceased patients respectively.
• Older patients have higher case fatality rate. For instance, patients 21 and under have case fatality rate (CFR) of 8.3%, patients between the age of 45 and 54 have CFR of 26.9% and patients 65 and over have CFR of 36.4%
• Although it was thought to have a significant impact on CFR, there is no clear correlation between the days of fever prior to hospital presentation and a patient’s outcome.
• A physician can more precisely determine outcome among patients who share similar clinical diagnosis using creatinine and lactate dehydrogenase lab results as shown in Figure 5.
• The difference in predicting outcome between single and recurring treatment data of patients vital signs is overall not significant, however there is a distinct peak on Day 9 for recurring data. In both cases there is a definite trend for increasing accuracy as treatment progresses.
• Zye is a powerful and precise tool for visually exploring and assessing multivariable data sets for correlations. For this Lassa fever case-study, Zye allows us to elegantly and intelligently generate hypotheses based on epidemiological data of Lassa fever patients.
The Irrua Specialist Teaching Hospital (ISTH) in Nigeria documents extensive and comprehensive epidemiological data of patients in the Lassa fever ward. Information from this epidemiological data is used in determining the case-fatality rate associated with Lassa fever patients, and most importantly, finding which parameters strongly predict patient outcome. This project presents a visualization software, Zye, which allows users to efficiently scan large datasets through visual analysis. In our results, clinical diagnosis of severe central nervous system disease
(SCNS), encephalopathy (ENC), acute renal failure (ARF), and hypotension (HYPO) most closely prognosticate outcome. Zye provides users an intuitive understanding of associations in multivariable datasets using mutual information ranking and an exploratory visual interface. The ability to extricate meaningful associations and generate novel hypotheses from large datasets has important implications for future epidemiological studies and more broadly, health sciences research.
Lassa fever is a viral hemorrhagic fever endemic in West African countries including Nigeria, Sierra Leone, and Guinea. The Lassa virus is a negative-strand RNA virus carried in small African rodents, the multimammate rat, which serve as the virus’ main reservoir. An estimated 100,000 to 300,000 people are infected by the Lassa virus every year in West Africa.1 The Irrua Specialist Teaching Hospital (ISTH) in Nigeria has on-site diagnostics and treatment for Lassa fever patients. Clinical data of Lassa patients from January 2011 until June 2013 is utilized in this case study.
1. Asogun DA, Adomeh DI, Ehimuan J, Odia I, Hass M, et al. (2012) Molecular Diagnostics for Lassa Fever at Irrua Specialist Teaching Hospital, Nigeria: Lessons Learnt from Two Years of Laboratory Operation. PLoS Negl Trop Dis 6(9): e1839.
2. Center for Disease Control. 2013. Lassa Fever. [Online]. Available: http://www.cdc.gov/ncidod/dvrd/spb/mnpages/dispages/lassaf.htm.Accessed August 8, 2013.
3. Khan SH, Goba A, Chu M, Roth C, et al. (2008) New opportunities for field research on the pathogenesis and treatment of Lassa fever. Antiviral Res 78: 103–115.
4. McCormick JB, King IJ, Webb PA, Johnson KM, et al. (1987) A case-control study of the clinical diagnosis and course of Lassa fever. J Infect Dis 155: 445–455.
5. McCormick JB, Webb PA, Krebs JW, Johnson KM, Smith ES (1987) A prospective study of the epidemiology and ecology of Lassa fever. J Infect Dis 155: 437–444.
Training Data Set
Zye Interface
assign P-value
Recovered Died
Sensitivity and Specificity Figure 5. Creatinine and lactate dehydrogenase lab results of patients diagnosed with three or more of the following : SCNS, ENC, ARF, and HYPO. (FR – first lab result of the patient, LR – last available lab result of patient)
0.30 4.00 7.70 11.40
Died
Recovered
152.0 1,444.0 2,736.0 4,028.0
Died
Recovered
LR - Creatinine
(mg/dl)
133.0 1,429.8 2,726.0 4,023.3 Died
Recovered
FR - Lactate
Dehydrogenase
(IU/L)
LR - Lactate
Dehydrogenase
(IU/L)
Conclusions
Acknowledgements
References
Recovered Died
Sensitivity
88.9% 66.7%
Specificity
88.9% 66.7%