Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Data Analytics – Assignment 2, Group 3
Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku
Parsing the data (Using R)
horse_colic_raw <- read.table("~/F2018- Data Analytics - Wilhelm/A2/horse-colic.data", quote="\"",
comment.char="")
View(horse_colic_raw)
attach(horse_colic_raw)
horse_data <- horse_colic_raw[c(4,5,13,23)]
View(horse_data)
colnames(horse_data)[1] <- "rectal_temp"
colnames(horse_data)[2] <- "pulse"
colnames(horse_data)[3] <- "abdominal_distension"
colnames(horse_data)[4] <- "outcome"
horse_data[horse_data=="?"]<-NA
write.table(horse_data,"D:/horse_data.txt",sep="\t")
Then load into Tableau for visualisation
Single Dimension/Measure Visual Encoding
Ordinal (Dimension)
This visualisation utilises a column chart to show the comparison between the initial abdominal
distension of the horse patients. We can see that abdominal distension exhibits a somewhat linear
behaviour where the severity of abdominal distension is negatively correlation to likelihood of
occurrence in horse patients. We have chosen size as a visual encoding for the ordinal dimension.
Categorical (Dimension)
This visualisation clearly exhibits how 59.3% of the 300 horses survived their treatment, 25.67% died
during treatment, 14.67% were euthanised and 0.33% (1 horse) is missing its outcome record. We can
conclude that the majority of horses do survive their treatment from this pie chart. We have chosen
colour and size visual encodings for the categorical dimension.
Continuous Quantitative (Measure)
For this visualisation, we have plotted the observed rectal temperatures individually using a box and
whiskers plot. The visualisation conveys how the data is rather normally dsitributed with a smaller
range. The maximum and minimum values are labelled in the visualisation. We have chosen location as
the visual encoding for this continuous quantitative measure and have included observation ID as a
dimension to disaggregate the data into individual points.
Discrete Quantitative (Measure)
For this visualisation, we have plotted the discrete pulse values individually using a box and whiskers
plot. The visualisation shows that the data is right skewed as the top whisker is longer than the bottom
whisker. The maximum and minimum values are also labelled in this visualisation. Similar to our
previous approach for the continuous attribute, we have chosen location as the visual encoding for this
discrete quantitative measure and have included observation ID as a dimension to disaggregate the data
into individual points.
Pairwise Visualisations
In the following visualisations we have plotted the quantitative attributes against the qualitative
attributes. Since the quantitative attributes have to be aggregated, we have chosen to plot the average
of each quantitative attribute in order to determine whether any linear relationship occurs. We decided
that the most suitable pairwise visual encoding for quantitative variables against an ordinal variable is
location against location as it satisfies the guidelines for both attributes. For quantitative variables
against a categorical variable, we have chosen the pairwise visual encoding of location against colour as
this clearly illustrates where each category falls in the quantitative scale. For our qualitative pairwise
visual encodings we have chosen size and location for our ordinal variable and colour for our categorical
variable. We have chosen the column chart visualisation as we are comparing the number of records
which involves summation.
Average Pulse by Outcome
From this visualisation, we can infer how horses with a higher pulse on average are more likely to die or
be euthanised.
Average Rectal Temperature by Outcome
According to this visualisation, average rectal temperature appears to be similar across all outcomes of
the treatment as the range is minuscule. We can conclude that difference in rectal temperature has a
minimal to no effect on the outcome of the treatment.
Average Rectal Temp. by Abdominal Distension
From this visualisation, we can see that the average rectal temperature rises slightly with increasing
abdominal distension.
Average Pulse by Abdominal Distension
This visualisation portrays a linear relationship between average pulse and abdominal distension. We
can see that average pulse rises with increasing abdominal distension.
Abdominal Distension by Outcome
This visualisation shows that horse patients with none to slight abdominal distension are more likely to
survive treatment. Euthanising also seems to be more common with horses experiencing moderate to
severe abdominal distension.
Bonus (Interaction):
https://public.tableau.com/profile/prateek.kumar.choudhary#!/vizhome/DA-Asmnt2/Dashboard1
Interactions allowed: Sorting the data based on Number of Horses - Allows the user to easily see which abdominal condition had the most number of cases. Selection of different colored sections - Allows user to highlight and view particular outcomes. Exploration - Users can mouse over a section of the graph get a popup displaying the exact count of horses for that particular case Filtering - The selection checkboxes to the right allow users to customize the graph for only the selected outcomes and/or age they want to view the graph for.