9
Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using R) horse_colic_raw <- read.table("~/F2018- Data Analytics - Wilhelm/A2/horse-colic.data", quote="\"", comment.char="") View(horse_colic_raw) attach(horse_colic_raw) horse_data <- horse_colic_raw[c(4,5,13,23)] View(horse_data) colnames(horse_data)[1] <- "rectal_temp" colnames(horse_data)[2] <- "pulse" colnames(horse_data)[3] <- "abdominal_distension" colnames(horse_data)[4] <- "outcome" horse_data[horse_data=="?"]<-NA write.table(horse_data,"D:/horse_data.txt",sep="\t") Then load into Tableau for visualisation

Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Data Analytics – Assignment 2, Group 3

Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku

Parsing the data (Using R)

horse_colic_raw <- read.table("~/F2018- Data Analytics - Wilhelm/A2/horse-colic.data", quote="\"",

comment.char="")

View(horse_colic_raw)

attach(horse_colic_raw)

horse_data <- horse_colic_raw[c(4,5,13,23)]

View(horse_data)

colnames(horse_data)[1] <- "rectal_temp"

colnames(horse_data)[2] <- "pulse"

colnames(horse_data)[3] <- "abdominal_distension"

colnames(horse_data)[4] <- "outcome"

horse_data[horse_data=="?"]<-NA

write.table(horse_data,"D:/horse_data.txt",sep="\t")

Then load into Tableau for visualisation

Page 2: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Single Dimension/Measure Visual Encoding

Ordinal (Dimension)

This visualisation utilises a column chart to show the comparison between the initial abdominal

distension of the horse patients. We can see that abdominal distension exhibits a somewhat linear

behaviour where the severity of abdominal distension is negatively correlation to likelihood of

occurrence in horse patients. We have chosen size as a visual encoding for the ordinal dimension.

Page 3: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Categorical (Dimension)

This visualisation clearly exhibits how 59.3% of the 300 horses survived their treatment, 25.67% died

during treatment, 14.67% were euthanised and 0.33% (1 horse) is missing its outcome record. We can

conclude that the majority of horses do survive their treatment from this pie chart. We have chosen

colour and size visual encodings for the categorical dimension.

Page 4: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Continuous Quantitative (Measure)

For this visualisation, we have plotted the observed rectal temperatures individually using a box and

whiskers plot. The visualisation conveys how the data is rather normally dsitributed with a smaller

range. The maximum and minimum values are labelled in the visualisation. We have chosen location as

the visual encoding for this continuous quantitative measure and have included observation ID as a

dimension to disaggregate the data into individual points.

Page 5: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Discrete Quantitative (Measure)

For this visualisation, we have plotted the discrete pulse values individually using a box and whiskers

plot. The visualisation shows that the data is right skewed as the top whisker is longer than the bottom

whisker. The maximum and minimum values are also labelled in this visualisation. Similar to our

previous approach for the continuous attribute, we have chosen location as the visual encoding for this

discrete quantitative measure and have included observation ID as a dimension to disaggregate the data

into individual points.

Page 6: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Pairwise Visualisations

In the following visualisations we have plotted the quantitative attributes against the qualitative

attributes. Since the quantitative attributes have to be aggregated, we have chosen to plot the average

of each quantitative attribute in order to determine whether any linear relationship occurs. We decided

that the most suitable pairwise visual encoding for quantitative variables against an ordinal variable is

location against location as it satisfies the guidelines for both attributes. For quantitative variables

against a categorical variable, we have chosen the pairwise visual encoding of location against colour as

this clearly illustrates where each category falls in the quantitative scale. For our qualitative pairwise

visual encodings we have chosen size and location for our ordinal variable and colour for our categorical

variable. We have chosen the column chart visualisation as we are comparing the number of records

which involves summation.

Average Pulse by Outcome

From this visualisation, we can infer how horses with a higher pulse on average are more likely to die or

be euthanised.

Page 7: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Average Rectal Temperature by Outcome

According to this visualisation, average rectal temperature appears to be similar across all outcomes of

the treatment as the range is minuscule. We can conclude that difference in rectal temperature has a

minimal to no effect on the outcome of the treatment.

Average Rectal Temp. by Abdominal Distension

From this visualisation, we can see that the average rectal temperature rises slightly with increasing

abdominal distension.

Page 8: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Average Pulse by Abdominal Distension

This visualisation portrays a linear relationship between average pulse and abdominal distension. We

can see that average pulse rises with increasing abdominal distension.

Abdominal Distension by Outcome

This visualisation shows that horse patients with none to slight abdominal distension are more likely to

survive treatment. Euthanising also seems to be more common with horses experiencing moderate to

severe abdominal distension.

Page 9: Data Analytics Assignment 2, Group 3 · Data Analytics – Assignment 2, Group 3 Tanasorn Chindasook, Prateek Choudhary, Gari Jose Ciodaro Guerra, Eno Ciraku Parsing the data (Using

Bonus (Interaction):

https://public.tableau.com/profile/prateek.kumar.choudhary#!/vizhome/DA-Asmnt2/Dashboard1

Interactions allowed: Sorting the data based on Number of Horses - Allows the user to easily see which abdominal condition had the most number of cases. Selection of different colored sections - Allows user to highlight and view particular outcomes. Exploration - Users can mouse over a section of the graph get a popup displaying the exact count of horses for that particular case Filtering - The selection checkboxes to the right allow users to customize the graph for only the selected outcomes and/or age they want to view the graph for.