Upload
olga-scrivner
View
55
Download
2
Embed Size (px)
Citation preview
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Interactive Visual Data AnalysisPart One
Language Variation Suite
Olga Scrivner
Indiana University
Workshop in Methods1 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Outline
1 Introduce a web application for quantitativeanalysis: LVS
2 Develop practical skills
3 Understand and interpret advanced statistical modelsPositionSentence
p < 0.001
1
ind, pre post
Heavinessp = 0.003
2
≤ 1 > 1
Periodp < 0.001
3
≤ 1 > 1
Node 4 (n = 81)
VO
OV
00.20.40.60.81
Node 5 (n = 119)
VO
OV
00.20.40.60.81
Node 6 (n = 181)
VO
OV
00.20.40.60.81
Periodp < 0.001
7
≤ 2 > 2
Node 8 (n = 221)
VO
OV
00.20.40.60.81
Focusp < 0.001
9
cf nf
Node 10 (n = 66)
VO
OV
00.20.40.60.81
Main_Verb_Structurep < 0.001
11
ACIOther, Restructuring
Node 12 (n = 43)
VO
OV
00.20.40.60.81
Node 13 (n = 265)
VO
OV
00.20.40.60.81
2 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
What is LVS?
Language Variation Suite
It is a Shiny web application originally designed for dataanalysis in sociolinguistic research.
It can be used for:
Processing spreadsheet data
Reporting in tables and graphs
Analyzing means, regression, conditional trees ...(and much more)
3 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Background
LVS is built in R using Shiny package:
1 R - a free programming language for statistical computingand graphics
2 Shiny App - a web application framework for R
Computational power of R + Web interactivity
4 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Background
http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/
5 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Workspace
Browser
Chrome, Firefox, Safari - recommendable
Explorer may cause instability issues
Accessibility
PC, Mac, Linux
Data files will be uploaded from any location on yourcomputer
Smart Phone
Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)
6 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Server
Since LVS is hosted on a server, Shiny idle time-out settingsmay stop application when it is left inactive (it will grey out).
Solution: Click reload and re-upload your csv file
7 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Data Preparation
Important things to consider before data entry:
File format:
Comma separated value (CSV) - faster processing
Excel format will slow processing
Column names should not contain spaces
Permitted: non-accented characters, numbers, underscore,hyphen, and period
One column must contain your dependent variable
The rest of the columns contain independent variables
8 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Terminology Review
a. Categorical - non-numerical data with two values
yes - no; male - female
b. Continuous - numerical data
duration, age, year
c. Multinomial - non-numerical data with three or morevalues
regions, nationalities
d. Ordinal - scale: currently not supported
9 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Terminology Review
a. Categorical - non-numerical data with two values
yes - no; male - female
b. Continuous - numerical data
duration, age, year
c. Multinomial - non-numerical data with three or morevalues
regions, nationalities
d. Ordinal - scale: currently not supported
9 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Workshop Files
http://ssrc.indiana.edu/seminars/wim.shtml
1 movie metadata.csvSimplified set from https://www.kaggle.com/deepmatrix/
imdb-5000-movie-dataset
2 LVS web site: https://languagevariationsuite.com
10 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Movie Data
Budget
Director
Actor 1
Director facebook likes
Actor 1 facebook likes
Genre
Year
11 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Language Variation Suite - Structure
1 Data
Upload file, data summary, adjust data, cross tabulation
2 Visual Analysis
Plotting, cluster classification
3 Inferential Statistics
Modeling, regression, conditional trees, random forest
12 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Language Variation Suite - Structure
1 Data
Upload file, data summary, adjust data, cross tabulation
2 Visual Analysis
Plotting, cluster classification
3 Inferential Statistics
Modeling, regression, conditional trees, random forest
12 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Upload File
Upload movie metadata.csv
13 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Uploaded Dataset
The data content is imported as a table and allows for sortingcolumns.
14 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Summary
Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.
15 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Data Structure
1 Total number of observations (rows)
2 Number of variables (columns)
3 Variable types
Factor - categorical values
Num - numeric values (0.95, 1.05)
Int - integer values (1, 2, 3)16 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cross Tabulation
Cross-tabulation examines the relationship between variables.
17 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cross Tabulation Plot
18 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cross Tabulation Plot
18 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Adjusting Browser - Layout
Shiny pages are fluid and reactive.
19 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Adjusting Browser - Layout
20 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Language Variation Suite - Structure
1 Data
Upload file, data summary, adjust data, cross tabulation
2 Visual Analysis
Plotting, cluster classification
3 Inferential statistics
Modeling, regression, varbrul analysis, conditional trees,random forest
21 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Visual Analytics
Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”
(Thomas et al. 2005)
22 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
One Variable Plot
23 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
One Variable Plot
23 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Customizing Plot
24 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Customizing Plot
24 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Saving Plot
25 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cluster Plot
Classification of data into sub-groups is based onpairwise similarities
Groups are clustered in the form of a tree-likedendrogram
26 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cluster Plot
27 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Cluster Plot
Group 1 Animation, Biography and Group 2 Action, Drama,Comedy
28 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Inferential Statistics
29 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Language Variation Suite - Structure
1 Data
Upload file, data summary, adjust data, cross tabulation
2 Visual Analysis
Plotting, cluster classification
3 Inferential statistics
Modeling, regression, varbrul analysis, conditional trees,random forest
30 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
How to Create a Regression Model
Step 1 Modeling - create a model with dependent andindependent variables
Step 2 Regression - specify the type of regression (fixed,mixed) and type of dependent variable (binary,continuous, multinomial)
Step 3 Stepwise Regression - compare models(Log-likelihood, AIC, BIC)
Step 4 Conditional Trees - apply non-parametric teststo the model
31 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Modeling
32 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Regression Types
Model
a.) Fixed effect
b.) Mixed effect - individual speaker/token variation (withingroup)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
33 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Regression
34 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Model Output
35 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Model Output
35 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Interpretation: Budget and Genre
Genre Action is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
http://www.free-online-calculator-use.com/scientific-notation-converter.html
36 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Interpretation: Budget and Genre
Genre Action is the reference value
Positive coefficient - positive effect
Negative coefficient - negative effect
http://www.free-online-calculator-use.com/scientific-notation-converter.html
36 / 44
exponential notation:
1.46e-7
0.000000146
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Conditional Tree
Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies
Linear regression: all information is combined linearly
Conditional tree regression: visual splitting to captureinteraction between variables
Recursive splitting (tree branches)37 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Conditional Tree
38 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Conditional Tree
38 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Conditional Tree
1 Genre is the significant factor for budget
2 Budget distribution is split in two groups:
Action and Animation
Biography, Comedy and Drama
3 Budget is significantly higher for Animation and Action
39 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Random Forest
1 Variable importance for predictors
2 Robust technique with small n large p data
3 All predictors considered jointly (allows for inclusion ofcorrelated factors)
40 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Random Forest
Let’s add more factors!
Return to Modeling
Add independent factors: director facebook likes, actor1 facebook likes, title year
41 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Random Forest
Genre is the most important predictor for this model.
Close to zero or red-dotted line (cut off values) - irrelevantfor this model
42 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Random Forest
Genre is the most important predictor for this model.
Close to zero or red-dotted line (cut off values) - irrelevantfor this model
42 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
Let’s Have a Short Break
43 / 44
Introduction
DataPreparation
LVS
Working withData
VisualAnalytics
InferentialAnalysis
References
References I
[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press
[2] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press
[3] Schnapp, Jeffrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0.
[4] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif
[5] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif
[6] http://www.martijnwieling.nl/R/sheets.pdf
44 / 44