53
Introduction Data Preparation LVS Working with Data Visual Analytics Inferential Analysis References Interactive Visual Data Analysis Part One Language Variation Suite Olga Scrivner Indiana University Workshop in Methods 1 / 44

Workshop on Quantitative Analytics Using Interactive On-line Tool

Embed Size (px)

Citation preview

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interactive Visual Data AnalysisPart One

Language Variation Suite

Olga Scrivner

Indiana University

Workshop in Methods1 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Outline

1 Introduce a web application for quantitativeanalysis: LVS

2 Develop practical skills

3 Understand and interpret advanced statistical modelsPositionSentence

p < 0.001

1

ind, pre post

Heavinessp = 0.003

2

≤ 1 > 1

Periodp < 0.001

3

≤ 1 > 1

Node 4 (n = 81)

VO

OV

00.20.40.60.81

Node 5 (n = 119)

VO

OV

00.20.40.60.81

Node 6 (n = 181)

VO

OV

00.20.40.60.81

Periodp < 0.001

7

≤ 2 > 2

Node 8 (n = 221)

VO

OV

00.20.40.60.81

Focusp < 0.001

9

cf nf

Node 10 (n = 66)

VO

OV

00.20.40.60.81

Main_Verb_Structurep < 0.001

11

ACIOther, Restructuring

Node 12 (n = 43)

VO

OV

00.20.40.60.81

Node 13 (n = 265)

VO

OV

00.20.40.60.81

2 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

What is LVS?

Language Variation Suite

It is a Shiny web application originally designed for dataanalysis in sociolinguistic research.

It can be used for:

Processing spreadsheet data

Reporting in tables and graphs

Analyzing means, regression, conditional trees ...(and much more)

3 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Background

LVS is built in R using Shiny package:

1 R - a free programming language for statistical computingand graphics

2 Shiny App - a web application framework for R

Computational power of R + Web interactivity

4 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Background

http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/

5 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Workspace

Browser

Chrome, Firefox, Safari - recommendable

Explorer may cause instability issues

Accessibility

PC, Mac, Linux

Data files will be uploaded from any location on yourcomputer

Smart Phone

Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)

6 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Server

Since LVS is hosted on a server, Shiny idle time-out settingsmay stop application when it is left inactive (it will grey out).

Solution: Click reload and re-upload your csv file

7 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Data Preparation

Important things to consider before data entry:

File format:

Comma separated value (CSV) - faster processing

Excel format will slow processing

Column names should not contain spaces

Permitted: non-accented characters, numbers, underscore,hyphen, and period

One column must contain your dependent variable

The rest of the columns contain independent variables

8 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; male - female

b. Continuous - numerical data

duration, age, year

c. Multinomial - non-numerical data with three or morevalues

regions, nationalities

d. Ordinal - scale: currently not supported

9 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; male - female

b. Continuous - numerical data

duration, age, year

c. Multinomial - non-numerical data with three or morevalues

regions, nationalities

d. Ordinal - scale: currently not supported

9 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Workshop Files

http://ssrc.indiana.edu/seminars/wim.shtml

1 movie metadata.csvSimplified set from https://www.kaggle.com/deepmatrix/

imdb-5000-movie-dataset

2 LVS web site: https://languagevariationsuite.com

10 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Movie Data

Budget

Director

Actor 1

Director facebook likes

Actor 1 facebook likes

Genre

Year

11 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential Statistics

Modeling, regression, conditional trees, random forest

12 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Upload File

Upload movie metadata.csv

13 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Uploaded Dataset

The data content is imported as a table and allows for sortingcolumns.

14 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Summary

Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.

15 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Data Structure

1 Total number of observations (rows)

2 Number of variables (columns)

3 Variable types

Factor - categorical values

Num - numeric values (0.95, 1.05)

Int - integer values (1, 2, 3)16 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation

Cross-tabulation examines the relationship between variables.

17 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation Plot

18 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cross Tabulation Plot

18 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Adjusting Browser - Layout

Shiny pages are fluid and reactive.

19 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Adjusting Browser - Layout

20 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

21 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Visual Analytics

Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas et al. 2005)

22 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

One Variable Plot

23 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

One Variable Plot

23 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Customizing Plot

24 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Customizing Plot

24 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Saving Plot

25 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

Classification of data into sub-groups is based onpairwise similarities

Groups are clustered in the form of a tree-likedendrogram

26 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

27 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Cluster Plot

Group 1 Animation, Biography and Group 2 Action, Drama,Comedy

28 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Inferential Statistics

29 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Language Variation Suite - Structure

1 Data

Upload file, data summary, adjust data, cross tabulation

2 Visual Analysis

Plotting, cluster classification

3 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

30 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

How to Create a Regression Model

Step 1 Modeling - create a model with dependent andindependent variables

Step 2 Regression - specify the type of regression (fixed,mixed) and type of dependent variable (binary,continuous, multinomial)

Step 3 Stepwise Regression - compare models(Log-likelihood, AIC, BIC)

Step 4 Conditional Trees - apply non-parametric teststo the model

31 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Modeling

32 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Regression Types

Model

a.) Fixed effect

b.) Mixed effect - individual speaker/token variation (withingroup)

Type of Dependent Variable

a.) Binary/categorical (only two values)

b.) Continuous (numeric)

c.) Multinomial - categorical with more than two values

33 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Regression

34 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Model Output

35 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Model Output

35 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interpretation: Budget and Genre

Genre Action is the reference value

Positive coefficient - positive effect

Negative coefficient - negative effect

http://www.free-online-calculator-use.com/scientific-notation-converter.html

36 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Interpretation: Budget and Genre

Genre Action is the reference value

Positive coefficient - positive effect

Negative coefficient - negative effect

http://www.free-online-calculator-use.com/scientific-notation-converter.html

36 / 44

exponential notation:

1.46e-7

0.000000146

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies

Linear regression: all information is combined linearly

Conditional tree regression: visual splitting to captureinteraction between variables

Recursive splitting (tree branches)37 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

38 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

38 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Conditional Tree

1 Genre is the significant factor for budget

2 Budget distribution is split in two groups:

Action and Animation

Biography, Comedy and Drama

3 Budget is significantly higher for Animation and Action

39 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

1 Variable importance for predictors

2 Robust technique with small n large p data

3 All predictors considered jointly (allows for inclusion ofcorrelated factors)

40 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Let’s add more factors!

Return to Modeling

Add independent factors: director facebook likes, actor1 facebook likes, title year

41 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Genre is the most important predictor for this model.

Close to zero or red-dotted line (cut off values) - irrelevantfor this model

42 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Random Forest

Genre is the most important predictor for this model.

Close to zero or red-dotted line (cut off values) - irrelevantfor this model

42 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

Let’s Have a Short Break

43 / 44

Introduction

DataPreparation

LVS

Working withData

VisualAnalytics

InferentialAnalysis

References

References I

[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press

[2] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press

[3] Schnapp, Jeffrey, and Peter Presner. 2009. Digital Humanities Manifesto 2.0.

[4] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif

[5] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif

[6] http://www.martijnwieling.nl/R/sheets.pdf

44 / 44