49
Lecy Data Driven Management LECTURE 00 Course Overview

Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Embed Size (px)

Citation preview

Page 1: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Lecy ∙ Data Driven Management

LECTURE 00Course

Overview

Page 2: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

When President Dwight Eisenhower established NASA in 1958, he called on the country's top scientists to bring their talents to the government.

Half a century later, when President Barack Obama was elected into office, he issued a similar call to America's scientists, but this time, there is a different mission at stake. Today's government scientists are tasked with deploying the latest technology to bring the government into the digital era, allowing it to more effectively deliver services to the American people.

Page 3: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

A team of engineers, coders, and developers have answered his call, leaving startups and top technology companies across the country for new posts in Washington, D.C.

When we asked members of the tech corp why they chose to make the switch from the private sector to the public sector, they explained that saw an opportunity to use their specialized skills to improve people's lives, from making Healthcare.gov as user-friendly as possible to ensuring that veterans receive support as soon as they need it.http://www.fastcompany.com/3046985/innovation-agents/meet-the-geeks-the-dc-tech-corps-leading-edge

Page 4: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Data-Driven

MANAGEMENTIn Public Organizations

Page 5: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

What is data-driven management?

Page 6: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Can government play moneyball?

Page 7: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

WHAT ISR ?

Page 8: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R Two guys in New Zealand who do not know how to program invent a language, give it away for free. It develops a cult following and takes on billion dollar industry giants like SAS and Stata.

Page 9: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R IS MANY THINGS

• R is a hybrid of a programming language and a stats package

• R is a platform– Operating system (environment) for programs (packages) written

by users– Data engine– Graphing engine

• R is an ecosystem– Packages can build on each other, code can be adapted

• R is a community

• R is a response to the commercialization of scientific knowledge at the expense of science

Page 10: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R IS GOOD AT SOME THINGS

• Rapid development and deployment of programs

• Customized professional graphics

• Open-source paradigm allows you to build on others work– For example, the “fix” command

• Breaking through cost barriers for small companies and students

• There is an amazing variety of packages and datasets (over 7000)– http://cran.r-project.org/web/views/

• Documentation is fairly good

Page 11: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R IS NOT GOOD AT OTHERS

• R is not built for large datasets (although there are now many ways to adapt it to these purposes)

• R is not as fast as compiled programming languages

• Distributed development means that uniform conventions are often not followed concerning function names, arguments, and documentation

• Output is not automatically pretty, so takes some extra time to format (though there are good packages for these purposes)

Page 12: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R EMBRACES OBJECT-ORIENTED PROGRAMMING

# example of plot O-O behavior

x <- 1:100y <- 2*x + rnorm(100,0,10)plot( x, y )

x2 <- cut( x, 5 )plot( x2, y )

m.01 <- lm( y ~ x )plot(m.01)

# example with variance O-O behavior:

dat <- data.frame( x, y )var( x )var( dat )

Page 13: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

WHYR ?

Page 14: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Statistics

Network AnalysisMachine Learning

Text Analysis

GIS

Dynamic Reports

Page 15: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

http://r4stats.com/articles/popularity/

R IS GROWING

Page 16: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

API

Shiny

Page 17: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

COURSE OVERVIEW

Page 18: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

COURSE OBJECTIVES

• Expose you to new and interesting developments in the data programming

world.

• Ability to use R Studio, read R documentation, and write R scripts.

• Ability to write technical notes and report results using R Markdown docs.

• Familiarity with R conventions and the Object Oriented framework.

• Understanding of core data structures of R.

• Understanding of core data programming operations.

• Comfort with the R graphics engine.

• Work with raw data using text functions.

• Understanding of programming fundamentals.

• Create a data dashboard using R Shiny.

• Collaborate in teams using GitHub.

Page 19: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

COURSE OBJECTIVES

• How much can I learn in a semester?

• What does this course prepare me for?

• What to do after taking this course?

https://www.coursera.org/course/rprog

Page 20: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

COURSE SCHEDULE:

Weeks 1-5: Core Data Operations• 1 – Intro• 2 – Data Structures• 3 – Merge Data• 4 – Descriptive Statistics• 5 – Data Input

Weeks 6-9: Visualization• 6 – Principles of Visualization• 7 – Core Graphics• 8 – Advanced Graphics• 9 – Maps and GIS

Weeks 10-12: Programming and Text• 10 – Basic Programming• 11 – Text Analysis• 12 – Text Analysis• 13 – Thanksgiving Break

Weeks 14-15: Building a Dashboard in Shiny• 14 – Intro to Shiny & GitHub• 15 – More Shiny

Page 21: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

REQUIRED TEXTS

• R Cookbook

• The Art of Programming in R

Page 22: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

BLACKBOARD

• Please contact me at [email protected]

(not through Blackboard’s messaging)

• All assignments submitted via Blackboard

Page 23: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

ASSIGNMENTS AND GRADES

Page 24: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

COURSE ORGANIZATION

Labs (10 total):50%

Quizzes (3 total):15%Case Studies (13 total):15%Final Project:20%

Page 25: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

LABS

• Meant to be practice• Graded pass / fail• Due each Tuesday before class• Office hours Mondays 2-3pm• Team work allowed / encouraged• Turn in your own code!• Only submit PDF or webpage complete files (no

HTML or RMD)

Page 26: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

QUIZZES

• Opportunities to consolidate knowledge• In-class, written

Page 27: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

CASE STUDY SUMMARIES:

• Each week there will be a case study of performance measurement, or performance management.

• Submit a 1-2 page summary of important lessons from the case study.

Page 28: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

FINAL PROJECTS:

Create a Data Dashboard

• Teams of 3-5 students• Create a realistic scenario for an organization• Develop 1-3 key performance indicators• Implement a data collection / input process• Write a program to analyze and visualize the data• Create a Shiny app to share the reports

• All of your code will be managed in GitHub

Page 29: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

FOR THURSDAY

• Install R and R Studio• Create an R Markdown document with the following information:

– Your name– Your department and degree– What you hope to take from the class– File New File R Markdown Document– http://www.rstudio.com/ide/docs/authoring/using_markdown

Knit to HTML save to PDF:

• First save the file as a .Rmd file.• Press the “knit to HTML” command.• You have now created an HTML file. Open in a browser and print

to PDF or save as a webpage complete file.• You will turn in the PDF or webpage complete files for homework

assignments. I do NOT want the .Rmd or raw .html files.

Page 30: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

REQUIRED SOFTWARE

Page 31: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

WE WILL BE USING

• The latest version of R (3.2.2 or higher)• R Studio development environment• GitHub (as much as we can)• R Shiny web toolkit

• Various packages throughout the semester– The Lahman Package for the first few weeks

• The textbooks are required and will be used extensively

– The R Cookbook– The Art of R Programming

Page 32: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

github

“Software engineers will pay monthly fees for the rest of their lives in order to create free software out of other free software!”

Some examples:A short tutorial for using the ‘twitteR’ package:

https://sites.google.com/site/miningtwitter/questions/talking-about

https://github.com/gastonstat/Mining_Twitter

Hadley Wickam (he created R Studio):

https://github.com/hadley

Page 33: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

VERSION CONTROL 101

Page 34: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

This code was added

This code was deleted

Page 35: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

SUPPORTS CONCURRENT DEVELOPMENT

Page 36: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

GRAPHICS

Page 37: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Two population density measures compared. Migration patterns of birds.

Page 38: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

OBJECTIVES

• Reflect on good visualization practices

• Understand ground, figure, and narrative on charts

• Learn the core functions of the graphics suite

• Learn how to customize graphs and create high quality images

• Touch on some nice mapping packages

Page 39: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

WRITING CLEAR CODE

Page 40: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

Donaudampfschiffahrtsgesellschaftskapitän

“Danube steamship company captain”

summary(lm(dat$crime[20:50]~bin(dat[20:50],”pop”],10)))

VS.

y.sub <- dat[ 20:50 , “crime” ]x.sub <- dat[ 20:50, “pop” ]x.bin <- bin( x.sub, 10 )lm.01 <- lm( y.sub ~ x.bin )summary( lm.01 )

THE R STYLE GUIDE

Page 41: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

THE ‘LAHMAN’ PACKAGE

Page 42: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

THE ART OF CREATING GRAPHICS:

http://chartsnthings.tumblr.com/post/22471358872/sketches-how-mariano-rivera-compares-to-baseballs

Page 43: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

FROM THE NTY BLOG, CHARTSNTHINGS

http://chartsnthings.tumblr.com/post/47670081904/climate-change-crowbars-and-strikeouts

Page 44: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

MISCELLANEOUS ANALYSIS

Page 45: Lecy ∙ Data Driven Management LECTURE 00 Course Overview
Page 46: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

WHAT ISobject-oriented ?

Page 47: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R EMBRACES OBJECT-ORIENTED PROGRAMMING

# A function to make cookies:

  make.cookies <- function( flour, eggs, sugar )  {

     # these steps give the operations

     batter <- mix( flours, eggs, sugar )

     baked.goods <- bake( batter, temp=450 )

     return( baked.goods )

  }  

# Each step of the recipe is a separate# function.  Here "mix" and "bake" are # defined elsewhere as “mix.R” and “bake.R”.

Page 48: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

# When you want to call the function you give # specific instances of the inputs

  cookies.01 <- make.cookies( flour.01,                       eggs.01, sugar.01)

# Because R is object-oriented, you not only need# to call the function but you need to give a name# to the final product.  A new data object is created# after each function is performed.

R EMBRACES OBJECT-ORIENTED PROGRAMMING

Page 49: Lecy ∙ Data Driven Management LECTURE 00 Course Overview

R EMBRACES OBJECT-ORIENTED PROGRAMMING

# example of plot O-O behavior

x <- 1:100y <- 2*x + rnorm(100,0,10)plot( x, y )

x2 <- cut( x, 5 )plot( x2, y )

m.01 <- lm( y ~ x )plot(m.01)

# example with variance O-O behavior:

dat <- data.frame( x, y )var( x )var( dat )