20
@JennyBryan @jennybc learn to love the data frame @STAT545 http://stat545.com

learn to love the data frame

  • Upload
    dinhbao

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: learn to love the data frame

@JennyBryan@jennybc

learn to love the data frame

@STAT545http://stat545.com

Page 2: learn to love the data frame

decision fatigueaggravationcutting corners

masteryefficiencysafety

😎

😭

Page 3: learn to love the data frame

decision fatigueaggravationcutting corners

masteryefficiencysafetyI want this for you!

😎

😭

Page 4: learn to love the data frame

Next 45 mins:I want to convince you that“when in doubt, stick it in a data frame”willincrease yourmastery, efficiency, safetyand reduce misery.

Page 5: learn to love the data frame

Three tricky bits:- wide range of previous R experience- existing BioC context re: classes- tibble variant of data frame very new

Page 6: learn to love the data frame

do first bit of pirates vs ninjas live coding

Page 7: learn to love the data frame

Simple view

Technically correct R view

mode class typeof

character character character character

logical logical logical logical

numeric numeric integer or numeric

integer or double

factor numeric factor integer

R objects come in a few flavours a simple view of simple R objects that will get you pretty far

Page 8: learn to love the data frame

Simple view

Technically correct R view

mode class typeof

character character character character

logical logical logical logical

numeric numeric integer or numeric

integer or double

factor numeric factor integer

R objects come in a few flavours a simple view of simple R objects that will get you pretty far

Page 9: learn to love the data frame

Simple view

Technically correct R view

mode class typeof

character character character character

logical logical logical logical

numeric numeric integer or numeric

integer or double

factor numeric factor integer

R objects come in a few flavours a simple view of simple R objects that will get you pretty far

Page 10: learn to love the data frame

vector

list

Collecting scalars of the same mode?

yes

no

matrixyes

Collecting vectors of the same length?

no

Collecting vectors of the same mode?data.frame

yesno

Page 11: learn to love the data frame

factor integer numeric

Page 12: learn to love the data frame

😎

If you can put it in a data frame, DO THAT.

Operate on the data frame holistically.

Pass it to other functions, pref. intact and whole.

Learn how to limit computation to specific rows or columns. Don’t create copies or excerpts lightly.

Page 13: learn to love the data frame

minimize the creation of data excerpts and copies ...

... they will just confuse you later

Page 14: learn to love the data frame

back to pirates vs ninjas live codingbut with the tidyverse

Page 15: learn to love the data frame

😎

If you can put it in a data frame, DO THAT.

Develop your preferred workflow for df manipulation and use it maniacally.

Strong recommend: use the “tidyverse” - tibble + dplyr + tidyr - tbl_df or “tibble” is a variant of data.frame

Page 16: learn to love the data frame

Where do data frames come from?

Page 17: learn to love the data frame

Delimited fileread.table() and friendsreadr package

Coercionas.data.frame()as_tibble()

Assembly from vector partsdata.frame()tibble::tibble(), tibble::enframe()

Growing existing objecttransform()dplyr::mutate()

Page 18: learn to love the data frame

Things you need to know about tibbles:

no partial name matching with `$`stringsAsFactors = FALSEdf[ , “X1”] will be a tibble, i.e. drop = FALSEyou can print them with wild abandonno row namesdo not alter with variable nameswill only recycle input of length 1

Page 19: learn to love the data frame

Preview of Friday “what should you do next?”

What if you need to divide the df into its rows or groups of rows and compute on them?

Putting complicated objects into list-columns, temporarily. How and why?

Page 20: learn to love the data frame

decision fatigueaggravationcutting corners

😎😭masteryefficiencysafety