49
Dat: version and share your data Karissa McKelvey Software Developer and Project Manager and Science Evangelist and Designer (I wear a lot of hats) U.S. Open Data @karissamck

Dat 5 minute Lightning Talk

Embed Size (px)

Citation preview

Page 1: Dat 5 minute Lightning Talk

Dat: version and share your data

Karissa McKelveySoftware Developer and Project Manager and Science Evangelist and Designer (I wear a lot of hats) U.S. Open Data

@karissamck

Page 2: Dat 5 minute Lightning Talk

karissa $ ~

Page 3: Dat 5 minute Lightning Talk

dat is a non profit

Page 4: Dat 5 minute Lightning Talk
Page 5: Dat 5 minute Lightning Talk

Reproducible Research

Page 6: Dat 5 minute Lightning Talk

“A rule of thumb … is that half of published research cannot be replicated”

Page 7: Dat 5 minute Lightning Talk

How do we replicate research today?

Page 8: Dat 5 minute Lightning Talk

How do we replicate research today?collaborate on

Page 9: Dat 5 minute Lightning Talk

How do we replicate research today?collaborate on

data analysis

Page 10: Dat 5 minute Lightning Talk

How do we collaborate today?

Page 11: Dat 5 minute Lightning Talk

How do we collaborate today?

Page 12: Dat 5 minute Lightning Talk

How do we collaborate today?

Page 13: Dat 5 minute Lightning Talk

How do we collaborate today?

Page 14: Dat 5 minute Lightning Talk

????????

Page 15: Dat 5 minute Lightning Talk
Page 16: Dat 5 minute Lightning Talk
Page 17: Dat 5 minute Lightning Talk
Page 18: Dat 5 minute Lightning Talk
Page 19: Dat 5 minute Lightning Talk
Page 20: Dat 5 minute Lightning Talk
Page 21: Dat 5 minute Lightning Talk
Page 22: Dat 5 minute Lightning Talk

How do we replicate research today?

Page 23: Dat 5 minute Lightning Talk
Page 24: Dat 5 minute Lightning Talk

me@home $ dat push me@campus $ dat pull

you@work $ dat clone

Page 25: Dat 5 minute Lightning Talk

dat workflow• import

• version

• publish

• replicate

Page 26: Dat 5 minute Lightning Talk

.csv.csvdata

you

Page 27: Dat 5 minute Lightning Talk

.csv.csvdata

you

Page 28: Dat 5 minute Lightning Talk

.csv.csvdata

you

Page 29: Dat 5 minute Lightning Talk

.csv.csvdata

import

you

Page 30: Dat 5 minute Lightning Talk

$ dat init

$ dat add dataset cities

$ dat add rows cities cities.csv

$ dat add files cities city_model.gz

import

Page 31: Dat 5 minute Lightning Talk

.csv.csvdata

import

http://my-data.bids.edu

you

Page 32: Dat 5 minute Lightning Talk

$ dat listen

Page 33: Dat 5 minute Lightning Talk

.csv.csvdata

import

http://my-data.bids.edu

publish

you

Page 34: Dat 5 minute Lightning Talk

.csv.csvdata

import

http://my-data.bids.edu

publish

you

Page 35: Dat 5 minute Lightning Talk

$ dat clone

Page 36: Dat 5 minute Lightning Talk

.csv.csvdata

import

http://my-data.bids.edu

publish

.csv.csvdata

you

Page 37: Dat 5 minute Lightning Talk

Versioning

$ dat add files cities us_cities_viz.pngThis will override us_cities_viz.png at c2342. OK?

$ dat cities add rows updated_data.csvThis will update 3,434,245 rows. OK?

$ dat push

Page 38: Dat 5 minute Lightning Talk

.csv.csvdata

import

http://my-data.bids.edu

publish

.csv.csvdata

you

Page 39: Dat 5 minute Lightning Talk

http://my-data.bids.edu

publish

.csv.csvdata

http://my-data.indiana.edu

Page 40: Dat 5 minute Lightning Talk

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyINTEROPERABILITY in Python and R

Page 41: Dat 5 minute Lightning Talk

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyECOSYSTEM

Page 42: Dat 5 minute Lightning Talk

• Goal: manipulate datasets with scripting

• Supported keywords: run, pipe, map, reduce, fork, keyword

• Bash-like

• Platform-independent

• Uses node.js streams (fast!)

Datscript

Page 43: Dat 5 minute Lightning Talk

Top: Datscript “pipe” command Bottom: Equivalent command in bash

Datscript: pipeline example

Page 44: Dat 5 minute Lightning Talk

Datscript: example commands

background - executes command, but doesn’t wait for it to finish map- pipes first argument into rest of arguments

run- a serial command (executes and finishes command)

Page 45: Dat 5 minute Lightning Talk
Page 46: Dat 5 minute Lightning Talk
Page 47: Dat 5 minute Lightning Talk

Karissa McKelvey - @karissamck

Melanie Cebula - @melaniecebula

http://dat-data.com

Page 48: Dat 5 minute Lightning Talk

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyINTEROPERABILITY in Python and R

Page 49: Dat 5 minute Lightning Talk

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyECOSYSTEM