33
1 Cloud Dataverse Tutorial Julian Gautier, Sarah Ferry Massachusetts Open Cloud, The Dataverse Project Boston University, Harvard University

Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

1

Cloud Dataverse Tutorial

Julian Gautier, Sarah FerryMassachusetts Open Cloud, The Dataverse Project

Boston University, Harvard University

Page 2: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Dataverse: Background

Page 3: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Open source data repository software

Built to support multiple types of users, data, and workflows

Page 4: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

demo.dataverse.org

Page 5: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way
Page 6: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way
Page 7: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Graphic by Raman Prasad

Page 8: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Graphic by Raman Prasad

Page 9: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Sharing data

Page 10: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Finding data

Page 11: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Graphic by Merce Crosas

Page 12: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way
Page 13: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Exploring data

Graphic by Merce Crosas

Page 14: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

26 known installations

Page 15: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

APIs● Access to

metadata, files and commands through APIs

● TwoRavens and Data Explorer

Migration● Dataset, file

level DDI used during Harvard Dataverse for 3.6 to 4.0 migration

Harvesting● DDI captured

when harvesting datasets

Page 16: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Benefits of an open source community

dataverse.org/contact

Page 17: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Cloud Dataverse: Background

● The MOC needed a place which incentivized the sharing of data to a public source

● The MOC and Dataverse users needed a simple way to access computing on datasets

● Storing data files in Swift allows for users to access their files in a compute environment directly through Sahara

A developing example of the Cloud Dataverse can be found at http://128.31.24.163:8080/

Page 18: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Dataverse

Horizon

GIJI

http://128.31.24.163:8080

kaizen.massopen.cloud

giji.massopen.cloud

Page 19: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

On the dataverse page, datasets with data files stored in OpenStack Storage look the same as regular datasets.

Page 20: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

The dataset page contains two additional sections

Page 21: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

The cloud storage access box allows for direct access to a swift container through the Swift API

Container name

Page 22: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

The files look like regular in DV, but we can see them at the Swift endpoint on Horizon

Page 23: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Notice the container name is the same as the container name in the cloud storage access box on Dataverse!

Page 24: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

The compute button leads to GIJI where we can access MOC’s computing environment

Page 25: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Cloud Dataverse enables access to your datafiles that are stored in Swift through Sahara

Page 26: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Notice the container name is passed into GIJI through the URL

Page 27: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way
Page 28: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way
Page 29: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

We can see the status of the launch on both GIJI and Horizon

Page 30: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

giji-test-cluster

doi_10_5072_FK2_I8AYQ8

doi_10_5072_FK2_I8AYQ8

Once the cluster is active, we can run a job

Notice the container name is passed in!

Page 31: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

giji-test-cluster

giji-test-cluster

Once the job has been submitted, we can see it running in the Spark UI

Page 32: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

And we can see the output of the job in the container we specified when we ran the job!

Page 33: Cloud Dataverse TutorialCloud Dataverse: Background The MOC needed a place which incentivized the sharing of data to a public source The MOC and Dataverse users needed a simple way

Questions?