33
COTE Fellow Chat

A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Embed Size (px)

DESCRIPTION

The presentation will overview a the establishment of a collaborative virtual community, focusing initially on data-intensive computing education in the social sciences.

Citation preview

Page 1: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

COTE Fellow Chat

Page 3: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Jim Greenberg, Director TLTC

Director, Teaching Learning Technology Center

SUNY Oneonta

Open SUNY Fellow Role:

Innovator and/ or Researcher

Topic: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Theme:

Research & Innovation

COTE NOTE: http://bit.ly/cotenotevidia

Page 4: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Providing Undergraduates with a Virtual Infrastructure for Data

Intensive Analysis • Jeanette Sperhac and Steven M. Gallo

• SUNY Buffalo

• Brian Lowe and Jim Greenberg

• SUNY Oneonta

Page 5: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

The VIDIA Team:

Gregory Fulkerson, Ph.D.

Assistant Professor of Sociology James Greenberg

Director, TLTC Brett Heindl, Ph.D.

Assistant Professor of

Political Science Achim Koeddermann, Ph.D.

Associate Professor of

Philosophy and Env.

Sciences Brian M. Lowe,

Ph.D.

Associate Professor

of Sociology Diana Moseman

Instructional

Designer/Programmer

TLTC

Harry Pence, Ph.D.

Distinguished Professor of

Chemistry Tim Ploss

Instructional Designer

Bill Wilkerson, Ph.D.

Associate Professor of Political

Science Steven M. Gallo

Lead Software Engineer

CCR, University at Buffalo Jeanette Sperhac

Scientific Programmer

CCR, University at Buffalo

Page 6: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Adopting social media analysis at

Oneonta

Social Sciences approached Oneonta IT to build an analysis environment

The needed resources did not exist in house

IITG connected Oneonta with CCR

Page 7: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Case Study: Society and Animals

200 level Sociology course; social science majors without formal programming training

Comparative/historical, social scientific, journalistic

Goal: students gather, organize and interpret mined social media

Page 8: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Project Goals

Achieving critical thinking through engaging texts

Deploying ideas from texts in new directions

Applying theoretical perspectives and concepts

Achieving student engagement through data-driven research

Page 9: A Virtual Infrastructure for Data intensive Analysis (VIDIA)
Page 10: A Virtual Infrastructure for Data intensive Analysis (VIDIA)
Page 11: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Collaboration Goals

Create a social sciences big data discovery environment

Support social science teaching and research

Leverage High Performance Computing (HPC) resources

Support coursework at Oneonta, Spring 2014

Page 12: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Introducing VIDIA

• Virtual Infrastructure

• for Data Intensive Analysis

Page 13: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

VIDIA

• Deployed using Purdue's HUBzero platform:

Provide workflow tools for data analysis

Offer access to computing resources

Curate large datasets of social scientific interest

Page 14: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Data Mining Workflow Tools

Graphical User Interface

Powerful, easy to use

Open source, extensible

Page 15: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Dataset Access

• Curate Big Data for social science:

Social data: Twitter feeds, etc.

Partnerships with social dataset providers

Enable students to capture own data

Page 16: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

HUBzero Platform

• Open source platform offers:

Access via web browser

Computation, collaboration, software tool development

Simplified access to remote HPC resources

Upload and sharing of course materials

And more...

Page 17: A Virtual Infrastructure for Data intensive Analysis (VIDIA)
Page 18: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Teaching on HUBzero

Unified platform for coursework Easy on IT staff:

Obviates software installs on individual student workstations

Access anytime, anywhere Resources can be selectively secured Students may access resources after course conclusion

Page 19: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

User Dashboard

Page 20: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Collaborative Features

• Any registered user can manage and control access to their own:

Groups: assemble users with common interests

Projects: assemble resources for a common goal

Tools: development, deployment, simulations

Page 21: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Groups

• HUBzero groups can:

Control access to resources

Share and distribute content

Allow users with common interests to associate

• Any registered user may create a group

Page 22: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Resources

Page 23: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Deployed Tool

• Orange Data Mining Tool

Page 24: A Virtual Infrastructure for Data intensive Analysis (VIDIA)
Page 25: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Computing Environment

User's Workstation

(web browser)

HUBzero server

Data storage

Cluster

resources

Page 26: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

VIDIA Hardware • HUBzero and webserver: Dell PowerEdge R720xd

2x 6-core Intel Xeon E5-2630 (2.30 GHz, 15M cache)

48 TB raw (~36 TB usable) SATA disk space

128 GB memory (16x8GB - 1333MHz DIMMS)

• Analysis: 4x Dell PowerEdge R520

6-core Intel Xeon E5-2430 (2.20 GHz, 15M cache)

4.8 TB raw (~4 TB usable) SAS disk space

96 GB memory (6x16GB - 1600MHz DIMMS)

Page 27: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

VIDIA: Spring 2014 Supported three SUNY Oneonta courses Deployed three data analysis tools 76 student users registered (themselves!) Assigned student tasks:

k-Means Clustering Word Co-Occurrences

Enabled 25+ simultaneous tool sessions

Page 28: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

RapidMiner Sessions

Month Tool Users Tool Sessions Run

Tool Walltime Tool CPU Time

April 2014 77 568 41.7 days 21.7 hours

May 2014 (as of 8 May)

80 849 61.0 days 23.7 hours

on VIDIA

Page 29: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Challenges

User training: learning the platform and tools

Technical performance details

HUBzero updates

Browser compatibility

Dataset acquisition

Page 30: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

What's next?

SUNY Oneonta coursework, Fall 2014

Deploy additional data mining tools

Integrate HUBzero collaboration features

Roll out to other SUNY comprehensive colleges (Discussion underway with SUNY Brockport)

Page 31: A Virtual Infrastructure for Data intensive Analysis (VIDIA)
Page 32: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Thank You!

Join the SUNY Learning Commons

http:///commons.suny.edu for access to the COTE Community group to continue the

conversation!

View a Recording of today’s Fellow Chat:

http://bit.ly/COTEfellowchatRECORDING

View the COTE NOTE:

http://bit.ly/cotenotevidia

Become an Open SUNY Fellow:

http://bit.ly/joinCOTE

Submit a Proposal:

http://bit.ly/COTEproposal

Page 33: A Virtual Infrastructure for Data intensive Analysis (VIDIA)

Next Fellow Chat Open SUNY Fellow:

Rhianna Rogers, Assistant Professor, SUNY Empire

State College

Open SUNY Fellow Role:

Innovator or Researcher

Topic:

Fostering Creativity in Learning: How to Effectively

Incorporate OERs into Assignments

Date:

Thursday August 7 & 14, 2014 12:00 PM

Register: http://www.cvent.com/d/t4qdfw