Data-Intensive Computing Symposium Data-Intensive Computing Symposium: Report Out Phillip B. Gibbons...

Data-Intensive Computing Symposium

Data-Intensive ComputingSymposium: Report Out

Phillip B. GibbonsIntel Research Pittsburgh

Phillip B. Gibbons, Data-Intensive Computing Symposium2

Data-Intensive Computing Symposium

Held 3/26/08 @Yahoo! in Sunnyvale, CA

Sponsored by:

– Yahoo! Research

– Computing Community Consortium supports the computing research community in creating compelling research visions and the mechanisms to realize these visions (http://www.cra.org/ccc/)

~100 invited attendees, ~12 invited talks

Slides and video to be posted on CCC web site

Blog: http://dita.ncsa.uiuc.edu/xllora (thanks!)

Randy Bryant (CMU)Data-Intensive Scalable Computing

Local speaker; I’ll skip in interest of time

DISC has been renamed

ChengXiang Zhai (UIUC)Text Information Management

ChengXiang Zhai (UIUC)Proposal 1: Maximum Personalization

ChengXiang Zhai (UIUC)

Dan Reed (Microsoft)Clouds and ManyCore: The Revolution

Big Data: Should focus more on the user experience

How to manage resources

Cloud computing can help organically orchestrate resources on demand

Initiative to bring academics, business, and users together under the big data problem (PCAST NITRD review)

Jill Mesirov (Broad Institute)Comput. Paradigms for Genomic Medicine

Broad has 4.8K processors, 1.4 PBs storage on site

Big Data Problem: Mining genome expression arrays– Row: patients; Column: genes, Value: expression values

– Example: classify leukemias based on expression arrays

– Solved by grad student over the weekend using web sources

Challenge: Computation/Analysis/Provenance infrastructure needed– Developed GenePattern 3.1: Software infrastructure for

interoperable informatics

– Usable by biologists

Garth Gibson (CMU)Simplicity and Complexity in Data Systems at Scale

Petascale Data Storage Institute Understanding disk failures, cfdr.usenix.org

Another local speaker, so I’ll skip in interest of time

Jeff Dean (Google)Handling Large Datasets at Google

Jeff Dean (Google)

GFS Usage

Jeff Dean (Google)

Jon Kleinberg (Cornell)Large-Scale Social Network Data

Diffusion in Social Networks

Why is chain letter diffusion so deep & narrow?

Iraq war authorization protestchain letter diffusion (18K nodes)

Jon Kleinberg (Cornell)

Marc Najork (Microsoft Research)Mining the Web Graph

Scalable Hyperlink Store: used internally within MSR, for web graphs

Query-dependent link-based ranking algorithm (HITS, SALSA)

Joe Hellerstein (UC Berkeley)“What” Goes Around

1. Industrial revolution of data: sensors, logs, cameras

2. Hardware revolution: datacenters/virtualization, many-core

3. Industrial revolution in software? Declarative languages in some domains

Why “What”: – Rapid prototyping

– Pocket-size code bases

– Independent from the runtime

– Ease of analysis and security

– Allow optimization and adaptability

Joe Hellerstein (UC Berkeley)

Sensor Networks, Mobile Networks, Modular Robotics, computer games, program analysis

Distributive inference (junction trees and loopy belief propagation), graphs upon graphs

Evita Raced: Overlog Metacompiler (compiler is written declaratively)

– matches datalog optimizations (dynamic prog.), cycle tests

Datalog with known extensions and tweaks Centrality of Rendezvous & graphs

Challenges: – performance beyond number of messages (e.g., memory

hierarchy), availability, real programs, not Turing complete

Raghu Ramakrishnan (Yahoo! Res.)Sherpa: Cloud Computing of the Third Kind

Raghu Ramakrishnan (Yahoo! Res.)

Alex Szalay (Johns Hopkins)Scientific Applications of Large Databases

Alex Szalay (Johns Hopkins)

Important, interesting, exciting research area

Cluster approach:computing is co-located where the storage is at

Memory hierarchy issues:where the (intermediate) data are at, over the course of the computation

Pervasive multimedia sensing: processing & querying must be pushed out of the data center to where the sensors are at

I know where it’s at, man!

Focus of this talk:

Phillip Gibbons (Intel Research)Data-Rich Computing: Where It’s At

Hierarchy-Savvy Parallel Algorithm Design (HI-SPADE) project

Hierarchy-savvy:– Hide what can be hid– Expose what must be exposed

– Sweet-spot between ignorant and fully aware

Support:– Develop the compilers, runtime systems,

architectural features, etc. to realize the model– Important component: fine-grain threading

Goal: Support a hierarchy-savvy model ofcomputation for parallel algorithm design

IrisNet’s Two-Tier Architecture

. . .SA

senseletsenselet

Sensor

senseletsenselet

Sensor Sensor

senseletsenselet

Web Serverfor the url

OAXML database

. . .OA

XML databaseOA

XML database

Two components:SAs: sensor feed processingOAs: distributed database

Sensornet

Jeannette Wing (CMU/NSF)NSF Plans for SupportingData-Intensive Computing

Google/IBM Data Center– ~2000 processors, large Hadoop cluster

– Allocate in units of rack weeks

– NSF will review proposals for use: Cluster Exploratory (CluE)

– Running Xen; Won’t open up performance monitoring

– Goal: Show applicable outside of computer science

Academic-Industry-Government partnership

Randy Bryant (CMU)Big Data Computing Study Group

Collection of ~20 people (looking for volunteers) Goals:

– Fostering educational activities

– Advocacy

– Building community

CCC’s Big Data Computing Study Group seeks to foster collaborations between industry, academia, and the U.S. government to advance the state of art in the development and application of large scale computing systems for making intelligent use of the massive amounts of data being generated in science, commerce, and society

Data-Intensive Computing Symposium Data-Intensive Computing Symposium: Report Out Phillip B. Gibbons...

Documents

CompSci516 Data Intensive Computing SystemsCompSci516 Data Intensive Computing Systems Lecture 21 Datalog Instructor: SudeepaRoy Duke CS, Fall 2016 1 CompSci 516: Data Intensive Computing

Data Intensive Computing Frameworks

On Data Intensive Computing and Exascale

Urban Computing Symposium 20100626

Attacking Data Intensive Science with Distributed Computing

Data-Intensive Scientific Computing in Astronomy

D -INTENSIVE COMPUTING PARADIGMS FOR BIG DATA

CSCI-2950u :: Data-Intensive Scalable Computing

Cooperative Computing for Data Intensive Science

Data-Intensive Computing: From Clouds to GPUs

Data-Intensive Computing Symposium: Report Out

Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

Data-Intensive Computing with Hadoopstorageconference.us/2008/presentations/1.Monday...Data-Intensive Computing with Hadoop Thanks to: Milind Bhandarkar

Extreme Data-Intensive Scientific Computing

Condor Compatible Tools for Data Intensive Computing

Abstractions for Data Intensive Computing

CompSci516 Data Intensive Computing Systems Lecture 6a ...db.cs.duke.edu/.../compsci516/fall17/Lectures/Lecture-6a-Normalizat… · CompSci516 Data Intensive Computing Systems Lecture

Integrating Data-Intensive Computing Systems with

Data Intensive Computing at Sandia

Cambridge Computing Education Research Symposium€¦ · Cambridge Computing Education Research Symposium. This symposium gives us an opportunity to bring together academics and educators