Upload
jeykottalam
View
700
Download
2
Embed Size (px)
Citation preview
Welcome and
AMPLab Overview
UC BERKELEY
Michael Franklin
November 20, 2014
3
AMPLab Overview
Project Launched Jan 2011, 6 Yr Planned Duration
Personnel: ~65 Students, Postdocs, Faculty and Staff
Funding: Government/Industry Partnership NSF Expedition Award , Darpa XData, DoE, 20+
Companies
Key Outputs:
BDAS Open Source Stack & Apps, (including Apache
Spark)
Publications: Top Venues in ML, Systems, Databases and
Others
Graduates in High Demand in Academia and Industry
“… the University of California, Berkeley’s AMPLab
has already left an indelible mark on world of
information technology, and even the web. But we
haven’t yet experienced the full impact of the group,
… Not even close.”
-- Derrick Harris, GigaOm, August 2014
The AMPLab Faculty UC BERKELEY
Michael Franklin (Databases)
Michael Jordan (Machine Learning)
Ion Stoica (Systems)
Dave Patterson (Systems)
Scott Shenker (Networks)
Alex Bayen (Mobile Sensing)
David Culler (Systems/Sensing)
Ken Goldberg (Crowdsourcing)
Anthony Joseph (Security)
Randy Katz (Systems)
Michael Mahoney (ML)
Ben Recht (Machine Learning)
Raluca Popa (Systems/security) joining in Summer 2015
Industrial Engagement
• Industrial-Strength Open Source Software
• Used by Sponsors, Start-ups and many others
• Regular interactions with top industry technologists
twice-yearly 3-day offsite retreats; AMPCamp training, some
site visits
AMP: Integrating 3 Key
Resources
Algorithms
• Machine Learning, Statistical Methods
• Prediction, Business Intelligence
Machines
• Clusters and Clouds
• Warehouse Scale Computing
People
• Crowdsourcing, Human Computation
• Data Scientists, Analysts
Time
AnswerQualityMoney
Our View of the Big Data Challenge
8
Step 1:
Improve
efficiency(e.g. Spark,
Tachyon)
Massive Diverse
and Growing
Data
Massive Diverse
and Growing
Data
Step 1I:
Enable
intelligent
tradeoffs(e.g.,
BlinkDB
SampleCle
an)
+ + Integration +
Extreme Elasticity +
Tradeoffs +
More Sophisticated Analytics
= Extreme Complexity
The Research Challenge
Arc of our Research
ProgramEarly work on Foundations (Yrs 1-2):
Algorithms – Bag of Little Bootstraps
Machines – Mesos and Spark
People – CrowdDB Prototype
Filling out the Analytics Stack (Yrs 3-4): <you are here>
Algorithms – ML Pipelines, Async Algorithms, Concurrency Ctl
Machines – Tachyon, SQL, Graphs, Streams, R, Performance
People – Hybrid Human/Machine Data Cleaning/Integration
Moving Up the Stack/Expanding the Footprint (Yrs5-6):
Algorithms – MLlib build out, Declarative ML (MLBase)
Machines – New Storage/Processing Archs, Data/Model Serving
Big Data Ecosystem
Evolution
MapReduce
Pregel
Dremel
GraphLab
Storm
Giraph
DrillTez
Impala
S4…
Specialized systems(iterative, interactive and
streaming apps)
General batch
processing
AMPLab Unification
PhilosophyDon’t specialize MapReduce – Generalize it!
Two additions to Hadoop MR can enable all the
models shown earlier!
1. General Task DAGs
2. Data Sharing
For Users:
Fewer Systems to Use
Less Data MovementSpark
Str
eam
ing
Gra
phX
…S
park
SQ
L
MLbase
Velox Model Serving
Tachyon
SparkStreamin
gShark
BlinkDB
GraphX MLlib
MLBa
se
Spark
R
Cancer Genomics, Energy Debugging, Smart
BuildingsSample Clean
In House Applications
Spark
Berkeley Data Analytics Stack
(open source software)
HDFS,
S3, …Mesos Yarn
Access and Interfaces
Processing Engine
Resource VirtualizationResource
Virtualization
Storage
Processing
Engine
Access and
Interfaces
In-house
Apps
TachyonStorage
Velox Model Serving
Tachyon
SparkStreamin
g
BlinkDB
GraphX MLlib
MLBa
se
Spark
R
Cancer Genomics, Energy Debugging, Smart
BuildingsSample Clean
Spark
Berkeley Data Analytics Stack
(open source software)
HDFS,
S3, …Mesos YarnResource
Virtualization
Storage
Processing
Engine
Access and
Interfaces
In-house
Apps
Tachyon
Apache
Apache
SharkSparkSQ
L
Some Academic Accolades
Ph.D. + Postdoc alumni 2013/14 above have accepted faculty jobs at: Brown, Harvey Mudd, MIT(3), Stanford,
UCLA, UT Austin
Best Paper Awards: BPOE14,Eurosys13, ICDE 13, NSDI 12, SIGCOMM 12 and Best Demo: SIGMOD 12, VLDB 11CACM “Research Highlight” Selections 2014 and 2015
About AMPCampHistory
Today • BDAS and Stack Component Overviews
• Hands On Exercises
• Use Cases
• Reception and Networking
Tomorrow• Research and ML Overviews
• Advanced Hands On Exercises (including
genomics)
AMPCamp I @ Berkeley, August 2012
AMPCamp II @ Strata NYC., Feb 2013
AMPCamp III @ Berkeley, August 2013
AMPCamp IV @Strata Santa Clara, Feb 2014
AMPCamp V @Berkeley, Nov 2015
Also “Spark Camp”: AMPCamp Spinoff
AMPCamp Made Possible
ByRachit Agarwal
Elaine Angelino
Peter Bailis
Dan Crankshaw
Ankur Dave
Joseph Gonzalez
Daniel Haas
Sanjay Krishnan
Haoyuan Li
Frank Austin Nothaft
Xinghao Pan
Pedro Rodriguez
Ginger Smith
Evan Sparks
Shivaram Venkataraman
Jiannan Wang
Zongheng Yang
Ameet Talwalkar
Jey Kottalam
Kattt Atchley
Carlyn Chinen
Boban Zarkovich
Jon Kuroda
To find out more or
get involved:
amplab.berkeley.edu
du
UC BERKELEY
Thanks to NSF CISE Expeditions in Computing, DARPA XData,
Founding Sponsors: Amazon Web Services, Google, and SAP,
the Thomas and Stacy Siebel Foundation,
and all our industrial sponsors and partners.