1©2016 Talend Inc
Lambda Architecture with SparkEfficiently combining Historical and New data for Analytics
Laurent Bride-CTOKurt Layson - Account Executive - MichiganVincent Galopin - Solutions Engineering Manager
March 10, 2016
2
Agenda
• Struggles in Traditional Architectures
• What is the Lambda Architecture?
• Spark: Unified Development Framework
• Demonstration: Spark Batch & Streaming jobs in Talend
3
Historical Data New Data
Traditional Architecture
Web Logs
Internet of Things
DBMS / EDW
HADOOP
Social Media
CLOUD DATASET
4
Situation
I need fast access to historical data on the fly with real time data from the stream for analysis
6
Lambda Architecture
A data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods
https://en.wikipedia.org/wiki/Lambda_architecture
7
Lambda Architecture
• Batch Layer
• Speed Layer
• Serving Layer
https://www.mapr.com/developercentral/lambda-architecture
13
APPLICATION INTEGRATION
CLOUD INTEGRATION
DATAINTEGRATION
BIG DATA INTEGRATION
MASTER DATA MANAGEMENT
STUDIO REPOSITORY DEPLOYMENT EXECUTION MONITORING
ComprehensiveEclipse-based user interface
Web-based deployment &
scheduling
Single web-based monitoring console
Consolidated metadata & project
information
Same container for batch processing,
message routing & services
6
Discovery & cleansing for
business users
SELF-SERVICE
51 3
42
14
APPLICATION INTEGRATION
CLOUD INTEGRATION
DATAINTEGRATION
BIG DATA INTEGRATION
MASTER DATA MANAGEMENT
Data Fabric
STUDIO REPOSITORY DEPLOYMENT EXECUTION MONITORING
ComprehensiveEclipse-based user interface
Web-based deployment &
scheduling
Single web-based monitoring console
Consolidated metadata & project
information
Same container for batch processing,
message routing & services
6
Discovery & cleansing for
business users
SELF-SERVICE
51 3
42
15
Visually develop jobs that run 100% on Spark• 5X times faster using independent benchmarks• 10X developer productivity gained over hand-coding
Spark• 100X faster with in-memory processing
900 components including 100+ new Spark components• HDFS, RDBMS, NoSQL, Cloud Storage, Transformation,
Messaging, In-memory analytics & machine learning recommendations, and much more
• In-memory data caching & “windowed” computations• Click to enable Spark Streaming for real-time data
processing
Real Time Big Data Integration and Unlimited Scale
1st Data Integration Platform on Spark
+ +5X FASTER
UNLIMITED SCALE
Benefits: Make decisions faster. Tremendous developer productivity.
16
Talend Demonstration
1. Talend Studio User Interface
2. Building a Spark Job
3. Building a Real-time Recommendation pipeline
4. Introduction to the Talend Real-time Big Data
Sandbox
17
For More Information
- Download the Talend Sandbox!http://www.talend.com/products/real-time-big-data
- Check the Apache Spark Projecthttp://spark.apache.org/
- Find out more about the Lambda Architecturehttp://lambda-architecture.net/