Upload
ajay
View
37
Download
0
Embed Size (px)
DESCRIPTION
Big Data Workloads Drawn from Real-time Analytics Scenarios Across Three Deployed Solutions Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li Software and Services Group, Intel. Statement of faith: - PowerPoint PPT Presentation
Citation preview
Big Data Workloads Drawn from Real-time Analytics Scenarios
Across Three Deployed Solutions
Tao Zhong K. Doshi Xi Tang Ting Lou Zhongyan Lu Hong Li
Software and Services Group, Intel
Statement of faith:
Real time (low latency) analytics will become more important to end users – if not for all queries, for a non-trivial fraction of queries.
We walk through three workload scenarios in this short presentation.
Objective- Generate ideas for workloads that reflect low latency and high throughput demands simultaneously.
All three use cases described here are in deployment or in pre-deployment testing among Intel partners in PRC.
1. Smart City Application:
Detect and Prevent License Plate Fraud
Object
PersonVehicle
VIDEO FRAMES
+Types Attributes
Image Files Descriptions FilesImage Files Descriptions FilesImage Files
CAPTURE
EXTRACT
STORE
COMPUTE ANALYSIS SERVICES
CAPTURE
EXTRACT
STORE
COMPUTE
Object
PersonVehicle
VIDEO FRAMES
+Types Attributes
Image Files Descriptions FilesImage Files Descriptions FilesImage Files Descriptions Files
ANALYSIS SERVICES
RDBMSRegistration and Traffic
History Records
Registration Records
Enforcement
File System
Extraction System
Query
Integrate
Retrieve
Feed
Persist
NotifyReal-time Analytics
Merge
Evolve
Detect
1A
5
43
2
F
E
DC
B
SMART CITY Workload Solution Flow
Registration Records
Enforcement
File System
Extraction System
Query
Integrate
Retrieve
Feed
Persist
NotifyReal-time Analytics
Merge
Evolve
Detect
1A
5
43
2
F
E
DC
B
SMART CITY Workload Characteristics
Transactional and analytic activities Structured and unstructured data Scale out in-memory processing combined with
distributed persistent data stores Real-time and batch operations Information inflows from sensor and non-sensor devices
Structured and unstructured data, Transactional and analytic activities, Scale out in-memory processing combined with
distributed persistent data stores Real-time and batch operations, and Information inflows from sensor and non-sensor devices
2. Content Management and Integration
Rapid Content Management -- Solution Flow
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
Information
Accumulatio
n over time
Information
Accumulatio
n over time
Digest and
Cross Reference
RDBMS
Log Extract and Transform
Sqoop
HBase
bulk move older data
sparse edits
Search
Data Analysis Logic
Hive
Hibernate DriverHBase Driver Hive Dialect
Rapid Content Management – Workload Characteristics
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
New Media
Traditional
Media
Information
Accumulatio
n over time
Information
Accumulatio
n over time
Digest and
Cross Reference
RDBMS
Log Extract and Transform
Sqoop
HBase
bulk move older data
sparse edits
Search
Data Analysis Logic
Hive
Hibernate DriverHBase Driver Hive Dialect
Structured and unstructured data Transactional and analytic activities Fast searches over “hot” data, slow searches over rest
Structured and unstructured data Transactional and analytic activities Fast searches over “hot” data, slow searches
over rest RDBMS ops mixed with HBASE
RDBMS ops mixed with HBASE
3. Fraud Detection
Mid-transac
tion Analytic
s
Transactions
History
Telecom Payment Fraud Detection/Prevention -- Solution Flow
Recharge
Transaction
Credit Records
ALERT
SELECT phone_number, SUM (charge_time), SUM (charge_amount) FROM trans_tableWHERE SUM(charge_time) > threshold_1 and SUM(charge_amount) > threshold_2
Summary
• Workload scenarios from several “real life” use cases
• Blend of SQL and NOSQL approaches
• Recent data is available for queries nearly instantaneously
• Real-time responsiveness combined with high data volumes
• Mix of slow and fast operations (low latency analytics on recent data, complex analytics on
historical data)