Upload
jubatusofficial
View
1.596
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Jubatus:”Scalable Distributed Computing Framework
for Realtime Analysis of Big Data”
2
Big Data : Web, SNS, System log, voice data, images/video, sensor data… Growth rate is 45%/year
◦ Increase of “unstructured data” such as sensor data
Big Data
45% growth/year
Business data
Customer data
Sensor data
Structured data
Unstructured data
images/videoSNS
(5 billions phones)
(uploaded videos: 60,000/week)
(8,000Tweets/sec)
(Processed data:100TB/day)
3
Hadoop : A de-facto distributed computing framework for Big Data But not suitable for realtime processing and in-depth analysis
Beyond Hadoop
Simple Statistics
In-depth Analysis
Batch Processing
Realtime Processing
Big data
4
Realtime application
Beyond Hadoop
Batch application
Simple Analysis (Statistics)
Jubatus
Batch ( Stored )
BigData
In-depth Analysis( classification, estimation, prediction )
Realtime ( Online )
5
Jubatus Requirements: “Scalability,” ”Realtime processing,” and
“In-depth analysis” Joint development with Preferred Infrastructure
SVMlight
RDBMS
DWH
In-depth Analysis
Realtim
e
proce
ssing
Scalability
CEP, Streaming(Yahoo! S4TwitterStor
m)
Online
machine
learning
References :•Hadoop-> http://hadoop.apache.org/•mahout-> http://mahout.apache.org/•WEKA-> http://weka-jp.info/•SVMlight-> http://svmlight.joachims.org/•Yahoo! S4-> http://s4.io/•TwitterStorm-> http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html•CEP-> Complex Event Processing
6
【Big Data】 Big stream⇒ worldwide:8000 Tweets/sec, Japanese:500~2000tweets/sec 【Realtime processing】 recognition of “good”/”bad” news by learning ⇒ following up bursty tweets 【 In-deapth analysis】 automatic classification of “tweets related to topics of interest(keyword)”
Jubatus Use Case:SNS Analysis
Realtime analysis by
Jubatus
results
Client Application
keyword : NTT Monitoring for NTT-related
tweets Unnecessarily to contain “NTT”
【 Realtime 】 【 in-depth analysis 】Automatic realtime classification for highly related tweets with the concerned issue (keyword)
【 Big Data 】tweetsWorldwide : 8000Tweets/secJapanese : 2000Tweets/sec
7
Jubatus Use Case:Recommendation
Realtime recommendation for E-Commerce sites / On demand TV ・ Conventional batch processing : a recommended item for a certain period ・ Jubatus : instant recognition of sudden changes in buying trend
Realtime recommendation by Jubatus
Customer buying history
Customers
Recommended items are updated in realtime by relating other
customers’ buying history trends
time
Recommendationaccuracy
Sudden order increase after a TV expose
Sudden order increase after the death of a celebrity
Real behavior
Jubatus
Batch processing
8
Peformance evaluation: Classification
【 Realtime 】 & 【 in-depth analysis 】Realtime automatic company classification for “tweets”
【 Big Data 】TweetsWorldwide : 8000Tweets/sec
Company Category
Company A
Company B
Company C
Company D
...
2-3 machinesfor current Twitter stream
9
【 Big Data 】&【 In-depth analysis 】 Response time: 0.1sec for 30 million users ( x10 faster than Mahout )
Buying/searchqueries
Recommended item
Item1
Item2
Item3
...ItemX
UserA
○ ○ ○
UserB ○
... ○ ○
UserY
○ ○
【 Big Data 】& 【 Realtime processing 】 100,000/sec update throughput per server
Peformance evaluation: Recommendation
10
Jubatus OSS website◦http://jubat.us ◦2nd edition will be released on 17th Feb.
2nd edition release
OSS communityWeb: http://jubat.us Github https://github.com/jubatus/jubatusTwitter @JubatusOfficial
Features
1st ed. Linear classification
2nd ed.
Regression, Statistics, Recommendation
11