11
Jubatus: ”Scalable Distributed Computing Framework for Realtime Analysis of Big Data”

Jubatus Presentation on R&D forum 2011

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Jubatus Presentation on R&D forum 2011

Jubatus:”Scalable Distributed Computing Framework

for Realtime Analysis of Big Data”

Page 2: Jubatus Presentation on R&D forum 2011

2

Big Data : Web, SNS, System log, voice data, images/video, sensor data… Growth rate is 45%/year

◦ Increase of “unstructured data” such as sensor data

Big Data

45% growth/year

Business data

Customer data

Sensor data

Structured data

Unstructured data

images/videoSNS

(5 billions phones)

(uploaded videos: 60,000/week)

(8,000Tweets/sec)

(Processed data:100TB/day)

Page 3: Jubatus Presentation on R&D forum 2011

3

Hadoop : A de-facto distributed computing framework for Big Data But not suitable for realtime processing and in-depth analysis

Beyond Hadoop

Simple Statistics

In-depth Analysis

Batch Processing

Realtime Processing

Big data

Page 4: Jubatus Presentation on R&D forum 2011

4

Realtime application

Beyond Hadoop

Batch application

Simple Analysis (Statistics)

Jubatus

Batch ( Stored )

BigData

In-depth Analysis( classification, estimation, prediction )

Realtime ( Online )

Page 5: Jubatus Presentation on R&D forum 2011

5

Jubatus Requirements: “Scalability,” ”Realtime processing,” and

“In-depth analysis” Joint development with Preferred Infrastructure

SVMlight

RDBMS

DWH

In-depth Analysis

Realtim

e

proce

ssing

Scalability

CEP, Streaming(Yahoo! S4TwitterStor

m)

Online

machine

learning

References :•Hadoop-> http://hadoop.apache.org/•mahout-> http://mahout.apache.org/•WEKA-> http://weka-jp.info/•SVMlight-> http://svmlight.joachims.org/•Yahoo! S4-> http://s4.io/•TwitterStorm-> http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html•CEP-> Complex Event Processing

Page 6: Jubatus Presentation on R&D forum 2011

6

【Big Data】 Big stream⇒ worldwide:8000 Tweets/sec, Japanese:500~2000tweets/sec 【Realtime processing】 recognition of “good”/”bad” news by learning ⇒ following up bursty tweets 【 In-deapth analysis】 automatic classification of “tweets related to topics of interest(keyword)”

Jubatus Use Case:SNS Analysis

Twitter

Realtime analysis by

Jubatus

results

Client Application

keyword : NTT Monitoring for NTT-related

tweets Unnecessarily to contain “NTT”

【 Realtime 】 【 in-depth analysis 】Automatic realtime classification for highly related tweets with the concerned issue (keyword)

【 Big Data 】tweetsWorldwide : 8000Tweets/secJapanese : 2000Tweets/sec

Page 7: Jubatus Presentation on R&D forum 2011

7

Jubatus Use Case:Recommendation

Realtime recommendation for E-Commerce sites / On demand TV ・ Conventional batch processing : a recommended item for a certain period ・ Jubatus : instant recognition of sudden changes in buying trend

Realtime recommendation by Jubatus

Customer buying history

Customers

Recommended items are updated in realtime by relating other

customers’ buying history trends

time

Recommendationaccuracy

Sudden order increase after a TV expose

Sudden order increase after the death of a celebrity

Real behavior

Jubatus

Batch processing

Page 8: Jubatus Presentation on R&D forum 2011

8

Peformance evaluation: Classification

Twitter

【 Realtime 】 & 【 in-depth analysis 】Realtime automatic company classification for “tweets”

【 Big Data 】TweetsWorldwide : 8000Tweets/sec

Company Category

Company A

Company B

Company C

Company D

...

2-3 machinesfor current Twitter stream

Page 9: Jubatus Presentation on R&D forum 2011

9

【 Big Data 】&【 In-depth analysis 】 Response time: 0.1sec for 30 million users ( x10 faster than Mahout )

Buying/searchqueries

Recommended item

Item1

Item2

Item3

...ItemX

UserA

○ ○ ○

UserB ○

... ○ ○

UserY

○ ○

【 Big Data 】& 【 Realtime processing 】 100,000/sec update throughput per server

Peformance evaluation: Recommendation

Page 10: Jubatus Presentation on R&D forum 2011

10

Jubatus OSS website◦http://jubat.us ◦2nd edition will be released on 17th Feb.

2nd edition release

OSS communityWeb:   http://jubat.us Github  https://github.com/jubatus/jubatusTwitter   @JubatusOfficial

Features

1st ed. Linear classification

2nd ed.

Regression, Statistics, Recommendation

Page 11: Jubatus Presentation on R&D forum 2011

11