Upload
rakuten-inc
View
953
Download
1
Embed Size (px)
DESCRIPTION
Rakuten Technology Conference 2013 "DWH/Hadoop in Rakuten Ichiba" Mitsuo Hangai (Rakuten)
Citation preview
DWH/Hadoop in Rakuten Ichiba
Vol.01 Oct/26/2013Mitsuo Hangai
Sendai Development GruopNew Service Development Department, Rakuten, Inc.http://www.rakuten.co.jp/
2
Self introduction
Mitsuo Hangai( 半谷 充生 )Rakuten, Inc. Service Development Sendai Group
@bangucs
汉语
3
Agenda
About Sendai Branch
What is our data ware-house
Now and future
4
About Sendai Branch
5
Do you know Sendai?
6
About Sendai
白地図、世界地図、日本地図が無料【白地図専門店】 http://www.freemap.jp/japan/ja_kouiki_japan_big_scale_3.html
Sendai City
7
About Sendai
8
About Sendai branch
9
History of Sendai Development Group
2007. Foundation for Pro-sports.
2008. Start Ichiba Business Support and Infoseek operations.
2009. Growing up and starting Advertisement development.
2010. Start Marriage operations.
2011.Hit by the huge earthquake… Move to new office.
2012. GM changed to Nanjo (He organizes Satellite!).
2007 2008 2009 2010 2011 2012 201305
1015202530
AdvertisementAuctionMarrigeInfoseekIchibaPro-sports
10
Current work of Rakuten Sendai
International IchibaDevelopment & Operation
Central Data WareHouseDevelopment & Operation
Development & Operation
System replacementDevelopment & Operation
Development & Operation
Our team!
11
About the usage of Rakuten Ichiba’s Data
12
How/what we use Rakuten Ichiba’s data
Accounting,Giving points
GMS(Gross Merchandise Sales)Reporting for 500 of EC
Consultants
Ranking IchibaPurchaseHistory
Find injustice
Marketing department
And so forth….
13
By the way….
Do you know ….
14
How many orders Rakuten Ichiba
receives per day?
15
A: About2,000,000
Transactions( order based, not items)※
16
How much dataDo our Data warehouse
handle per day?
17
A:About 100GB(this is not all, only needed)
18
How many Items
Does Rakuten Ichiba have?
19
A: about1,400,000,000
Items(2013/10/08 basis)
20
There are some Long and
Funny names of Items
21
http://item.rakuten.co.jp/wakamaru/sale-2908-50offcp/
This is the name of this item!!
22
http://item.rakuten.co.jp/pascoshop/4901820354426/
This is the name of this item!!
23
http://item.rakuten.co.jp/e-cha/hd-sakusakuwakame/
This is the name of this item!!
24
We have such huge(and
funny) data.
25
We must handle such huge data until morning…
26
Like this(1):
This table has about 200,000,000 records
27
Like this(2):
About2 meters
Each tables has about 200,000,000
records
28
How tough…But it is
necessary…
29
Few years ago(- May 2011)
Purchase
Shops
ITEM
RDB1
RDB2
FileFileFile
Scheduler Batch Server
SQLPe rl
Unlo
ad
load
Inte
rface
Old SelectDB
FileFileFile
FileFileFile
FileFileFileFileFileFile
FileFileFile
FileFileFile
107 tables378 interface files
30
Problem
RDBMS had problems such as:-Poor performance…-Lack of disk amount...-Difficult to enhance…-servers are expensive!!
31
Really poor…For example:
32
33
How do we solve it?
34
35
Sweet point of Hadoop
Good performance! As for batch processing, it acts extremely
good performance. Easy to enhance!
Just only add Data nodes. Do not need high performance
servers! Just only commodity servers, so we can
reduce costs!
36
Bitter point of Hadoop
MapReduce is not easy… We decided to use Hive(enable MapReduce
via SQL-like query language called HiveQL)
Hive has no “delete” and “insert into” clause, and HiveQL has many different from SQL… Need to consider before development, deeply
Hive has high latency… Only batch processing
37
Then we decided to use
Hadoop.
38
Rakuten’s Shared Hadoop Cluster
200915 nodes
50TB
201169 nodes
300TB
201330 nodes
1PB
RecommendRanking
Item data analysisBehavior analysis
RecommendRanking
Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis
Suggest
RecommendRanking
Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis
PersonalizeGranting Point
39
Plan:
FileFileFile
Scheduler Batch Server
HiveQLShell/Java
Unlo
ad
Inte
rface
New SelectDB(called Ichiba DWH)
FileFileFile
FileFileFile
FileFileFileFileFileFile
FileFileFile
FileFileFile
Purchase
Shops
ITEM
HadoopCluster
69nodes
load
107 tables378 interface files
40
We had to transfer data from old system to
Hadoop:
107 tables!!
We had to check all diff between old system
and new system:
378 files!!(=378HiveQL)
41
Project was2010-Oct
To 2011-May
42
http://www.pakutaso.com/20130900245post-3233.html
Can we Beat it?
43
Moreover...
44
2011- March
45
We were hit by a huge earthquake on March 11, 2011… the project was in the climax….
Hole at the wall…
46
But we did.
47
At temporary office(like Tako-beya)
48
2011- May
49
We releasedThe new Data warehouse!!
50
Hadoop was great
RDB1 VS
At result, total processing time basis:
161:29:38 99:54:39
Hadoop beat RDBMS40%!!!!
51
No problem at all!!!
52
Detail of architecture of our DWH
FileFileFile
Scheduler Batch Server(Client Node of Hadoop)
HiveQLPerl/Shell/Java
Unlo
ad
Inte
rface
FileFileFile
FileFileFile
Purchase
Shops
ITEM
Rakuten Shared Hadoop Cluster
load
Data Nodes
Job tracker/Name Node
FileFileFile
FileFileFile
FileFileFile
FileFileFile
ICHIBA DWH
53
Now and future
54
Total processing time: 99:54:39
Current situation of our DWH
ICHIBA DWHPurchase
Shops
ITEM
Review
…
Rakuten Shared Hadoop Cluster
Data Nodes
New!
Total processing time: 80:04:04!!
New!
http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131
Tables:202HiveQLs:701
It doubled!
Keeps Growing!!!
55
Future
ICHIBA DWHPurchase
Shops
ITEM
Review
…
Rakuten Shared Hadoop Cluster
Data Nodes
http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131
New!
New!Customer
Support tool
BI
New!
New!
56
We will expand our service and usage of data!!
Currently, we act like a platform team. But our mission is “analytics”.
We are going to focus on Analyzing data, more and more!
And we are going to expand and develop other services which use Rakuten Ichiba’s exciting data!
57
Exciting!!
http://www.pakutaso.com/20130926245post-3235.html
58
We are Waiting for
you!!
59
Join us!
60
Thank you for listening!
Contact me via:
@bangucsMitsuo.hangai
English is OK, of course 日本語でもおk