60
DWH/Hadoop in Rakuten Ichiba Vol.01 Oct/26/2013 Mitsuo Hangai Sendai Development Gruop New Service Development Department, Rakuten, Inc. http://www.rakuten.co.jp/

[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

Embed Size (px)

DESCRIPTION

Rakuten Technology Conference 2013 "DWH/Hadoop in Rakuten Ichiba" Mitsuo Hangai (Rakuten)

Citation preview

Page 1: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

DWH/Hadoop in Rakuten Ichiba

Vol.01   Oct/26/2013Mitsuo Hangai

Sendai Development GruopNew Service Development Department, Rakuten, Inc.http://www.rakuten.co.jp/

Page 2: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

2

Self introduction

Mitsuo Hangai( 半谷 充生 )Rakuten, Inc. Service Development Sendai Group

@bangucs

汉语

Page 3: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

3

Agenda

About Sendai Branch

What is our data ware-house

Now and future

Page 4: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

4

About Sendai Branch

Page 5: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

5

Do you know Sendai?

Page 6: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

6

About Sendai

白地図、世界地図、日本地図が無料【白地図専門店】 http://www.freemap.jp/japan/ja_kouiki_japan_big_scale_3.html

Sendai City

Page 7: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

7

About Sendai

Page 8: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

8

About Sendai branch

Page 9: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

9

History of Sendai Development Group

2007. Foundation for Pro-sports.

2008. Start Ichiba Business Support and Infoseek operations.

2009. Growing up and starting Advertisement development.

2010. Start Marriage operations.

2011.Hit by the huge earthquake… Move to new office.

2012. GM changed to Nanjo (He organizes Satellite!).

2007 2008 2009 2010 2011 2012 201305

1015202530

AdvertisementAuctionMarrigeInfoseekIchibaPro-sports

Page 10: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

10

Current work of Rakuten Sendai

International IchibaDevelopment & Operation

Central Data WareHouseDevelopment & Operation

Development & Operation

System replacementDevelopment & Operation

Development & Operation

Our team!

Page 11: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

11

About the usage of Rakuten Ichiba’s Data

Page 12: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

12

How/what we use Rakuten Ichiba’s data

Accounting,Giving points

GMS(Gross Merchandise Sales)Reporting for 500 of EC

Consultants

Ranking IchibaPurchaseHistory

Find injustice

Marketing department

And so forth….

Page 13: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

13

By the way….

Do you know ….

Page 14: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

14

How many orders Rakuten Ichiba

receives per day?

Page 15: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

15

A: About2,000,000

Transactions( order based, not items)※

Page 16: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

16

How much dataDo our Data warehouse

handle per day?

Page 17: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

17

A:About 100GB(this is not all, only needed)

Page 18: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

18

How many Items

Does Rakuten Ichiba have?

Page 19: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

19

A: about1,400,000,000

Items(2013/10/08 basis)

Page 20: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

20

There are some Long and

Funny names of Items

Page 21: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

21

http://item.rakuten.co.jp/wakamaru/sale-2908-50offcp/

This is the name of this item!!

Page 22: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

22

http://item.rakuten.co.jp/pascoshop/4901820354426/

This is the name of this item!!

Page 23: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

23

http://item.rakuten.co.jp/e-cha/hd-sakusakuwakame/

This is the name of this item!!

Page 24: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

24

We have such huge(and

funny) data.

Page 25: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

25

We must handle such huge data until morning…

Page 26: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

26

Like this(1):

This table has about 200,000,000 records

Page 27: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

27

Like this(2):

About2 meters

Each tables has about 200,000,000

records

Page 28: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

28

How tough…But it is

necessary…

Page 29: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

29

Few years ago(- May 2011)

Purchase

Shops

ITEM

RDB1

RDB2

FileFileFile

Scheduler Batch Server

SQLPe rl

Unlo

ad

load

Inte

rface

Old SelectDB

FileFileFile

FileFileFile

FileFileFileFileFileFile

FileFileFile

FileFileFile

107 tables378 interface files

Page 30: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

30

Problem

RDBMS had problems such as:-Poor performance…-Lack of disk amount...-Difficult to enhance…-servers are expensive!!

Page 31: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

31

Really poor…For example:

Page 32: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

32

Page 33: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

33

How do we solve it?

Page 34: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

34

Page 35: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

35

Sweet point of Hadoop

Good performance! As for batch processing, it acts extremely

good performance. Easy to enhance!

Just only add Data nodes. Do not need high performance

servers! Just only commodity servers, so we can

reduce costs!

Page 36: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

36

Bitter point of Hadoop

MapReduce is not easy… We decided to use Hive(enable MapReduce

via SQL-like query language called HiveQL)

Hive has no “delete” and “insert into” clause, and HiveQL has many different from SQL… Need to consider before development, deeply

Hive has high latency… Only batch processing

Page 37: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

37

Then we decided to use

Hadoop.

Page 38: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

38

Rakuten’s Shared Hadoop Cluster

200915 nodes

50TB

201169 nodes

300TB

201330 nodes

1PB

RecommendRanking

Item data analysisBehavior analysis

RecommendRanking

Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis

Suggest

RecommendRanking

Item data analysisBehavior analysisJapan Ichiba DWHAccess log analysis

PersonalizeGranting Point

Page 39: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

39

Plan:

FileFileFile

Scheduler Batch Server

HiveQLShell/Java

Unlo

ad

Inte

rface

New SelectDB(called Ichiba DWH)

FileFileFile

FileFileFile

FileFileFileFileFileFile

FileFileFile

FileFileFile

Purchase

Shops

ITEM

HadoopCluster

69nodes

load

107 tables378 interface files

Page 40: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

40

We had to transfer data from old system to

Hadoop:

107 tables!!

We had to check all diff between old system

and new system:

378 files!!(=378HiveQL)

Page 41: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

41

Project was2010-Oct

To 2011-May

Page 42: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

42

http://www.pakutaso.com/20130900245post-3233.html

Can we Beat it?

Page 43: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

43

Moreover...

Page 44: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

44

2011- March

Page 45: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

45

We were hit by a huge earthquake on March 11, 2011… the project was in the climax….

Hole at the wall…

Page 46: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

46

But we did.

Page 47: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

47

At temporary office(like Tako-beya)

Page 48: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

48

2011- May

Page 49: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

49

We releasedThe new Data warehouse!!

Page 50: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

50

Hadoop was great

RDB1 VS

At result, total processing time basis:

161:29:38 99:54:39

Hadoop beat RDBMS40%!!!!

Page 51: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

51

No problem at all!!!

Page 52: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

52

Detail of architecture of our DWH

FileFileFile

Scheduler Batch Server(Client Node of Hadoop)

HiveQLPerl/Shell/Java

Unlo

ad

Inte

rface

FileFileFile

FileFileFile

Purchase

Shops

ITEM

Rakuten Shared Hadoop Cluster

load

Data Nodes

Job tracker/Name Node

FileFileFile

FileFileFile

FileFileFile

FileFileFile

ICHIBA DWH

Page 53: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

53

Now and future

Page 54: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

54

Total processing time: 99:54:39

Current situation of our DWH

ICHIBA DWHPurchase

Shops

ITEM

Review

Rakuten Shared Hadoop Cluster

Data Nodes

New!

Total processing time: 80:04:04!!

New!

http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131

Tables:202HiveQLs:701

It doubled!

Keeps Growing!!!

Page 55: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

55

Future

ICHIBA DWHPurchase

Shops

ITEM

Review

Rakuten Shared Hadoop Cluster

Data Nodes

http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131

New!

New!Customer

Support tool

BI

New!

New!

Page 56: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

56

We will expand our service and usage of data!!

Currently, we act like a platform team. But our mission is “analytics”.

We are going to focus on Analyzing data, more and more!

And we are going to expand and develop other services which use Rakuten Ichiba’s exciting data!

Page 57: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

57

Exciting!!

http://www.pakutaso.com/20130926245post-3235.html

Page 58: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

58

We are Waiting for

you!!

Page 59: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

59

Join us!

Page 60: [RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

60

Thank you for listening!

Contact me via:

@bangucsMitsuo.hangai

[email protected]

English is OK, of course 日本語でもおk