Upload
mu-chun-wang
View
605
Download
1
Embed Size (px)
DESCRIPTION
這場Talk將要分享如何使用HBase來建置一套可以延展的系統,大綱如下: 1. HBase brief introduction:簡單介紹HBase的運作原理 2. Rowkey(Schema) design:Rowkey的設計與AP的效能息息相關,如何設計Rowkey是HBase非常重要的一個課題 3. Best practice in Java:如何在操作HBase時,可以少碰一些雷 4. API Blueprint:分享如何將HBase設計出來的Dataflow,整理成文件 5. HBase Dataflow:可以利用這套工具,將設計出來的Dataflow傳承下去,利於保存 * Keyword: HBase, Rowkey Design, Dataflow * HBase Dataflow: http://kewangtw.github.io/hbase-dataflow
Citation preview
How to build a scalable SNS using HBase
Kewang
三竹資訊
Who I am
● 王慕羣● Java / Node.js / AngularJS● SQL-like / HBase
Github: kewangtw
Facebook: kewangtw
Linkedin: kewangtw
Slideshare: kewang
Mail: [email protected]
Who Mitake is
三竹資訊
Who Mitake is
三竹資訊
大家都唸Mitake
Who Mitake is
三竹資訊
大家都唸Mitake,但我們公司都唸Mitake
Who Mitake is
三竹資訊
Mitake不唸作MiTAC啊!!!
Who Mitake is
三竹資訊
Who Mitake is
三竹資訊● 簡訊平台
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:臺銀、土銀、富邦、台新、聯邦、臺企銀、遠銀、華南、澳盛、郵局、合庫、渣打 ...等 18家
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:臺銀、土銀、富邦、台新、聯邦、臺企銀、遠銀、華南、澳盛、郵局、合庫、渣打 ...等 18家● 產壽險:
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:臺銀、土銀、富邦、台新、聯邦、臺企銀、遠銀、華南、澳盛、郵局、合庫、渣打 ...等 18家● 產壽險:全球、明台、新光、新安東京、富邦 ...等
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:臺銀、土銀、富邦、台新、聯邦、臺企銀、遠銀、華南、澳盛、郵局、合庫、渣打 ...等 18家● 產壽險:全球、明台、新光、新安東京、富邦 ...等● 其他:
Who Mitake is
三竹資訊● 簡訊平台● 行動下單:不計其數● 行動銀行:臺銀、土銀、富邦、台新、聯邦、臺企銀、遠銀、華南、澳盛、郵局、合庫、渣打 ...等 18家● 產壽險:全球、明台、新光、新安東京、富邦 ...等● 其他: udn買東西、手機逛週年慶、財政園地、證交所、綜所稅申報 ...等
System Architecture
19
System Architecture
20
System Architecture (Backend)
21
System Architecture (Frontend)
22
System Architecture (Frontend)
MOPCON 2014 CfP
23
Agenda
● Rowkey design● Best Practice in Java● API Blueprint● HBase Dataflow
24
Rowkey design
25
Rowkey design - Avoid hotspotting
26
Rowkey design - Avoid hotspotting
● Sorted lexicographically
27
Rowkey design - Avoid hotspotting
● Sorted lexicographically
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
28
Rowkey design - Avoid hotspotting
● Sorted lexicographically
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
29
Rowkey design - Avoid hotspotting
● Sorted lexicographically
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
30
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
31
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
32
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
BOX
33
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
BOX
34
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
BOX
a-foo-1
35
Rowkey design - Avoid hotspotting
● Salting, Hashing or Reversing
Region 3
Region 1
Region 2
foo-1
foo-2
foo-3
foo-4
BOX
a-foo-1
b-foo-2
c-foo-3d-foo-4
36
Rowkey design - Refining ID
37
Rowkey design - Refining ID
● SHA1: 40 bytes– 3204c3aefcca4a556f0f7547d056235fa823af3a
38
Rowkey design - Refining ID
● SHA1: 40 bytes– 3204c3aefcca4a556f0f7547d056235fa823af3a
● UUID: 36 bytes– 22bfad60-39d2-11e4-916c-0800200c9a66
39
Rowkey design - Refining ID
● SHA1: 40 bytes– 3204c3aefcca4a556f0f7547d056235fa823af3a
● UUID: 36 bytes– 22bfad60-39d2-11e4-916c-0800200c9a66
● MD5: 32 bytes– 27734b3f4f98e709f58c6ddb0193164e
Too long !!!
41
Rowkey design - Refining ID
● X Algorithm
42
Rowkey design - Refining ID
● X Algorithm: 12 bytes
43
Rowkey design - Refining ID
● X Algorithm: 12 bytes– Auto increment
44
Rowkey design - Refining ID
● X Algorithm: 12 bytes– Auto increment– Ordered
45
Rowkey design - Refining ID
● X Algorithm: 12 bytes– Auto increment– Ordered– Counts to 2.17E21
Rowkey design - Refining ID
● X Algorithm: 12 bytes– Auto increment– Ordered– Counts to 2.17E21– e.g: H00000001B12
47
Rowkey design - Authenticating
48
Rowkey design - Authenticating
● Get frequently
49
Rowkey design - Authenticating
● Get frequently
User Id ID0000001A3B
Access Token d66e3b70-3666-11e4-8c21-0800200c9a66
Expired Time 1410077636
50
Rowkey design - Authenticating
● Get frequently● Multi-login
51
Rowkey design - Authenticating
● Get frequently● Multi-login
User Id ID0000001A3B
Token 0 d66e3b70-3666-11e4-8c21-0800200c9a66+1410077636+Device1
Token 1 92e84bf9-7852-492d-b56a-13ba7acb8fb5+1410123456+Device2
52
Rowkey design - Rice dumpling
53
Rowkey design - Rice dumpling
54
Rowkey design - Rice dumpling
55
Rowkey design - Rice dumpling
Id ME00000024AC
Title Announce
Content We are hiring
56
Rowkey design - Rice dumpling
Id ME00000024AC
Title Announce
Content We are hiring
Id ME00000024AC.ME00000037ZZ
Title (n/a)
Content I want to join your team !!!
57
Rowkey design - Rice dumpling
Id ME00000024AC
Title Announce
Content We are hiring
Id ME00000024AC.ME00000037ZZ
Title (n/a)
Content I want to join your team !!!
58
Rowkey design - Access controlling
59
Rowkey design - Access controlling
60
Rowkey design - Access controlling
Only A, B can see it.
61
Rowkey design - Access controlling
Only A, B can see it.Of course, including me.
62
Rowkey design - Access controlling
● When post a message (Write)
63
Rowkey design - Access controlling
● When post a message (Write)– Generate ACL Id
64
Rowkey design - Access controlling
● When post a message (Write)– Generate ACL Id– Put ACL Id to message, and reader's ACLs
65
Rowkey design - Access controlling
● When post a message (Write)– Generate ACL Id– Put ACL Id to message, and reader's ACLs
● When read my messages (Read)
66
Rowkey design - Access controlling
● When post a message (Write)– Generate ACL Id– Put ACL Id to message, and reader's ACLs
● When read my messages (Read)– Scan my ACLs, and all messages
67
Rowkey design - Access controlling
● When post a message (Write)– Generate ACL Id– Put ACL Id to message, and reader's ACLs
● When read my messages (Read)– Scan my ACLs, and all messages– If my ACLs contains message's ACL Id, can SHOW it
68
Rowkey design - Access controlling
Write
69
Rowkey design - Access controlling
ACL hash hash(A, B, K)+C+R
ACL Id AI0070ADWrite
70
Rowkey design - Access controlling
ACL hash hash(A, B, K)+C+R
ACL Id AI0070AD
Message Id ME00000024AC
Title Announce
Content We are hiring
ACL Id AI0070AD
Write
71
Rowkey design - Access controlling
ACL Id+User Id AI0070AD+A AI0070AD+B AI0070AD+K
Create 1 1 1
Read 1 1 1
Update 0 0 0
Delete 0 0 0Write
72
Rowkey design - Access controlling
User Id+ACL Id A+AI0070AD B+AI0070AD K+AI0070ADCreate 1 1 1Read 1 1 1Update 0 0 0Delete 0 0 0
ACL Id+User Id AI0070AD+A AI0070AD+B AI0070AD+K
Create 1 1 1
Read 1 1 1
Update 0 0 0
Delete 0 0 0Write
73
Rowkey design - Access controlling
Read
74
Rowkey design - Access controllingUser Id+ACL Id K+AI0070AD K+AI028577
Create 1 1
Read 1 1
Update 0 1
Delete 0 1
Read
75
Rowkey design - Access controllingUser Id+ACL Id K+AI0070AD K+A1028577
Create 1 1
Read 1 1
Update 0 1
Delete 0 1
Read
Message Id ME00000024AC
Title Announce
Content We are hiring
ACL Id AI0070AD
76
Rowkey design - Access controllingUser Id+ACL Id K+AI0070AD K+A1028577
Create 1 1
Read 1 1
Update 0 1
Delete 0 1
Read
Message Id ME00000024AC
Title Announce
Content We are hiring
ACL Id AI0070AD
77
Rowkey design - Statistics
78
Rowkey design - Statistics
● Variety of types– e.g., Likes, Comments, Registrations
79
Rowkey design - Statistics
● Variety of types– e.g., Likes, Comments, Registrations
● By unit– i.e., hourly, daily, weekly, monthly, yearly
80
Rowkey design - Statistics
● Variety of types– e.g., Likes, Comments, Registrations
● By unit– i.e., hourly, daily, weekly, monthly, yearly
● By user
81
Rowkey design - Statistics
Unit+Time Base+User Id+Type H+20140908+AAA+Like
11 7
15 22
21 15
Unit+Time Base+User Id+Type D+201409+AAA+Like
08 44
11 58
82
Rowkey design - Statistics
● Sum counts from 2014/9/7 to 2014/9/20 group by user or counting type
Unit+Time Base+User Id+Type D+201409+AAA+Like
02 20
08 52
09 41
... ...
20 55
83
Rowkey design - Statistics
● Sum counts from 2014/9/7 to 2014/9/20 group by user or counting type
Unit+Time Base+User Id+Type D+201409+AAA+Like
02 20
08 52
09 41
... ...
20 55
84
Rowkey design - Statistics
● Sum AAA's counts from 2014/9/7 to 2014/9/20 group by counting type
Unit+Time Base+User Id+Type D+201409+AAA+Like
02 20
08 52
09 41
... ...
20 55
85
Rowkey design - Statistics
● Sum AAA's like counts from 2014/9/7 to 2014/9/20
Unit+Time Base+User Id+Type D+201409+AAA+Like
02 20
08 52
09 41
... ...
20 55
86
Rowkey design - Summary
● Avoid hotspotting● Refining ID● Authenticating● Rice dumpling● Access controlling● Statistics
87
Best Practice in Java
88
No. 1
89
No. 1
USE HashMap
90
No. 1
USE HashMapNoSQL is different from RDBMS
91
No. 1 USE HashMap
OLD
92
No. 1 USE HashMappublic class Validation1 { private String accessToken; private long expiredTime;
public Validation1() { accessToken = null; expiredTime = -1; } public String getAccessToken() { return accessToken; } public void setAccessToken(String accessToken) { this.accessToken = accessToken; } public long getExpiredTime() { return expiredTime; } public void setExpiredTime(long expiredTime) { this.expiredTime = expiredTime; }}
OLD
93
No. 1 USE HashMap
NEW
94
No. 1 USE HashMappublic static final String ACCESS_TOKEN = "access token";
private Map<String, byte[]> putMap;
public Validation1() { super();}
public Validation1(Result result) { super(result);}
public String getAccessToken() { return Bytes.toString(putMap.get(ACCESS_TOKEN));}
public void setAccessToken(String accessToken) { putMap.put(ACCESS_TOKEN, Bytes.toBytes(accessToken));}
NEW
95
No. 1 USE HashMappublic static final String ACCESS_TOKEN = "access token";
private Map<String, byte[]> putMap;
public Validation1() { super();}
public Validation1(Result result) { super(result);}
public String getAccessToken() { return Bytes.toString(putMap.get(ACCESS_TOKEN));}
public void setAccessToken(String accessToken) { putMap.put(ACCESS_TOKEN, Bytes.toBytes(accessToken));}
NEW
96
No. 1 USE HashMap
● Bytes.toXXX() returns always Type XXX or NULL
97
No. 1 USE HashMap
● Bytes.toXXX() returns always Type XXX or NULL– Or throws Exception
No. 1 USE HashMap
● Bytes.toXXX() returns always Type XXX or NULL– Or throws Exception
● Improve default value in Java
99
No. 2
100
No. 2
ONE table, MULTI domains
101
No. 2
ONE table, MULTI domainsNoSQL is different from RDBMS
102
No. 2 ONE table, MULTI domains
● In RDBMS–
–
–
● In NoSQL–
–
–
103
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)–
–
–
● In NoSQL–
–
–
104
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)–
–
–
● In NoSQL (at runtime)–
–
–
105
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column–
–
● In NoSQL (at runtime)–
–
–
106
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column–
–
● In NoSQL (at runtime)– Rowkey always changes–
–
107
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column– Schema is fixed–
● In NoSQL (at runtime)– Rowkey always changes–
–
108
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column– Schema is fixed–
● In NoSQL (at runtime)– Rowkey always changes– Schema always changes–
109
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column– Schema is fixed– DAO serves one domain
● In NoSQL (at runtime)– Rowkey always changes– Schema always changes–
110
No. 2 ONE table, MULTI domains
● In RDBMS (at design time)– Primary key affects only one column– Schema is fixed– DAO serves one domain
● In NoSQL (at runtime)– Rowkey always changes– Schema always changes– DAO serves many domains
111
No. 2 ONE table, MULTI domains
User Id ID0000001A3B
Access Token d66e3b70-3666-11e4-8c21-0800200c9a66
Expired Time 1410077636
112
No. 2 ONE table, MULTI domains
User Id ID0000001A3B
Access Token d66e3b70-3666-11e4-8c21-0800200c9a66
Expired Time 1410077636
User Id+ACL Id ID0000001A3B+AI0070AD
Create 1
Read 1
Update 0
Delete 0
113
No. 2 ONE table, MULTI domains
User Id ID0000001A3B
Access Token d66e3b70-3666-11e4-8c21-0800200c9a66
Expired Time 1410077636
User Id+ACL Id ID0000001A3B+AI0070AD
Create 1
Read 1
Update 0
Delete 0
114
No. 2 ONE table, MULTI domains
● A DAO maps to a domain in RDBMS
115
No. 2 ONE table, MULTI domains
● A DAO maps to a domain in RDBMS
DB DAO Domain A
116
No. 2 ONE table, MULTI domains
● A DAO maps to multiple domains in NoSQL
117
No. 2 ONE table, MULTI domains
● A DAO maps to multiple domains in NoSQL
DB DAO
Domain A1
Domain A2
Domain A3
118
No. 2 ONE table, MULTI domains
● A DAO maps to multiple domains in NoSQL● Build a middle layer to translate multiple domains
DB DAO
Domain A1
Domain A2
Domain A3
119
No. 2 ONE table, MULTI domains
● A DAO maps to multiple domains in NoSQL● Build a middle layer to translate multiple domains
DB DAO
Domain A1
Domain A2
Domain A3
Schema
120
No. 2 ONE table, MULTI domains
● A DAO maps to multiple domains in NoSQL● Build a middle layer to translate multiple domains
DB DAO
Domain A1
Domain A2
Domain A3
Schema
121
No. 2 ONE table, MULTI domains
private String checkDomainType(Result result) { if (result.isEmpty()) { return null; } else { String rowkey = Bytes.toString(result.getRow()); String[] splitKey = rowkey.split(DIVIDER);
if (splitKey.length == 1) {
} }}
122
No. 2 ONE table, MULTI domains
private String checkDomainType(Result result) { if (result.isEmpty()) { return null; } else { String rowkey = Bytes.toString(result.getRow()); String[] splitKey = rowkey.split(DIVIDER);
if (splitKey.length == 1) { return DOMAIN_TYPE_VALIDATION1; } }}
123
No. 2 ONE table, MULTI domains
private String checkDomainType(Result result) { if (result.isEmpty()) { return null; } else { String rowkey = Bytes.toString(result.getRow()); String[] splitKey = rowkey.split(DIVIDER);
if (splitKey.length == 1) { return DOMAIN_TYPE_VALIDATION1; } else if (splitKey.length == 2) { return DOMAIN_TYPE_VALIDATION2; } }}
124
No. 2 ONE table, MULTI domains
private String checkDomainType(Result result) { if (result.isEmpty()) { return null; } else { String rowkey = Bytes.toString(result.getRow()); String[] splitKey = rowkey.split(DIVIDER);
if (splitKey.length == 1) { return DOMAIN_TYPE_VALIDATION1; } else if (splitKey.length == 2) { return DOMAIN_TYPE_VALIDATION2; } else { return DOMAIN_TYPE_VALIDATION3; } }}
125
No. 2 ONE table, MULTI domains
private String checkDomainType(Result result) { if (result.isEmpty()) { return null; } else { String rowkey = Bytes.toString(result.getRow()); String[] splitKey = rowkey.split(DIVIDER);
if (splitKey.length == 1) { return DOMAIN_TYPE_VALIDATION1; } else if (splitKey.length == 2) { return DOMAIN_TYPE_VALIDATION2; } else { return DOMAIN_TYPE_VALIDATION3; } }}
Customize
126
No. 3
127
No. 3
NoSQL is different from RDBMS
128
No. 3
NoSQL is different from RDBMSREALLY !!!
129
API Blueprint
130
131
API Blueprint - Introduction
132
API Blueprint - Introduction
● Web API Language
133
API Blueprint - Introduction
● Web API Language● Pure Markdown
134
API Blueprint - Introduction
● Web API Language● Pure Markdown● Design for Humans
135
API Blueprint - Introduction
● Web API Language● Pure Markdown● Design for Humans● Understandable by Machines
136
API Blueprint - Introduction
● Web API Language● Pure Markdown● Design for Humans● Understandable by Machines● Powerful Tooling
137
API Blueprint - Introduction
● Web API Language● Pure Markdown● Design for Humans● Understandable by Machines● Powerful Tooling● Easy Lifecycle
138
API Blueprint - Hello World
139
API Blueprint - Hello World
140
API Blueprint - Complex
141
API Blueprint - Complex
142
HBase dataflow
143
HBase dataflowconserve your domain know-how
144
HBase dataflow - Solve what ?
145
HBase dataflow - Solve what ?
● How to conserve system know-how about Put, Get, Scan or other operations in HBase ?
146
Paper & Pen ?
147
Paper & Pen ?
148
Redmine / KM ?
149
Redmine / KM ?
150http://kewangtw.github.io/hbase-dataflow/
151
HBase dataflow - introduction
152
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters
153
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
154
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
– to JSON / Markdown
155
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
– to JSON / Markdown– to PNG / PDF
156
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
– to JSON / Markdown– to PNG / PDF
● Import from JSON
157
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
– to JSON / Markdown– to PNG / PDF
● Import from JSON● Write title & summary
158
HBase dataflow - introduction
● HBase operation - Put, Delete, Get, Scan, Filters● Export
– to JSON / Markdown– to PNG / PDF
● Import from JSON● Write title & summary● Open source
159
Live DEMO
160
Design API Step by Step
161
Design API Step by Step
1.Paper & pen always are your friends
162
Design API Step by Step
1.Paper & pen always are your friends
2.Use HBase dataflow to simulate data's flow
163
Design API Step by Step
1.Paper & pen always are your friends
2.Use HBase dataflow to simulate data's flow
3.Export it
164
165
References
● HBase in Action● apiblueprint, aglio● HBase dataflow
166
167