Upload
ngdata
View
1.896
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011
Citation preview
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
the (lily) rowlog library
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
Lily
Arc
hite
ctur
e(c
ompo
nent
s)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
Lily
Arc
hite
ctur
e(c
ompo
nent
s) ?
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Lily 101
» data repository on top of HBase» records with fields» rich data types + schema» versioning» Java + REST api» indexes into Solr (et al)» a bunch more: smart data at scale, made easy» Apache license - www.lilyproject.org
5
More info?Hadoop World
Tuesday 1:15PM
Met Balroom
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
use of rowlog inside lily
» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» shared concerns:
» reliability» consistency» manageability» (scalability)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
UC1: message queue (mq)
7
record Solr index entryIndexerupdate update
possible failure
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
UC1: message queue (mq)
8
record Solr index entryIndexer update
MQ
upda
te
?
IndexerIndexer
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
UC1: message queue (mq)
9
record Solr index entryIndexer update
MQ
upda
te
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MQ requirements
10
» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same record» no extra tech (HBase should be good enough)
» management complexity» benefits from scalability, resilience, etc
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
UC2: write-ahead-log (WAL)» secondary actions
» pushing messages onto MQ (!)» updating secondary indices (i.e. linkindex)
» requirements» sec. actions eventually get executed, in predefined order» further updates to record denied until sec. actions succeeded» synchronous» pre-update: check WAL for outstanding actions + cleanup
mechanism
11
listener
RowLog
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
the rowlog library
12
RowLog
subscription
subscription
subscription
subscription
listener
listener
VM
Netty
global row-local
queue storage (HBase)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
global queue
13
» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timestamp + (data
table) rowkey + sequence nr» rowlog processor (single instance, managed by ZK)» data always appended/deleted from table end (boo!)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
row-local queue
14
RECORDS table (HBASE) Row-locaL queue DATA
ROW 1
ROW 2
ROW 3
ROW 4
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
row-local queue
15
data payload
payload data
1 2
payload data
execution state
1 2
message ID
consumer id state
CF1 CF2
ROW X
ROW Y
ROW Z
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
why row-local queue?
16
» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have
really happened» rely on atomic row operation guarantee of HBase» msgs on global queue without local counterparts can be discarded
» ‘msgs’ on global rowlog can be small» just point to msgs in row-local queue» actual payload sits there
» optimized processing of msgs per row (i.e. combine)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
rowlog sharding
» MQ and WAL tables tend to be smallish» MQ depends on performance of Solr indexing» WAL size = number of simultaneous operations
» risk for contention (all data in one region)➡ introduction of RowLog sharding (Lily 1.1)
➡ continuous puts/deletes on HBase table = not very efficient ➙ long-term need to replace this
17
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
last words
» RowLog library can be used independent from Lily (!)» part of the Lily source tree
» Apache license
» www.lilyproject.org» shameless plug: go and check out Lily, HBase+Solr-
backed repository for content-centric apps
18
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Thank you !for your attentionfor your questions
» @stevenn