19
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org Lily A SMART DATA PLATFORM MAKING BIG DATA APPS EASY

The Lily RowLog library

  • Upload
    ngdata

  • View
    1.896

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Citation preview

Page 1: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

LilyA SMART DATA PLATFORMMAKING BIG DATA APPS EASY

Page 2: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

the (lily) rowlog library

Page 3: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3

Lily

Arc

hite

ctur

e(c

ompo

nent

s)

Page 4: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4

Lily

Arc

hite

ctur

e(c

ompo

nent

s) ?

Page 5: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily 101

» data repository on top of HBase» records with fields» rich data types + schema» versioning» Java + REST api» indexes into Solr (et al)» a bunch more: smart data at scale, made easy» Apache license - www.lilyproject.org

5

More info?Hadoop World

Tuesday 1:15PM

Met Balroom

Page 6: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6

use of rowlog inside lily

» feed Solr index with (Lily|HBase) record updates» maintain secondary indices (i.e. linkindex)» shared concerns:

» reliability» consistency» manageability» (scalability)

Page 7: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

UC1: message queue (mq)

7

record Solr index entryIndexerupdate update

possible failure

Page 8: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

UC1: message queue (mq)

8

record Solr index entryIndexer update

MQ

upda

te

?

Page 9: The Lily RowLog library

IndexerIndexer

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

UC1: message queue (mq)

9

record Solr index entryIndexer update

MQ

upda

te

Page 10: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MQ requirements

10

» async (cope with Solr ‘lag’)» guaranteed execution» no concurrent processing of 2 msg about the same record» no extra tech (HBase should be good enough)

» management complexity» benefits from scalability, resilience, etc

Page 11: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

UC2: write-ahead-log (WAL)» secondary actions

» pushing messages onto MQ (!)» updating secondary indices (i.e. linkindex)

» requirements» sec. actions eventually get executed, in predefined order» further updates to record denied until sec. actions succeeded» synchronous» pre-update: check WAL for outstanding actions + cleanup

mechanism

11

Page 12: The Lily RowLog library

listener

RowLog

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

the rowlog library

12

RowLog

subscription

subscription

subscription

subscription

listener

listener

VM

Netty

global row-local

queue storage (HBase)

Page 13: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

global queue

13

» separate HBase table» 1 msg per record update per subscription» key = (shard id +) subscription ID + timestamp + (data

table) rowkey + sequence nr» rowlog processor (single instance, managed by ZK)» data always appended/deleted from table end (boo!)

Page 14: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

row-local queue

14

RECORDS table (HBASE) Row-locaL queue DATA

ROW 1

ROW 2

ROW 3

ROW 4

Page 15: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

row-local queue

15

data payload

payload data

1 2

payload data

execution state

1 2

message ID

consumer id state

CF1 CF2

ROW X

ROW Y

ROW Z

Page 16: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

why row-local queue?

16

» predates Inbox-concept (Google Megastore)» msgs will appear on rowlog if and only if updates have

really happened» rely on atomic row operation guarantee of HBase» msgs on global queue without local counterparts can be discarded

» ‘msgs’ on global rowlog can be small» just point to msgs in row-local queue» actual payload sits there

» optimized processing of msgs per row (i.e. combine)

Page 17: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

rowlog sharding

» MQ and WAL tables tend to be smallish» MQ depends on performance of Solr indexing» WAL size = number of simultaneous operations

» risk for contention (all data in one region)➡ introduction of RowLog sharding (Lily 1.1)

➡ continuous puts/deletes on HBase table = not very efficient ➙ long-term need to replace this

17

Page 18: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

last words

» RowLog library can be used independent from Lily (!)» part of the Lily source tree

» Apache license

» www.lilyproject.org» shameless plug: go and check out Lily, HBase+Solr-

backed repository for content-centric apps

18

Page 19: The Lily RowLog library

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Thank you !for your attentionfor your questions

» [email protected]

» @stevenn