32
MongoDB at Groupon Peter Bakkum @pbbakkum

MongoDB San Francisco 2013: Using MongoDB for Groupon's Place Data presented by Peter Bakkum, Member of Technical Staff, Groupon

  • Upload
    mongodb

  • View
    1.400

  • Download
    0

Embed Size (px)

Citation preview

MongoDB at Groupon

Peter Bakkum

@pbbakkum

MerchantData

CRM

MerchantPages

MerchantData

CRM

MerchantPages

Self-Service Others

Arnold: Declarative Crowd-Machine

Data IntegrationShawn Jeffery, Liwen Sun, Matt DeLand,

Nick Pendar, Rick Barber, Andrew Galdi

CIDR 2013cidrdb.org/cidr2013/Papers/CIDR13_Paper22.pdf

Concordance

{

name: “Joe’s Pizza”,

location: {

address: “1000 Market St.”,

postal_code: “94100-1001”

},

source: 1

}

{

name: “Joes Pizza”,

location: {

address: “1000 Market Street”,

postal_code: “94100”

},

source: 2

}

{

name: “Joes”,

location: {

address: “1000 Market”,

postal_code: “94100”

},

source: 3

}

Data Systems

Content Input

Data Processing

Serving

Content Input

Data SetsInput Feeds

Normalization

Crowd Sourcing

Web Crawling

Content Input

Crawl Store

Web Crawler

Configuration

Normalized Data

Recent Feed History

Data Processing

Storm Topology

Storm

Serving

HTTP Access Layer Varnish

Data Processing

storm topology

bolts

parsing

normalization

concordance

geocoding

persistence

place model

record

tree

places.find({

_id: “013e4e2afc26”

})

placeCollection.find({

location.postcode: “94100”,

location.country: “US”

})

places.findAndModify(

{

_id: “013e4e2afc26”

persisted_at: “2013-02-01T0:00:00Z”

},

{ place model })

Concordance

Persistence

config cluster

4 arbiters

4 shards of 2 nodes

replica set failover

64 GB dedicated hardware

storm workers

mongos routers

ID Scheme

UUID v1

82d991c6-b098-11e2-8fc0-c82a14fffe86

82d9996e-b098-11e2-8fc0-c82a14fffe86

82d99f04-b098-11e2-8fc0-c82a14fffe86

82d9a40e-b098-11e2-8fc0-c82a14fffe86

UUID vB

wwwwwwww-xxxx-byyy-yyyy-zzzzzzzzzzzz

w: controllable counter

x: process id

b: literal 'b’

y: fragment of MAC address

z: milliseconds since epoch (UTC)

toggle

c8c9cef9-7a7f-bd53-7a50-013e4e2afbde 14951cfa-7a7f-bd53-7a50-013e4e2afbde 6f5169fb-7a7f-bd53-7a50-013e4e2afbde ba2da6fc-7a7f-bd53-7a50-013e4e2afbde

f5166777-7a7f-bd53-7a50-013e4e2afc26 f5166778-7a7f-bd53-7a50-013e4e2afc26 f5166779-7a7f-bd53-7a50-013e4e2afc26 f516677a-7a7f-bd53-7a50-013e4e2afc26

github.com/groupon/locality-uuid.java

github.com/groupon/locality-uuid.rb

Backup and MapReduce

Hadoop Cluster

places.ns

places.0

places.1

places.2

…char[128]

name

DiskLoc

firstExtent

DiskLoc

lastExtent

places.ns

places.0

places.1

places.2

places.ns

places.0

places.1

places.2

places.0 places.1

extent extent extent

places.ns

places.0

places.1

places.2

places.0 places.1

extent extent extent

MapReduceInput Split

MapReduceInput Split

MapReduceInput Split

public void map(

Text key,

WritableBSONObject value,

Context context)

{

String id = (String) value.get(“_id”);

...

}

Mongo Cluster

Hadoop Cluster

MapReduce Job

Backs up Mongo data to

Hadoop

Much faster data export

Exploits our Hadoop cluster

Peter Bakkum

@pbbakkum

[email protected]