53
High-order bits from Cassandra & Hadoop srisatish ambati @srisatish

High order bits from cassandra & hadoop

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: High order bits from cassandra & hadoop

High-order bits from Cassandra & Hadoop

srisatish ambati@srisatish

Page 2: High order bits from cassandra & hadoop

NoSQL -Know your queries.

Page 3: High order bits from cassandra & hadoop

points

• Usecases• Why NoSQL?• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency • Why facebook is not using Cassandra?• Community, Code, Tools• Q&A

Page 4: High order bits from cassandra & hadoop

Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy

Page 5: High order bits from cassandra & hadoop

TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value

Metrics typically way larger dataset than users.

Page 6: High order bits from cassandra & hadoop

Why Cassandra?

Page 7: High order bits from cassandra & hadoop

Operational simplicity peer-to-peer

Page 8: High order bits from cassandra & hadoop

Operational simplicity peer-to-peer

Page 9: High order bits from cassandra & hadoop

Replication: Multi-datacenterMulti-region ec2Multi-availability zones

Page 10: High order bits from cassandra & hadoop

Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones

dc1 dc2

reads local

Page 11: High order bits from cassandra & hadoop

“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage:

Page 12: High order bits from cassandra & hadoop

Netflix was running on AWS.

4.21.2011, Amazon Web Services outage:

Page 13: High order bits from cassandra & hadoop

fast durable writes. fast reads.

Page 14: High order bits from cassandra & hadoop

Writes Sequential, append-only.~1-5ms

Page 15: High order bits from cassandra & hadoop

Writes Sequential, append-only.~1-5ms

On cloud: ephemeral disks rock!

Page 16: High order bits from cassandra & hadoop

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

Page 17: High order bits from cassandra & hadoop

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

ssds, improved read performance!

Page 18: High order bits from cassandra & hadoop

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Page 19: High order bits from cassandra & hadoop

Usecase #3: hadoopHdfs cassandra hiveLogs stats analytics

Page 20: High order bits from cassandra & hadoop

BriskTruly peer-to-peer hadoop.

Page 21: High order bits from cassandra & hadoop

mv computationnot data

Page 22: High order bits from cassandra & hadoop
Page 23: High order bits from cassandra & hadoop

Parallel Execution View

Page 24: High order bits from cassandra & hadoop
Page 25: High order bits from cassandra & hadoop

jobtracker, tasktrackerhdfs: namenode, datanode

Page 26: High order bits from cassandra & hadoop

clouderaamazon: elastic map reducehortonworksmapRbrisk

Page 27: High order bits from cassandra & hadoop

Namenode decomposition, explained.

Page 28: High order bits from cassandra & hadoop
Page 29: High order bits from cassandra & hadoop
Page 30: High order bits from cassandra & hadoop

Use column families (tables)inodesblock

Page 31: High order bits from cassandra & hadoop

near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes

Page 32: High order bits from cassandra & hadoop

FUD, acronym: fear, uncertainty, doubt.

Page 33: High order bits from cassandra & hadoop

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS

* N is replication factor. Not to be confused with T=total #of nodes

Page 34: High order bits from cassandra & hadoop

Tune-able, flexibility.For High Consistency:

read:quorum, write:quorumFor High Availability:

high W, low R.

Page 35: High order bits from cassandra & hadoop
Page 36: High order bits from cassandra & hadoop

Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.

Average NoSQL deployment size: ~6-12 nodes.

Page 37: High order bits from cassandra & hadoop

Usecase #5: searchApache Solr + Cassandra = Solandra

Other inbox/file Searches:xobni, c3

github.com/tjake/solandra

Page 38: High order bits from cassandra & hadoop

“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.

Page 39: High order bits from cassandra & hadoop

Miscellaneous, Myth: data-loss, partial rows.writes are durable.

Page 40: High order bits from cassandra & hadoop

Three good reasons for Cassandra...

Page 41: High order bits from cassandra & hadoop

ToolsAMIs, OpsCenter, DataStaxAppDynamics

Page 42: High order bits from cassandra & hadoop

B e a u t i f u l C 0 d e

= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.

Page 43: High order bits from cassandra & hadoop

Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.

compaction.

Page 44: High order bits from cassandra & hadoop

CommunityRobust. Rapid. #Professional support from DataStax.Filesystem innovatin from Acunu

engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..

Come join the efforts!

Page 45: High order bits from cassandra & hadoop
Page 46: High order bits from cassandra & hadoop

Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra

Page 47: High order bits from cassandra & hadoop
Page 48: High order bits from cassandra & hadoop
Page 49: High order bits from cassandra & hadoop

Copyright: xkcd

Page 50: High order bits from cassandra & hadoop

Copyright: plantoys

… more than one way to do it!

Page 51: High order bits from cassandra & hadoop

Summary -high scale peer-to-peer datastore

best friend for multi-region, multi-zone availability.

Hadoop – HDFS engulfing the DataWorld

Page 52: High order bits from cassandra & hadoop

Q&A@srisatish

Page 53: High order bits from cassandra & hadoop

NoSQL -Know your queries.