NoSQL ‚’祂‹ Cassandra ‹‚‰ NoSQL ‚’­¦¶

  • View
    486

  • Download
    12

Embed Size (px)

DESCRIPTION

NoSQL を知る Cassandra から NoSQL を学ぶ. 王 海東 先端技術研究センター http://sjitech.github.io / 2013 年 12 月 18 日. 1999. 2005. 2012. 2013. 自己紹介. 8 月中途入社の「新入社員」. 1999.09 旧サン・ジャパン南京日恒 C 、 Java. 2005.02 旧サン・ジャパン Java. 2012.05 GREE PHP. 2013.08 SJI 様々な技術を触れてみる - PowerPoint PPT Presentation

Text of NoSQL ‚’祂‹ Cassandra ‹‚‰ NoSQL...

PowerPoint

NoSQLCassandraNoSQL http://sjitech.github.io/

20131218

12819992005201220131999.09 CJava2005.02 Java2012.05 GREE PHP2013.08 SJI 33SJI4

http://sjitech.github.io/NoSQL1Cassandra23QA4Agenda55NoSQL6 (RDBMS)

RDBMSNoSQL != Big DataNoSQL != No SQLNoSQL = Not Only SQLNoSQLRDBMSRDBMSRDBMSMySQLmemcachedPostgreSQLJSONFusionIOSSD7RDBMS

RDBMS2TB2TB 100MB/ 5.688NoSQL9RDBMSRDBMS10

RDBMS NoSQL

11

WebAPDB

DB

CDN DB12DB

DB

user_id%100user00user99

DBDBDBDB

DB1

DB2

DB3

DB4user00user24user24user49user50user74user75user99DB13

MasterSlaveSlaveSlaveStandbyStandby

MasterMySQL PluginNoSQL14JOINNoSQL15

15016

171P2P

HA18

HA

P2PmemcachedRedismongoDBHBaseCassandraDynamo19KVS(XMLJSONKVS20KVSKEYVALUE1V12V221

22ColumnKEYVALUEColumnKEYVALUEColumnKEYVALUEColumnKEYVALUEColumnKEYVALUEColumnKEYVALUEColumnKEYVALUERowKeyRowKey2223

24

KVSNoSQL25BASECAPConsistent HashingEventual Consistency26ACIDRDBMSAtomicityConsistencyIsolationDurabilityACIDBASEConsistencyIsolation27BASENoSQLBasically AvailableSoft-stateEventual consistencyOKHard-stateSoft-stateSoft-stateBest effortHard-stateSoft-stateCAP28Eric BrewerSeth GilbertNancy LynchConsistencyAvailabilityPartitiontoleranceConsistencyAvailabilityPartition tolerance3229AvailabilityConsistencyPartitiontoleranceAC

RDBMSMySQLOraclePostgreSQL2APCEventual ConsistencyCP

BigTableHBaseMongoDBRedisMemcachedHypertableCassandraDynamoCouchDBTokyo Cabinet30Consistent Hashing

Ahash(A.id)Bhash(B.id)Dhash(D.id)Chash(C.id)hash(key1)BCDA3031

Ahash(A.id)Bhash(B.id)Dhash(D.id)Chash(C.id)set(key1)3132

Ahash(A.id)Bhash(B.id)Dhash(D.id)Chash(C.id)get(key1)3233

ABDCBCDA

EDE3334

ABDC

Egetset34Eventual Consistency35Amazon CTO

Eventual3536NRW

Process A(W=1)(N=)Process B (R=1)R+WNW=22()R=3Strong Consistency(N=)3738

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tunable consistent, column-oriented database. Cassandra?NoSQL38Cassandra39Amazon DynamoGoogle BigtableAP->CP vs SPOFSQL (0.8 ) ThriftAvroAPIReadWrite3940

Cassandra

datastax2008ASF410.70.81.11.22.0 2011/01 2011/06 CQL 2012/04 SSD+HDDCQLHadoop 2013/01 CQL32013/09 Lightweight transaction2.0.3 CQLCassandra Query Language SQL42

30Cassandra43Row KeyKEYVALUETimestampColumnColumnColumnColumnRow KeyRow KeyColumnColumnColumnColumnColumnColumnRow KeyColumnColumnColumnRow KeyRow KeyColumnColumnColumnColumnRow KeyColumnColumn 2.0.xSuperColumnSuperColumnFamily CFColumn FamilyIndexed44RDBMSJavaKey Space11Schema/DatabaseSetColumn FamilyTableMapRowRowOrderedMapColumnColumn(Name, Value)(key, value, timestamp)4CLIset Keyspace1.ColumnFamily2[row1][column3] = value3CQLINSERT INTO Keyspace1.ColumnFamily2(column3) VALUES(value3) WHERE id = row14445CLIGetSetCQL(Cassandra Query Language)SQLNoSQLSQLAPICQL DriverJava DriverC# DriverThriftRPCFacebook12AvroHadoopC, C++, C#, Java, PHP, Python, and RubyHector(Java)Pycassa(Python)4546Cassandra(HDD(Seek time(+ Seek time47Gossip ProtocolConsistent hashingvirtual nodesPartitionersStrategySnitchesIOWriteHinted HandoffReadRead repairAnti-entropy repair48Gossip ProtocolConsistent hashingvirtual nodesPartitionersStrategySnitchesIOWriteHinted HandoffReadRead repairAnti-entropy repair49CassandraGossip Protocol

Gossip Protocol (JOIN,DEAD,AVAIL)

GossipIO4950CassandraGossip1Gossip endpoint1Gossip: unreachableN /(liveN + 1)GossipGossip 1GossipSeed or liveN < SeedNSeedGossipSeed: staticGossipApplicationStateLoad AverageJOIN,DEAD,AVAIL HeartBeatStatehttp://highscalability.com/blog/2011/11/14/using-gossip-protocols-for-failure-detection-monitoring-mess.html5051Gossip ProtocolConsistent hashingvirtual nodesPartitionersStrategySnitchesIOWriteHinted HandoffReadRead repairAnti-entropy repair52Cassandra1.2RandomPartitioner1.2ByteOrderedPartitionerHashMurmur3Partitioner1.2 1.253

ABDC0, 212723716703940000153059732412441632990819 72360816833403413813516172818645147903 53716703941129153059732412441632990819 123621947362397555094783433836216926846md5(key)=514755909755515592000481005244904880883 set(key)W=2Consistent HashingTokenMD50~2^127 hash ringTokenToken< () TokenDataTokenringData

Zero-hop DHT(OKRandomPartitioner2^127 CassandraMD5hash128(Token)5354Murmur3PartitionerRandomPartitionerRandomPartitionerCPU-2^63 ~ 2^63 -1-9223372036854775808 ~ 922337203685477580755

ABDC

EDTBConsistent hashing561.2Dynamo1hash ring range(

From Datastaxs Cassandra document

hash ring57

Virtual nodes

58Gossip ProtocolConsistent hashingvirtual nodesPartitionersStrategySnitchesIOWriteHinted HandoffReadRead repairAnti-entropy repair59CoordinatorCoordinatorN-1SimpleStrategyPartitionerCoordinatorhash ring (network topology)NetworkTopologyStrategyDCDCDCrack

60NetworkTopologyStrategy

writeN=361NetworkTopologyStrategyRack1nodeRack2nodeRack3nodeRack4nodewriteN=362Snitches

Dynamic snitchingSimpleSnitchDCrackRackInferringSnitchIPDCrackPropertyFileSnitchDCrack175.54.35.197 =DC1:RAC1120.53.24.101 =DC1:RAC2GossipingPropertyFileSnitchDCrackGossipDCrackEC2SnitchEC2MultiRegionSnitchAmazon EC2EC2 region nameDCECprivate IPpublic IP110.100.200.105DCracknode63Gossip ProtocolConsistent hashingvirtual nodesPartitionersStrategySnitchesIOWriteHinted HandoffReadRead repairAnti-entropy repair64Consistency LevelCassandraLevelANY1writeCoordinatorOKONE,TWO,THREEresponse1,2,3commit logmemory tableQUORUMresponse / 2 + 1 LOCAL_QUORUMQUORUMLOCAL_ONEONEEACH_QUORUMALLresponseStrong ConsistencySERIAL2.0lightweight transactionsACIDisolation levelWrite Consistency LevelsLevelONE,TWO,THREE1,2,3 Read repairQUORUM / 2 + 1 LOCAL_QUORUMDCLOCAL_ONEDynamic snitching1.2EACH_QUORUMLOCAL_QUORUMALLRead Consistency Levels65N=3WriteR1R2R366N=2Hinted Hand-offGossip Write (Node,Proxy Node) Gossip HintSystemTableCF Consistency Levelany() anyHint Hinted HandoffRead Repair

6667N=6WriteR4DC1DC2R5R6R1R2R3ProxyProxy68ReadCassandraread K1V1, t1V2, t2Not found!node1node2node3proxyV2, t2read repairAnti-Entropy repairMerkle Tree CFMerkle Tree LeafRow(Hash) Hash I/O Merkle Tree check

6869WriteCompactionReadDelete70WriteCompactionReadDelete71GoogleBigTable+Commit LogSSTableMemTableSSTableMemtableindexBloomFilter72WriteCompactionReadDelete73WriteCommit LogMemTableIndexSSTableflushflushSSTableflushGCBloom FilterIndexSSTableBloom FilterIndexSSTableBloom FilterIndexSSTableBloom FilterBloomFilter()IndexIndexSSTableBloom FilterReadSSTableCompactionMinor CompactionSSTableMajor CompactionCFSSTabletombstoneMergeSSD7374WriteCompactionReadDelete75ReadMemTableIndexSSTableBloom FilterIndexSSTableBloom FilterIndexSSTableBloom FilterRow CacheSSTableKey CacheIndexKey CacheRow CacheSSD7576WriteCompactionReadDelete77Delete1. 2. tombstone & JVM GCtombstoneTombstoneGC (GC Time:10)78Java$ curl L O http://ftp.riken.jp/net/apache/cassandra/2.0.3/apache-cassandra-2.0.3-bin.tar.gz$ tar xzf apache-cassandra-2.0.3-bin.tar.gz$ cd apache-cassandra-2.0.3$ sudo mkdir p /var/lib/cassandra/$ sudo mkdir p /var/log/cassandra/$ vim conf/cassandra.yaml #

$ bin/cassandra f # cassandra

$ bin/cqlsh # CQLCQLSQLcqlsh> CREATE KEYSPACE demodb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 3};

cqlsh> CREATE TABLE users ( user_name varchar, password varchar, gender varchar, birth_year bigint, PRIMARY KEY (user_name));

cqlsh> INSERT INTO users (user_name , password , gender , birth_year ) VALUES (cassandra, nosql, unknown', 2006);

cqlsh> SELECT * FROM users WHERE user_name = 'cassandra';79nodetoolnodetool compactkey spacecolumn familycompactionnodetool repairnodetool cleanupnodetool snapshotnodetool streamsnodetool decommissionnodetool removetokennodetool movesstable2jsonjson2sstableexportimport

80JavaJMXjconsoleservice:jmx:rmi:///jndi/rmi://localhost:8080/jmxrmi

NagiosJMXhttp://www.mahalo.com/how-to-monitor-cassandra-with-nagios81DatastaxOpsCenter

Its free!8283

CloudianNoSQL84NoSQLRDBMSDBRDBMSNoSQL

NoSQLRDBMSRDBMSSQLACIDGoogle spannerHadoopIDEORMNoSQL85CassandraNetflixdiggCTOJVMGCCompactionRepairFacebookWebHadoopHadoopCassandraFSTwitterFacebookHBaseP2PHBase8586Thanks for your patience!http://sjitech.github.io/https://github.com/sjitech87Question?