35
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 1 DILEEP KALIDINDI 23 rd February 2015 Explore, Build & Operate NoSQL with Apache Cassandra

Exploring NoSQL and implementing through Cassandra

Embed Size (px)

Citation preview

Page 1: Exploring NoSQL and implementing through Cassandra

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1

DILEEP KALIDINDI23rd February 2015

Explore, Build & Operate

NoSQL with

Apache Cassandra

Page 2: Exploring NoSQL and implementing through Cassandra

Who am I ?

Dileep Varma Kalidindi

Current: Senior Engineer @Responsys (since Apr’14), Circles Team.

Fascination: Problem Solving , Distributed & BigData churning systems.

Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.

Hobbies: Adventure sports.

Page 3: Exploring NoSQL and implementing through Cassandra

05/02/2023

Are we good ?

3

Data

Page 4: Exploring NoSQL and implementing through Cassandra

Data

Data has never been in same structure, so as their modelling techniques.

Applications evolved from OLAP, OLTP to Web, Mobile & Social.

Big Data comes with different characteristics – Volume, Velocity, Variety, Veracity & Value.

Responsys Data:

Need for better suitable Data models and Storage models

- but why ?

Page 5: Exploring NoSQL and implementing through Cassandra
Page 6: Exploring NoSQL and implementing through Cassandra

Impending Mismatch –Data model & Storage model

SQL relational model is User oriented

in store concurrency, integrity, consistency, or data type validity

Transactional guarantees, schemas and referential integrity

Purpose applications tend to control integrity and validity (not aggregation fancy)

Difference between the persistent data model and the in-memory data structures.

Data duplication and denormalization are now First class citizens !!

Scale–up to Scale– wide – NoSQL Multinode vs RDBMS clustering.

Page 7: Exploring NoSQL and implementing through Cassandra

Conceptual – ACID, BASE & CAP

Transactions, consistency and availability – could we prioritize ?

Page 8: Exploring NoSQL and implementing through Cassandra

CAP theorem - consequences

Page 9: Exploring NoSQL and implementing through Cassandra
Page 10: Exploring NoSQL and implementing through Cassandra

Agenda

NoSQL NoSQL Implementations – for various purposes Architecture fit – Polyglot persistence Data modelling – concepts in view of NoSQL . Cassandra – Architecture Database Internals CQL & DEMO Installation, Configuration & tools Oracle NoSQL – pitch by Sheetal

Page 11: Exploring NoSQL and implementing through Cassandra

# NoSQL

May 2, 2023 11

Page 12: Exploring NoSQL and implementing through Cassandra

NoSQL

Non-relational, distributed, open-source & horizontally scalable #nxtGen

NoSQL is an accidental neologism.

Schema less storage systems built for 5 v’s of Bigdata.

Decentralized – Every node in cluster is identical

High Availability - No SPoF – No Network failures

Open source and No cost models (Except for enterprise support)

Page 13: Exploring NoSQL and implementing through Cassandra
Page 14: Exploring NoSQL and implementing through Cassandra

NoSQL – Architecture fit-in

Polyglot persistence thinking fits in right data store for appropriate data sets.

Service usage over Direct data usage.Concerns

Operational concerns like licensing, support, tools, upgrade, auditing. Security of Datastore, Context’s, Authorization etc .. Integration with ETL and Data transfer utilities. Deployment complexity

Page 15: Exploring NoSQL and implementing through Cassandra

Data models – in view of NoSQL

NoSQL models are application specific “What questions do I have?”

Relational models are driven by structure of data “What answers do I have?” 

Modelling techniques Conceptual: Denormalization, Aggregates & Application side joins General: Atomic aggregates, Enumerable Keys, Dimensionality

reduction, Index table & Composite key index. Hierarchical: Tree aggregation, Materialized paths, Nested sets &

batch graph processing.

Page 16: Exploring NoSQL and implementing through Cassandra

Data models – deep view

Conceptual: DeNormalization Query data volume or IO per query VS total data volume

Processing complexity VS total data volumeAggregates:

Simple Atomic

Tree aggregation:

Page 17: Exploring NoSQL and implementing through Cassandra
Page 18: Exploring NoSQL and implementing through Cassandra

NoSQL - implementations

If one implementation fits all then why not RDBMS ?Classification is driven in application point of view !Key-Value

Strong aggregation which is opaque to the database Oracle NoSQL, Windows Azure & Redis

Document database Structure in the aggregate MongoDb, CouchDb & Raven DB

Page 19: Exploring NoSQL and implementing through Cassandra

NoSQL - implementations

Column family structures Two level aggregate structure Key & a row aggregate, Row aggregate is a group of columns. Big table, Hbase & Cassandra

Graphs database Neo 4j

Page 20: Exploring NoSQL and implementing through Cassandra

NoSQL – implementations – CAP fit

Page 21: Exploring NoSQL and implementing through Cassandra

May 2, 2023 21

Page 22: Exploring NoSQL and implementing through Cassandra

Apache Cassandra - Continuous availability, linear scalability & operational simplicity

About Column store NoSQL Database. Originally developed by Facebook (2007) and now an Apache project Master less architecture with all nodes in Ring topology Commercial add-ons & support (“enterprise edition”) by Datastax

Data center replication, Scalability (wide), Fault-tolerance & Tunable consistency.

Online load balancing, flexible schema, key-oriented queries & CAP-aware Implementation of good Security standards, Operations, Monitoring & utilities.

Page 23: Exploring NoSQL and implementing through Cassandra

Column – Key-value pair Counter column Expiring column Super column

Column family – Collection of rows - Map <RowKeys, OrderedColumn Collection> Dynamic (Wide) Static (Narrow)

KeyStore – containts column families & super column familes

Cassandra – data model

Page 24: Exploring NoSQL and implementing through Cassandra

CAP Values – AP (Availability & Partition tolerance). Consistency (eventual) available with latency. No row locking (Hbase wins!)

Linear scaling of Cassandra – throughput vs no-of nodes. Casandra Cluster – Partioner generates tokens for rowKeys Write in action Read in action

Cassandra – Architecture

Page 25: Exploring NoSQL and implementing through Cassandra

Installation & Configuration

Yum installation is the easiest - /etc/yum.repos.d/datastax.repo Cassandra.yaml configuration

Cluster_name, data_file_dir, commitlog_dir Directory locations Start Cassandra :– Cassandra –f

Start CLI:- cqlsh Stop Cassandra – service stop or process kill

Page 26: Exploring NoSQL and implementing through Cassandra

Demo

May 2, 2023 26

Page 27: Exploring NoSQL and implementing through Cassandra

CQL in action

CQL 3.0 is much like SQL. All names are case-insensitive

CQL Data types: Create KeySpace: Responsys_Demo Create table, index, user All other SQL like functions !!

Page 28: Exploring NoSQL and implementing through Cassandra

Cassandra – Monitoring

JMX Interface – DEMO Nodetool – Cassandra JMX interface

cfstats Netstats Ring & other operations

DataStax Ops center Nagios monitoring Cassandra logging & GC logging

Page 29: Exploring NoSQL and implementing through Cassandra

05/02/2023

29Confidential

Summary, Conclusions&

References

Page 30: Exploring NoSQL and implementing through Cassandra

Summary – Quick recap

Data evolution ACID, BASE & CAP NoSQL, data models, implementations Cassandra & Data model Architecture Installations & Operations

Page 32: Exploring NoSQL and implementing through Cassandra

05/02/2023

32

Q & A

Page 33: Exploring NoSQL and implementing through Cassandra

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.33

Thank you

Page 34: Exploring NoSQL and implementing through Cassandra
Page 35: Exploring NoSQL and implementing through Cassandra

APPENDIX