IIMB presentation

Preview:

Citation preview

Speed @ Scale with NoSQLAveekshith BushanRegional Sales and SA Director - APACaveek@aerospike.com Twitter: @aveekshith

2Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Then and now!

3Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Volume, Variety and Velocity

4Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Scale - Closer to Home

1956IBM 350 Hard Disk5MB of storageSystem Cost: 160K$

1980IBM 33801GB of storageCost: 50K$

2015Multiple Options1TB of storageCost: 0.8K$

5Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Over the Years – Scale!

6Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Scale Changes Everything!

Source: The Black Swan by Nassim Nocholas Taleb

7Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

The Black Swan Effect

8Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

“Known” and “Unknown” Unknowns!

Known Unknowns• Can be Planned For• Through BCP, Risk Matrix etc

Unknown Unknowns• Difficult to Model and Foresee• Impact can be reduced by

Diversification Across Investments, Business, Markets and Product Types

9Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

What Does it Mean – IT Perspective

Positive Black Swans• Explosion in Data• Exposure to Different

Types of Data• Agility in IT

Infrastructure• Ex: Successful New

Product or Market Launch

Negative Black Swans• Globally Distributed IT

Infrastructure• No Vendor Lock-In• Easy Deployment

Models• Ex: Natural or Man-

made Disasters, Market Changes

Gaussian World• Structured Data• Predictable Growth in

Data Volume• Lower Cost of Overall

Operation• Ex: Traditional

Applications

10Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Positive Black Swans - Data

Positive Black Swans• Explosion in Data• Exposure to Different

Types of Data• Agility in IT

Infrastructure• Ex: Successful New

Product or Market Launch

Horizontal Scalability

Dynamic Data Model

PerformanceAgility

Geospatial Information

11Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Negative Black Swans - Data

Negative Black Swans• Globally Distributed

IT Infrastructure• No Vendor Lock-In• Easy Deployment

Models• Ex: Natural or Man-

made Disasters, Market Changes

Geographically Distributed

Clusters

Built on Commodity Hardware

Cloud-ReadyFlexible Data Model

Low Cost Solution

12Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Gaussian World - Data

Gaussian World• Structured Data• Predictable Growth

in Data Volume• Lower Cost of Overall

Operation• Ex: Traditional

Applications

Consistency

Query Model

Structured Data

Manageability

Ecosystem

13Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Real World ERD Diagram

14Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Familiar World!

ORM Relational DB

15Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Making Changes

New Table New

Table

16Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

What you don’t get with Relational Databases!

• Unstructured Data

• Semi-structured Data

Data Types

• Speed at Scale

• Petabytes Scale

Volume• Quick Time to Market

• Agile Development

Agility

• Cloud Ready• Scale-out and Scale-up

Deployment Models

17Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

NoSQL Types

Key Value StoresDocument StoresColumnar StoresGraph StoresOther Stores

Time-SeriesNew SQLSSD Optimized DBsIn-Memory Stores

18Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

SSD

Key Value Store

Relational Key Value Store

F_Name

L_Name

Dept

Location

Skill_Details

John Marsh E11 [45.123,47.232]

{ Skill_Name: ‘Java’, Version: ‘1.8’, Level:3, … },{Skill_Name: ‘Go’, Version: ‘1.7’, Level:2, … }

0 Memory

Ex: Aerospike, Redis

Emp_ID F_Name L_Name Dept City1 John Marsh E11 New York

2 Satish Rao E12Bengaluru

3 Alok Jain E12New Delhi

4 Raghu G E11BengaluruSkill_ID Skill_Name Version

1 Java 1.82 Go 1.73 Python 3.5

ID Emp_ID Skill_ID Level100 1 1 3101 1 2 2102 2 2 3103 3 1 4104 4 3 1

19Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Document Stores

Document DB{ F_Name: ‘John’, L_Name: ‘Marsh’ city: ‘New York’, location: [45.123,47.232], skills: [ { Skill_Name: ‘Java’, Version: ‘1.8’, Level: 3, … }, { Skill_Name: ‘Go’, Version: ‘1.7’, Level: 2, … } ]}

Ex: MongoDB, CouchDB, OrientDB

RelationalEmp_ID F_Name L_Name Dept City

1 John Marsh E11 New York

2 Satish Rao E12Bengaluru

3 Alok Jain E12New Delhi

4 Raghu G E11BengaluruSkill_ID Skill_Name Version

1 Java 1.82 Go 1.73 Python 3.5

ID Emp_ID Skill_ID Level100 1 1 3101 1 2 2102 2 2 3103 3 1 4104 4 3 1

20Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Hadoop and NoSQL

Hadoop is a Map/Reduce FrameworkUsed to partition computation on large datasetsUsed where you need to analyse most of the dataE.g.

Count all the links on all the web pages in IndiaAnalyse the recommendations based on yesterdays purchasesUse a connector to Push and Pull Data from Hadoop in to NoSQL

MONGODB

22Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Architecture

AEROSPIKE

24Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Architecture

1) No Hotspots – Distributed Hash Table simplifies data partitioning

2) Smart Client – 1 hop to data, no load balancers

3) Shared Nothing Architecture, every node is identical

6) XDR – sync replication across data centers ensures Zero Downtime

4) Smart Cluster, Zero Touch – auto-failover, rebalancing, rolling upgrades

5) Operations and long-running tasks prioritized in real-time

25Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Data is Distributed Randomly

Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function

This hash + additional data (fixed 64 bytes)are stored in RAM in the index

12 bits of this hash are used to compute the partition id

There are 4096 partitions

Partition id maps to node id based on cluster membership

cookie-abcdefg-12345678

182023kh15hh3kahdjsh

PartitionID

Master node

Replica node

… 1 4

1820 2 3

1821 3 2

4096 4 1

26Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]

Even record distribution

Node A Node B Node C

Z

Z’

Y

Y’

X

X’

AerospikeClientApplication

Thank You!

Aveekaveek@aerospike.com