57
Copyright © 2012 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1 Unlock Potential William McKnight President McKnight Consulting Group October 16, 2012 NoSQL for SQL Professionals Dipti Borkar Director, Product Management Couchbase

NoSQL for SQL Professionals

Embed Size (px)

Citation preview

Page 1: NoSQL for SQL Professionals

Copyright © 2012 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1

Unlock Potential

William McKnight President McKnight Consulting Group October 16, 2012

NoSQL for SQL Professionals

Dipti Borkar Director, Product Management Couchbase

Page 2: NoSQL for SQL Professionals

2  

William McKnight

President,  McKnight  Consul5ng  Group    •  Frequent  keynote  speaker  and  trainer  interna5onally    •  Consulted  to  Pfizer,  Sco5abank,  Teva  Pharmaceu5cals,  

Verizon,  and  many  other  Global  1000  companies  •  A  prolific  writer  with  hundreds  of  ar5cles,  blogs  and  white  

papers  in  publica5on  •  Focused  on  delivering  business  value  and  solving  business  

problems  u5lizing  proven,  streamlined  approaches  to  informa5on  management  

•  Former  Fortune  50  Informa5on  Technology  execu5ve  

Page 3: NoSQL for SQL Professionals

3  

RDBMS LEGACY SOURCES

DATA MARTS DATA INTEGRATION

DATA WAREHOUSES MDBS

USERS/REPORTS

OPERATIONAL

ANALYTICAL

OPERATIONAL APPLICATIONS AND USERS

Former Enterprise Information Holy Grail

Page 4: NoSQL for SQL Professionals

4  

No More

Page 5: NoSQL for SQL Professionals

5  

The Relational Database Data Page

© McKnight Consulting Group, 2010

Page Header

Page Footer

Row IDs

Records

1120  Aris Doug Johnson Practice Director 206-676-5636

[email protected]

1121  Stolt Offshore MS Craig Lennox Mr

+66 1226 71269 [email protected]

1122Medtronic, Inc. Mark Kohls Principle Database Administrator

763.516.2557 [email protected]

Page 6: NoSQL for SQL Professionals

6  

What does Big Data Mean?

"   Data in NoSQL - No SQL allowed or Not Only SQL?

"   Sensor, social and web data? "   Data in a system that does not support SQL? "   A system with petabytes? "   Hadoop?

Page 7: NoSQL for SQL Professionals

7  

"   An increased number and variety of data sources that generate large quantities of data –  Sensors (e.g. location, RFID, …) –  Social (e.g. twitter, wikis, … ) –  Web clicks

"   Realization that data was “too valuable” to delete –  Even when little signal to lots of noise

"   Dramatic decline in the cost of hardware, especially storage –  If storage was still $100/GB there would be no big data

revolution underway

Why the Sudden Explosion of Interest?

Page 8: NoSQL for SQL Professionals

8  

"   More data model flexibility –  JSON as a data model (think XML) –  No “schema first” requirement; load first

"   Faster time to insight from data acquisition "   Relaxed ACID

–  Eventual consistency –  Willing to trade consistency for availability –  ACID would crush things like storing clicks on Google

"   Low upfront software costs "   Utilizes Java "   Full Scans "   Programmers love the freedoms

Why NoSQL for Big Data

Page 9: NoSQL for SQL Professionals

9  

Hadoop, MapReduce and “Big Data”

•  Parallel programming framework

•  Hadoop is an open source distributed file system (HDFS) plus MapReduce

•  Hadoop is used by those facing webscale-data challenges

Page 10: NoSQL for SQL Professionals

10  

Who uses Hadoop

40,000+ nodes running Hadoop Research for Ad systems and web search

Product search indexes Analytics from user sessions

Log analysis for reporting and analytics and machine learning

Log analysis, data mining, and machine learning

Large scale image conversion

High energy physics, genomics, Digital Sky Survey

Page 11: NoSQL for SQL Professionals

11  

ACID

"   Atomicity – full transactions pass or fail "   Consistency – database in valid state after each

transaction "   Isolation – transactions do not interfere with one

another "   Durability – transactions remain committed no

matter what (i.e., crashes)

Page 12: NoSQL for SQL Professionals

12  

What Gives the CIO Heartburn About NoSQL

"   Developer Skills "   Lack of ACID Compliance "   Tools lacking and Projects Flawed "   Fast Nature of Unburdened Projects "   Different Developers "   Schema-less/lite Models "   Lack of Payback Methodology

Page 13: NoSQL for SQL Professionals

13  

1.  Take  a  large  problem  and  divide  it  into  sub-­‐problems    

2.  Perform  the  same  func5on  on  all  sub-­‐problems    

 3.  Combine  the  output  

 

DoWork()   DoWork()   DoWork()  …

Output  

MAP

 RE

DUCE

 MapReduce

Page 14: NoSQL for SQL Professionals

14  

"   Programming framework (library and runtime) for analyzing data sets stored in HDFS

"  MapReduce jobs are composed of two functions –  Map –  Reduce

"   User only writes the Map and Reduce functions "  MR framework provides all the “glue” and

coordinates the execution of the Map and Reduce jobs on the cluster. –  Fault tolerant –  Scalable

MapReduce (MR)

Page 15: NoSQL for SQL Professionals

15  

A Quick Summary

Parallel DB Systems NoSQL Data Model " Structured data with known

schema "   Any data will fit in any

format "   (un)(semi)structured

Hardware Configuration

" Purchased as an appliance "   “User assembled” from commodity machines

Fault Tolerance " Failures assumed to be rare " No query level fault tolerance

"   Failures assumed to be common

"   Simple, yet efficient, fault tolerance.

Where to do big data analytics?

Page 16: NoSQL for SQL Professionals

16  

Key-Value Stores

"   NoSQL OLTP "   A record may look like:

–  Book: “Of Mice and Men": Author: “Hemmingway“

"  Great for unstructured data centered on a single object.

"   Typically used as a cache for data frequently requested by web applications such as online shopping carts or social-media sites.

Page 17: NoSQL for SQL Professionals

17  

"   A record may look like: –  “id” => 12345, –  “name” => “Jane”, –  “age” => 22, –  “address” => number => 123 street => Main

"  Often deployed for web-traffic analysis, social gaming, content stores, user-behavior/action analysis, or log-file analysis in real time.

Document Stores

Page 18: NoSQL for SQL Professionals

18  

"   Based on Graph Theory –  Vertices (nodes), edges (relations) and properties

"   Navigating social networks, configurations and recommendations –  i.e., Get the cheapest flights from DFW to SYD leaving

on 7/12/12 with a minimum number of stops and each stop less than 2 hours.

"   i.e., Social Networks –  Churn and Offer Management

Graph Stores: Emphasizing Relationships as Primary Data

Page 19: NoSQL for SQL Professionals

19  

From “Picking the Right NoSQL Database Tool” by Mikayel Vardanyan

Picking the Right NoSQL Database

Page 20: NoSQL for SQL Professionals

20  

The NoSQL Challenge

Page 21: NoSQL for SQL Professionals

21  

There’s No Technology Silver Bullet

21 >

Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,2010

Page 22: NoSQL for SQL Professionals

22  

RDBMS LEGACY SOURCES

DATA WAREHOUSE APPLIANCE

DATA INTEGRATION

MDBS

USERS/REPORTS

MASTER DATA

OPERATIONAL

ANALYTICAL

OPERATIONAL APPLICATIONS AND USERS

COLUMNAR DATABASES

HADOOP

Hybrid Information Universe

DATA WAREHOUSE

DATA MARTS

ELEMENTS IN THE CLOUD

SYNDICATED DATA

DATA STREAM PROCESSING

NOSQL

Page 23: NoSQL for SQL Professionals

23  

"   Increasingly data first lands in the unstructured universe

"   NoSQL stores are big data "EL" tools "   The Need for Data Integration with the Enterprise

UnBig  (RDBMS)  

Big  (NoSQL)  

Data Integration

Page 24: NoSQL for SQL Professionals

24  24

Agile Approaches

15 Implementation

16 Release

Evaluation

11 ETL

Development

14 Metadata

Repository Development

12 Application

Development

13 Data Mining

9 ETL Design

10 Metadata

Repository Design

3 Project

Planning

1 Business Case

Assessment

2 Enterprise

Infrastructure Evaluation

5 Data

Analysis

7 Metadata

Repository Analysis

4 Project

Requirements Definition

6 Application Prototyping

8 Database

Design

Justification Planning Deployment

Business Analysis Design Construction

Support

17 Operate and

Maintain

Source: Business Intelligence Roadmap, Larissa Moss & Shaku Atre"

Page 25: NoSQL for SQL Professionals

25  

Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif)

Cloud  Services  

The benefits of cloud computing are: • On-Demand and Self Service • Broad Network Access • Resource Pooling • Rapid Elasticity • Measured Service

Page 26: NoSQL for SQL Professionals

26  

Information Store Guidance

    Real-­‐Time  

Small  Data  OK  

Terabytes   Petabytes   Historical    Data  

Unstructured    Data  

Source  Data  supplier  to    other  systems  

Random  Queries  

Ad-­‐hoc  

OperaKonal  Systems                          

Columnar  database                          

Data  Mart  (relaKonal)                      

Data  Stream  Processing              

Data  Warehouse              

NoSQL                      

Master  Data  Management                      

MulKdimensional  Mart                              

Page 27: NoSQL for SQL Professionals

27  

What Will Motivate IT to Adopt NoSQL?

"   Continuation of Big Vendor Legacy Seen as Too Expensive

"   Scaling: Data > 1 Machine "   Schema Flexibility "   Mandatory Requirements to Keep Multiple Years of Highly

Detailed Data "   Tired of Losing “Deals” to More Agile Hybrid IT

Organizations "   NoSQL Tool Marketplace Innovations

Page 28: NoSQL for SQL Professionals

28  

NoSQL  for    Interac5ve  Applica5ons  

Page 29: NoSQL for SQL Professionals

29  

2.0�

NoSQL  Database  NoSQL  Document  Database  

Couchbase  Server  

Page 30: NoSQL for SQL Professionals

30  

Market  Adop5on  

Internet  Companies   Enterprises  

•  Social  Gaming  •  Ad  Networks  •  Social  Networks  •  Online  Business  

Services  •  E-­‐Commerce  •  Online  Media  •  Content  Management  •  Cloud  Services  

• Communica5ons  • Retail  • Financial  Services  • Health  Care  • Automo5ve/Airline  • Agriculture  • Consumer  Electronics  • Business  Systems  

Page 31: NoSQL for SQL Professionals

31  

Market  Adop5on  –  Customers  

Internet  Companies   Enterprises  

More  than  300  customers  -­‐-­‐  5,000  producKon  deployments  worldwide  

Page 32: NoSQL for SQL Professionals

32  

RELATIONAL  VS  NOSQL  DOCUMENT  DATABASES  

Page 33: NoSQL for SQL Professionals

33  

Rela5onal  vs  Document  data  model  

RelaKonal  data  model   Document  data  model  Collec5on  of  complex  documents  with  arbitrary,  nested  data  formats  and  

varying  “record”  format.  

Highly-­‐structured  table  organiza5on  with  rigidly-­‐defined  data  formats  and  

record  structure.  

JSON  JSON  

JSON  

C1   C2   C3   C4  

{        }  

Page 34: NoSQL for SQL Professionals

34  

Example:  User  Profile  

Address  Info  

1   DEN   30303  CO  

2   MV   94040  CA  

3   CHI   60609  IL  

User  Info  

KEY   First   ZIP_id  Last  

4   NY   10010  NY  

1   DipK   2  Borkar  

2   Joe  

2  Smith  

3   Ali   2  Dodson  

4   John   3  Doe  

ZIP_id   CITY   ZIP  STATE  

1   2  

2   MV   94040  CA  

To  get  informaKon  about  specific  user,  you  perform  a  join  across  two  tables    

Page 35: NoSQL for SQL Professionals

35  

All  data  in  a  single  document  

Document  Example:  User  Profile  

 {          “ID”:  1,          “FIRST”:  “DipK”,          “LAST”:  “Borkar”,          “ZIP”:  “94040”,          “CITY”:  “MV”,          “STATE”:  “CA”      }  

JSON  

=   +  

Page 36: NoSQL for SQL Professionals

36  

RDBMS  Scales  Up  Get  a  bigger,  more  complex  server  

Users  

ApplicaKon  Scales  Out  Just  add  more  commodity  web  servers  

Users  

System  Cost  Applica5on  Performance    

Rela5onal  Technology  Scales  Up  

RelaKonal  Database  

Web/App  Server  Tier  

Expensive  and  disrupKve  sharding,  doesn’t  perform  at  web  scale  

System  Cost  Applica5on  Performance    

Won’t  scale  beyond  this  point  

Page 37: NoSQL for SQL Professionals

37  

Couchbase  Server  Scales  Out  Like  App  Tier  

NoSQL  Database  Scales  Out  Cost  and  performance  mirrors  app  Ker  

Users  

Scaling  out  flalens  the  cost  and  performance  curves  

Couchbase  Distributed  Data  Store  

Web/App  Server  Tier  

ApplicaKon  Scales  Out  Just  add  more  commodity  web  servers  

Users  

System  Cost  Applica5on  Performance    

Applica5on  Performance    System  Cost  

Page 38: NoSQL for SQL Professionals

38  

NoSQL  Database  Considera5ons  

Easy  Scalability  

Consistent  High  Performance  

Flexible  Data  Model  

Always  On  24x7x365  

Grow  cluster  without  applica5on  changes,  without  down5me  

when  needed  

Always  awesome  experience    for  your  applica5on  users.  

The  sun  never  sets  on  the  Internet,  your  applica5on  needs  the  database  

to  always  serve  data.  

Keep  developers  produc5ve  and  allow  fast  and  easy  addi5on  of    

new  features  

Page 39: NoSQL for SQL Professionals

39  

USE  CASE  AND  APPLICATION  EXAMPLES  

Page 40: NoSQL for SQL Professionals

40  

Data  driven  use  cases    

•  Support  for  unlimited  data  growth      •  Data  with  non-­‐homogenous  structure    •  Need  to  quickly  and  ofen  change  data  structure  •  3rd  party  or  user  defined  structure  •  Variable  length  documents  •  Sparse  data  records  •  Hierarchical  data    

Page 41: NoSQL for SQL Professionals

41  

Performance  driven  use  cases  

•  Low  latency  magers  •  High  throughput  magers  •  Large  number  of  users    •  Unknown  demand  with  sudden  growth  of  users/data    

•  Predominantly  direct  document  access  •  Workloads  with  very  high  muta5on  rate  per  document  

Page 42: NoSQL for SQL Professionals

42  

Use  Case  Examples  

Web  app  or  Use-­‐case   Couchbase  SoluKon   Example  Customer  

Content  and  Metadata  Management  System  

Couchbase  document  store  +  Elas5c  Search   McGraw-­‐Hill…  

Social  Game  or  Mobile  App  

Couchbase  stores  game  and  player  data    

Zynga…  

Ad  TargeKng   Couchbase  stores  user  informa5on  for  fast  access  

AOL…  

User  Profile  Store   Couchbase  Server  as  a  key-­‐value  store    

TuneWiki…  

Session  Store   Couchbase  Server  as  a  key-­‐value  store    

Concur….  

High  Availability    Caching  Tier  

Couchbase  Server  as  a  memcached  5er  replacement    

Orbitz…    

Chat/Messaging  Plaoorm  

Couchbase  Server   DOCOMO…  

Page 43: NoSQL for SQL Professionals

43  

• User  account  informa5on  • User  game  profile  info  • User’s  social  graph  • State  of  the  game  • Player  badges  and  stats  

Social  and  Mobile  Gaming  

• Ability  to  support  rapid  growth  • Fast  response  5mes  for  awesome  user  experience  

• Game  up5me  –24x7x365  • Easy  to  update  apps  with  new  features    

• Scalability  ensures  that  games  are  ready  to  handle  the  millions  of  users  that  come  with  viral  growth.    

• High  performance  guarantees  players  are  never  lef  wai5ng  to  make  their  next  move.    

• Always-­‐on  opera5ons  means  zero  interrup5on  to  game  play  (and  revenue)    

• Flexible  data  model  means  games  can  be  developed  rapidly  and  updated  easily  with  new  features  

Types  of  Data   ApplicaKon  Requirements  

Why  NoSQL  and  Couchbase    

Use  Case:  Social  Gaming  

Page 44: NoSQL for SQL Professionals

44  

• User  profile:  preferences  and  psychographic  data  

• Ad  serving  history  by  user  • Ad  buying  history  by  adver5ser      

• Ad  serving  history  by  adver5ser    

Ad  TargeKng  

• High  performance  to  meet  limited  ad  serving  budget;  5me    allowance  is  typically  <40  msec  

• Scalability  to  handle  hundreds  of  millions  of  user  profiles  and  rapidly  growing  amount  of  data  

• 24x7x365  availability  to  avoid  ad  revenue  loss  

• Sub-­‐millisecond  reads/writes  means  less  5me  is  needed  for  data  access,  more  5me  is  available  for  ad  logic  processing,  and  more  highly  op5mized  ads  will  be  served  

• Ease  of  scalability  ensures  that  the  data  cluster  can  be  grown  seamlessly  as  the  amount  of  user  and  ad  data  grows  

• Always-­‐on  opera5ons  =  always-­‐on  revenue.  You  will  never  miss  the  opportunity  to  serve  an  ad  because  down5me.  

Types  of  Data   ApplicaKon  Requirements  

Why  NoSQL  and  Couchbase    

Use  Case:  Ad  Targe5ng  

Page 45: NoSQL for SQL Professionals

45  

Use  Case:  Content  and  metadata  store  

Building  a  self-­‐adapKng,  interacKve  learning  portal  with  Couchbase  

Page 46: NoSQL for SQL Professionals

46  

As learning move online in great numbers

Growing need to build interactive learning environments that

Scale!!

Scale  to  millions  of  learners  

Serve  MHE  as  well  as  third-­‐party  content  

Including  open  content  

Support  learning  apps  

010100100111010101010101001010101010  

Self-­‐adapt  via  usage  data  

The Problem  

Page 47: NoSQL for SQL Professionals

47  

• Allow  for  elastic scaling  under  spike  periods  • Ability  to  catalog  &  deliver  content  from  many

sources  • Consistent  low-latency  for  metadata  and  stats  access  

• Require  full-text  search  support  for  content  discovery  • Offer  tunable  content  ranking & recommendation  func5ons    

Backend is an Interactive Content Delivery Cloud that must:

XML  Databases  

SQL/MR  Engines  

In-­‐memory  Data  Grids  

Enterprise  Search  Servers  

Experimented with a combination of:

Hmmm...this  looks  kinda  like:  +  Content  Caching  (Scale)  +  Social  Gaming  (Stats)    +  Ad  Targe<ng  (Smarts)  

The Challenge  

Page 48: NoSQL for SQL Professionals

48  

The Technologies  

Page 49: NoSQL for SQL Professionals

49  

The Learning Portal  

•  Designed and built as a collaboration between MHE Labs and Couchbase

•  Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration

•  Available for download and further development as open source code

Page 50: NoSQL for SQL Professionals

50  

COUCHBASE  SOLUTION  “THE  BASICS”  

Page 51: NoSQL for SQL Professionals

51  

COUCHBASE  SERVER    CLUSTER  

Basic  Opera5on  

•  Docs  distributed  evenly  across  servers    

•  Each  server  stores  both  acKve  and  replica  docs  –  Only  one  server  ac5ve  at  a  5me  

•  Client  library  provides  app  with  simple  interface  to  database  

•  Cluster  map  provides  map    to  which  server  doc  is  on  –  App  never  needs  to  know  

•  App  reads,  writes,  updates  docs  

•  MulKple  app  servers  can  access  same  document  at  same  Kme  

User  Configured  Replica  Count  =  1  

READ/WRITE/UPDATE  

   ACTIVE  

Doc  5  

Doc  2  

Doc  

Doc  

Doc  

SERVER  1      ACTIVE  

Doc  4  

Doc  7  

Doc  

Doc  

Doc  

SERVER  2  

Doc  8  

   ACTIVE  

Doc  1  

Doc  2  

Doc  

Doc  

Doc  

REPLICA  

Doc  4  

Doc  1  

Doc  8  

Doc  

Doc  

Doc  

REPLICA  

Doc  6  

Doc  3  

Doc  2  

Doc  

Doc  

Doc  

REPLICA  

Doc  7  

Doc  9  

Doc  5  

Doc  

Doc  

Doc  

SERVER  3  

Doc  6  

APP  SERVER  1  

COUCHBASE  Client  Library      CLUSTER  MAP  

COUCHBASE  Client  Library      CLUSTER  MAP  

APP  SERVER  2  

Doc  9  

Page 52: NoSQL for SQL Professionals

52  

Add  Nodes  to  Cluster  

•  Two  servers  added  with  one-­‐click  operaKon  

•  Docs  automaKcally  rebalance  across  cluster  –  Even  distribu5on  of  docs  –  Minimum  doc  movement  

•  Cluster  map  updated  

•  App  database    calls  now  distributed    over  larger  number  of  servers    

   

REPLICA  

ACTIVE  

Doc  5  

Doc  2  

Doc  

Doc  

Doc  4  

Doc  1  

Doc  

Doc  

SERVER  1      

REPLICA  

ACTIVE  

Doc  4  

Doc  7  

Doc  

Doc  

Doc  6  

Doc  3  

Doc  

Doc  

SERVER  2      

REPLICA  

ACTIVE  

Doc  1  

Doc  2  

Doc  

Doc  

Doc  7  

Doc  9  

Doc  

Doc  

SERVER  3      

SERVER  4      

SERVER  5  

REPLICA  

ACTIVE  

REPLICA  

ACTIVE  

Doc  

Doc  8   Doc  

Doc  9   Doc  

Doc  2   Doc  

Doc  8   Doc  

Doc  5   Doc  

Doc  6  

READ/WRITE/UPDATE   READ/WRITE/UPDATE  

APP  SERVER  1  

COUCHBASE  Client  Library      CLUSTER  MAP  

COUCHBASE  Client  Library      CLUSTER  MAP  

APP  SERVER  2  

COUCHBASE  SERVER    CLUSTER  

User  Configured  Replica  Count  =  1  

Page 53: NoSQL for SQL Professionals

53  

Fail  Over  Node  

   

REPLICA  

ACTIVE  

Doc  5  

Doc  2  

Doc  

Doc  

Doc  4  

Doc  1  

Doc  

Doc  

SERVER  1      

REPLICA  

ACTIVE  

Doc  4  

Doc  7  

Doc  

Doc  

Doc  6  

Doc  3  

Doc  

Doc  

SERVER  2      

REPLICA  

ACTIVE  

Doc  1  

Doc  3  

Doc  

Doc  

Doc  7  

Doc  9  

Doc  

Doc  

SERVER  3      

SERVER  4      

SERVER  5  

REPLICA  

ACTIVE  

REPLICA  

ACTIVE  

Doc  9  

Doc  8  

Doc   Doc  6   Doc  

Doc  

Doc  5   Doc  

Doc  2  

Doc  8   Doc  

Doc  

•  App  servers  accessing  docs  

•  Requests  to  Server  3  fail  

•  Cluster  detects  server  failed  –  Promotes  replicas  of  docs  to  

ac5ve  –  Updates  cluster  map  

•  Requests  for  docs  now  go  to  appropriate  server  

•  Typically  rebalance    would  follow  

Doc  

Doc  1   Doc  3  

APP  SERVER  1  

COUCHBASE  Client  Library      CLUSTER  MAP  

COUCHBASE  Client  Library      CLUSTER  MAP  

APP  SERVER  2  

User  Configured  Replica  Count  =  1  

COUCHBASE  SERVER    CLUSTER  

Page 54: NoSQL for SQL Professionals

54  

Couchbase  Server  Admin  Console  

Page 55: NoSQL for SQL Professionals

55  

Page 56: NoSQL for SQL Professionals

56  

Q  &  A  

Page 57: NoSQL for SQL Professionals

57  

William McKnight [email protected] www.mcknightcg.com

Dipti Borkar [email protected] www.couchbase.com