Upload
lynn-langit
View
141
Download
0
Tags:
Embed Size (px)
DESCRIPTION
slides from talk for Malibu SQL User Group - July 2013
Citation preview
NoSQL for the SQL Server Pro(or “Practical Big Data”)
Lynn Langit
July 2013 – Malibu SQL UG
Data Expertise / Lynn Langit
• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB
• Practicing Architect• Technical author / trainer
– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server 2012 Series – 2 books on SQL Server BI– Cloudera trainer (certified)
• Former MSFT FTE– 4 years
BigData Pipeline - STEP 1 – Acquire
AcquireProcess
StoreQuery & Mine
Visualize
BigData = ‘Next State’ Questions
• What could happen?• Why didn’t this happen?• When will the next new thing
happen?• What will the next new thing be?• What happens?
Collecting Behavioral
data
BigData Pipeline – STEP 2 - Process
AcquireProcess
StoreQuery & Mine
Visualize
Is Big Data = NoSQL and just Hadoop?
HUGE Hype factor since 2011
Apache Hadoop • a software framework that supports data-intensive
distributed applications • under a free license enables applications to work with thousands of
nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS)
papers
Hadoop in the Enterprise
How you ‘get’ Hadoop
• roll your own
Open source
• Cloudera• MapR• Hortonworks• More…
Commercial distribution
• AWS• HDInsight
Rent it via the cloud
Demo - HDInsight
About Hadoop MapReduce
Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
Demo - HDInsight – MapReduce w/Java
Working with Hadoop
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
BigData Pipeline STEP 3 – Store
AcquireProcess
StoreQuery & Mine
Visualize
“Small” BigData vs. “Big” BigData
Hadoop
NoSQL
RDBMS
Hadoop
NoSQL
RDBMS
On Premises In the Cloud
Cloud-hosted NoSQL up to 50x CHEAPER
So many NoSQL options• More than just the Elephant in the room• Over 120+ types of NoSQL databases
Flavors of NoSQLKey/ValueVolatile
Key/valuePersistent
Wide-Column Document Graph
Key / Value Database• Just keys and values
– No schema• Persistent or Volatile• Examples
– AWS Dynamo DB– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
NoSQL BLOB Storage Buckets in the Cloud
• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS• Others
– Dropbox– Box– More…
DEMO - Battle of the Buckets
• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 / Glacier
Column Database
• Wide, sparse column sets• Schema-light
• Examples:– Cassandra– HBase w/Hadoop– BigTable– GAE HR DS
Types of Column Databases
• Column-families– Non-relational– Sparse– Examples:
• HBase• Cassandra• xVelocity (SQL 2012 Tabular)
• Column-stores– Relational– Dense– Example:
• SQL Server 2012 – Columnstore index
DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)
Document Database (Mongo DB)• document-oriented (collection of
JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…
• binary forms – PDF, Microsoft Office documents --
Word, Excel…)
• Examples:– MongoDB– Couchbase
Demo - Mongo DB
Graph Databases
• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
• Examples:– Neo4J– Google Freebase
DEMO – Neo4J
“Small” BigData vs. “Big” BigData
Hadoop
Key/Value or Column
Document or Graph
RDBMS
On Premise or In the Cloud
Cloud-hosted RDBMS
• AWS RDS – SQL Server, mySQL, Oracle– Medium cost– Solid feature set, i.e.
backup, snapshot– Use existing tooling
• Google – mySQL– Lowest cost– Most limited RDBMS
functionality• Microsoft – SQLAzure
– Highest cost
DEMO - AWS RDS
• SQL Server, MySQL or Oracle• Essential to understand pricing models
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
NoSQL Applied
Soci
al G
ames
Prod
uct C
atal
ogs
Soci
al a
ggre
gato
rs
Log
File
s
Line
-of-B
usin
ess
ColumnstoreHBase
Key/ValueDynamoDB
DocumentMongoDB
GraphNeo4j
RDBMSSQL Server
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB H/R Data on GAE Azure Tables
Streaming ML or (Mahout)
Custom EC2 Prospective Search &Prediction API
StreamInsight
NoSQL Document or Graph
MongoDB on EC2 Freebase MongoDB on Windows Azure
NoSQL – ColumnHadoop (HBase)
Elastic MapReduce using S3 & EC2
none HDInsight
Dremel/Warehousing
RedShift BigQuery none
BigData Pipeline STEP 4 – Query
AcquireProcess
StoreQuery & Mine
Visualize
Alw
ays
Map
Redu
ce?
Can Excel help?
• Connector to Hadoop• Data Explorer• Data Quality Services• Master Data Services• Integration with Azure Data Market• Visualize with PowerView• Data Mining w/Predixion
Demo - Hadoop Connector to Excel
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for
read
Cleaning / matching (your) data • ETL – Microsoft Data
Explorer, Google Refine• Data Quality – Windows
Azure Data Market, InfoChimps, DataMarket.com
Collecting BigData• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data
Standards• M2M• Public Datasets
– Freebase– Azure DataMarket– Hillary Mason’s list
41
NoSQL To-Do List
Understand types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments
Learn NoSQL access technologies & services• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…• Windows Azure Data Market, other public data markets
www.TeachingKidsProgramming.org• Free Courseware (Java, Small Basic or C# [on Pluralsight])• Do a Recipe Teach a Kid (Ages 10 ++)• VOTE at http://www.azureDevs.com, CONFIRM via email and
SHARE (tweet)
• recipes)
VOTECONFIRMSHARE
Keep Learning• Twitter: @LynnLangit• YouTube:
http://www.youtube.com/user/SoCalDevGal
• Hire me– To help build your BI/Big Data solution– To teach your team next gen BI– To learn more about using NoSQL
solutions