Upload
vivian-alyson-stone
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Operational Landscape
More Data4.4ZB in 2015, 44ZB in 2020New streams: IoT, mobile, social85% managed by enterprises1.5% is ‘high value’* Source: EMC Digital Universe
Less Cost1980: >$400k/GB, 2015: ~$0.03/GB[Azure] 2010: $0.15, 2015: $0.02 - $0.06Similar story for compute
Profit?Diverse Data StreamsNew AnalyticsMore PeopleAt The Right Time$
Technical LandscapeOn Premises
CloudNetwork Fast & reliable network
Switch for app + db?Same D/C; less reliable, more hops and load balancers = latency
Storage Big, fast SAN Big, cheap JBOD
Hardware Specific to role Generic (no custom SKUs)
Availability Managed servicing, low failures
Unexpected services & failures
Purchasing Upfront capex: overprovision Opex (add/remove on demand)
Licensing Per processor, per year Per minute/hour
Result *Everything* goes in database *Everything* as-a-service
Data
& S
tora
ge
Com
pute
SQL Database
Cloud Services
StorageBlobs Tables Queue
s
Technical Landscape: Azure
Data
& S
tora
ge
Web
& M
ob
ileC
om
pute
SQL Database
AppService
Virtual Machines
Med
ia &
CD
N
Media Services
CDN
Develo
per
Serv
ices
DocumentDB
Redis Cache
Cloud Services Batch
Service Fabric
Netw
ork
ing
Virtual Network
ExpressRoute
Traffic Manager
StorSimpleSearchStorage
Identi
ty&
Acc
ess
Azure Active Directory
Multi-Factor Authent
API Management
Notification Hubs
Mobile Engagement
Visual Studio Online
Application Insights
Manag
em
ent
Scheduler AutomationOperational Insights Key Vault
Analy
tics
& IoT
HDInsightMachine Learning
Stream Analytics
Data Factory Event Hubs H
yb
rid
Inte
gra
tion
BizTalk Services Service Bus Backup
Site Recovery
Web App
Mobile App
API App
Logic App
Blobs Tables Queues
Files
Mark
etp
lace
…
Data LakeData Warehouse
RemoteApp DNSApplication Gateway
Technical Landscape: Azure
Rule #1: if it isn’t relational data, don’t use a relational database
* But there’s a decent chance it is…
Relational databasesIaaS: SQL Server in VM18 months ago: features & performance
Now:Existing apps (‘lift & shift’)Control over machineControl over softwareControl over network topologyControl over storageExtended feature set (RS, AS, IS, Hekaton)
Recent:Templates for AlwaysOn
PaaS: Azure SQL18 months ago: good luck
Now:Default choice
Recent:SLAReplicasTransparent Data EncryptionFull text searchMore query feature supportBigger perf + storage SKUs
CachingUse CasesProtects critical/hard to scale assetsCache Aside PatternCache Seeding PatternIn-memory databases: sets, batches, transactionsIn-memory messaging: pub-sub
Azure Redis CacheProvides Redis-as-a-service (with optional redundancy)Consider: default caching option; look beyond key/value capabilities
Azure Role CachingProvides in-memory caching as part of Cloud Service deploymentConsider: if you’re using Cloud Services, have spare memory, need super low latency
Deploy your ownE.g. Azure VM + memcached, Redis w persistence, etc.Consider: legacy scenarios, sophisticated
MessagingUse CasesProtects critical/hard to scale assetsSupports load levelling patternDoesn’t always mean slow or fire-and-forget
Azure QueuesUp to 2000 msgs/sec (1KB each); max 64 KB msg size; max total 200TB; 10ms latencyScale out (i.e. more queues side-by-side)Consider: modest requirements
Azure Service BusUp to 2000 msgs/sec (1KB each); max 256 KB msg size; max total 80GB; 20-25ms latencyScale out through partitioning (up to 16)Consider: features, features, features
Deploy your ownE.g. Azure VM + RabbitMQ
Bonus Rule: choose the simplest thing that could possibly work
Scenario: Parts UnlimitedE-CommerceOnline auto parts retailer
Cloud-basedHosted on Azure WebsitesRuns on ASP.NET MVC
SQL Server poweredAll the data is in the database
ScalingScale Up– aka Vertical Scaling• Increase resource capacity within
existing node
Scale Out – aka Horizontal Scaling Increase resource capacity by adding nodes
Distributed Systems and the CAP Theorem
AvailabilityConsistency
Partition Tolerant
Relational, Un-partitioned
Dynamo-like:Cassandra, CouchDB
Big Table-like:HBase, MongoDB
Sharding
(1) Database
(2) Sharding Key
. . .
(3) Shard Map Manager
(4) Shard
(4) Shard
(5) Shard Set
(6) Sharded Table
(7) Reference Table
(8) Shardlet
Customer ID
Name
1 Alice
2 Bob
Customer Table
Data Center ID DC Name
1 Boston
2 Miami
Data Center Table
Elastic DatabasesClient & ToolsShard map management, data-dependent routing, multi-shard queryShard on range (contiguous) or list (explicit) across INT / BIGIINT / GUID / VARBINARYTools include support for split-merge
PoolsDefault: DTUs allocated per databasePool: DTUs allocated/shared across db group
JobsAdministrative ops across multiple databases
QueryTransact-SQL across multiple databases (enables reporting scenario)
Polyglot persisten
ce
• Optimized for data• Optimized for workload
Not a new concept
• EAV• XML• Architecture paradigm:
OLAP/DW and OLTP
The And of Data Persistence
fully featured RDBMS
transactional processing
rich query
managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats
Beyond relational: storage
Blobs & FilesUse CasesStatic content (e.g. images, video, scripts, binaries, backups, …)Relational/NoSQL databases typically not optimized for storage of large binary objects
Azure BlobsBlock blobs: sequential access files, <195GBPage blobs: random access drives, <1TBWeb enabled: REST API, SAS keys, headersSupercharge with Azure CDNConsider: any static content
Azure FilesProvides SMB 2.1 compatible file shareFiles up to 1TB in sizeConsider: legacy app storage, i.e. if SMB is actually the simplest thing that could possibly work
Deploy your ownE.g. Azure VM + Windows File Share
Key-ValueUse CasesPersistent dictionaries, lookup based on keyHigh volume structured writes (e.g. logs)
Azure TablesScalable, fast lookups based on PartitionKey + RowKeyMore partitions/shards, more scale, more miserable scansStill waiting for secondary indexes…Consider: cheap
HBaseAvailable as HDInsight managed serviceMassive tables (billions of rows, millions of columns), scalable, consistentConsider: significant data, richer feature set
Deploy your ownE.g. Azure VM + Riak / Redis / Kai / …
Schema-lessUse CasesNot all data fits a relational model: complex object hierarchies, volatile schemas, high throughputAnother approach for partition tolerance
Azure DocumentDBSchema-free, transaction aware, indexed, tuneable, queryable serviceReview features: Order By (July), Id-based Routing (Aug), Geospatial (Aug)Consider: default option
MongoDBMature OSS document storeAvailable through Azure Marketplace through MongoLabConsider: more mature feature set
Deploy your ownE.g. Azure VM + Cassandra
StateUse CasesTypical 3-tier model breaks with highly stateful workloadsBring state into service tier to reduce latency and improve partitioningIn-memory redundant data storageE.g. VOIP services, multiplayer game state
Azure Service FabricPaaS V2 (i.e. Cloud Services) in private preview with optional actor-based programming modelPowers core infrastructure, Azure SQL, Azure DocDB, Event Hubs, Bing Cortana, Intune…Consider: soon for specific workloads
OrleansSoftware-based actor programming model with native Azure supportPowers Halo multiplayer and other servicesConsider: non-Azure or immediate
Deploy your ownE.g. Azure VM + SQL 2016 in-memory tables(data remains in data tier)
Bonus Rule: if you need Service Fabric, you’ll know about Service Fabric
ServicesUse CasesBeyond the general purpose data storage technologies exist many data services designed for specialized data operationsOn premises: use existing database capacityvs.Cloud: either pay for more database or pay for optimized service
Azure SearchPowered by ElasticSearch + Microsoft Natural Language stack (used by Bing, Office)Consider: default choice
Application InsightsSupports meaningful near real-time analysisLogging to files/tables great for ad-hoc analysis but a poor option for monitoringMultiple tiers depending on # of eventsConsider: default choice
Deploy your ownE.g. Azure VM + ElasticSearchConsider: platform gaps in feature set, pricing model, tuning model
Event Queuing System
Collection Presentation and action
Event producers
Transformation
Long-term storage
Event Hubs
Storage adapters
Stream processi
ng
Cloud gateways(web APIs)
Field gateways
Applications
Search and query
Data analytics (Excel)
Web/thick client dashboards
Live Dashboards
Apache Storm on HDInsight
Devices to take action
Kafka /RabbitMQ /ActiveMQ
Web and Social
Devices
Sensors
Azure Stream
Analytics
HDFS
Azure DB
DocumentDB
HBase
Azure storage
Step 1: IngestUse CasesIoT/click streams/logging: velocity of data isn’t suitable for storage before analysisReal-time eventing: business value decays as time goes onSolutions need to be fast, scalable, fault tolerant, and reliable
Azure Event Hubs30bn+ events/per dayScale out (up to 32 partitions, 20+ MB/s ingress)Low latency, AMQP supportConsider: default choice
Deploy your ownE.g. Azure VM + KafkaConsider: protocol support
Stream AnalyticsManaged service with Event Hubs supportConsider: fixed query language, fixed scale
StormAvailable as HDInsight managed serviceRecord-at-a-time (procedural) or micro-batches (w. Trident)Consider: custom logic, large scale
Spark StreamingAvailable as HDInsight managed serviceIn-memory, micro-batches (functional)
Service Fabric
Deploy your own
Step 2: ProcessUse CasesIoT/click streams/logging: velocity of data isn’t suitable for storage before analysisReal-time eventing: business value decays as time goes onSolutions need to be fast, scalable, fault tolerant, and reliable
Step 3: StoreUse CasesBusiness value may decay post real-time analysis but some opportunities may lie in historical review
* Actual mechanics of dumping to cold storage aren’t hard: just use a WebJob/Worker
As previously discussedAzure Storage, Azure SQL, SQL Server, DocumentDB, HDInsight
Azure Data LakeHighly scalable HDFS serviceRemoves Azure Storage account limits around throughput and sizeConsider: Hadoop-based workloads
Azure Data WarehouseHighly scalable SQL Server based on PDWOn-demand scale up/down (even off)Consider: SQL-based workloads
PatternsTransient: RetryCommonplace; micro-outagesPattern depends on interactive vs. batch, recoverable vs. non-recoverable errorsE.g. internal load balancing, managed failure
Enduring: Circuit BreakerNot uncommon; outages between sec - minAggressive retry can make outage worse; instead, manage the failure
Critical: FailoverOccasional; recovery not possible within SLATraditional DR approach with secondary site
PatternsPrefer chunky over chattyNetwork, security, and serialization overheads add up quicklyEspecially damaging when holding resources in a transaction
Prefer optimistic concurrencyIf data contention is not highPessimistic concurrency locks resources before updating, increasing system loadOptimistic concurrency validates resource hasn’t changed as part of update statement
PatternsIsolate ImplementationsEventually you will find a way to break your data platform of choiceConfine interactions to a single module; don’t leak implementation details and allow implicit dependencies
Compensating TransactionsDistributed transactions aren’t availableCompensating transactions don’t reset system state; instead, performs opposite actionsAs # of data services increase, the more likely this situation will unfold
ConclusionDesign for multiple storage optionsCloud solutions typically don’t use just one persistence mechanismEmbrace the cloud marketplace to get the right solution just in time
Design for balanced capex/opexBalance cost to develop and cost to runLeave yourself an escape hatch in case of future success
Design for failureMore services == more risk of an outageBuild your service to degrade gracefully
Related Ignite NZ SessionsAdvanced Messaging Scenarios with Azure Service Bus Messaging??? Fri 11:55am
How to Build High Performance Apps Using Microsoft Azure Redis Cache??? Thu 1:55pm
Elastic for SQL – shards, pools, stretch??? Fri 11:55am
In-Memory OLTP: The Road Ahead??? Wed 11:55am
Azure Storage Architecture and getting the most out of IaaS Premium storage??? Thu 10:40am
Building highly available and recoverable solutions with Azure Event Hubs and Service Bus Messaging??? Thu 3:10pm
Find me later at… Hub Happy Hour Wed 5:30-6:30pm
1
2
3
4
5
6
Resources
TechNet & MSDN FlashSubscribe to our fortnightly newsletter
http://aka.ms/technetnz http://aka.ms/msdnnz
http://aka.ms/ch9nz
Microsoft Virtual AcademyFree Online Learning
http://aka.ms/mva
Sessions on Demand