Bigger, Better, Faster, More

Preview:

DESCRIPTION

Bigger, Better, Faster, More. An Introduction to Super-Scalability. But first…. The Arms Race. 1 ENIAC. 1 Teletype. 1 Mainframe. N Terminals. N Servers. N Terminals. N Servers. N PCs. N Web Servers. N Browsers. N Web Servers. N AJAX Apps. N Clusters. N AJAX Apps. N Clusters. - PowerPoint PPT Presentation

Citation preview

Bigger, Better, Faster, More

An Introduction to Super-Scalability

THE ARMS RACEBut first…

1 ENIAC 1 Teletype

1 Mainframe N Terminals

N Servers N Terminals

N Servers N PCs

N Web Servers N Browsers

N Web Servers N AJAX Apps

N Clusters N AJAX Apps

N Clusters N*M Phones

N Cloudlets N*M Phones

And So On…

What is Scalability?

Scalability = Ability to do More

More What?

More Processing

Processing Takes Resources

Types of Resources

CPU Disk Memory Network

Types of Utilization

Time / Throughput

Space / Capacity

Types of Utilization

Time / Throughput

Space / Capacity

Complexity

Locking

Resources & Utilization

We Want More!(but how to scale?)

How to Scale

Just make it bigger (vertical scaling)

We Want Even More!(super-scalability)

Scaling Strategies

Space Bigger

Complexity Better

Time Faster

Locking More

Bigger (Space)

Not SuperOne big data storeOne big memory storeMake it biggerMake it redundantE.g. Full activity logging

PartitioningSharding / HashingGrowth = Add PartitionTradeoff: Splitting PartitionsTradeoff: Redundancy becomes a distribution problem

…CBA

Better (Complexity)

Not SuperNumber of objects increaseAs relations increase, add time or space requirementsCommon with graph problemsE.g. PageRank

DistributionChop up problem / workloadMap/ReduceTradeoff: coordinationTradeoff: network

Faster (Time)

Not SuperTune your codeTune your databaseTune your networkBetter hardware

OptimizationAs fast as possible

Can’t scale as fast as growthSpecialization – ONE thingCaching - Reduces work in trade for spaceTradeoff: spaceTradeoff: coordination

More (Locking)

Not SuperOne at a timeSerialized access

Parallelizing / EstimatingSeparate reads & writesNon-locking estimationReduce contentionTradeoff: spaceTradeoff: coordination

But Which Technique(s)?

It Depends!

All: Divide & Conquer

Partitions: Data & ProcessingShardingWorker Processes

Coordination: Distribution & OrderingQueues & ManagersSeparate Read/Write Access

What does this make the system look like?

SOME THEORYAnd now…

ACID: reliable transaction systems

Atomicity – all or nothingConsistency – always correctIsolation – changesets executed independentlyDurability – once committed, stays so

Really hard to scale in one big block (although SSDs + RAM helps!)

Maybe It’s Not so Important?(it depends)

BASE is easier

Basically AvailableSoft StateEventual Consistency

A node will either eventually get a change or retireWell…still need conflict resolution

BASE is NOT ACID (get it?)

Can we have a Balanced pH?

CAP Theorem

Choose TWO:ConsistencyAvailabilityPartition tolerance

Manager

Replica 1 Replica 2

Double Outage!Double Outage!

Client 1 Client 2

Designing a scalable system

It Depends!

Understand Your Scale Points

LogProfileTuneTestDivideComparePartitionNo, really, log a lot

Fallacies of Distributed Computing

1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.

SOME “SCALY” TOOLS

CQRS Pattern

Separate operations for:Command – perform an actionQuery – returns data about state

Promotes simpler programsAllows Command QueuesReduces locking

A Scaly Stack

• Applications

SaaS

• Storage• Identity• Runtime• Queue / BusPaaS

• Compute• Block Data• NetworkIaaS

Infrastructure as a ServiceComponent Example

Compute Amazon EC2Azure Web/Worker Roles

Storage Amazon S3Azure TableStore

Network Any CDN

Platform as a ServiceComponent Example

Database SQL AzurePostgresMySQL

NoSQL CassandraRedisBigTableMongoDB

Cache Memcache

Queue Azure Service Bus

Processing HadoopStorm

Application as a Service

Salesforce? (Also sort of a platform)

Whateva!

AN EXAMPLECassandra

Cassandra

A “scalable” key-value storeAutomatic partitioningAutomatic replicas

Cassandra Data Model

So All is Good, Right?

A RELATIONAL EXAMPLEWorse than SQL Tuning?

Our Database

Know your Access Patterns

Get user by user id Get item by item idGet all the items that a particular user likesGet all the users who like a particular item

Cassandra Model #1: Relational-y

Can’t get all the items that a particular user likes (without a giant scan)

Cassandra Model #2: Indexes

N-M relationship is modeled with two tables. But Properties require secondary lookups.

Cassandra Model #3: Denormalization

Can put some data in the indexes if your queries need it. (Or serialize data.)

Cassandra Model #4: SuperColumns?

SuperColumns let you store other dimensions of data. (eek?)

Cassandra Model #5: Time order

Composite (sorted) column keys let you do neat things like time-order the mapping.

IT DEPENDS!Roll your own model – see www.datastax.com for great data model articles

Conflict Resolution in Cassandra

Each Tuple has a TimestampLast change winsRequires clock synchronization(Working on other strategies)

THE FUTUREBut wait, there’s more….

N*M*Q Cloudlets N*M*Q Devices

The Internet of ThingsIt’s coming. Can your servers handle it?

Things are Getting Smarter

ArduinoNetduinoRaspberry Pi ($25)

Servers will do Server Things

Cross-thing sharingData storageAnalysis

How Will We Survive?

CommunicationNetwork EffectAnalytics

Cell Computing

Self-sufficient unit of scaleAll components required to operate a portion of workloadKnown performance characteristicsKnown cost to interact with other cells

THINK BIGHow big is your project?

Some Scale

50,000 doctors100 editors500GB of data

Does it matter?

Recommended