Scaling your website

SCALING YOUR WEBSITE

Alejandro Marcu

Dutch PHP Conference 2016

2

Started programming Logo at 8 years old

Then moved to Basic, Turbo Pascal, C++, Java

2001 – 2004 Various programming jobs in Argentina

2004 – 2008: TopCoder 2009 – 2015: Facebook

Alejandro Marcu

3

Scalable architecture

Scaling the database

Caching

Introducing new features

What You Will Learn Today

Scalable architecture

5

Single Server

Hosted or in the cloud

Web App: Apache/Nginx +

PHP

DB: MySql, MongoDB, etc.

Cache: Memcache, Redis Web App

CacheDB

Server

User

6

More RAM

More cores or faster CPU

SSD

RAID

Network Interfaces

Scaling Vertically

7

Functional Partitioning

Servers can have different

hardware specs

More latency

Limited growthServer 1

Server 3Server 2

Web App

CacheDB

Data Center

User

8

Splitting the Web App

Web Front End should be a

thin presentation layer

Services

Just another class

Remote over SOAP, REST, Thrift

Start simple, plan for scale

Web Front End

Service 1

DB

Service 2 Service nBack End

Cache

iOS App

AndroidApp

9


Back end servers can have

one or more services

Some services can be in

more than one server

Service 1 Service nBack End

Server 4 Server k

Server 1

Server 3Server 2

Web Front End

CacheDB

Data Center

User

10

Don’t store anything locally

Use external storage (e.g. databases)

Can use local caching

Stateless Services

11

HTTP Session

Cookies

External Data Store

Uploaded Files

DFS: GFS, HDFS, ClusterFS

Amazon S3

Stateless Front End

12

Multiple Front End Servers

Load Balancer:

Cloud based (Amazon ELB)

Software (NGINX, HAProxy)

Hardware (BIG-IP,

Netscaler)

Load Balance

r

Service 1 Service n

Back End

CacheDB

Data Center

User

Web FE 1Front End

Web FE k

13

Caching static files

Files that are the same on each request, e.g. jpg, png, css, js, mp3, etc

Reverse HTTP Proxy Load balancers usually

provide this functionality

CDN (Content Delivery Network) E.g. Akamai, Amazon

Cloudfront Pay for usage Multiple locations

User CDN

Data Center

staticcontent

dynamiccontent

14

Advantages

Lower latency for users

Reduced disaster risk

Economic opportunities

Challenges

Consistency

Latency between data centers

Bandwidth between data centers

Multiple Data Centers

Scaling databases

16

Too much data

Too many reads

Too many writes

Want higher availability

Scaling relational databases

17

Replication

Usually much more reads

than writes

Higher availability

Read after write can be

wrong

Master

Slave Slave

R/W

R

DB clients

Binlogs

18


Limited growth

Can separate unrelated

functionality

User

Post

Payment

DB 1

DB 2

19

Sharding

Tables are split into multiple

DBs

Sharding key used to decide

which db, e.g. id

Sharding function, e.g.

db(id) = (id % 2) + 1

Searching becomes more

complicated

id name

1 John

3 Jack

5 Anne

id name

2 Louise

4 Bob

6 Marie

DB 1

DB 2

20

Sharding

E.g., add an extra db

New sharding function:

db(id) = (id % 3) + 1

Conclusion: modulo is not a

good sharding function

id name

1 John

3 Jack

5 Anne

id name

2 Louise

4 Bob

6 Marie

DB 1

DB 2

id name

1 John

4 Bob

DB 1

id name

2 Louise

5 Anne

DB 2

id name

3 Jack

6 Marie

DB 3

21

Consistent Sharding

Consistent sharding needs

less reallocations id name

1 John

3 Jack

5 Anne

id name

2 Louise

4 Bob

6 Marie

DB 1

DB 2

id name

1 John

3 Jack

DB 1

id name

2 Louise

4 Bob

DB 2

id name

5 Anne

6 Marie

DB 3

22

Sharding

Create many logical DBs

Distribute them across

servers

Server 1

DB 1DB 2……DB 16

Server 2

DB 17DB 18……DB 32

23

Sharding

Re-distribute DBs when

needed

Need a function to map db

to server, can be a

configuration

Server 1

DB 1DB 2……DB 16

Server 2

DB 17DB 18……DB 24

Server 3

DB 25DB 18……DB 32

24

Sharding colocation

Put owned data in the same

table (e.g. shard by user_id

in post table)

Can execute joins

userid name

1 John

3 Jack

5 Anne

id name

2 Louise

4 Bob

6 Marie

DB 1

DB 2user

postid user_id text

100 1 …

125 1 …

180 3 …

postid user_id text

143 2 …

110 6 …

175 6 …

25

Sharding fan-out

Many-to-many relationships

are spread out

To get friend’s names:

Get ids

Group by db

Query on each db

Gets worse with more dbs

Caching helps a lot

Needs inverse entries

userid name

1 John

3 Jack

5 Anne

id name

2 Louise

4 Bob

6 Marie

DB 1

DB 2user

friendid1 id2

1 2

1 4

3 4

friendid1 id2

2 1

4 1

4 3

26

Replication

Scales reads, higher availability

Functional partitioning

Limited scalability

Helps across the board

Sharding

Scales reads, writes, too much data and helps with availability

Those 3 techniques can be combined

Database scaling

Caching

28

Usually required at large scale Key-Value stores

Set(key, value[, TTL])

Get(key)

Delete(key) Different levels

Client side (e.g. in the browser in JS)

In the WebServer (e.g. APC)

Distributed cache (e.g. Redis, Memcached)

Caching application data

29

E.g. APC (Alternative PHP Cache)

Very fast

Duplicated caching between web servers

Expensive to invalidate

Use sparingly, mostly for global data

Caching in the web server

30

Examples:

Redis

Memcached (+ McRouter or libmemcached)

One or more cache servers, shared use between clients

Network latency

Distributed cache

31

Features to consider:

Replication

Partitioning

Separate pools

Persistence

Atomic operations

Distributed cache

32

When the value is no longer valid, usually just delete the key

Example:

user_friends:100 => ‘John X, Bob Y, Anne Z’

Need to invalidate when:

The user adds or removes friends

A friend removes him as a friend

A friend changes his name

Can you tolerate temporary inconsistencies?

Cache invalidation

33

What happens if you change the structure of the values? Example: (old) user_friends:100 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’

New code breaks with old style keys

Old code breaks with new style keys

Solution: use versions: (old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’

Cache versioning


35

Objectives:

A/B testing

Quickly revert it if needed

Protect infrastructure

Ease of development


36

Some possibilities:

1. Development branch

2. Feature toggle

3. Percentage Rollout

4. Advanced Rollout


37

New branch for the feature, merge when finished

Can be fine in the early stages

No extra setup or complexity

Long living branch, may be hard to merge

Development Branch

38

Can be changed at run time (console or configuration)

Should distinguish prod from testing

Allows for intermediate commits Code structure:

if (feature_enabled(‘homepage_redesign’)) { new_homepage();} else { old_homepage();}

Feature Toggle

39

Dynamically control the percentage of users

for a feature

When increasing the percentage, should

include previous users Code structure:

if (feature_enabled(‘homepage_redesign’, $user_id)) { new_homepage();} else { old_homepage();}

Percentage Rollout

40

Turn on/off features for a percentage of users that:

Are employees

Are in another rollout group

Use a certain language

Are in a certain country

Individually whitelist or blacklist people

Advanced Rollout

41

Some frameworks to check out:

Swivel

Opensoft/rollout

LaunchDarkly

Don’t forget to clean up the old code paths


42

Contact Information

[email protected]

/alejandro.marcu

/alejandromarcu

@AlejandroMarcu

/in/alejandromarcu

Career

Scaling your website