Upload
alejandro-marcu
View
134
Download
0
Embed Size (px)
Citation preview
SCALING YOUR WEBSITE
Alejandro Marcu
Dutch PHP Conference 2016
2
Started programming Logo at 8 years old
Then moved to Basic, Turbo Pascal, C++, Java
2001 – 2004 Various programming jobs in Argentina
2004 – 2008: TopCoder 2009 – 2015: Facebook
Alejandro Marcu
3
Scalable architecture
Scaling the database
Caching
Introducing new features
What You Will Learn Today
Scalable architecture
5
Single Server
Hosted or in the cloud
Web App: Apache/Nginx +
PHP
DB: MySql, MongoDB, etc.
Cache: Memcache, Redis Web App
CacheDB
Server
User
6
More RAM
More cores or faster CPU
SSD
RAID
Network Interfaces
Scaling Vertically
7
Functional Partitioning
Servers can have different
hardware specs
More latency
Limited growthServer 1
Server 3Server 2
Web App
CacheDB
Data Center
User
8
Splitting the Web App
Web Front End should be a
thin presentation layer
Services
Just another class
Remote over SOAP, REST, Thrift
Start simple, plan for scale
Web Front End
Service 1
DB
Service 2 Service nBack End
Cache
iOS App
AndroidApp
9
Functional Partitioning
Back end servers can have
one or more services
Some services can be in
more than one server
Service 1 Service nBack End
Server 4 Server k
Server 1
Server 3Server 2
Web Front End
CacheDB
Data Center
User
10
Don’t store anything locally
Use external storage (e.g. databases)
Can use local caching
Stateless Services
11
HTTP Session
Cookies
External Data Store
Uploaded Files
DFS: GFS, HDFS, ClusterFS
Amazon S3
Stateless Front End
12
Multiple Front End Servers
Load Balancer:
Cloud based (Amazon ELB)
Software (NGINX, HAProxy)
Hardware (BIG-IP,
Netscaler)
Load Balance
r
Service 1 Service n
Back End
CacheDB
Data Center
User
Web FE 1Front End
Web FE k
13
Caching static files
Files that are the same on each request, e.g. jpg, png, css, js, mp3, etc
Reverse HTTP Proxy Load balancers usually
provide this functionality
CDN (Content Delivery Network) E.g. Akamai, Amazon
Cloudfront Pay for usage Multiple locations
User CDN
Data Center
staticcontent
dynamiccontent
14
Advantages
Lower latency for users
Reduced disaster risk
Economic opportunities
Challenges
Consistency
Latency between data centers
Bandwidth between data centers
Multiple Data Centers
Scaling databases
16
Too much data
Too many reads
Too many writes
Want higher availability
Scaling relational databases
17
Replication
Usually much more reads
than writes
Higher availability
Read after write can be
wrong
Master
Slave Slave
R/W
R
DB clients
Binlogs
18
Functional Partitioning
Limited growth
Can separate unrelated
functionality
User
Post
Payment
DB 1
DB 2
19
Sharding
Tables are split into multiple
DBs
Sharding key used to decide
which db, e.g. id
Sharding function, e.g.
db(id) = (id % 2) + 1
Searching becomes more
complicated
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
20
Sharding
E.g., add an extra db
New sharding function:
db(id) = (id % 3) + 1
Conclusion: modulo is not a
good sharding function
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
4 Bob
DB 1
id name
2 Louise
5 Anne
DB 2
id name
3 Jack
6 Marie
DB 3
21
Consistent Sharding
Consistent sharding needs
less reallocations id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
3 Jack
DB 1
id name
2 Louise
4 Bob
DB 2
id name
5 Anne
6 Marie
DB 3
22
Sharding
Create many logical DBs
Distribute them across
servers
Server 1
DB 1DB 2……DB 16
Server 2
DB 17DB 18……DB 32
23
Sharding
Re-distribute DBs when
needed
Need a function to map db
to server, can be a
configuration
Server 1
DB 1DB 2……DB 16
Server 2
DB 17DB 18……DB 24
Server 3
DB 25DB 18……DB 32
24
Sharding colocation
Put owned data in the same
table (e.g. shard by user_id
in post table)
Can execute joins
userid name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2user
postid user_id text
100 1 …
125 1 …
180 3 …
postid user_id text
143 2 …
110 6 …
175 6 …
25
Sharding fan-out
Many-to-many relationships
are spread out
To get friend’s names:
Get ids
Group by db
Query on each db
Gets worse with more dbs
Caching helps a lot
Needs inverse entries
userid name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2user
friendid1 id2
1 2
1 4
3 4
friendid1 id2
2 1
4 1
4 3
26
Replication
Scales reads, higher availability
Functional partitioning
Limited scalability
Helps across the board
Sharding
Scales reads, writes, too much data and helps with availability
Those 3 techniques can be combined
Database scaling
Caching
28
Usually required at large scale Key-Value stores
Set(key, value[, TTL])
Get(key)
Delete(key) Different levels
Client side (e.g. in the browser in JS)
In the WebServer (e.g. APC)
Distributed cache (e.g. Redis, Memcached)
Caching application data
29
E.g. APC (Alternative PHP Cache)
Very fast
Duplicated caching between web servers
Expensive to invalidate
Use sparingly, mostly for global data
Caching in the web server
30
Examples:
Redis
Memcached (+ McRouter or libmemcached)
One or more cache servers, shared use between clients
Network latency
Distributed cache
31
Features to consider:
Replication
Partitioning
Separate pools
Persistence
Atomic operations
Distributed cache
32
When the value is no longer valid, usually just delete the key
Example:
user_friends:100 => ‘John X, Bob Y, Anne Z’
Need to invalidate when:
The user adds or removes friends
A friend removes him as a friend
A friend changes his name
Can you tolerate temporary inconsistencies?
Cache invalidation
33
What happens if you change the structure of the values? Example: (old) user_friends:100 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’
New code breaks with old style keys
Old code breaks with new style keys
Solution: use versions: (old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’ (new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’
Cache versioning
Introducing new features
35
Objectives:
A/B testing
Quickly revert it if needed
Protect infrastructure
Ease of development
Introducing new features
36
Some possibilities:
1. Development branch
2. Feature toggle
3. Percentage Rollout
4. Advanced Rollout
Introducing new features
37
New branch for the feature, merge when finished
Can be fine in the early stages
No extra setup or complexity
Long living branch, may be hard to merge
Development Branch
38
Can be changed at run time (console or configuration)
Should distinguish prod from testing
Allows for intermediate commits Code structure:
if (feature_enabled(‘homepage_redesign’)) { new_homepage();} else { old_homepage();}
Feature Toggle
39
Dynamically control the percentage of users
for a feature
When increasing the percentage, should
include previous users Code structure:
if (feature_enabled(‘homepage_redesign’, $user_id)) { new_homepage();} else { old_homepage();}
Percentage Rollout
40
Turn on/off features for a percentage of users that:
Are employees
Are in another rollout group
Use a certain language
Are in a certain country
Individually whitelist or blacklist people
Advanced Rollout
41
Some frameworks to check out:
Swivel
Opensoft/rollout
LaunchDarkly
Don’t forget to clean up the old code paths
Introducing new features
42
Contact Information
/alejandro.marcu
/alejandromarcu
@AlejandroMarcu
/in/alejandromarcu