Upload
confoo
View
4.709
Download
2
Tags:
Embed Size (px)
Citation preview
ScalableArchitectures 101
ConFooMar 10, 2011
Mike Willbanks Blog: http://blog.digitalstruct.comTwitter: mwillbanks IRC: lubs on freenode
● Software Development Manager● Organizer of MNPHP / MNMySQL● Zend Certified Engineer (PHP/ZF)
Who am I?
Your application is growing, your systems are slowing and growth is inevitable...
● Where do we go from here?● Load Balancing● Web Servers● Database Servers● Cache Servers
● Job Servers● DNS Servers● CDN Servers● FrontEnd Performance
Scalability?
Single Server Syndrome
● One Server Many Functions
● Web Server, Database Server, Cache Server, Job Server, DNS Server, Mail Server....
● How we know it's time
● iostat, cpu load, overall degradation● OR.....
The Beginning
Single Separation Syndrome
● Separation of Web and Database
● Fix the main disk I/O bottleneck.● Still generally running several things on a single
server.
The Next Step
Uh, oh :(
Load Balancing
A Load Balanced Environment
● DNS Rotation (Little to No Cost)● Not reliable, but it can work on a small scale.
● Software Based (Commodity Server Cost)● HAProxy, Pound, Varnish, Squid, Wackamole,
Perlbal, Web Server Proxy (Nginx, Apache, etc)...
● Hardware Based (High Cost Appliance)● Several vendors ranging based on need.
– A10, F5, etc.
Load Balancing Options
● Round Robin● Static● Least Connections● Source● IP● Basic Authentication
● URI● URI Parameter● Header● Cookie● Regular Expression
Load Balancing Routing Types
● Out of the many options we will focus in on 3● HAProxy – By and large one of the most popular.● Pound – Said to be great for medium traffic sites.● Varnish – A caching solution that also does load
balancing
Targeting Open Source Software Packages
● Pros● Extremely full featured● Very well known● Handles just about every type of routing● Several examples online● Has a webbased GUI
● Cons● No native SSL support (use Stunnel)● Setup can be complex and take a lot of time
HAProxy
global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 user haproxy group haproxy daemon
defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000
listen localhost 0.0.0.0:80 option httpchk GET / balance roundrobin cookie SERVERID server serv1 0.0.0.0:8080 check inter 2000 rise 2 fall 5 server serv2 0.0.0.0:8080 check inter 2000 rise 2 fall 5 option httpclose stats enable stats uri /lb?stats stats realm haproxy stats auth test:test
HAProxy: Sample Configuration
● Pros● chroot support● Native SSL support● Insanely simple setup● Supports virtually all types of routing● Many online tutorials
● Cons● No webbased statistics (use poundctl)● HAProxy can scale more...
Pound
User "www-data"Group "www-data"LogLevel 1 Alive 30Control "/var/run/pound/poundctl.socket"ListenHTTP Address 127.0.0.1 Port 80 xHTTP 0
Service BackEnd Address 127.0.0.1 Port 8080 End BackEnd Address 127.0.0.1 Port 8080 End End End
Pound: Sample Configuration
● Pros● Supports frontend caching● Farily simple setup● Extremely well known● Many online tutorials● Large suite of tools (varnishstat, varnishtop,
varnishlog, varnishreplay, varnishncsa)
● Cons● No native SSL support (use Pound or Stunnel)● If you want a WebGUI you must PAY
Varnish
backend default1 { .host = "127.0.0.1"; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } }
backend default2 { .host = "127.0.0.1"; .port = "8080"; .probe = { .url = "/"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } }
director default round-robin { { .backend = default1; } { .backend = default2; } }
sub vcl_recv { if (req.http.host ~ "^127.0.0.1$") { set req.backend = default; } }
Varnish: Sample Configuration
● Web Servers● One always needs to be available● Don't use SSL on the web server level!
● Headers● Pass headers if SSL is on or not● Client IP is likely on Xforwardedfor● If using Virtual Hosts pass the Host
● Sessions● Need a solution if not using sticky routing
Load Balancing: Keep in Mind
Web Servers
● Apache● IIS● Nginx● Lighttpd● etc.
Many Web Servers
● Sever name should be the same on all servers● Make a server alias so you can reach individual
servers w/o load balancing
● Each configuration SHOULD or MUST be the same.
● Client IP generally is in Xforwardedfor.● SSL will not be in $_SERVER['HTTPS'] and
HTTP_ header instead.
Web Server Configuration
● Files● All web servers need our files.● Static content could be tagged in version control.● Static content may need a file server / CDN / etc.● User Generated content on NFS mount or served
from the cloud or a CDN.
● Sessions● All web servers need access to our sessions.● Remember disk is slow and the database will be a
bottleneck. How about distributed caching?
Web Servers: Keep in Mind
● Running PHP on your web server may be a resource hog, you may want to offload static content requests to varnish, nginx, lighttpd or some other lightweight web server.● Running a proxy to your main web servers works
great for hardworking processes. While serving static content from the lightweight server.
Web Servers: Other Information
DatabaseServers
Single Database Server
● Lots of options and steps as we move forward.
Single Database Server
Single Master, Single Slave
● Write code that can write to the master and read from the slave.
● Exception: Be smart, don't write to the master and read from the slave on the table you just wrote to.
Database Replication
Single Master, Multiple Slaves
● It is a great time to start to implement connection pooling.
Database Replication Multiple Slaves
Multiple Master, Multiple Slaves
● Do NOT write to both masters at once with MySQL!
● Be warned, autoincrementing now should change so you do not conflict.
Database Replication Multiple Everything
Segmenting your Data● Vertical Partitioning
● Move less accessed columns, large data columns and columns not likely in the where to other tables.
● Horizontal Partitioning
● Done by moving rows into different tables.– Based on Range, Date, User or Interlaced– May require duplicate lookup tables for different
indexes.
Database Table Partitioning
● Replication● There may be a lag!● All reports / read queries should go here● Don't read here directly after a write
– Transactions / Lag / etc.
● Sessions● Never store sessions in the DB
– Large binlogs, garbage collection causes slow queries, queue may fill up and cause a crash or max connections.
Database Servers: Keep in Mind
CacheServers
(not full page)
“Caching is imperative in scaling and performance”
● Single Server– Shared Memory: APC / Xcache / etc– File Based: Files / Sqlite / etc– Not highly scalable, great for configuration files.
● Distributed– Memcached, Redis, etc.– Setup consistent hashing.
● Do not cache what cannot be recreated.
Cache Servers: What Type?
In The Beginning
● Single Caching Server
● Start to cache fetches, invalidate cache on write and write new cache, always reading from the cache.
Caching: Single Server
Distributed Mania
● Write based on consistent hashing (hash of a key that you are writing)
● Server depends on the hash.
● Hint – use the memcached pecl extension.
Caching: Going Distributed
In the most simple form...
Caching: Read / Write with a Database
● Replicated or not...
● Elasticity
● Consistent hashing – cannot add or remove w/o losing data● Sessions
● Store me here... please please please!● Memory Caches
● Durability If it fails, it's gone!● Ensure dedicated memory!● If you run out of memory, does it remove an old and add the
new or not allow anything to come in?
Caching: Keep in Mind
JobServers
(message queues)
“Message queues and mailboxes are softwareengineering components used for interprocess communication, or for inter
thread communication within the same process. They use a queue for messaging – the passing of control or of content.”
http://en.wikipedia.org/wiki/Message_queue
Messages: They're Everywhere
● A FIFO buffer● Asynchronous push / pull● An application framework for sending and
receiving messages.● A way to communicate between applications /
systems.● A way to decouple components.● A way to offload work.
Message Queues: What are They?
Single Job Server
● Lots of options and steps as we move forward.
Producer Message QueueServer
ConsumerQueue Receive
Message Queues: The Basic Concept
Distributed Mania
● Load balance a message queue for scale
● Can continue to create more workers
Producer
QueueServer
Consumer Consumer Consumer ConsumerConsumer
QueueServer
QueueServer
Producer Producer
Message Queue: Going Distributed
● Asynchronous Processing● Communication between Applications / Systems● Image Resizing● Video Processing● Sending out Emails● AutoScaling Virtual Instances● Log Analysis● The list goes on...
Message Queues: Useful for?
● Replication or not?● You need to keep your workers running
● Supervisord or monit or some other monitoring...
● Don't offload things just to offload● If it needs to be realtime and not near realtime this
is not a good place for things – however, your boss does not need to know :)
Message Queues: Keep in Mind
DNS Servers
● Just about every domain registrar runs DNS● If you don't need to, do not run your own.
● Anycast DNS● Anycast is a network addressing and routing
scheme whereby data is routed to the "nearest" or "best" destination as viewed by the routing topology.
● It's sexy, it's sweet and it is FAST!
DNS Servers: Are you running your own?
● Wildcard support● Failover / Distributed● CNAME support● TXT support● Name Server support
DNS Servers: Identifying a Good Service
CDN Servers
● A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, so as to avoid bottlenecks near that server.
● Content types include web objects, downloadable objects (media files, software, documents), applications, real time media streams, and other components of internet delivery (DNS, routes, and database queries).
CDN: What is a CDN?
http://en.wikipedia.org/wiki/Content_delivery_network
● Extremely fast at serving files● Increased serving capacity● Distributed nodes● Frees up your server for the difficult stuff
CDN: Why Use One?
● Origin Pull● Utilizes your own web server and pulls the content
and stores it in their nodes.
● PoP Pull● You upload the content to something like S3 and it
has a CDN on the top of it like CloudFront.
CDN: The Types
● Depends on your need...● Origin Pull is great if you want to maintain all of
the content in your web server.● PoP Push is great for storing things like user
generated content.
CDN: What Should I Use?
Mail Servers
● Google Apps – just offload it!● If you do not need to run a mail server don't.
● SpamAssassin and ClamAV are resource hogs● If you need it, put it on it's own server.
Mail Servers: A Quick Note
FrontEndPerformance
● Tactics● Minification (JavaScript / CSS)
– PHP 5.3 library: Assetic ● CSS Sprites
– Several online and offline tools● GZIP
– Configured in the web server● Cookies slow down the client● Parallel downloads (use subdomains)● HTTP Expires
– Configured in the web server
FrontEnd Performance: Points
● Tools for Identifying Areas● Yslow● Firebug● Google Page Speed● Google Webmaster Tools● Pingdom
FrontEnd Performance: Tools
Questions?
Mike Willbanks Blog : http://blog.digitalstruct.com Twitter : mwillbanks IRC : lubs on freenodeJoind.in : http://joind.in/2838