Scalarea Aplicatiilor Web - 2009

  • View
    759

  • Download
    0

Embed Size (px)

Text of Scalarea Aplicatiilor Web - 2009

  • Scalarea Aplicatiilor Web

    Andrei Gheorgheidevelop.ro

  • Cazul cel mai comun

    Shared Hosting

  • Unde apar probleme

    Puterea de procesare a serverului:

    CPU, RAM, etc

    Latimea de banda

    Capacitate de stocare

    Baza de date

  • Server Web + Server DB

  • Load Balancing

  • Load Balancing

    Hardware Balancingul se face la nivel de transport pachete Scump, nu stie nimic despre arhitectura aplicatiei

    DNS Load Distribution ("Round Robin") Statistic, distribuie traficul uniform Nu stie nimic despre disponibilitatea serverelor Pot aparea probleme de DNS caching Este o solutie doar la scara foarte mare

    Reverse Proxy

  • Reverse Proxy Load Balancing

    Un singur front-end pentru mai multe servere

    Securitate

    Accelerarea cererilor SSL

    Caching

    nginx, squid, lighthttpd

  • Relational Databases

    tabele, coloane, joinuri

  • MySQL Replication

  • MySQL Cluster

    Data node Nu se interactioneaza direct cu ele

    Management node Configurarea si monitorizarea clusterului

    SQL node (mysqld process): Un server MySQL care se conecteaza la nodurile de date

    pentru a cere sau stoca informatii

    Generally, each node will run on a separate host

  • MySQL Cluster

    Synchronous Replication Datele sunt replicate pe mai multe noduri pentru a asigura

    disponibilitatea in cazul deconectarii unui nod de date

    Horizontal Data Partitioning Informatiile sunt partitionate automat intre toate nodurile de

    date folosind un algoritm bazat pe primary key

    Hybrid Storage memory / disk

    Shared Nothing no single point of failure

  • Normalizare

    Presupune aducerea bazei de date la o

    forma normala

    Datele sunt structurate pe tabele cu relatii

    intre ele, si fiecare informatie apare o

    singura data

    Asigura consistenta informatiei in cazul

    operatiilor asupra bazei de date

  • Normalizare / Denormalizare

    USERSuser_id, user_name, user_password

    POSTSpost_id, post_author_id

    COMMENTSc_id, c_post_id, c_text

  • Normalizare / Denormalizare

    USERSuser_id, user_name, user_password

    POSTSpost_id, post_author_id, post_author_name

    COMMENTSc_id, c_post_id, c_text

  • Normalizare / Denormalizare

    USERSuser_id, user_name, user_password

    POSTSpost_id, post_author_id, post_author_name, post_comment_count

    COMMENTSc_id, c_post_id, c_text

  • Key Value Databases

  • Key Value Databases

    Distributed, persistent hash tables "Eventual consistency"

    Permit SELECT-uri cu conditii

    Necesita o doza de denormalizare a datelor Tratarea manuala a inconsistentelor, propagarea datelor

    corecte

    MemcacheDB, CouchDB, Amazon SimpleDB, Hypertable, Google BigTable

  • Sharding

  • Vertical Sharding

    Un server pentru useri, un server pentru search, etc

    JOIN-urile intre tabele se fac manual Denormalizarea DB reduce nevoia de JOIN-uri

    USERS COMMENTS SEARCH

  • Horizontal Sharding

    Impartirea inregistrarilor dintr-un tabel intre mai multe servere

    Algoritmul de impartire este foarte important in functie de algoritmul ales, reechilibrarea

    datelor in cazul modificarii topologiei poate fi dificila

    Se poate folosi un dictionar central algoritm transparent mai usor de reechilibrat poate crea SPF

    USR #1

    USR #2

    USR #3

  • Avantajele sharding-ului

    High availability. Daca un server crapa, aplicatia continua sa functioneze

    Query-uri mai rapide Query-urile fiind pe bucati mai mici de date se executa

    mai repede

    Rata de scriere mai mare Scrierile se executa mai repede deoarece, neavand un

    server central, se executa in paralel

  • Cache

  • memcached

    memcached -d -u www -m 2048 -l 10.0.0.8 -p 11211

    Hash table distribuit, pastrat in RAM

    set(key, value) get(key) delete(key)

    value este de obicei un intreg obiect serializat Ex: articol + comentarii + informatii autor

    Exista clase de interactiune cu memcached pentru orice limbaj de programare, inclusiv PHP

  • memcached

    "Least Recently Used"

    Intr-o retea cu mai multe servere, instantele de memcached pot fi legate intre ele pentru a forma un cluster memcache in care cache-ul este replicat pe mai multe noduri

    memcached ruleaza pe Linux, Windows, poate fi pornit oriunde exista RAM liber

  • Session Clustering

  • Load Balancing Revisited

  • Session Clustering

    Store in common filesystem Not useful in multi-server environments NFS will cache pages

    Store in database Very fast because you are only ever looking up primary keys Make sure the DB has row locking (InnoDB), not table locking.

    Store in memcached Stored across several machines rather than just one. A total machine failure now affects only a percentage of users

    rather than everyone.

  • Content Delivery Network

    A collection of web servers distributed across

    multiple locations to deliver static content more

    efficiently to users.

    The server selected for delivering content to a

    specific user is typically based on a measure of

    network proximity.

  • Multiple Codebases

    Daca arhitectura serverelor si a site-ului o

    permite, se pot face lucruri interesante avand

    cod diferit

    Folosind un reverse proxy, se pot trimite 10%

    din vizitatori spre o versiune 2.0 beta a site-ului

    si observa felul cum interactioneaza

    Daca lucrurile nu ies cum ar trebui, se revine la

    codul initial si nu au fost afectati decat 10%

  • Studii de cazhighscalability.com

  • LAMP

    Shards

    Memcached

    Squid

    Smarty

    Imagemagick

  • More than 4 billion queries per day

    ~35M photos in squid cache (total)

    ~2M photos in squids RAM

    ~470M photos, 4 or 5 sizes of each

    38k req/sec to memcached (12M objects)

    2 PB raw storage (consumed about ~1.5TB on Sunday

    Over 400,000 photos being added every day

  • Debian Linux, Apache, PHP, MySQL

    memcached

    MemcacheDB - distributed key-value storage

    system which conforms to memcache protocol

    15,000 writes/second, 64,000 reads/second

    Lots of servers

  • 26 million uniques a month

    30 million users.

    Uniques are only half that traffic. Traffic =

    unique web visitors + APIs + Digg buttons.

    2 billion requests a month

    13,000 requests a second, peak at 27,000

    requests a second.

  • Data are separated into separate clusters: User

    Actions, Users, Comments, Items, etc.

    Asynchronous queuing architecture for near-

    term processing

  • Amazon Web Services

  • Simple Storage Service (S3)

    Cloud storage service

    Servere in US / Europe

    REST API

    Stocare: $0.150 / GB

    Upload: $0.100 / GB

    Download: $0.170 / GB

    Twitter foloseste S3 pentru pozele userilor

  • Elastic Compute Cloud (EC2)

    On-demand server instances

    In 5 minute poti porni un server la care ai acces

    root

    $0.10 / ora, 99.95% uptime garantat

    4 ore pe an downtime

    Se pot aloca adrese IP statice si se pot construi

    arhitecturi complexe

    Acces rapid la S3

  • SimpleDB

    Distributed hash DB

    Permite SELECT-uri cu conditii

    Query limitat la 5 secunde

  • thank you, come again