Building Scalable Web Architectures

Embed Size (px)

DESCRIPTION

Building Scalable Web Architectures. Aaron Bannert aaron@apache.org / aaron@codemass.com. Goal. To build a reliable , scalable , cheap , flexible , extendable internet application. The Age of LAMP. What does a LAMP architecture give us?. Scalability. Grows in small steps - PowerPoint PPT Presentation

Text of Building Scalable Web Architectures

  • BuildingScalable Web ArchitecturesAaron Bannertaaron@apache.org / aaron@codemass.com

  • Goal

    To build a reliable, scalable, cheap, flexible, extendable internet application.

  • The Age of LAMPWhat does a LAMP architecture give us?

  • ScalabilityGrows in small stepsStays up when it countsCan grow with your trafficRoom for the future

  • ReliabilityHigh Quality of ServiceMinimal DowntimeStabilityRedundancyResilience

  • Low CostLittle or no software licensing costsMinimal hardware requirementsAbundance of talentReduced maintenance costs

  • FlexibleModular ComponentsPublic APIsOpen ArchitectureVendor NeutralMany options at all levels

  • ExtendableFree/Open Source LicensingRight to UseRight to InspectRight to ImprovePluginsSome FreeSome CommercialCan always customize

  • Free as in Beer?

    PriceSpeedQuality

    Pick any two.

  • LAMP-like Architectures

  • The Big Picture

  • External Caching Tier

  • Web Serving Tier

  • Application Server Tier

  • Internal Cache Tier

  • Database Tier

  • Misc. Services (DNS, Mail, etc)

  • The GlueRoutersSwitchesFirewallsLoad Balancers

  • Software ChoicesBuilding LAMP Software

  • External Caching Tier

  • External Caching TierWhat is this?SquidApaches mod_proxyCommercial HTTP Accelerator

  • External Caching TierWhat does it do?Caches outbound HTTP objectsImages, CSS, XML, HTML, etcFlushes ConnectionsUseful for modem users, frees up web tierDenial of Service Defense

  • External Caching TierHardware RequirementsLots of MemoryModerate to little CPUFast NetworkModerate Disk CapacityRoom for cache, logs, etc (disks are cheap)One slow disk is OK

    Two Cheapies > One Expensive

  • External Caching TierOther QuestionsWhat to cache?How much to cache?Where to cache (internal vs. external)?

  • Web Serving Tier

  • Web Serving TierWhat is this?ApachethttpdTux Web ServerIISNetscape

  • Web Serving TierWhat does it do?HTTP, HTTPSServes Static Content from diskGenerates Dynamic ContentCGI/PHP/Python/mod_perl/etcDispatches requests to the App Server TierTomcat, Weblogic, Websphere, JRun, etc

  • Web Serving TierHardware RequirementsLots and lots of MemoryMemory is main bottleneck in web servingMemory determines max number of usersFast NetworkCPU depends on usageDynamic content needs CPUStatic file serving requires very little CPUCheap slow disk, enough to hold your content

  • Web Serving Tier: Zero-copyPerformance HintDedicated static content serversModern web servers are very good at serving static content such asHTMLCSSImagesZip/GZ/Tar files

  • Web Serving TierPerformance HintStateless SessionsEach connection is a fresh startServer remembers nothingBenefits?Allows Better CachingScales Horizontally

  • Web Serving TierChoicesHow much dynamic content?When to offload dynamic processing?When to offload database operations?When to add more web servers?

  • Application Server Tier

  • Application Server TierWhat does it do?Dynamic Page ProcessingJSPServletsStandalone mod_perl/PHP/Python engines Internal ServicesEg. Search, Shopping Cart, Credit Card Processing

  • Application Server TierHow does it work?Web Tier generates the request usingHTTP (aka REST, sortof)RPC/CorbaJava RMIXMLRPC/Soap(or something homebrewed)App Server processes request and responds

  • Application Server TierCaveatsDecoupling of services is GOODManage Complexity using well-defined APIsDont decouple for scaling, change your algorithms!Remote Calling overhead can be expensiveMarshaling of dataSockets, net latency, throughput constraintsXML, Soap, XMLRPC, yuck (dont scale well)Better to use Javas RMI, good old RPC or even Corba

  • Application Server TierMore CaveatsRemote Calling can introduce new failure scenariosClassic Distributed ProblemsHow to detect remote failures?How long to wait until deciding its failed?How to react to remote failures?What do we do when all app servers have failed?

  • Application Server TierHardware RequirementsLots and Lots and Lots of MemoryApp Servers are very memory hungryJava was hungry to being withConsider going to 64bit for larger memory-spaceDisk depends on application, typically minimal neededFAST CPU required, and lots of them(This will be an expensive machine.)

  • Database Tier

  • Database TierAvailable DB ProductsFree/Open Source DBsPostgreSQLGNU DBMIngresSQLite

    CommercialOracleMS SQLIBM DB2SybaseSleepyCatMySQLSQLitemSQLBerkeley DB

  • Database TierWhat does it do?Data Storage and RetrievalData Aggregation and ComputationSortingFilteringACID properties(Atomic, Consistent, Isolated, Durable)

  • Database TierChoicesHow much logic to place inside the DB?Use Connection Pooling?Data Partitioning?Spreading a dataset across multiple logical database slices in order to achieve better performance.

  • Database TierHardware RequirementsEntirely dependent upon application.Likely to be your most expensive machine(s).

    Tons of MemorySpindles galoreRAID is useful (in software or hardware)Reliability usually trumps SpeedRAID levels 0, 5, 1+0, and 5+0 are usefulCPU also importantDual power suppliesDual Network

  • Internal Cache Tier

  • Internal Cache TierWhat is this?Object CacheWhat Applications?MemcacheLocal Lookup TablesBDB, GDBM, SQL-basedApplication-local Caching (eg. LRU tables)Homebrew Caching (disk or memory)

  • Internal Cache TierWhat does it do?Caches objects closer to the Application or Web TiersTuned for your applicationVery Fast AccessScales Horizontally

  • Internal Cache TierHardware RequirementsLots of MemoryNote that 32bit processes are typically limited to 2GB of RAMLittle or no diskModerate to low CPUFast Network

  • Misc. Services (DNS, Mail, etc)

  • Misc. Services (DNS, Mail, etc)Why mention these?Every LAMP system has themCrucial but often overlookedSource of hidden problems

  • Misc. Services: DNSImportant PointsAlways have an offsite NS slaveAlways have an onsite NS slaveMinimize network latencyDont use NAT, load balancers, etc

  • Misc. Services: Time SynchronizationSynchronize the clocks on your systems!Hints:Use NTPDATE at boot time to set clockUse NTPD to stay in synchDont ever change the clock on a running system!

  • Misc. Services: MonitoringSystem Health MonitoringNagiosBig BrotherOrcalatorGangliaFault Notification

  • The GlueRoutersSwitchesFirewallsLoad Balancers

  • Routers and SwitchesExpensiveComplexCrucial Piece of the System

    HintsUse GigE if you canJumbo Frames are GOODVLans to manage complexityLACP (802.3ad) for failover/redundancy

  • Load BalancersHardware vs. SoftwareSoftware is complex to set up, but cheaperHardware is expensive, but dedicatedIMHO: Use SW at first, graduate to HW

  • Load BalancersWhat services to balance?HTTP Caches and Servers, App Servers, DB SlavesWhat NOT to balance?DNSLDAPNISMemcacheSpreadAnything with its own built-in balancing

  • Message BussesWhat is out there?SpreadJMSMQSeriesTibco Rendezvous

    What does it do?Various forms of distributed message delivery.Guaranteed Delivery, Broadcasting, etcUseful for heterogeneous distributed systems

  • What about the OS?Operating System Selection

  • Lots of OS choicesLinuxFreeBSDNetBSDOpenBSDOpenSolarisCommercial Unix

  • Whats Important?MaintainabilityUpgrade PathSecurity UpdatesBug FixesUsabilityDo your engineers like it?CostHardware Requirements(you dont need a commercial Unix anymore)

  • Features to look forMulti-processor Support64bit CapableMature Thread SupportVibrant User CommunitySupport for your devices

  • Hardware ChoicesBuilding LAMP Hardware

  • Commodity Hardware DiscussionConsistency vs. SpecializationConsistency reduces maintenance costsLess Burn-in testingFewer drivers to supportFewer OS variantsFewer types of security updates, upgradesIn Sort: Dont throw hardware at the problem.However, specialization may improve ROIPut the money where best needed

  • Commodity Hardware DiscussionWhat I do when planning for growth:Specialize in the beginningWhen cost is more importantAnd designs arent yet matureDesign for horizontal scalabilityPlan on machine-sized piecesWant to grow by just adding more boxesEventually settle on two or three machine types

  • In-House vs. ColocationAlmost no reason to stay in-house these days

    Colos keep getting cheaperLeased lines are still expensive

  • Beige-Box vs. Name BrandDetermine your Reqs ahead of timeTalk to your engineers FirstHow important is a support plan?Hardware will break, plan on itName Brand usually has fewer optionsWorks well if they have exactly what you needSeek a neutral technical advisorIn the end it should come down to cost

  • Disk Drive TechnologiesSCSIExpensiveBig (300GB)FastReliable

    IDECheapHuge (500GB!)SlowOn-board support, often w/ RAID0/1Use SCSI for PerformanceUse IDE for cluster nodesIDE w/ RAID for cheap speed

  • Disk Drive TechnologiesPATAImmature driversParticularly w/ OSSLinux has poor support Prices coming downUnnecessary addonsHot Swap not often needed, costs more

    SATATried and TestedObsoleteSATA is not SCSIThe fast SATAs cost as much as SCSISATA not quite there for servers

  • Disk Drive Technology: SpindlesNumber of SpindlesMore spindles can giveHigher ThroughputHigher ConcurrencyConcurrency is crucial for DatabasesReliabilityFailover drives, mirrors

  • Memory TechnologiesECCExpensive