44
Planning for Planning for performance performance For web developer, open discussion For web developer, open discussion Tin@Beijing Open Party Tin@Beijing Open Party

Planning For High Performance Web Application

Embed Size (px)

DESCRIPTION

This slide is prepared for Beijing Open Party (a monthly unconference in Beijing China). And it's covered some important points when you are building a scalable web sites. And few page of this slide is in Chinese.

Citation preview

Page 1: Planning For High Performance Web Application

Planning for Planning for performanceperformanceFor web developer, open discussionFor web developer, open discussion

Tin@Beijing Open PartyTin@Beijing Open Party

Page 2: Planning For High Performance Web Application

师不必强于己师不必强于己己不必不如师己不必不如师

Page 3: Planning For High Performance Web Application

AgendaAgenda

Basic programming practiceBasic programming practice

Hardware platformHardware platform

Software platformSoftware platform

System essentialsSystem essentials

OptimizationsOptimizations

Load BalancingLoad Balancing

Page 4: Planning For High Performance Web Application

Basic practicesBasic practices

Use proper SCMUse proper SCM

CVSCVS

SVNSVN

MercurialMercurial

GitGit

Page 5: Planning For High Performance Web Application

Basic practicesBasic practices

Use a auto-build systemUse a auto-build system

Shell scripsShell scrips

MakeMake

Ant, NantAnt, Nant

RakeRake

Page 6: Planning For High Performance Web Application

Basic practicesBasic practicesUse a Continues Integration toolUse a Continues Integration tool

So first you need a lot of testsSo first you need a lot of tests

Add auto test, compile job as daily taskAdd auto test, compile job as daily task

Use CI tools to monitor health of your code Use CI tools to monitor health of your code basebase

CruiseControl, Luntbuild, Continnum, HudsonCruiseControl, Luntbuild, Continnum, Hudson

Cruise, Teamcity, BanbooCruise, Teamcity, Banboo

Use cc-tray, cc-menu desktop widgetUse cc-tray, cc-menu desktop widget

Page 7: Planning For High Performance Web Application

Basic practicesBasic practices

Use a issue trackerUse a issue tracker

Trac (only svn)Trac (only svn)

Bugzilla, Mantis Bug TrackerBugzilla, Mantis Bug Tracker

JiraJira

MingleMingle

BugFreeBugFree

Page 8: Planning For High Performance Web Application

Voice from twitterVoice from twitter

一定要测试!一定要早点测试!一定要早点测试!否一定要测试!一定要早点测试!一定要早点测试!否则你就死定了。则你就死定了。

对任何部分都要测试。对任何部分都要测试。

性能测试要交给用户来做。那样才有意义。所以要做性能测试要交给用户来做。那样才有意义。所以要做好好 loglog。。

Page 9: Planning For High Performance Web Application

Basic practicesBasic practices

Lifecycle controlLifecycle control

Develop -> Test -> DeployDevelop -> Test -> Deploy

Release managementRelease management

Trunk, Branch, TagTrunk, Branch, Tag

Milestone, Release candicateMilestone, Release candicate

Page 10: Planning For High Performance Web Application

Basic practicesBasic practices

Use Agile methodologiesUse Agile methodologies

XP practicesXP practices

TDDTDD

Pair programmingPair programming

ScrumScrum

Hybrid agileHybrid agile

Page 11: Planning For High Performance Web Application

Hardware platformHardware platform

Use economical hardwareUse economical hardware

CPU and MemoryCPU and Memory

Disk and disk I/O (Raid)Disk and disk I/O (Raid)

NICNIC

Power and fanPower and fan

1U 2U 3U 4U ?1U 2U 3U 4U ?

Page 12: Planning For High Performance Web Application

Hardware platformHardware platformBrandBrand

Dell, IBM, HP, Lenovo, Asus?Dell, IBM, HP, Lenovo, Asus?

Service qualityService quality

Hardware redundancyHardware redundancy

Part redundancyPart redundancy

Availability and Lead Time (critical parts)Availability and Lead Time (critical parts)

Capacity redundancyCapacity redundancy

Future plan?Future plan?

Page 13: Planning For High Performance Web Application

Network & hostingNetwork & hostingVPS, VPS, 虚拟主机虚拟主机

Co-Located Hardware (colo), Co-Located Hardware (colo), 主机托管主机托管

Bandwidth, Duel lines, air-conditionBandwidth, Duel lines, air-condition

Geo-locationGeo-location

Self-HostingSelf-Hosting

How to choose network hardware How to choose network hardware (switch/router)?(switch/router)?

Cisco, Huaway, FoundryCisco, Huaway, Foundry

Page 14: Planning For High Performance Web Application

Software platformSoftware platform

Use pre-compiled OS and softwareUse pre-compiled OS and software

Choose a OSChoose a OS

CentOS, Redhat, SuseCentOS, Redhat, Suse

FreebsdFreebsd

SolarisSolaris

no ubuntu server (from nicholas ding)no ubuntu server (from nicholas ding)

Page 15: Planning For High Performance Web Application

Software platformSoftware platform

Choose a language (scriptiing language is Choose a language (scriptiing language is better)better)

PHPPHP

PythonPython

PerlPerl

RubyRuby

JavaJava

Many many many... but not c...Many many many... but not c...

Page 16: Planning For High Performance Web Application

Software platformSoftware platform

Choose a database ( or data provider)Choose a database ( or data provider)

MysqlMysql

PosgresqlPosgresql

Big table implementation?Big table implementation?

Page 17: Planning For High Performance Web Application

Now, let’s goNow, let’s go

Page 18: Planning For High Performance Web Application

System essentials System essentials

Web serverWeb server

ApacheApache

LighthttpdLighthttpd

NginxNginx

Tux, Cherokee, LightspeedTux, Cherokee, Lightspeed

Tomcat, JettyTomcat, Jetty

Mongrel, ThinMongrel, Thin

Page 19: Planning For High Performance Web Application

System essentials System essentials

Different deployment style (python/ruby)Different deployment style (python/ruby)

Apache + mod_python (mod_rails, Apache + mod_python (mod_rails, passenger)passenger)

Fastcgi, SCGI, CGIFastcgi, SCGI, CGI

Proxy (Load balancing) + Multi-server Proxy (Load balancing) + Multi-server instanceinstance

thread? process?thread? process?

Page 20: Planning For High Performance Web Application

System essentialsSystem essentials

Monitoring your systemMonitoring your system

web server logsweb server logs

Webalizer, Report MagicWebalizer, Report Magic

Beacon (seperate static file server tracker)Beacon (seperate static file server tracker)

error log analysiserror log analysis

AWStats & Google AnalyticsAWStats & Google Analytics

Page 21: Planning For High Performance Web Application

System essentialsSystem essentialsMonitoring your systemMonitoring your system

Monit (RubyWorks use runit)Monit (RubyWorks use runit)

Monitoring process statusMonitoring process status

Auto restart your important processAuto restart your important process

Better than cron for monitoringBetter than cron for monitoring

Munin & NagiosMunin & Nagios

Distributed monitoring all of your systemDistributed monitoring all of your system

Administrator’s eyes, developers friendsAdministrator’s eyes, developers friends

Page 22: Planning For High Performance Web Application

System essentialsSystem essentialsMunin & Nagios continuesMunin & Nagios continues

Munin has server and nodes, it generate sites Munin has server and nodes, it generate sites to report the statistics of your server (in to report the statistics of your server (in interval)interval)

Munin and Nagios and integrateMunin and Nagios and integrate

Mem usage, CPU, process, disk usageMem usage, CPU, process, disk usage

Service: HTTP, SMTP, POP3, NNTP, PingService: HTTP, SMTP, POP3, NNTP, Ping

Hardware temperature and other datasHardware temperature and other datas

Network statisticsNetwork statistics

Custom scrips (plugins): db related, user Custom scrips (plugins): db related, user numbernumber

Page 23: Planning For High Performance Web Application

System essentialsSystem essentialsProtect your system (Protect your system (Management is important than Management is important than

toolstools))

SSH brute attack protectionSSH brute attack protection

ssh key loginssh key login

blockhost (scripts + pf/iptables)blockhost (scripts + pf/iptables)

Audit: SELinux...Audit: SELinux...

Firewall (port block and audit) Firewall (port block and audit)

Use safe OS? (Netbsd, freebsd)Use safe OS? (Netbsd, freebsd)

Network safety (but no hardware firewall for Network safety (but no hardware firewall for websites)websites)

Page 24: Planning For High Performance Web Application

System essentials System essentials SNA (Share Nothing Architecture) (This is relative SNA (Share Nothing Architecture) (This is relative term)term)

All static file and rsyncAll static file and rsync

Database centric SNADatabase centric SNA

Memcached + db-persistenceMemcached + db-persistence

Server hash, cluster, partitionServer hash, cluster, partition

Amazon/Blogger/Cragslist/Facebook/Google/Amazon/Blogger/Cragslist/Facebook/Google/LiveJournal/Slashdot/Wikipedia/Yahoo/YouTubeLiveJournal/Slashdot/Wikipedia/Yahoo/YouTube

Session stickySession sticky

Page 25: Planning For High Performance Web Application

System essentialsSystem essentials

Make your modules independentMake your modules independent

Layers, packagesLayers, packages

Easy to replace moduleEasy to replace module

Easy to deployEasy to deploy

Easy to profile and make improvesEasy to profile and make improves

Page 26: Planning For High Performance Web Application

OptimizationsOptimizationsSplit your static content and dynamic content Split your static content and dynamic content serverserver

Use lightweight web server to server static Use lightweight web server to server static contentscontents

Use different domain to different serverUse different domain to different server

CachingCaching

MemcachedMemcached

Query result, domain objects, sessionsQuery result, domain objects, sessions

Page tiles, template tilesPage tiles, template tiles

Everything that you needEverything that you need

Page 27: Planning For High Performance Web Application

OptimizationsOptimizationsCachingCaching

Optimize your code (lazy evaluate, cache result)Optimize your code (lazy evaluate, cache result)

Cache and asynchronous update (cron update)Cache and asynchronous update (cron update)

目标,命中率目标,命中率 90%90%以上!以上! Target 90%+Target 90%+

But cache invalidation is a critical problem!But cache invalidation is a critical problem!

Asynchronous messaging make sure cache Asynchronous messaging make sure cache validatevalidate

No blocking!No blocking!

ActiveMQ, RabbitMQ, Drb (for ruby)ActiveMQ, RabbitMQ, Drb (for ruby)

Page 28: Planning For High Performance Web Application

OptimizationsOptimizationsCachingCaching

Better client side cachingBetter client side caching

Use expired header: max-age, expiredUse expired header: max-age, expired

E-tag? (Not recommended, IE doesn’t support E-tag? (Not recommended, IE doesn’t support it)it)

Use HEAD method and 301 to detect changes Use HEAD method and 301 to detect changes (for squid or other proxy scenarios)(for squid or other proxy scenarios)

Compress (contact js, css)Compress (contact js, css)

Page 29: Planning For High Performance Web Application

OptimizationsOptimizationsSQL optimizationsSQL optimizations

Add index (especially the column in where Add index (especially the column in where closure)closure)

De-normalized SQLDe-normalized SQL

Useful redundancy (use duplication avoid join)Useful redundancy (use duplication avoid join)

Don’t relay on ORM. No matter Don’t relay on ORM. No matter Data-mapper/Active Record/Unit Of WorkData-mapper/Active Record/Unit Of Work

Don’t use full-text searchDon’t use full-text search

Use seperate search engine module (lucene)Use seperate search engine module (lucene)

Page 30: Planning For High Performance Web Application

OptimizationsOptimizations

Choose proper database store engineChoose proper database store engine

Mysql: MyISAM? InnoDB? BDB? Heap?Mysql: MyISAM? InnoDB? BDB? Heap?

AcceleratorAccelerator

PHP: APC, Zend Optimizer, XCache, PHP: APC, Zend Optimizer, XCache, eAccelerator, ionCube PHP Accelerator, Turck eAccelerator, ionCube PHP Accelerator, Turck MMCacheMMCache

Python: psycoPython: psyco

Ruby: Joyent acceleratorRuby: Joyent accelerator

Page 31: Planning For High Performance Web Application

But most important thing:But most important thing:

Find out the bottle neck Find out the bottle neck before you start to before you start to optimize your optimize your application.application.

Page 32: Planning For High Performance Web Application

Next,Next, Scaling, Scaling,

If time is enoughIf time is enough

Page 33: Planning For High Performance Web Application

What is scaling?What is scaling?

Three basics, Three basics, 简单特性简单特性 ::

能够使用率的提高能够使用率的提高 , Useable capacity increasing, Useable capacity increasing

能够容纳数据集提高,能够容纳数据集提高, Data capacity increasingData capacity increasing

系统可维护,系统可维护, MaintainableMaintainable

Page 34: Planning For High Performance Web Application

Scaling, 2 waysScaling, 2 waysVertical ScalingVertical Scaling

Upgrade your hardware systemUpgrade your hardware system

More CPU, memory ....More CPU, memory ....

Horizontal ScalingHorizontal Scaling

Buy more same hardware, deploy more server Buy more same hardware, deploy more server instanceinstance

Distributed your systemDistributed your system

But this way need you modify your code But this way need you modify your code (generally)(generally)

Page 35: Planning For High Performance Web Application

Scaling-Load BalancingScaling-Load BalancingDNS-GSLBDNS-GSLB

Use DNS’s round-robin algorithm randomize IP Use DNS’s round-robin algorithm randomize IP resultresult

xBayDNSxBayDNS

Can’t deal with failure (TTL)Can’t deal with failure (TTL)

Hard to do accurate managementHard to do accurate management

CDN content delivery networkCDN content delivery network

transparent service provide by some companytransparent service provide by some company

expansive, and not suitable for dynamic content expansive, and not suitable for dynamic content

Page 36: Planning For High Performance Web Application

Scaling-Load BalancingScaling-Load Balancing

Hardware LBHardware LB

Citrix: Netscalers, Foundry: ServerIron, F5 (4-Citrix: Netscalers, Foundry: ServerIron, F5 (4-7)7)

ExpensiveExpensive

Software LBSoftware LB

Perlbal (4), Pound (7)Perlbal (4), Pound (7)

LVS (4)LVS (4)

Page 37: Planning For High Performance Web Application

Scaling-Load BalancingScaling-Load BalancingLayer2, Layer4 and Layer7 LBLayer2, Layer4 and Layer7 LB

Layer 2: Link aggregation, provide Layer 2: Link aggregation, provide redundancy and fault tolerance, improve redundancy and fault tolerance, improve access speedaccess speed

Layer 4: round-robin on TCP (with port info)Layer 4: round-robin on TCP (with port info)

Layer 7Layer 7

Session sticky enalbedSession sticky enalbed

Easy to write complicate hash logicEasy to write complicate hash logic

Good for Squid (Squid cluster enabled)Good for Squid (Squid cluster enabled)

Page 38: Planning For High Performance Web Application

Scaling-Load BalancingScaling-Load BalancingHuge Scale LBHuge Scale LB

GSLB -> DNS round robinGSLB -> DNS round robin

Virtual IP -> L4 or L7 LB (SNAT)Virtual IP -> L4 or L7 LB (SNAT)

ExampleExample

Level 1 LB use GSLB give geo-located DNS Level 1 LB use GSLB give geo-located DNS resultresult

VIP is dispatched by F5VIP is dispatched by F5

F5 -> Squid, reverse proxyF5 -> Squid, reverse proxy

Squid delegate real dynamic or static serverSquid delegate real dynamic or static server

Page 39: Planning For High Performance Web Application

Scaling-Proxy CacheScaling-Proxy Cache

Reverse proxyReverse proxy

SquidSquid

Use http head method to validate contentUse http head method to validate content

Use memory to cache content - light speedUse memory to cache content - light speed

Mature, fast, industry standardMature, fast, industry standard

Page 40: Planning For High Performance Web Application

Scaling-DatabaseScaling-Database

Scaling MySQLScaling MySQL

MySQL replication/duplication (Failure, Lag)MySQL replication/duplication (Failure, Lag)

Master/SlaveMaster/Slave

Tree replicationTree replication

Data partitionData partition

MySQL proxyMySQL proxy

Data shardData shard

Page 41: Planning For High Performance Web Application

Scaling-File SystemScaling-File SystemSingle Disk (Array)Single Disk (Array)

Raid 1, Raid 0, Raid5Raid 1, Raid 0, Raid5

Partition table type (GPT, MBR)Partition table type (GPT, MBR)

Partition Format (ext2, ext3, resierfs, XFS, ZFS)Partition Format (ext2, ext3, resierfs, XFS, ZFS)

ClusterCluster

Single Disk has limitation, but Cluster has no Single Disk has limitation, but Cluster has no limitlimit

NetApp Filer (NAS - Network-attached storage)NetApp Filer (NAS - Network-attached storage)

Many many choicesMany many choices

Page 42: Planning For High Performance Web Application

Scaling-File System Scaling-File System SharingSharing

Hardware based sharing NAS (previous page)Hardware based sharing NAS (previous page)

NFS - most simple way to share FSNFS - most simple way to share FS

Samba - almost same with NFS, nice to trySamba - almost same with NFS, nice to try

MogileFS (for web, no cursor based random MogileFS (for web, no cursor based random access)access)

GFS, Hadoop FS (chunk based)GFS, Hadoop FS (chunk based)

Page 43: Planning For High Performance Web Application

We are coming a long way, We are coming a long way, babybaby

Page 44: Planning For High Performance Web Application

Thanks!