Upload
neueda
View
22.424
Download
0
Embed Size (px)
Citation preview
Architecture:Surviving the High Load
.
пятница, 6 мая 2011 г.
Who we are ?Alexander ChinaryovLead Platform DeveloperSince 2007
Alexander HristoforovLead Platform DeveloperSince 2009
Oleg AnastasyevLead Platform DeveloperSince 2007
пятница, 6 мая 2011 г.
Load : some facts2,8M users online
150k pages/s, 50ms avg32Gbit/s out
• 4000 Messages/s (> # twits)• 160k Photo downloads/s• 500 Comments/s• 90 000 notifications/s• 1500/s feed posts, 30k/s gets
пятница, 6 мая 2011 г.
Load: handled by• 3 Datacenters• 2400 servers & storages (and counting)
• 1.5M SLOC (99.9% java, 0.1%C)• 60 modules
• 40 devs + 8 testers• 20 admins
пятница, 6 мая 2011 г.
Arch: layers• 150+50 webs• 120 app srvs• 25 kinds
business services• 6 SSO• >100 caches• 230 SQLs• >400 noSQL
пятница, 6 мая 2011 г.
Load: Balance• LVS• One-cluster–Weighted RR– Pluggable Failure detectors– Integrated with one-remote-service– Locality groups
пятница, 6 мая 2011 г.
Arch: Presentation• Apache Tomcat 6• RDK framework:– GUI components– Independant portlets– AJAX update → no full page– No javascript required
• Google Web Toolkit for Dynamics– Toolbar, Photo pins, gifts
• Flash (Apps, players, ads)пятница, 6 мая 2011 г.
Arch: Business Logic• Odnoklassniki-ejb– JBoss 4.2– JTA, Stateless, Entity beans (BMP)– Business Op handling & orchestration– Event/handler pattern– Component logic– Data partitioning– Spring (DI)
пятница, 6 мая 2011 г.
Arch: Business Srvcs• IM, discussions, feeds– JBoss Remoting 2.2– One remote service– 100k+ req/sec on recent 8 core CPU
/** * Ex. of Remote server */public interface Server extends RemoteService{ @RemoteMethod IListChunk<Friend> getFreshMyFriends(@PartitionSource long userId, IChunkProperties cp);
@RemoteMethod(invokeAll=true,split=true,reduceStrategy=ListReduceStrategy.class) List<?> mapReduceMethod(@PartitionSource long userId, ... );
@RemoteMethod(invokeAll=true,asyncMaxDelay=1000L,asyncMaxBatch=100) void asyncNotify(@PartitionSource long userId, ... );}
пятница, 6 мая 2011 г.
Arch: Caches• one-graph– Social graph storage– 30Gb, 17K ops/server 7%CPU …
• Odnoklassniki-cache– users, groups, photos,sessions...– Smart– Off heap (Unsafe) → no FGC
• Near cache
пятница, 6 мая 2011 г.
Arch: Persistance• MS SQL 2005– High Consistency– Flexible queries
• NoSQL: one-db– Berkley 4.5 C edition +– JBoss remoting based server +– Simple querying = – noSQL storage server
• … and others are in research
пятница, 6 мая 2011 г.
Concept: DB Partitioning• DB scaling is hard & expensive• Vertical• Horizontal• ID:– long ID = uid << 8 + domain– Domain = 0..255– Domain → servers map
пятница, 6 мая 2011 г.
Perf : SQL DB• XA → local TA only• Dirty reads• DB JOIN → app server memory• FK, SP, Triggers• DELETE :– No delete/insert workflow → update– Async batch process, retry
• Indexes, clustered indexes
пятница, 6 мая 2011 г.
Perf: general• Seq Access speed:
– RAM 10x > SSD 1.5x > 1Gbit eth comm 2x > disk
• Random Access speed:– RAM 20000x (~50ns) > SSD 5-10x > disk (~5ms)– Net roundtrip ~ 0.5 ms
• So:– Near data/cache – fastest solution ( cache coherence problem )– Partitioned network cache – Database access is the slowest thing
• Still you have to sacrifice consistency
пятница, 6 мая 2011 г.
Surviving : GC• Young GC → high CPU load
– Too much garbage (autoboxing, overlooked log.debug,...)– FIX: find and fix code → can take weeks
• Old GC → pauses → carousel– 2-4Gb is limit for ParallelGC ( 1-4 secs )– 8-10 Gb is limit for CMS
• and it still can stop the world!– FIX: use Unsafe (offheap memory) or partition
• Perm GC → pauses → carousel again– Too much .classes – FIX: +CMSClassUnloadingEnable
пятница, 6 мая 2011 г.
Surviving: failures• SQL partition failure– FIX: fault tolerance: read incomplete, write
fail• One-db– Non stable replication → no fix :-(– Data corruption → separate ids storage– Random disk access → SSD, tmpfs
пятница, 6 мая 2011 г.
Surviving: carousel• Reasons:
– Net problems– Unusual activity, spammers– Full GCs– Cold caches– Unexpected slowdowns, bugs– Activity growth
• Fixes:– Timeout = 3s– Client side automatic fail detectors, server cutout– Gatekeepers
пятница, 6 мая 2011 г.
Surviving: gatekeepers• Fine grain func switches• Used for:
– Fighting with carousel– Smooth new functions launch– Experiments
• Can:– Turn on/off specific func, individual 3rd party games– On per server basis– On per user domain
пятница, 6 мая 2011 г.
Surviving: measure!• One-log statistics
пятница, 6 мая 2011 г.
Test yourself ;-)• PhotoMarks table
– 32p x (500M rows, 42 Gb data + 25 Gb index)– Load (photoId, userId): 14kops, create: 1500kpos – Most load calls are check for row absence
• Rejected apriori– Add more SQL nodes – too expensive– Place all marks to cache – 2600Gb RAM is not cheap as well
PhotoId:long UserId:long Mark:byte timestamp
пятница, 6 мая 2011 г.