38
GDC Tutorial, 2005. GDC Tutorial, 2005. Building Multi-Player Building Multi-Player Games Games Case Study: The Sims Online Case Study: The Sims Online Lessons Learned Lessons Learned , , Larry Mellon Larry Mellon

GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Embed Size (px)

Citation preview

Page 1: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

GDC Tutorial, 2005. GDC Tutorial, 2005. Building Multi-Player GamesBuilding Multi-Player Games

Case Study: The Sims OnlineCase Study: The Sims Online

Lessons LearnedLessons Learned, ,

Larry MellonLarry Mellon

Page 2: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TSO: Overview TSO: Overview

Initial team: little to no MMP experience Initial team: little to no MMP experience Engineering estimate: switching from 4-8 player peer Engineering estimate: switching from 4-8 player peer

to peer to MMP client/server would take no additional to peer to MMP client/server would take no additional development time!development time!

No code / architecture / tool support for No code / architecture / tool support for Long-term, continually changing nature of gameLong-term, continually changing nature of game Non-deterministic execution, dual platform (win32 / Linux)Non-deterministic execution, dual platform (win32 / Linux)

Overall process designed for single-player Overall process designed for single-player complexity, small development teamcomplexity, small development team Limited nightly builds, minimal daily testingLimited nightly builds, minimal daily testing Limited design reviews, limited scalability testing, no Limited design reviews, limited scalability testing, no

“maintainable/extensible” impl. requirement“maintainable/extensible” impl. requirement

Page 3: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TSO: Case Study OutlineTSO: Case Study Outline(Lessons Learned)(Lessons Learned)

Poorly designed SP Poorly designed SP MP MP MMP transitionsMMP transitionsScalingScaling

Team & code size, data set sizeTeam & code size, data set sizeBuild & distributionBuild & distribution

Architecture: logical & codeArchitecture: logical & codeVisibility: development & operationsVisibility: development & operationsTestability: development, release, loadTestability: development, release, load

Multi-Player, Non-determinismMulti-Player, Non-determinism

Persistent user data vs code/content updatesPersistent user data vs code/content updatesPatching / new content / custom contentPatching / new content / custom content

Page 4: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

ScalabilityScalability(Team Size & Code Size)(Team Size & Code Size)

What were the problemsWhat were the problems Side effect breaks & ability to work in parallelSide effect breaks & ability to work in parallel

Limited encapsulation + poor testability + non-determinism = Limited encapsulation + poor testability + non-determinism = TROUBLETROUBLE

Independent module design & impact on overall system Independent module design & impact on overall system (initially, no system architect)(initially, no system architect)

#include structure#include structure win32 / Linux, compile times, pre-compiled headers, ...win32 / Linux, compile times, pre-compiled headers, ...

What workedWhat worked Move to new architecture via Refactoring & Scaffolding Move to new architecture via Refactoring & Scaffolding

HSB, incSync, nullView Simulator, nullView client, …HSB, incSync, nullView Simulator, nullView client, … Rolling integrations: never darkRolling integrations: never dark Sandboxing & pumpkinsSandboxing & pumpkins

Page 5: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Scalability Scalability (Build & Distribution)(Build & Distribution)

To developers, customers & fielded serversTo developers, customers & fielded servers What didn’t work (well enough)What didn’t work (well enough)

Pulling builds from developer’s workstationsPulling builds from developer’s workstations Shell scripts & manual publicationShell scripts & manual publication

What worked wellWhat worked well Heavy automation with web trackingHeavy automation with web tracking

Repeatability, Speed, VisibilityRepeatability, Speed, Visibility Hierarchies of promotion & test Hierarchies of promotion & test

Page 6: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Scalability Scalability (Architecture)(Architecture)

Logical versus physical versus code structureLogical versus physical versus code structure Only physical was not a major, MAJOR issueOnly physical was not a major, MAJOR issue

Logical: Replicated computing vs client / serverLogical: Replicated computing vs client / server Security & stability implicationsSecurity & stability implications

Code: Client / server isolation & code sharingCode: Client / server isolation & code sharing Multiple, concurrent logic threads were sharing code&data, Multiple, concurrent logic threads were sharing code&data,

each impacting the otherseach impacting the others Nullview client & simulatorNullview client & simulator Regulators vs Protocols: bug counts & state machinesRegulators vs Protocols: bug counts & state machines

Page 7: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Go to final architecture ASAPGo to final architecture ASAP

ClientSim

ClientSim

ClientSim

ClientSim

Multiplayer:Multiplayer:

Here beSyncHell

Evolve

Client/Server:Client/Server:

Client

Sim

Client

Client

NiceUndemocratic

Request/Command

Page 8: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Evolve

Final Architecture ASAP:Final Architecture ASAP:Make Everything Smaller&SeparateMake Everything Smaller&Separate

Page 9: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Final Architecture ASAP:Final Architecture ASAP:Reduce Complexity of Branches Reduce Complexity of Branches

PacketArrival

If (client)

If (server)

#ifdef (nullview)

Shared Code

Client Event

Server Event

Client & server teams would constantly break each other via changes to shared state&code

More Packets!!

SharedState

Page 10: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Final Architecture ASAP:Final Architecture ASAP:“Refactoring”“Refactoring”

Decomposed into Multiple dll’s Decomposed into Multiple dll’s Found the SimulatorFound the Simulator

InterfacesInterfaces Reference CountingReference Counting Client/Server subclassingClient/Server subclassing

How it helped:–Reduced coupling. Even reduced compile times!–Developers in different modules broke each other less often.–We went everywhere and learned the code base.

How it helped:–Reduced coupling. Even reduced compile times!–Developers in different modules broke each other less often.–We went everywhere and learned the code base.

Page 11: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Final Architecture ASAP:Final Architecture ASAP:It Had to Always RunIt Had to Always Run

Initially clients wouldn’t behave predictablyInitially clients wouldn’t behave predictably We could not even play testWe could not even play test Game design was demoralizedGame design was demoralized

We needed a bridge, now!We needed a bridge, now!? ?

Page 12: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Final Architecture ASAP:Final Architecture ASAP:Incremental SyncIncremental Sync

A quick temporary solution…A quick temporary solution… Couldn’t wait for final system to be finishedCouldn’t wait for final system to be finished High overhead, couldn’t ship itHigh overhead, couldn’t ship it

We took partial state snapshots on the server We took partial state snapshots on the server and restored to them on the clientand restored to them on the client

How it helped:–Could finally see the game as it would be.

–Allowed parallel game design and coding

–Bought time to lay in the “right” stuff.

How it helped:–Could finally see the game as it would be.

–Allowed parallel game design and coding

–Bought time to lay in the “right” stuff.

Page 13: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Architecture: Architecture: ConclusionsConclusions

Keep it simple, stupid!Keep it simple, stupid! Client/serverClient/server

Keep it cleanKeep it clean DLL/module integration pointsDLL/module integration points #ifdef’s must die!#ifdef’s must die!

Keep it aliveKeep it alive Plan for a constant system architect role: review all Plan for a constant system architect role: review all

modules for impact on team, other modules & extensibilitymodules for impact on team, other modules & extensibility Expose & control all inter-process communicationExpose & control all inter-process communication

See Regulators: state machines that control transactionsSee Regulators: state machines that control transactions

Page 14: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TSO: Case Study OutlineTSO: Case Study Outline(Lessons Learned)(Lessons Learned)

Poorly designed SP Poorly designed SP MP MP MMP transitionsMMP transitionsScalingScaling

Team & code size, data set sizeTeam & code size, data set sizeBuild & distributionBuild & distribution

Architecture: logical & codeArchitecture: logical & codeVisibility: development & operationsVisibility: development & operationsTestability: development, release, loadTestability: development, release, load

Multi-Player, Non-determinismMulti-Player, Non-determinism

Persistent user data vs code/content updatesPersistent user data vs code/content updatesPatching / new content / custom contentPatching / new content / custom content

Page 15: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

VisibilityVisibility

ProblemsProblems Debugging a client/server issue was very slow & painfulDebugging a client/server issue was very slow & painful Knowing what to work on next was largely guessworkKnowing what to work on next was largely guesswork Reproducing system failures from live environmentReproducing system failures from live environment Knowing how one build or server cluster differed from Knowing how one build or server cluster differed from

another was again largely guessworkanother was again largely guesswork What we did that workedWhat we did that worked

Log / crash aggregators & filtersLog / crash aggregators & filters Live “critical event” monitorLive “critical event” monitor Esper: live player & engine metricsEsper: live player & engine metrics Repeatable load testingRepeatable load testing Web-based Dashboard: health, status, where is everythingWeb-based Dashboard: health, status, where is everything Fully automated build & publish proceduresFully automated build & publish procedures

Page 16: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Visibility via “Bread Crumbs”: Visibility via “Bread Crumbs”: Aggregated Instrumentation Flags Aggregated Instrumentation Flags

Trouble SpotsTrouble Spots

Server Crash

Page 17: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Quickly Find Trouble SpotsQuickly Find Trouble Spots

DB byte count oscillates out of control, server

crashes

Page 18: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Drill Down For DetailsDrill Down For Details

A single DB Request is

clearly at fault

Page 19: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TSO: Case Study OutlineTSO: Case Study Outline(Lessons Learned)(Lessons Learned)

Poorly designed SP Poorly designed SP MP MP MMP transitionsMMP transitionsScalingScaling

Team & code size, data set sizeTeam & code size, data set sizeBuild & distributionBuild & distribution

Architecture: logical & codeArchitecture: logical & codeVisibility: development & operationsVisibility: development & operationsTestability: development, release, loadTestability: development, release, load

Multi-Player, Non-determinismMulti-Player, Non-determinism

Persistent user data vs code/content updatesPersistent user data vs code/content updatesPatching / new content / custom contentPatching / new content / custom content

Page 20: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TestabilityTestability

Development, release, load: all show stopper Development, release, load: all show stopper problemsproblems

QA coordination / speed / costQA coordination / speed / cost Repeatablity, non-determinismRepeatablity, non-determinism Need for many, Need for many, manymany tests per day, each with tests per day, each with

multiple inputs (two to two thousand players multiple inputs (two to two thousand players per test)per test)

Page 21: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Testability: What WorkedTestability: What Worked Automated testing for repeatablity & scaleAutomated testing for repeatablity & scale

Scriptable test clients: mirrored actual user play sessionsScriptable test clients: mirrored actual user play sessions Changed the game’s architecture to increase testabilityChanged the game’s architecture to increase testability External test harnesses to control 50+ test clients per CPU, External test harnesses to control 50+ test clients per CPU,

4,000+ per session4,000+ per session Push-button UI to configure, run & analyze tests (developer & Push-button UI to configure, run & analyze tests (developer &

QA)QA) Constantly updated Baselines, with “Monkey Test” statsConstantly updated Baselines, with “Monkey Test” stats Pre-checkin regressionPre-checkin regression QA: web-driven state machine to control testers & QA: web-driven state machine to control testers &

collect/publish resultscollect/publish results What didn’t workWhat didn’t work

Event Recorders, unit testingEvent Recorders, unit testing Manual-only testingManual-only testing

Page 22: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

MMP Automated Testing: ApproachMMP Automated Testing: Approach

Push-button ability to run large-scale, repeatable testsPush-button ability to run large-scale, repeatable tests CostCost

Hardware / SoftwareHardware / Software Human resourcesHuman resources Process changesProcess changes

BenefitBenefit Accurate, repeatable Accurate, repeatable measurablemeasurable tests during development tests during development

and operationsand operations Stable software, faster, measurable progressStable software, faster, measurable progress Base key decisions on fact, not opinionBase key decisions on fact, not opinion

Page 23: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Why Spend The Time & Money?Why Spend The Time & Money?

System complexity, non-determinism, scaleSystem complexity, non-determinism, scale Tests provide hard data in a confusing sea of Tests provide hard data in a confusing sea of

possibilitiespossibilities End users: high Quality of Service barEnd users: high Quality of Service bar Dev team: greater comfort & confidence Dev team: greater comfort & confidence

Tools augment your team’s ability to do their jobsTools augment your team’s ability to do their jobs Find problems fasterFind problems faster Measure / change / measure: repeat as necessaryMeasure / change / measure: repeat as necessary

Production & executives: come to depend on this Production & executives: come to depend on this data to a high degreedata to a high degree

Page 24: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Scripted Test ClientsScripted Test Clients

Scripts are emulated play sessions: just Scripts are emulated play sessions: just like somebody plays the gamelike somebody plays the game Command stepsCommand steps: what the player does to the : what the player does to the

gamegame Validation stepsValidation steps: what the game should do : what the game should do

in response in response

Page 25: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Scripts TailoredScripts TailoredTo Each Test ApplicationTo Each Test Application

Unit testingUnit testing: 1 feature = 1 script: 1 feature = 1 script Load testingLoad testing: Representative play session: Representative play session

The average Joe, times thousandsThe average Joe, times thousands Shipping qualityShipping quality: corner cases, feature : corner cases, feature

completenesscompleteness IntegrationIntegration: test code changes for catastrophic : test code changes for catastrophic

failures failures

Page 26: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Test ClientTest Client Game Client Game Client

Scripted Players: ImplementationScripted Players: Implementation

Script Engine

State

Game GUI

Client-Side Game Logic

Commands

State

Presentation Layer

Page 27: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Process Shift:

TimeTargetLaunch

Amount of work done

ProjectStart

MMP Developer Efficiency

Strong test supportWeak test support

Not GoodEnough

Earlier Tools Investment Equals More Gain

Page 28: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Process Shifts: Automated Testing Process Shifts: Automated Testing Changes The Shape Of The Development Changes The Shape Of The Development

Progress CurveProgress Curve

Scale & Feature CompletenessScale & Feature Completeness

Keep Developers moving forward, not bailing waterKeep Developers moving forward, not bailing water

Stability (Code Base & Servers)Stability (Code Base & Servers)

Focus Developers on key, measurable roadblocksFocus Developers on key, measurable roadblocks

Page 29: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Process Shift: Measurable Targets, Process Shift: Measurable Targets, Projected Trend LinesProjected Trend Lines

Core FunctionalityTests, Any Feature

(e.g. # clients)

TargetComplete

Time

Any Time(e.g. Alpha)

First PassingTest

Now

Actionable progress metrics, early enough to reactActionable progress metrics, early enough to react

Page 30: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Process Shift: Load Testing Process Shift: Load Testing (Before Paying Customers Show Up)(Before Paying Customers Show Up)

Expose issues that only occur at scaleExpose issues that only occur at scale

Establish hardware requirementsEstablish hardware requirements

Establish play is acceptable @ scaleEstablish play is acceptable @ scale

Page 31: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Client-Server ComparisonClient-Server Comparison

Page 32: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

TSO: Case Study OutlineTSO: Case Study Outline(Lessons Learned)(Lessons Learned)

Poorly designed SP Poorly designed SP MP MP MMP transitionsMMP transitionsScalingScaling

Team & code size, data set sizeTeam & code size, data set sizeBuild & distributionBuild & distribution

Architecture: logical & codeArchitecture: logical & codeVisibility: development & operationsVisibility: development & operationsTestability: development, release, loadTestability: development, release, load

Multi-Player, Non-determinismMulti-Player, Non-determinism

Persistent user data vs code/content updatesPersistent user data vs code/content updatesPatching / new content / custom contentPatching / new content / custom content

Page 33: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

User DataUser Data

Oops!Oops! Users stored much more data (with much more variance) that Users stored much more data (with much more variance) that

we had planned forwe had planned for Caused many DB failures, city failuresCaused many DB failures, city failures BIG problem: their persistent data has to work, always, across all BIG problem: their persistent data has to work, always, across all

builds & DB instancesbuilds & DB instances What helpedWhat helped

Regression testing, each build, against live set of user dataRegression testing, each build, against live set of user data What would have helped moreWhat would have helped more

Sanity checks against the DBSanity checks against the DB Range checks against user dataRange checks against user data Better code & architecture support for validation of user dataBetter code & architecture support for validation of user data

Page 34: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Patching / New Content / Custom Patching / New Content / Custom ContentContent

Oops!Oops! Initial Patch budget of 1Meg blown in 1Initial Patch budget of 1Meg blown in 1stst week of week of

operationsoperations New Content required stronger, more predictable New Content required stronger, more predictable

processprocess Custom Content required infrastructure able to easily Custom Content required infrastructure able to easily

add new content, on the flyadd new content, on the fly Key Issue: Key Issue: all effort had gone into going Live, not all effort had gone into going Live, not

creating a sustainable process once Livecreating a sustainable process once Live Conclusion: designing these in would have been Conclusion: designing these in would have been

much easier than retrofitting…much easier than retrofitting…

Page 35: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Lessons LearnedLessons Learned

autoTest: autoTest: Scripted test clients and instrumented code rock!Scripted test clients and instrumented code rock! Collection, aggregation and display of test data is vital in making decisions on Collection, aggregation and display of test data is vital in making decisions on

a day to day basisa day to day basis Lessen the panicLessen the panic

Scale&Break is a very clarifying experienceScale&Break is a very clarifying experience Stable code&servers greatly ease the pain of building a MMP gameStable code&servers greatly ease the pain of building a MMP game Hard data (Hard data (notnot opinion) is both illuminating and calmingopinion) is both illuminating and calming

autoBuild: make it pushbutton with instant web visibilityautoBuild: make it pushbutton with instant web visibility Use early, use often to get bugs out before going liveUse early, use often to get bugs out before going live

Budget for a strong architect role & a strong design review process Budget for a strong architect role & a strong design review process for the entire game lifecyclefor the entire game lifecycle Scalability, testability, patching & new content & long-term persistence are Scalability, testability, patching & new content & long-term persistence are

requirements: MUCH cheaper to design in than frantic retrofittingrequirements: MUCH cheaper to design in than frantic retrofitting KISS principle is mandatory, as is expecting changesKISS principle is mandatory, as is expecting changes

Page 36: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Lessons LearnedLessons Learned

Visibility: tremendous volumes of data require automated Visibility: tremendous volumes of data require automated collection&summarizationcollection&summarization Provide drill-down access to details from summary view web pagesProvide drill-down access to details from summary view web pages

Get some people on board who’ve been burned before: a lot of Get some people on board who’ve been burned before: a lot of TSO’s pain could have been easily avoided, but little distributed TSO’s pain could have been easily avoided, but little distributed system experience & MMP design issues existed in early phases of system experience & MMP design issues existed in early phases of projectproject

Fred Brooks, the 31Fred Brooks, the 31stst programmer programmer Strong tools & process pays off for large teams & long-term operations Strong tools & process pays off for large teams & long-term operations Measure & improve your workspace, constantlyMeasure & improve your workspace, constantly

Non-determinism is painful & unavoidableNon-determinism is painful & unavoidable Minimize impact via explicit design support & use strong, constant Minimize impact via explicit design support & use strong, constant

calibration to understand itcalibration to understand it

Page 37: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Biggest WinsBiggest WinsBiggest WinsBiggest Wins

Code Isolation

Scaffolding

Tools: Build / Test / Measure, Information Management

Pre-Checkin Regression / Load Testing

Page 38: GDC Tutorial, 2005. Building Multi-Player Games Case Study: The Sims Online Lessons Learned, Larry Mellon

Biggest LossesBiggest LossesBiggest LossesBiggest Losses

Architecture: Massively peer to peer

Early lack of tools

#ifdef across platform / function

“Critical Path” dependencies

More Details: www.maggotranch.com/MMP (3 TSO Lessons Learned talks)