Upload
jennifer-davis
View
298
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Yahoo! Service Engineers (SE) specialize in bridging the gap between system administration and development. SEs are tasked with delivering a reliable, consistent quality service through the use of best practices. They must understand network, OS, hardware, and customer use cases; and dive deep into the application internals. In this talk, Jennifer will describe her journey with the Sherpa service at Yahoo! and lessons learned about building a reliable, consistent, and high-quality service from scratch. The key takeaway from this talk will be to educate practitioners on successful strategies and pitfalls when building out a service.
Citation preview
Bu i l d i ng L a r ge S ca l e Se r v i c e s
PRESENTED BY Jennifer Davis⎪ November 8, 2013
Twitter: @sigje Email: [email protected]
SysAdmin Controls all the things
11/11/13 3
Shared Dependencies
11/11/13 4
The Reality…
11/11/13 5
The Dream…
11/11/13 6
How?
Define Core Principles
11/11/13 8
§ Common › CollaboraGon across teams, companies, industry, define standards
› Incident, Problem, Change, Config, Release management
§ DisGnct › Specifics to an applicaGon or service › Availability, Service, Business ConGnuity, Capacity
Kill the Myths
11/11/13 9
§ Stupid User
Kill the Myths
11/11/13 10
§ Stupid User § System Admin == Operator
11/11/13 11
Failing Gracefully
puppet
ruby
SKILLS
perl
nosql
operability security
mysql
unix
TCP/IP
bash
CHEF
11/11/13 12
Kill the Myths
11/11/13 13
§ Stupid User § System Admin == Operator § Words have a common universal implicit meaning
11/11/13 14
Learn to Modulate your Message
11/11/13 15
11/11/13 16
Team
Manager Customer
Team
11/11/13 17
§ People working towards common goal. § Different roles. § Different views. § Same objecGves.
11/11/13 18
Image Credit: Kyle LaGno
Team
11/11/13 19
Sugges/on: Don’t talk about the “devs” request, talk about Elaine’s request.
Team
11/11/13 20
Sugges/on: Don’t talk about the “devs” request, talk about Elaine’s request. Sugges/on: Verify that your team has the same vision.
Understand the vision.
11/11/13 21
§ Are there other opGons, open source or not within the company? § Are there other opGons outside the company? § Is EVERYONE on the same page about what the service is?
Vision Statement
11/11/13 22
§ Clear statement about the problem that the service is solving. › DirecGon › IdenGty management › Team cohesion
New product? Be part of creaGng that vision!
Sherpa’s Vision
11/11/13 23
.. Distributed replicated eventually consistent key value store that had a focus on scalability ..
My Job
11/11/13 24
§ Examine soaware § Define risk § Communicate cost of risks § MiGgate risks § IdenGfy events § Manage events
Fragile Platforms are Bad.
11/11/13 25
Change is inevitable
11/11/13 26
§ Products pivot based on needs. § Requirements change and evolve. § Know core issues.
Know Core Issues
11/11/13 27
§ Limit the scope of focus.
Know Core Issues
11/11/13 28
§ Limit the scope of focus. § Focus on the biggest prioriGes.
Know Core Issues
11/11/13 29
§ Limit the scope of focus. § Focus on the biggest prioriGes.
› Understand Development Methodology: Waterfall, Scrum, ?
Know Core Issues
11/11/13 30
§ Limit the scope of focus. § Focus on the biggest prioriGes.
› Understand Development Methodology: Waterfall, Scrum, ? › IdenGfy the key “Gme” elements.
Know Core Issues
11/11/13 31
§ Limit the scope of focus. § Focus on the biggest prioriGes.
› Understand Development Methodology: Waterfall, Scrum, ? › IdenGfy the key “Gme” elements. › Talk to them. IdenGfy their key terms. “Enhancements”, “Defects”
Know Core Issues
11/11/13 32
§ Limit the scope of focus. § Focus on the biggest prioriGes.
› Understand Development Methodology: Waterfall, Scrum, ? › IdenGfy the key “Gme” elements. › Talk to them. IdenGfy their key terms. “Enhancements”, “Defects” › Establish the “Top” list.
Create checklists
11/11/13 33
§ Not because people are dumb. § Not only because of automaGon. § When things break, knowing what needs focus. § During normal maintenance, can idenGfy “not OK”.
› Audit checklists for deployment through staging environment.
Know Outputs
11/11/13 34
§ IdenGfy components. § Well defined protocols between components. § Expected Inputs. § Expected Outputs.
11/11/13 35
11/11/13 36
11/11/13 37
11/11/13 38
11/11/13 39
Know State Transitions Explicitly.
11/11/13 40
§ When component is installed but not ready
Know State Transitions Explicitly.
11/11/13 41
§ When component is installed but not ready § When the colo is going away § Go through What If Scenarios.
› Document them.
Know choke points explicitly.
11/11/13 42
§ Memory § Disk § Bandwidth
Now and in 6 months. JIT?
Failure will happen.
11/11/13 43
§ There are no 0 failure systems. § “Give me the brain” documentaGon so that anyone can be the brain. § Repeatable/Reliable failure handling. § Run fire drills. Really.
11/11/13 44
System Administration is Gardening.
11/11/13 45
§ No guarantee of resources. § Only guarantee is change.
System Administration is Gardening.
11/11/13 46
§ Nurture relaGonships. › Be authenGc. › Be trusGng and trustworthy. › Have integrity.
Success At Scale is Collaboration & Cooperation across Teams.
Decreasing Value
11/11/13 48
11/11/13 49
0
2
4
6
8
Jan Apr Jul Oct
# of Support Engineers
# of Support Engineers
11/11/13 50
0
1
2
3
4
5
6
Jan Apr Jul Oct
# of Support Engineers
# of Support Engineers
11/11/13 51
Documentation is not the cure.
11/11/13 52
§ DocumentaGon doesn’t guarantee understanding. › OperaGons Sandbox Environment
§ Don’t spend Gme at the end documenGng.
53 11/11/13
Summary
Be Expendable. Feed your brain.
11/11/13 55
Acknowledgements
11/11/13 56
• hkp://www.flickr.com/photos/levork • hkp://www.flickr.com/photos/puggles • hkp://www.flickr.com/photos/byteorder • hkp://www.flickr.com/photos/egoant • hkp://www.flickr.com/photos/happymonkey • Kyle LaGno • Greg Connor