40
Building the perfect PHP app for the enterprise Episode 3: Resolving problems & high availability Clark Everetts September 28, 2016

Resolving problems & high availability

Embed Size (px)

Citation preview

Page 1: Resolving problems & high availability

Building the perfect PHP app for the enterprise

Episode 3: Resolving problems & high availabilityClark EverettsSeptember 28, 2016

Page 2: Resolving problems & high availability

2

Series overviewNow: Resolving problems and high availability

October 13: Optimizing performance (revised date)Keep users on your site by learning how to use background jobs and caching, measure performance, and make data-driven decisions.

Page 3: Resolving problems & high availability

Clark EverettsProfessional servicesRogue Wave Software

Page 4: Resolving problems & high availability

4

Agenda

1. How’s your reputation?2. Monitoring: Know you have a

problem3. Fault diagnosis / Root cause analysis4. Optimizing scale: Cluster

management5. Synchronizing session data6. Conclusion7. Q&A

Page 5: Resolving problems & high availability

How’s your reputation?

Page 6: Resolving problems & high availability

6

The cost of a bad rep

Complexity

ScaleROI

DIY

Ideal enterprise

Volume scales beyond servers

Performance degradation

Administrative costs

Not so good reputation• Page delays• Application

downtime

Good reputation• Responsive under load• Application availability

Page 7: Resolving problems & high availability

Monitoring:Know you have a problem

Page 8: Resolving problems & high availability

8

Potential faults

“Issues are discussing, problems are for solving.”- Me

FatalPHP errors

Out of memory

Failed database queries or updates

Network connectivity (no connection)

Application

Non-fatalPHP notices, warnings

Slow functions or request executions

High memory consumption

Network (degraded)

Application logic

Page 9: Resolving problems & high availability

9

The problem with problem resolution

• Most problem resolution time is spent identifying root cause• Problem reproduction is often difficult and time-consuming• Many possible sources: server load, input data, database state, etc.

Page 10: Resolving problems & high availability

10

Problem identification• Do you know you have a problem?

– Your phone is ringing?– Getting emails?– Monitoring tools or services?

• Is a problem brewing that customers don’t see … yet?

Analyze information

• Debugging• Logging (files, events

database, application level logs)

Recreateproblem

With enough relevant information:• Reproduce in order to

troubleshoot and verify a fix

• Can we identify the cause without having to reproduce?

Gather information

• What information can you collect?

Page 11: Resolving problems & high availability

11

Monitoring for faults• Scan log files (not manually!)

– Web server access and error logs– PHP error log (php.log)– Application-specific logs (filesystem, database)

• Don’t log noise• Avoid logging to php.log• Zend\Log, Monolog, error_log()

• Event-based monitoring– Recorded in event database, visible in UI, accessible via API– Optional automatic notification via email alerts– Optional callback URIs for integration with other monitoring

tools

Page 12: Resolving problems & high availability

12

Monitoring events

Page 13: Resolving problems & high availability

13

Event rules

Page 14: Resolving problems & high availability

14

Sampling of event types• Custom event• Database error• Function error• High memory usage• Inconsistent output size• Job execution delay/error• Job logical failure• PHP error• Slow function execution• Slow query execution• Slow request execution• Zend Framework exception

Page 15: Resolving problems & high availability

15

Example

Results:Users never experienced a problem

Development team solidified “trust factor” with management

Requirements:Stale data is unusable data“Soft” performance criteria

(user’s say when “good enough”)

Problem: New feature of internal application suffered slow performance due to large database result sets from

complex queries.

Challenge:Prior to rollout, isolate which queries

were experiencing the slowest response times, make improvements,

& cache results if possible

Used:Zend Server Monitoring,

IBM i DB2 index analyzer, and Zend Server Data Cache

Page 16: Resolving problems & high availability

Poll #1How do you discover problems in your applications?- Notified by a person (phone call, email, cubicle visit)- Notified by an in-house automated tool- Notified by commercial automated tool (Zend, New

Relic)

Page 17: Resolving problems & high availability

Fault diagnosis /Root cause analysis

Page 18: Resolving problems & high availability

18

Root cause analysis• Log files

– Can both indicate a problem, and contain necessary diagnostics

• Monitoring tools may provide further info on:– Failed function call arguments– High memory consumption– Etc.

• printf() and var_dump()

• Debuggers (Xdebug, Zend Debugger, phpdbg)

• Code tracing pinpoints in the request execution what triggered the problem

• Z-Ray: request details right in the developer’s web browser (code trace-like)

Page 19: Resolving problems & high availability

19

Event details

Page 20: Resolving problems & high availability

20

Debugging

Page 21: Resolving problems & high availability

Poll #2What is your primary means of root cause analysis?- printf(), var_dump()- Logging data to files- Xdebug- Zend debugger- phpdbg

Page 22: Resolving problems & high availability

Optimizing scale:Cluster management

Page 23: Resolving problems & high availability

23

What is a cluster?

Page 24: Resolving problems & high availability

24

Why cluster?• Long-term demand is increasing

– Growing population of mobile devices– Machine-to-machine traffic (bots, B2B, APIs) on the

rise• Demand is both predictable and unpredictable

– “The Witching Hour” and other periodic processing spikes

• Resilience when failures occur

Clustering allows you to• Adapt to changing demand• Manage infrastructure costs• Provide redundancy in the face of failures

Page 25: Resolving problems & high availability

25

Cluster overview

Requests

Responses

Page 26: Resolving problems & high availability

26

Cluster characteristics• Nodes are the same

– Any node can do the same work as all others– Same specs

• Operating system, installed software base• Hardware (RAM, disk, etc.)

• Virtual machines– Containerization and provisioning (Docker, Rocket, Puppet,

Chef, Ansible, SaltStack, Fabric, Capistrano, etc.)

Provides for:• Scaling out/in as traffic increases/decreases• Redundancy in the face of failures

Page 27: Resolving problems & high availability

Synchronizing session data

Page 28: Resolving problems & high availability

28

Load balancing and sessions

Session Affinity(Sticky Sessions)

Page 29: Resolving problems & high availability

29

Session clustering

Page 30: Resolving problems & high availability

30

Session clustering

Page 31: Resolving problems & high availability

31

Session clustering

Page 32: Resolving problems & high availability

32

Best practicesHow do you know? • Monitoring

How do you diagnose?• Log files• Code tracing• Z-Ray

How do you prevent?• Testing!• Load

balancing• Clustering

How do you minimize downtime? • Support

Page 33: Resolving problems & high availability

Poll #3How do you currently implement high availability sessions in a clustered environment?- Central database (MySQL, PostgreSQL, Oracle, MariaDB)- Memcached- Redis- Zend Server- Other/We’re not clustered

Page 34: Resolving problems & high availability

34

Conclusion• Reputation = f(reliability) + f(availability)

• Monitor for faults: know quickly when you have a problem

• Fault diagnosis is all about using the right tools

• Q: Scalability? A: Clustering!

• Sessions in clusters

Visit www.zend.com/en/resources/webinars for webinars

Visit devzone.zend.com for the Zend Developer Zone

Page 35: Resolving problems & high availability

Q & A

Page 36: Resolving problems & high availability

36

The fastest way to enterprise PHP

Free trial

www.zend.com

• Full, tested, secure PHP stack• Z-Ray vision deep into your app• Code tracing• Job queuing and caching• Deployment and DevOps• High availability session clustering• Backed by support & services

Page 37: Resolving problems & high availability

37

Series overviewOctober 13: Optimizing performance (revised date)Keep users on your site by learning how to use background jobs and caching, measure performance, and make data-driven decisions.

Page 38: Resolving problems & high availability

38

Don’t miss this premiere PHP event!Register at zendcon.com

Visit with sponsors 90+ sessions in 6 tracks

Page 40: Resolving problems & high availability

Building the perfect PHP app for the enterprise

Episode 3: Resolving Problems &

High AvailabilityClark EverettsSeptember 28, 2016