Server Monitoring (Scaling while bootstrapped)

Preview:

Citation preview

By Ajibola Aiyedogbon

Server Monitoring(scaling while bootstrapped)

About meCo-founder Amebo App

Mobile Developer (Jobberman, GTBank WP, etc)

DevOps enthusiast

Before before...1 server for everything

-1 users

J2ME only

What throughput!

Cloudinary as CDN

Deployment fails

High costs, ignorance is very expensive

Now5+ servers

Hundreds of thousands of users

Multi Platform apps

18000 req/min throughput

Cloudflare as CDN

Deployments with zero downtime

Managed costs

Scaling rhymes with Failing!

Server Stack

Server Stack1 load balancer (layer 4, high availability, failover*)

3 web servers (vertically & horizontally scalable)

1 database server (replication*, redundancy*)

1 staging server

$65 monthly serving over 100 million requests

Cloudflare secret weapon, caches static requests (70%).

Technology Stack

Technology StackHaproxy (load balancer)

Nginx, Php-fpm (web server, php interpreter)

Phalcon, Php-Resque (framework, scheduler)

Redis, MongoDB, MariaDB (in-memory cache, datastores)

Git (BitBucket), Packer, Ansible (server provisioning, code provisioning)

SetCronJob, CloudFlare, Fastly (3rd party)

Why Iaas not Paas?All about the pricing page!

Bandwidth costs too high

Code optimizations are hidden behind computing power

Mission critical? Offload to PaaS selectively, e.g. Parse EOL, death by acquisition...

Why Monitor?

Don’t end up like these guys...

Why monitor?Get Visibility

Improve usability & stability

Complicated technology stacks with hard to trace errors

Mission critical

More sleep!

What to monitor apart from everything?

Server Metrics (infrastructure)Ram usage, spikes

Bandwidth usage, highs vs lows

CPU usage over time, peak usage

Disk I/O

Open source vs Saas

Free mostly

Server Metrics (services)Haproxy stats

Nginx Stats

Mysql performance etc

Service *something* status

Application ErrorsCatch all exception php

User defined errors

3rd party Library errors

Tech Stack (Application Performance Monitoring)

Request throughput

Resource usage

Service Health

Database monitoring

Infrastructure bottlenecks

Failure Alerts

Code Errors

High level overview with deep dive

Log Tracking

Better way to tail -f

Http stack errors & anomalies

Multiple log files from diff services

Manual tailing is difficult

Get pre configured graphs based on logs

All server traffic is logged, access_log

Client Errors (Mobile)

Client side stack traces post deployment

Valuable version & device insight

Very handy at debug time & post

Catch all errors …. mostly

Memory leaks & stack traces

3rd party library errors or platform errors

Open Source vs Proprietary

Vendor lockin

Community support

DIY vs training

Industry standards & experience

Fault tolerance

Enterprise customer experience

3rd Party vs Native monitoring toolsCore business?

Pricing again!

Support lifecycle and responsiveness

Product version, beta or 5.0?

Dashboard simplicity

Security implications? firewalled? https? localhost only? Install certs?

Too many alerts….!

What now?Congratulations, you reward is more work!

Customize alerts

Fix errors

Webhooks

Send to slack

Ignore at own risk

Be like this guy….or not!

Graphs on graphs on graphs on graphsInformation overload is real

Customize dashboard

Overviews only

Deep dive early to be familiar with dashboard

What Next? Setup BugSnag

ConclusionWhy Monitor

What to Monitor

How to monitor

Pricing

Dashboards

Discuss your stack with peers

Thank You@Ajibz

Recommended