Click here to load reader

Actionable Logging For Smoother Operation and Faster Recovery

  • View

  • Download

Embed Size (px)


This talk focuses on tips for making application logs more useful to operations staff, particularly after a pager goes off at 3 a.m. We\'ll look at methods for managing logs, separating problems from metrics, and working with developers with a goal toward faster recovery after problems and automation of simple fixes.

Text of Actionable Logging For Smoother Operation and Faster Recovery

  • 1. Actionable Logging For Smoother Operation andFaster Recovery Mandi Walls AOL, LLC June 23, 2008 Velocity 2008

2. Actionable Logging

  • What is Actionable
  • Goals of logging in production
  • Logging quality information
  • Improving log contents

3. Actionable

  • No nonsense logging
  • Concise, easy to understand
  • Express symptoms of production issues
  • Anything that makes the log needs to be fixed

4. Why Its Important

  • Expending resources on production systems
  • The point of logging in production
  • Diagnosis of issues
  • The 4am Test

5. Logging Goals

  • Diagnosis and recovery
  • Statistics and monitoring
  • Provide insight into the behavior of the application
  • Indicate potential issues, and areas for improvement
  • Not the same goals as development and QA environments!

6. Types of Logs

  • Access log
  • Server log, i.e., catalina.out
  • Application logs
  • Special use logs for recording specific groups of activities

7. Log File Location

  • Where logs are located on the system should be predictable and obvious
  • It may be helpful to locate logs on different disk partitions but link them back to the app
  • Keep older logs in an obvious place

8. Log File Management

  • Everyone has their own method
  • Roll logs into files with timestamps:
    • host-01.log.003 vs host-01.log.06202008
  • Roll all the logs at the same time for a given app to make coordination of events easier
  • Roll when the app needs logs rolled: hourly, daily, weekly
  • Dont rely on STDOUT or server files that cant be rolled without a hassle

9. Logging Quality Information

  • Logs should be expressive but not overly verbose
  • Keys to making logs more actionable:
    • Appropriate Formats
    • Quality Messages

10. Quality Information: Format

  • Timestamping: what not to do
    • 1213988938:tvdata shows/617/306
    • 1213988939:tvdata shows/618/307
    • 20/130055 err(4) lang-locale es-us not found
    • SEVERE: Error listenerStart

11. Quality Information: Format

  • Timestamps that mean something
    • Jun 19, 2008 4:20:25 PM org.apache.catalina.startup.Catalina start
    • - - [20/Jun/2008:15:15:58 -0400] "GET /monitors HTTP/1.0" 200 2300.049909
  • Good timestamps give context for linking to external events like network outages or traffic anomalies

12. Quality Information: Format

  • Other considerations in log file format include:
    • Creating a common format for multiple products and log types
    • Limiting the number of log entries that write to multiple log lines for faster parsing
    • Deciding how much is too much information

13. Quality Info: Good Messages

  • Heres some bad messages :
    • [19/Jun/2008:11:14:03][14960.229405][-conn:thread::6] Error: $$$$$$$$$$$$$$$
    • [19/Jun/2008:11:58:32][32652.67698738][channels_news] Notice: My gallery : xl
    • [19/Jun/2008:12:03:29][32652.67010608][channels_games] Notice: 0
    • [19/Jun/2008:11:58:28][32652.67715090][channels_money] Error: ViewCounter: APP2 returns statusCode=400,
    • statusText=Invalid request
  • Other things to avoid:messages with only numerical error codes in them

14. Quality Info: Good Messages

  • Heres some messages that are reasonably helpful:
    • [19/Jun/2008:12:03:30][32652.66764839][channels_money] Notice: INFO_FEED: moduleId(283403) failed with url=
    • [19/Jun/2008:12:09:52][32652.68059183][channels_news] Error: can't read "useragent": no such variable
  • One that needs a little tweaking:
    • [19/Jun/2008:00:01:48][15446.36831248][channels_games] Error: dom parse timeout doc: error "syntax error" at line 1 character 0
    • "t