195
The Art and Science of Debugging PPW 2011 Brock Wilcox [email protected]

The Art and Science of Debugging DCBPW Debugging Intro... · • Log::Dispatch • Framework ... ERROR: Date not set in obj. HOWEVER: Date IS set in DB! ... Fix mod_deflate issue

  • Upload
    doliem

  • View
    224

  • Download
    1

Embed Size (px)

Citation preview

The Art and Science of

Debugging

PPW 2011Brock [email protected]

whoami

liquidation.com

PerlMySQLApache (soon Starman!)Linux

RTMercurialJenkinsCustom dev build tools

11 Perl developers(hiring)

Revision control back to 2002(RT comments since 2007)

Debugging

Bug?

Undesired Behavior?

Developer vs Operations

My arbitraryprocess breakdown

Report

ReportReproduce

ReportReproduceReduce

ReportReproduceReduceComprehend

ReportReproduceReduceComprehendCorrect

ReportReproduceReduceComprehendCorrectPrevent

I call this therrrccp bugfix cycle.

r3c2p

(not really)

ReproduceDiagnoseFixReflect

ReportReproduceReduceComprehendCorrectPrevent

Most important:

Reproduce

This talk:

Reduce / Comprehend

ReportReproduceReduceComprehendCorrectPrevent

Report Quality

"The program doesn't work"

Gee, thanks.

Good reports answer these:

• How is it working now?• How is it supposed to work?

Good reporters collect these:

• What are you trying to achieve?• Is this one issue or many?• Is this a code issue?• When did the issue start?• How often does it occur?• Is it reliably reproducable?• Are there test-steps?• Screenshot / Screencast?• Is there a work-around

Help your bug-reportersto report bugs better.

ReportReproduceReduceComprehendCorrectPrevent

Ship in a Bottle

You do have a test environment, right?

State

The biggest enemy:

Non-determinism.

State before,Execute test,State after

ReportReproduceReduceComprehendCorrectPrevent

Shrink That Ship, Yo

If you discover the essence of the problem,enlightenment will follow.

How?

Science!

(It works, bitches!)

Hypothesize, test, repeat...

Form a Hypothesis

Test your Hypothesis

(hypothesis must be testable!)

Iterate until a proven hypothesis is found

Propose a solution

Iterate until solution is proven

Regression Test

Let's begin!But where?

When in doubt, look about.

Examine Symptoms

Failing tests

Buildbot History

Similar Problems

Recent Changes

Quickly Find Changes

Trace back to ticket

Navigate libraries

Find references (ack)

Look at relevant side-effects

eg. DB content before/after

So what ACTUALLY happened?

Live vs Dead

Debugger vs Logging

Logging

Timeline of events

Blunt Logging Tools:• print "Here!\n"• Carp::cluck()• Devel::Trace (Devel::Trace::More)

Logging Tools:• Log::Log4perl• Log::Dispatch• Framework logging• Debug mode of libraries• PSGI Middleware

Bonus Logging Tools:• Devel::NYTProf• Devel::Cover

Types of Logging:• Temporary vs permanent (vs Conditional)

LIQ::debug user => $user, address => $user->get_primary_address;

You're already doing it

Edit, execute, inspect. Repeat.

Debugger

Logging gives you a rough idea

Enter the debugger w/ breakpoint

Inspect AND modify!

Redefine subs / callbacks

Otherwise: Edit, debugger-execute, inspect. Repeat

Debugger Tools• perl -d• Devel::ebug• Devel::Enbugger• Devel::REPL (Eval::WithLexicals)• PSGI Middleware

(logger vs debugger, a digression)

Testability

MAKE it testable

Automate 'execute' step with tests!

Case Study

Autmated test failing:• Import file• Get obj from DB• Verify obj (FAIL)

ERROR: Date not set in obj

HOWEVER: Date IS set in DB!

Wha?

Hint:• Import file•• Using ORM• Get obj from DB•• Using ORM• Verify obj

How does the date get into the DB?

Trace w/ SQL Log

Auto-set on insert by ORM

... but not copied back to obj!

This is why you shouldn'twrite your own ORM.

Other Tips

Error Seeding

Verify Assumptions

Is that variable still thevariable that you think it is?

say "obj: $obj\n";

Compiler help

Change one variable at a time

Watch for edge cases

Especiall off-by-one

And type-o

Nondeterminism can be causedby resource consumption

ReportReproduceReduceComprehendCorrectPrevent

You know, enlightenment and stuff.

Types of bugs

"Programming Bug"

Code does not do what you intended.

"Logic Bug"

It's doing exactly what you wanted it to.

"Timing Bug"

(aka: concurrency bug)

One of the worst bugs ever!

Things I hear

I ran into this impossible situation...

Clearly it isn't.

Given so much time, the "impossible"becomes possible, the possible probable,and the probable virtually certain.One has only to wait: time itselfperforms the miracles.

-- George Wald

Vizzini:HE DIDN'T FALL? INCONCEIVABLE.

Inigo Montoya:You keep using that word. I do not thinkit means what you think it means.

Case Study

Deploy Apache2 + PSGI/Starman

Great!

Except...

Scan Pallet -- OK!Move Pallet -- OK!Scan Pallet again... FAIL

First issue: reproducing

Hardware / networking issues :(

Emulator!

Watch HTTP Traffic

Er... SSL...

Proxy!

Er... bug goes away.

Heisenbug!

Sniff the traffic.Can't see content because of SSL.

But can still see that itis a caching issue

Hook up post-SSL logging(server side)

No workey without PSGI?

Ahh! Apache1 vs Apache2

Eliminate PSGI/Starman as cause

Look for changes caused by Apache2

Fix mod_deflate issue on IE4

... Wait for next release for PSGI

Anyway

Understand the cause.

Often, go back to reduction :)

ReportReproduceReduceComprehendCorrectPrevent

Don't do X

Do Y instead

Keep it!

Programming Bug -- fix it

Logic Bug -- fix it

(but make sure it is still ok)

Timing Bug -- mutex?

Design Flaw

Back to the drawing board :)

ReportReproduceReduceComprehendCorrectPrevent

Never fix again

Similar bugs?

Fix 'em!

Test suite!

In-code assistance

Non-code assistance

Other Resources

Old people

Other Developers

Operations

Books

Take a break

Questions?

The End

2012DC-BaltimorePerl Workshop

BONUS SLIDES

bisect

This thing used to work!

But I don't know whenit stopped working.

Pick a revision whereit did work (last release?)

Pick a revision whereit DOESN'T work (like... now)

Works best if you have a test

$ git bisect start$ git bisect bad$ git bisect good v1.2.3

$ git bisect run 'prove t/blah.t | grep PASS'

chomping

(aka delta debugging)

I've never used this in Real Life.