Crash Fast & Furious

  • View
    601

  • Download
    0

Embed Size (px)

Text of Crash Fast & Furious

  • Crash Fast &

    Furious

    Pierre-Yves Ricau / @Piwai

    Run Keeper / morning. End of run, save: crash

  • Source: https://t.co/uH1EqxqAow

    * Study thats available on hp.com

    * Stole slide from Doug Sillars

    https://t.co/uH1EqxqAow

  • Not your fault. Fragmentation, bugs in manufacturer. Lifecycle. Fragments.

  • * Why do they waste their time on this?

    * Why dont they fix the crashes first?

    * Crash: your fault. Even when not your own code is at fault.

  • Whats a crash?

  • Reminder: Android = linux, 1 app = 1 VM = 1 process

    Crash: something bad happen, need to kill that process and restart it.

  • 1

    2

    * Threads: exception handler per thread

    * Exceptions bubble up, delegated to exception handlers

    * If no handler, goes to static default

  • 1

    2

    3

    * Focus on crashes in Java land * Uncaught exception delegated to default handler

  • 1

    2

    3

    4

    * main: when program starts * log, dialog, kill.

  • * How many people click Report? * What do most people do? * Cant use Play Store Crash reports

  • 1

    2

    * You can create your own. * Dont do that. Client is easy, backend is hard.

  • * Crashlytics: closed source, UI for noobs * ACRA: OSS client, free, host your backend * Bugsnag: OSS client, small teams, small API, scaling issues

  • Native Crashes

    1

    2

    * Signal sent to the process, need a signal handler * Uses Google breakpad * Fake exception

  • First thing to look at?

    => stack trace

  • * line numbers => checkout correct version of sources

    * Smart stacktrace => stupid

  • * Stacktrace: quick fix of simple error * Who started animation, why? * Stacktrace on server: each frame is a layer * Callback => loss of info. Stacktrace not enough.

  • Reproducing

    * How can we reproduce the crash?

  • * Associate customer id to crash

    * Best is to ask customer what they did.

    * Great for alpha / internal testing.

  • * Startup is the worst. Crash after work is second. * Asking for feedback channels frustration, avoids 1 star reviews.

  • * Custom crash dialog that asks for feedback.

    * Good idea: offer a link to contact support

    * Emotional connection to customer.

  • 1

    2

    3

    4

    * Cant ask for feedback while crashing: display popup on restart.

    * Risky: what if crash on restart. Dont double restart. Maybe crash dedicated activity + different process.

    * Twitter & Fb seem to do that.

  • Static info

    * diff UI, diff code path => isTablet helps identify problem * isTablet: sw600dp * app version number + SHA version numbers for dev builds

  • * Picture of what the screen look like at time of crash.

    * Bitmap? Too big. Upload description of view hierarchy.

  • Current screen

    * What the user is looking at * Current screen

  • 1

    2

    3

    Find all windows: Espresso RootsOracle

  • * Black box.

  • History: high level log

    1

    2

    * Steps of the user + internal state changes * Look at log, reproduce the steps * Navigation + Http calls

  • * OkHttp interceptor

  • OOM: Stack trace is useless.

  • squ.re/leakcanary

    Detect memory leaks

  • * UI has validation rules * Bug: somehow not enforced * Crash time: too late to do something about it.

  • Exception =

    something unexpected happened

    What do you do when something unexpected happens and the app crashes?

    * We somehow got a blank email

  • Defensive programming

    * Cant figure it out. * Fatal condition, shouldnt happen. Problem ignored. * Payments.

  • Offensive programming Crash Fast

    2

    1

    * Detect problems early * Complain as loudly as possible * Quality of code increases * If you cant understand, make the problem happen earlier, and ship it.

  • Exception Grouping

    * Exceptions thrown by a common Preconditions class might be grouped together in the crash reporting tool.

  • 1

    2

  • More assertions = more crashes.

    How to keep low impact on customers?

  • Integrations tests

    * Writing a feature = writing UI tests * Espresso * Run on VMs, no real devices. Parallelized * 20min total build

  • Smoke testing

    * Manual QA * Testing parties * Internal and payed external testers

  • Dogfood / Beta

    * Internal releases: hard, need use case => lunch. * Dogfood at sellers * Betas work better.

  • Staged Rollout

    * Test the waters

    * Ship to 5%, 10%, evaluate crash rate and do minor dot releases

  • * Raw crash numbers: not useful * Most important feature: take payments * Crash per transaction

  • Reproducing Static info Flight Recorder View hierarchy / state Crash Fast Staged rollout

  • Questions?

    py@squareup.com

    @Piwai* We are hiring. SF, NYC, Canada.