Crash Fast &
Pierre-Yves Ricau / @Piwai
Run Keeper / morning. End of run, save: crash
* Study thats available on hp.com
* Stole slide from Doug Sillars
Not your fault. Fragmentation, bugs in manufacturer. Lifecycle. Fragments.
* Why do they waste their time on this?
* Why dont they fix the crashes first?
* Crash: your fault. Even when not your own code is at fault.
Whats a crash?
Reminder: Android = linux, 1 app = 1 VM = 1 process
Crash: something bad happen, need to kill that process and restart it.
* Threads: exception handler per thread
* Exceptions bubble up, delegated to exception handlers
* If no handler, goes to static default
* Focus on crashes in Java land * Uncaught exception delegated to default handler
* main: when program starts * log, dialog, kill.
* How many people click Report? * What do most people do? * Cant use Play Store Crash reports
* You can create your own. * Dont do that. Client is easy, backend is hard.
* Crashlytics: closed source, UI for noobs * ACRA: OSS client, free, host your backend * Bugsnag: OSS client, small teams, small API, scaling issues
* Signal sent to the process, need a signal handler * Uses Google breakpad * Fake exception
First thing to look at?
=> stack trace
* line numbers => checkout correct version of sources
* Smart stacktrace => stupid
* Stacktrace: quick fix of simple error * Who started animation, why? * Stacktrace on server: each frame is a layer * Callback => loss of info. Stacktrace not enough.
* How can we reproduce the crash?
* Associate customer id to crash
* Best is to ask customer what they did.
* Great for alpha / internal testing.
* Startup is the worst. Crash after work is second. * Asking for feedback channels frustration, avoids 1 star reviews.
* Custom crash dialog that asks for feedback.
* Good idea: offer a link to contact support
* Emotional connection to customer.
* Cant ask for feedback while crashing: display popup on restart.
* Risky: what if crash on restart. Dont double restart. Maybe crash dedicated activity + different process.
* Twitter & Fb seem to do that.
* diff UI, diff code path => isTablet helps identify problem * isTablet: sw600dp * app version number + SHA version numbers for dev builds
* Picture of what the screen look like at time of crash.
* Bitmap? Too big. Upload description of view hierarchy.
* What the user is looking at * Current screen
Find all windows: Espresso RootsOracle
* Black box.
History: high level log
* Steps of the user + internal state changes * Look at log, reproduce the steps * Navigation + Http calls
* OkHttp interceptor
OOM: Stack trace is useless.
Detect memory leaks
* UI has validation rules * Bug: somehow not enforced * Crash time: too late to do something about it.
something unexpected happened
What do you do when something unexpected happens and the app crashes?
* We somehow got a blank email
* Cant figure it out. * Fatal condition, shouldnt happen. Problem ignored. * Payments.
Offensive programming Crash Fast
* Detect problems early * Complain as loudly as possible * Quality of code increases * If you cant understand, make the problem happen earlier, and ship it.
* Exceptions thrown by a common Preconditions class might be grouped together in the crash reporting tool.
More assertions = more crashes.
How to keep low impact on customers?
* Writing a feature = writing UI tests * Espresso * Run on VMs, no real devices. Parallelized * 20min total build
* Manual QA * Testing parties * Internal and payed external testers
Dogfood / Beta
* Internal releases: hard, need use case => lunch. * Dogfood at sellers * Betas work better.
* Test the waters
* Ship to 5%, 10%, evaluate crash rate and do minor dot releases
* Raw crash numbers: not useful * Most important feature: take payments * Crash per transaction
Reproducing Static info Flight Recorder View hierarchy / state Crash Fast Staged rollout
@Piwai* We are hiring. SF, NYC, Canada.