43
Node.js Postmortem Working Group Update Yunong Xiao, Netflix Michael Dawson, IBM

Post mortem talk - Node Interactive EU

Embed Size (px)

Citation preview

Page 1: Post mortem talk - Node Interactive EU

Node.js Postmortem Working Group Update

Yunong Xiao, NetflixMichael Dawson, IBM

Page 2: Post mortem talk - Node Interactive EU

Yunong XiaoPlatform Architect, Netflix

@yunongxhttp://yunong.io

Page 3: Post mortem talk - Node Interactive EU

About Michael

Page 4: Post mortem talk - Node Interactive EU

About The Postmortem Workgroup

Howard Hellyer@hhellyer

David Pacheco@davepacheco

Julien Gilli@mistredjules

Michael Dawson@mhdawson

Chris Bailey@seabaylea

Daniel Khan@danielkhan

Joshua Clulow@jclulow

Yunong Xiao@yunong

James Bellenger@jbellenger

Bradley Meck@bmeck

Luca Maraschi@lucamaraschi

David Clements@davidmarkclements

Richard Chamberlain@rnchamberlain

Page 5: Post mortem talk - Node Interactive EU

Mission Statement

The working group is dedicated to the support and improvement of postmortem debugging for Node.js.

Page 6: Post mortem talk - Node Interactive EU

Debugging Node.js

Page 7: Post mortem talk - Node Interactive EU

Debugging Node.js

Test Enviro

nment

Page 8: Post mortem talk - Node Interactive EU

What about production?

Page 9: Post mortem talk - Node Interactive EU
Page 10: Post mortem talk - Node Interactive EU
Page 11: Post mortem talk - Node Interactive EU

“The method described in this article was designed to provide a core dump… with a minimal impact on the spacecraft… as the resumption of data acquisition from the spacecraft is the highest priority.”

- Chafin, R. "Pioneer F & G Telemetry and Command Processor Core Dump Program." JPL Technical Report XVI, no. 32-1526 (1971): 174.

Page 12: Post mortem talk - Node Interactive EU

Core Dumps: Brief History

● Magnetic core memory● Dump out the contents

of “core” memory for debugging

● “Core dump” was coined● Initially printed on paper● Postmortem debugging

was born

Page 13: Post mortem talk - Node Interactive EU

Production Constraints

● Uptime is critical

● Not easily reproducible

● Can’t simulate environment

● Resume normal operations ASAP

Page 14: Post mortem talk - Node Interactive EU

Postmortem Debugging

Page 15: Post mortem talk - Node Interactive EU

└─[0] <> node --abort_on_uncaught_exception throw.jsUncaught Error

FROMObject.<anonymous> (/Users/yunong/throw.js:1:63)Module._compile (module.js:435:26)Object.Module._extensions..js (module.js:442:10)Module.load (module.js:356:32)Function.Module._load (module.js:311:12)Function.Module.runMain (module.js:467:10)startup (node.js:134:18)node.js:961:3

[1] 4131 illegal hardware instruction (core dumped) node --abort_on_uncaught_exception throw.js

Page 16: Post mortem talk - Node Interactive EU

Where: Inspect stack trace

Why: Inspect heap and stack variable state

Page 17: Post mortem talk - Node Interactive EU

Generate Core Dump Ad-hoc

root@demo:~# gcore `pgrep node`[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7facaeffd700 (LWP 5650)][New Thread 0x7facaf7fe700 (LWP 5649)][New Thread 0x7facaffff700 (LWP 5648)][New Thread 0x7facbc967700 (LWP 5647)][New Thread 0x7facbd168700 (LWP 5617)][New Thread 0x7facbd969700 (LWP 5616)][New Thread 0x7facbe16a700 (LWP 5615)][New Thread 0x7facbe96b700 (LWP 5614)]0x00007facbea5b5a9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6Saved corefile core.5602

Page 18: Post mortem talk - Node Interactive EU

Example

Page 19: Post mortem talk - Node Interactive EU

Netflix API Request

Page 20: Post mortem talk - Node Interactive EU

Node to API RPC

Page 21: Post mortem talk - Node Interactive EU

[2016-09-09T16:25:48.388Z] WARN: reactive socket/rs-pool/17352 on lgud-yunong:

TcpLoadBalancer._connect: no more free connections

Page 22: Post mortem talk - Node Interactive EU

Postmortem Debugging

Page 23: Post mortem talk - Node Interactive EU

Connection Pool State> ::findjsobjects -p _connections | ::jsprint{ "connected": { }, "connecting": {}, "free": { "100.82.188.185:7001": [...], "100.82.37.181:7001": [...], "100.82.41.121:7001": [...], "100.82.102.157:7001": [...], "100.82.106.115:7001": [...], "100.82.129.239:7001": [...], "100.82.102.158:7001": [...], "100.82.74.237:7001": [...], ... }}

Page 24: Post mortem talk - Node Interactive EU

Postmortem Debugging is Critical to Large Scale Production Node Deployments

Page 25: Post mortem talk - Node Interactive EU

Postmortem WGgithub.com/nodejs/post-mortem/

Page 26: Post mortem talk - Node Interactive EU

Postmortem WG - Mission

Guide improvements in postmortem

● Interfaces/APIs ● Dump formats ● Tools and Techniques

Page 27: Post mortem talk - Node Interactive EU

State of key tools today

Heap dump - snapshot of heap

● heapdump module - https://github.com/bnoordhuis/node-heapdump● Chrome developer tools● Limitations

● Need to modify application● Slow to generate (minutes or hours)● O(N) memory usage ● Limited content● Output is large

Talk about IDDE, MDB, LDDB, node-heapdump

pros/cons

Low level, hard to use, not x-platform, new(lddb), slow, can’t use in prod...

Page 28: Post mortem talk - Node Interactive EU

State of key tools today

Core dump - memory image

● Creation○ Crash, signal○ --abort-on-uncaught-exception○ Fast (relative to heap dumps) ○ Size matches process memory

● OS debuggers○ Examination at C/C++ or assembler level○ No knowledge of Node/v8 structures

● Node core dump inspectors○ MDB (limited platform support)○ IDDE (IBM SDK specific)○ LLNODE (newer, less complete)

Page 29: Post mortem talk - Node Interactive EU

Example commandsMDB_V8 command LLNODE command IDDE

Print a stack trace jsstack, jsframe v8 bt !stack, !frame

Find objects findjsobjects v8 findjsobjects, v8 findjsinstances <type>

!jslistobjects!jsgroupobjects!jsfindbyproperty!jsobjectsmatching

Print an object jsprint v8 inspect !jsobject

Print function source jssource v8 source (prints source for a stack frame)

!jsobject, !string + work

Find constructor for an object

jsconstructor n/a !jsconstructor

Print elements of a FixedArray

v8array v8 inspect <instance> !array

Find native memory backing a buffer

nodebuffer v8 inspect <instance> !nodebuffers

Page 30: Post mortem talk - Node Interactive EU

How to make this better?

● Improve ease of use● Common APIs to introspect dumps● Cross platform support● Common command set● Lightweight dump

Page 31: Post mortem talk - Node Interactive EU

The Postmortem WG is working on...

Common Heap Dump Format

Improved Core Dump Analysis

● Library in C & JS● Tools: mdb_v8(mdb), llnode(lldb), ...

Node Report

Page 32: Post mortem talk - Node Interactive EU

Common Heap Dump Format

Enabler for new tools

Generation

● mdb● llnode

Consumption

● Conversion to existing v8 format - > chrome dev tools● C/Javascript APIs

Page 33: Post mortem talk - Node Interactive EU

Core Dump Analysis

Currently working on

● Platform coverage● Re-use of command implementation● Common APIs

Soon to benodejs/llnode !

https://github.com/nodejs/post-mortem/issues/37

Page 34: Post mortem talk - Node Interactive EU

Working to get to….

Page 35: Post mortem talk - Node Interactive EU

Node Report

Lightweight Dump

● Fast● Small● Human readable● Key information to start investigating● Triggers: exception, fatal error, signal, JavaScript API

Page 36: Post mortem talk - Node Interactive EU

NodeReportexample - heap out of memory error

NodeReport content:● Event summary● Node.js and OS versions● JavaScript stack trace● Native stack trace● Heap and GC statistics● Resource usage● libuv handle summary● Environment variables● OS ulimit settings

Page 37: Post mortem talk - Node Interactive EU

Javascript API

API in Javascript

● More accessible● Leverages

○ llnode ○ Common Heap Dump (future)

Page 38: Post mortem talk - Node Interactive EU

JavaScript API - example application

Page 39: Post mortem talk - Node Interactive EU

Summary

What is postmortem debugging

Example of where it’s helpful

Activities of the working group

● Common heap format● APIs (C/JS)● Tools(lldb, mdb_v8, NodeReport)

Page 40: Post mortem talk - Node Interactive EU

Get Involved !

Great chance to learn

● Low level machine details● Key debugging techniques● Different platforms/operating systems

Where

● Most work done through GitHub issues/Pull Requests● http://github.com/nodejs/post-mortem/

Page 41: Post mortem talk - Node Interactive EU

Postmortem Debugging is Critical to Large Scale Production Node Deployments

Page 42: Post mortem talk - Node Interactive EU

Some production problems are otherwise impossible

Save complete process state for debugging later

Page 43: Post mortem talk - Node Interactive EU

Copyrights and Trademarks

IBM, the IBM logo, ibm.com are trademarks or registeredtrademarks of International Business Machines Corp.,

registered in many jurisdictions worldwide. Other product andservice names might be trademarks of IBM or other companies.

A current list of IBM trademarks is available on the Web at“Copyright and trademark information” at

www.ibm.com/legal/copytrade.shtml

Node.js is an official trademark of Joyent. IBM SDK for Node.js is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

Java, JavaScript and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United

States, other countries, or both.