Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
LPS
Lightweight Provenance Service for
High-Performance Computing
Dong Dai*, Yong Chen, Philip Carns, John Jenkins, and Robert Ross
9/11/17 2017PACTPortland,Oregon 1
Ingeneral,provenanceisdocumentedhistoryofanobject
9/11/17 2017PACTPortland,Oregon 2
Little Dancer Aged Fourteen1. Degas, Edgar (created1878-1881) 2. René De Gas (heritage 1917) 3. Adrien-Aurélien Hébrard (a contract 5/3/1918)4. Nelly Hébrard (heritage 1937)5. M. Knoedler & Company, Inc. (cosigned 1955)6. Paul Mellon (purchased 5/1956)7. National Gallery of Art (bequest 1999).
- From National Gallery of Art website
In computer science, provenance means the lineage of data, including• processes that act on data • agents that are responsible for those processes.
9/11/17 2017PACTPortland,Oregon 3
Open Provenance Model (OPM)• a graph-based provenance representation model
Nodes
Edges
A
Artifact Process Agent
A
P Ag
P
A P
Ag P
PP...
Example
Prj-1
Joe
catvi
mpi-ior -np 3
/scratch/joe/ior.confVersion: 1
rank 0
rank 2
/scratch/joe/ior.confVersion: 0
emacs
/scratch/joe/ior.confVersion: 1
/scratch/joe/ior.confVersion: 0
iorior
iorior
iorior
Shared Libs
• Evaluateanewsystem• repeatedlyrunthesamebenchmark(typicallytimeconsuming)• calculateavg andstd forcomparing
• Questions• Ifunexpectedvariationsoccur,howtoensuretheyarefromyoursystemorfromyourevaluations?• Canothereasilyrepeatthesameevaluations?• ...
9/11/17 2017PACTPortland,Oregon 4
HPC provenance is useful in the simulate-analyze-publish science discovery cycle
Provenance Helps
• Performance Requirements: • HPC users are performance sensitive. • Managing overhead should be less than 1% slowdown and less than
1MB memory footprint per core.• Coverage Requirements:• Provenance generated from multiple physical locations.• Provenance could have various granularities.
• Transparency Requirements:• Users should not change or recompile their codes for provenance.• More aggressively, users should not disable it when provenance is
used in critical missions.
9/11/17 2017PACTPortland,Oregon 5
Requirements on managing provenance in HPC
9/11/17 2017PACTPortland,Oregon 6
However, in many cases, these metrics are conflicting• To cover more details, one has to collect more fine-grain events,
introducing higher probing overheads• To be transparent, one has to rely on noisy system events,
introducing higher processing overheads
Existing solutions make a fixed tradeoff among overhead, coverage, and transparency during design.• PASSv2 [ATC’06], SPADEv2 [Middleware’12]: transparent and
detailed, with 23% and 10% overhead respectively• Zoom [VLDB’07], VisTrails [CC’08]: low overhead, but only
covers workflows• ...
9/11/17 2017PACTPortland,Oregon 7
LPS – Adapt with Flexible Provenance Granularity
File Atimeline
Execution
Open Close
Read
WriteRead
Write
Open/Close Granularity
First/Last Granularity (Read)
Read/Write Gran.
...
• Fix the primitive of provenance Node• Agent: user• Process: local process• Artifact: whole file (local or distributed)
• Adapt the primitive of provenance Edge• Mainly “Process” to “Artifact”• Also important for causality inference• Open/Close• First/Last• Read/Write
9/11/17 2017PACTPortland,Oregon 8
LPS – Adapt with Flexible Provenance Granularity• Open/Close
• 2 events per {process, file} pair• No need to know all read/write operations• Low Overheads, Coarse Granularity
• First/Last• 2 events per {process, file} pair• Need to know all read/write operations• High Overheads, Fine Granularity
• Read/Write• N events per {process, file} pair• Need to know all read/write operations• Resource consuming, Not practical in HPC
File Atimeline
Execution
Open Close
Read
WriteRead
Write
Open/Close Granularity
First/Last Granularity (Read)
Read/Write Gran.
...
Flexibly Change the granularity according to current workloads
9/11/17 2017PACTPortland,Oregon 9
LPS – Unified Model for Flexible Granularities• Flexibly changing the granularities means uncomplete event
pairs, like Open and LastAccess;• A unified granularity model• [Tevent_start, Tevent_end] to represent a unified granularity• allows any two events to be paired to form a granularity• rule table determines the legal granularity
9/11/17 2017PACTPortland,Oregon 10
LPS Overall Architecture in HPC
Provenance
Distributed LPS Cluster
Local LPS
Logi
n N
odes
Compute Nodes
Parallel File System
ProvenanceData
Distributed LPS Cluster
Kernel Space
System Call Layer
LPS Aggregatoruser processes
/Proc
A Single Server
LPS Tracer
User SpaceLPS Aggregator
A Single Server
Server
LPS Builder LPS Builder LPS Builder LPS Builder
Server Server Server...,...
LPS Design and Implementation
coveredinDonget.al., GraphMeta:AGraph-BasedEngineforManagingLarge-ScaleHPCRichMetadata,IEEECluster,2016
9/11/17 2017PACTPortland,Oregon 11
LPS Tracer• LPS leverages kernel instrument to collect detailed runtime
events to build provenance [among three methods]
• To support flexible granularity, it needs to enable/disable probing read/write events• Dynamic Probe• Two kernel instrument scripts (Systemtap)• The second one only probes read/write events• Can be disabled/enabled accordingly in runtime
9/11/17 2017PACTPortland,Oregon 12
LPS Aggregator1. Monitoring overhead and direct granularity change2. Pruning noisy events to improve performance
• Instrumentation introduces overheads• Instrument Read/Write towards an
application issuing 1M 1-byte writes
• The aggregator monitors read/write frequency• a counter records the events• a timer that resets the counter• notify and change granularity
9/11/17 13
LPS Aggregator1. Monitoring overhead and direct granularity change2. Pruning noisy events to improve performance
bash
14879+0
ls
/usr/lib64/ld-2.21.so
/etc/ld
libselinux.so.1
libcap.so.2.24
libacl.so.1.1.0
libc-2.21.so
libpcre.so.1.2.5
libdl-2.21.so
libattr.so.1.1.0
libpthread-2.21.so
locale-archive
/home/daidong
UNDEFINED
UNDEFINED
bash
/usr/bin/sed
gconv-modules.cache
UNDEFINED
UNDEFINED
UNDEFINED
bash
UNDEFINED
UNDEFINED
bash
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
bash
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
1952+0
firefox
15453+1441600853121328
basename
firefox
uname
UNDEFINED
/proc/15458/task/15459/stat
/proc/15458/stat
UTF-16.so
/home/daidong/.mozilla/firefox/Crash Reports/InstallTime20150518070114
/usr/share/locale/locale.alias
gtk30.mo
/run/user/1000/gdm/Xauthorityfirefox
mkdir
UNDEFINED
UNDEFINED
firefox
UNDEFINED
UNDEFINED
firefox
run
run
dirname
run
UNDEFINED
15453+1441600856998997
/dev/null
swrast_dri.so
libdrm_nouveau.so.2.0.0
libdrm_intel.so.1.0.0
libdrm_radeon.so.1.0.1
libelf-0.161.so
libLLVM-3.5.so
libpciaccess.so.0.11.1
libedit.so.0.0.53
libtinfo.so.5.9
/sys/devices/system/cpu/online
/etc/drirc
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
bash
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
/home/daidong/.cache/mozilla/firefox/0vphsm5b.default/cache2/entries/85222C3CBB346ADAAB5A6E9B7FB98BAB19631C19
/proc/15453/mountinfo
/home/daidong/.cache/mozilla/firefox/0vphsm5b.default/cache2/entries/F18D85F52EBBBA2AB081EF739ED0D6E8A76D497C
/home/daidong/.cache/mozilla/firefox/0vphsm5b.default/cache2/index.log
/home/daidong/.cache/mozilla/firefox/0vphsm5b.default/cache2/index
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
bash
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
/home/daidong/DocumentsUNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
UNDEFINEDUNDEFINED
UNDEFINED
UNDEFINED
UNDEFINED
/home/daidong/.ICEauthority
/usr/share/emacs/24.5/etc/charsets/MULE-is13194.map
/usr/share/emacs/24.5/etc/charsets/symbol.map
DejaVuSansMono.ttf
/usr/share/icons/Adwaita/cursors/xterm
/usr/share/icons/Adwaita/cursors/left_ptr
/usr/share/icons/Adwaita/cursors/watch
/usr/share/icons/Adwaita/cursors/hand2
/usr/share/icons/Adwaita/cursors/sb_h_double_arrow
/usr/share/icons/Adwaita/cursors/sb_v_double_arrow
/usr/share/icons/Adwaita/index.theme/usr/share/icons/default/index.theme
/usr/share/emacs/24.5/etc/images/icons/hicolor/scalable/apps/emacs.svg
mime.cache
libpixbufloader-svg.so
/home/daidong/.emacs.d/auto-save-list
/usr/lib64/pango/1.8.0/modules.cache
Cantarell-Regular.otf
DejaVuSansMono-Oblique.ttf
/usr/share/emacs/24.5/lisp/calendar/time-date.elc/usr/share/emacs/site-lisp/site-start.el
/usr/share/emacs/site-lisp/site-start.d
/usr/share/emacs/site-lisp/site-start.d/autoconf-init.el
DejaVuSansMono-Bold.ttf
/usr/share/emacs/site-lisp/site-start.d/desktop-entry-mode-init.el
/usr/share/emacs/24.5/etc/images/new.xpm
/usr/share/emacs/24.5/etc/images/open.xpm
/usr/share/emacs/24.5/etc/images/diropen.xpm
/usr/share/emacs/24.5/etc/images/close.xpm
/usr/share/emacs/24.5/etc/images/save.xpm
/usr/share/emacs/24.5/etc/images/undo.xpm
/usr/share/emacs/24.5/etc/images/cut.xpm
/usr/share/emacs/24.5/etc/images/copy.xpm
/usr/share/emacs/24.5/etc/images/paste.xpm
/usr/share/emacs/24.5/etc/images/search.xpm
icon-theme.cache
/usr/share/icons/hicolor/index.theme
icon-theme.cache
/usr/share/icons/gnome/scalable/places
/usr/share/icons
/usr/share/pixmaps
/usr/share/icons/Adwaita/24x24/actions/document-new.png
/usr/share/icons/Adwaita/24x24/actions/document-open.png
/usr/share/icons/Adwaita/24x24/apps/system-file-manager.png
/usr/share/icons/Adwaita/24x24/actions/window-close.png
/usr/share/icons/Adwaita/24x24/actions/document-save.png
/usr/share/icons/Adwaita/24x24/actions/edit-undo.png
/usr/share/icons/Adwaita/24x24/actions/edit-cut.png
/usr/share/icons/Adwaita/24x24/actions/edit-copy.png
/usr/share/icons/Adwaita/24x24/actions/edit-paste.png
/usr/share/icons/Adwaita/24x24/actions/edit-find.png
/usr/share/emacs/site-lisp/site-start.d/systemtap-init.el
/home/daidong/.emacs
/usr/share/emacs/site-lisp/default.el
/home/daidong/Documents/fs.stp
/usr/share/emacs/site-lisp/systemtap-mode.el
/usr/share/emacs/24.5/lisp/progmodes/cc-mode.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-defs.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-vars.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-engine.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-styles.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-align.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-cmds.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-menus.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-guess.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/easymenu.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-fonts.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-awk.elc
/usr/share/emacs/24.5/lisp/progmodes/cc-langs.elc/usr/share/emacs/24.5/lisp/emacs-lisp/cl-lib.elc
/usr/share/zoneinfo/America/Chicago
/usr/share/emacs/24.5/lisp/emacs-lisp/cl-loaddefs.el
/usr/share/emacs/24.5/lisp/emacs-lisp/cl.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/gv.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/bytecomp.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/cconv.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/cl-extra.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/byte-opt.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/cl-seq.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/derived.elc
/usr/share/emacs/24.5/lisp/emacs-lisp/cl-macs.elc
/usr/share/emacs/24.5/etc/images/splash.svg
/usr/share/emacs/24.5/etc/tutorials/TUTORIAL
/usr/share/emacs/24.5/etc/images/checked.xpm
/usr/share/emacs/24.5/etc/images/unchecked.xpm
DejaVuSans.ttf
DejaVuSans-Oblique.ttf
/home/daidong/.emacs.d/auto-save-list/.saves-15582-localhost.localdomain~
UNDEFINED
15589+1441600998209413
bash
Raw system events from kernel
instrumentation
9/11/17 14
LPS Aggregator1. Monitoring overhead and direct granularity change2. Pruning noisy events to improve performance
Bash (pid: 2322)
(pid: 6444) (pid: 6447)mpiexec
(pid: 6439)
(pid: 6445)sed (pid: 6446)
(pid: 6448)
(pid: 6449)
hydra_pmi_proxy (pid: 6440)
./mpitest (pid: 6441)
./mpitest(pid: 6442)
./mpitest (pid: 6443)
• Representative Executions• Executions that users care
the most• Eliminate unimportant child
processes• Eliminate helper child
processes• Events of non-R executions
are counted to their ancestor R executions
9/11/17 2017PACTPortland,Oregon 15
LPS Builder1. Fusing with environmental variables2. Building provenance with versioning
• Local aggregators generate isolated provenance events• Workflows or jobs that are across multiple servers
• A fatal challenge• To match identities in different machines needs a unique ID• Unique IDs are generated by specific software, no transparency
• A compromise solution• LPS relies on specific environmental variables to match identities• LPS should be notified about the name of these env variables
9/11/17 2017PACTPortland,Oregon 16
LPS Builder1. Fusing with environmental variables2. Building provenance with versioning
w0 start w1 start w1 end w0 end
r0 start
w0
List w0
w1w0Av1 v2
r0 end
v3
w0w1A v2
w0A v3 A v4
• Events on the same file will be sent to the same builder• RULE: If events are overlapped, read always depend on the
newest version that the overlapped writes create
9/11/17 2017PACTPortland,Oregon 17
Evaluation Results – LPS on Benchmarks
• All evaluations done on CloudLab• 45 servers are used to build the HPC
environment
• HPCG Benchmark• Network- and CPU-intensive workloads• Report performance in GFlops• Runs on 8 – 196 processes
• Results• Less than 0.1% performance degradation
9/11/17 2017PACTPortland,Oregon 18
Evaluation Results – LPS on Benchmarks
• IOR Benchmark• Data-intensive workloads• 256 processes, each running on a core• issues 50K rounds of random 4K writes• Runs on 8 – 256 processes
• Results• Open/Close barely introduces overhead
(around 0.1%)• First/Last introduces noticeable
overheads, but they are largely covered by the variations and still less than 1%
9/11/17 2017PACTPortland,Oregon 19
Evaluation Results – LPS on Benchmarks
• MDTest Benchmark• metadata-intensive
workloads• report 4 representative
operations• Results• Less than 1% overhead
in all ops except file read• MDTest file reads hit the
I/O buffer a lot.• Should switch to
open/close in this case
9/11/17 2017PACTPortland,Oregon 20
Evaluation Results – LPS Component by Component
LPS Tracer Overhead LPS Aggregator Monitoring
LPS Aggregator Pruning
LPS BuilderScalability
• Summary• It is always needed to balance between performance and
accuracy
• Using flexible provenance granularity, LPS is able to capture full provenance in HPC environment with low overhead
• Future Work• Quantitatively analysis on the uncertainty introduced by
flexible provenance granularity
9/11/17 2017PACTPortland,Oregon 21
Summary and Future Work
Thanks&Questions
9/11/17 2017PACTPortland,Oregon 22
Provenance
9/11/17 2017PACTPortland,Oregon 23