Upload
nelson-caldwell
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Reliability ofParallel Build Systems
Derrick Coetzee, George NeculaUC Berkeley
Creative Commons Zero Waiver: To the extent possible under law, the author, Derrick Coetzee, waives all copyright and related or neighboring rights to this work.
Why parallelize builds?
• Developer cycle time– Faster builds = Developers get more work done,
higher morale• Continuous integration– Faster builds = tests run more often
• Check-in verification systems– Faster builds = more throughput on check-in
queue
Parallel build systems today
• Job scheduling• Typical example: make -j <n>– Find n build steps that have no unbuilt
dependencies and run them– Whenever one exits, start the next one
• Depends on the dependency graph being correct and complete
• Coarse-grained task parallelism
What could go wrong?
• Incomplete dependency information– Serial builds → leads to incorrect incremental
builds– Parallel builds → leads to nondeterministic builds,
build breaks, incorrect builds– Developer changes can introduce or remove
dependencies at any time• #include "yy.lex.h"
Example of missing dependencies
• gcc test.c -o test– What files does it read/write/test existence of?
Example of missing dependencies
• gcc test.c -o test– What files does it read/write/test existence of?
• Actual: 5 processes, 119 files/directories/usr/bin/gcc /etc/ld.so.hwcap /tmp
/usr/lib/gcc/…/cc1 /lib/libc.so.6 /tmp/ccdCCHK0.s
/usr/bin/as /proc/meminfo /tmp/ccKs1ykU.c
/usr/bin/ld test.c.gch /tmp/cc0YtTuE.o
/usr/bin/nm /usr/lib/crt1.o /tmp/ccGGL3Eo.ld
/usr/bin/strip /usr/…/lib/specs /tmp/ccG4c608.le
… … …
Parallel builds are error-prone
• Missing dependencies cause errors• Nondeterministic builds make errors difficult
to reproduce• Unnecessary dependencies limit scalability• An alternative:– Developer specifies serial build (easier!)– Serial build is automatically parallelized– Nondeterminism is eliminated
Build transactions
• Each build step’s file operations are monitored using system call interception
• A transaction manager inserts locks before accessing each file (may suspend processes)
• Ensure that parallel build behaves in same way as the serial build– Use concurrency control techniques from databases– Schedule is conflict-equivalent to the user’s serial
schedule
Build transactions example
• (1) Compile test.c to test.o, then (2) link:tid Lock/unlock Lock type Path Result
1 LOCK READ /etc/ld.so.cache OK
2 LOCK READ /etc/ld.so.cache OK
… … … … …
1 LOCK CREATE test.o OK
… … … … …
2 LOCK TEST test.o BLOCKED
… … … … …
1 UNLOCK CREATE test.o OK
2 LOCK TEST test.o OK
Build transactions example
• What if transaction 2 takes the lock first?tid Lock/unlock Lock type Path Result
1 LOCK READ /etc/ld.so.cache OK
2 LOCK READ /etc/ld.so.cache OK
… … … … …
2 LOCK TEST test.o OK
… … … … …
1 LOCK CREATE test.o ROLLBACK 2
… … … … …
2 LOCK TEST test.o BLOCKED
… … … … …
1 UNLOCK CREATE test.o OK
2 LOCK TEST test.o OK
Avoiding cascading rollback
• To ensure conflict-equivalence to the serial schedule, transactions must commit in order– Strict two-phase locking is too strict
• Instead, take advantage of the fact that the dependency graph – and lock set – changes very little from build to build
• Predicted locks– Derived from set of possible conflicts during previous run– Never block– Give no privilege to access data– Block conflicting lock attempts by transactions with larger
timestamps
Build transactions example
• Compile step followed by a link step:tid Lock/unlock Lock type Path Result
1 PREDICTED LOCK
CREATE test.o OK
1 LOCK READ /etc/ld.so.cache OK
2 LOCK READ /etc/ld.so.cache OK
… … … … …
2 LOCK TEST test.o BLOCKED
… … … … …
1 LOCK CREATE test.o OK
… … … … …
1 UNLOCK CREATE test.o OK
2 LOCK TEST test.o OK
Preliminary results - Linux kernel build
0 5 10 15 20 25 30 35 401
1.5
2
2.5
3
3.5
Speedup (apmake)Speedup (make -j)
Number of concurrent processes
Preliminary results - Linux kernel build• Statistics:– Number of transactions/build steps: 2,949– Parallel build time: 3m9s– Total lock requests: 1,859,172– Lock requests blocked due to conflict: 1,697
0.000.18
0.370.55
0.740.92
1.111.29
1.481.66
1.842.03
2.212.40
020406080
100120140160180
Time waiting on lock (sec)
Freq
uenc
y
Future work:Unimplemented stuff
• Haven’t yet implemented rollback– Needed for “unexpected dependencies”
• Fast cross-platform system call interception– ptrace, binary translation, custom filesystem?
• Multiversion timestamping– Useful for builds that read/write the same file
multiple times• Append-only files– Log files, standard out
Future work:Diagnosing make build bugs
• If two build steps experience a conflict, but neither depends on the other directly or indirectly…– This proves the make build is nondeterministic– Isolates most important missing dependencies
• Filter dependency graph by “files in my source repository”– Finds other interesting dependencies (e.g. headers)
• Easy bug-finding tool for existing projects
Future work:Process hierarchies
• Long-running process spawning many short-lived processes (e.g. make)
• Rolling back make would be very bad• Solution is virtualization:– Lie to make (your children have completed)– Predict outputs of children based on previous
build – block make if it tries to access these– Rolling back make (if necessary) isn’t so bad now
Future work:Intra-build step parallelism
• Efficient parallel parsing for compilation– Ref Par Lab Browser’s work (Seth Fowler)
• Efficient parallel optimization– Unexplored?
• Efficient parallel linking– Ref Google’s gold linker
Questions?
Future work:Validated incremental builds
• Observation: most build steps produce same output files as in previous build
• Go ahead and use the old versions – if they’re wrong, we’ll find out when that file is rebuilt
• Eliminates blocking for a faster parallel build, at the cost of more rollbacks
Future work:Distributed parallel builds
• How to automatically partition builds between machines based on dependency graph?
• How to efficiently handle unexpected dependencies