323
ThinLTO A Framework for Scalable and Incremental Link-Time Optimization Teresa Johnson Mehdi Amini Xinliang David Li

ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO

A Framework for Scalable and Incremental Link-Time Optimization

Teresa Johnson Mehdi Amini Xinliang David Li

Page 2: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO

A Framework for Scalable and Incremental Link-Time Optimization

Teresa Johnson Mehdi Amini Xinliang David Li

Towards Always-Enabled LTO

Page 3: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LLVM LTO: in a Nutshell

Page 4: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1

main.o test1.o test2.o

LLVM LTO: in a Nutshell

Page 5: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto

main.o test1.o test2.o

LLVM LTO: in a Nutshell

Page 6: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto

.o files are generated, but they are actually raw bitcode files

main.o test1.o test2.o

LLVM LTO: in a Nutshell

static archive will contain these bitcode files

Page 7: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LLVM LTO: in a Nutshell

Page 8: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o

Page 9: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o

Page 10: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

FrontendLinker

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o

Page 11: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

FrontendLinker

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM as a linker-plugin (libLTO.dylib or LLVMgold.so)

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o

Page 12: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Link all bitcode in one single Module

FrontendLinker

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM as a linker-plugin (libLTO.dylib or LLVMgold.so)

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o.bc Monolithic LTO Implementation

Page 13: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Link all bitcode in one single Module

Optimizer / Inlining Single-threaded very boring usual optimizations

FrontendLinker

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM as a linker-plugin (libLTO.dylib or LLVMgold.so)

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o.bc Monolithic LTO Implementation

Page 14: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Link all bitcode in one single Module

Optimizer / Inlining Single-threaded very boring usual optimizationsPotentially threaded

CodeGen

FrontendLinker

CodeGen

Highly parallel frontend processing+ initial optimizations.o.o .o .o.o .o .o.o LLVM

Bitcode

LLVM as a linker-plugin (libLTO.dylib or LLVMgold.so)

Traditional Linking

LLVM LTO: in a Nutshell

.o .o.o .o .o .o .o .o.bc Monolithic LTO Implementation

Page 15: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Why LTO?Benefits

Page 16: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Why LTO?Benefits

• Binary size: inherent dead-stripping and auto-hidden visibility via internalization.

Page 17: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Performance! 10% boost is common.

Why LTO?Benefits

• Binary size: inherent dead-stripping and auto-hidden visibility via internalization.

Removes module optimization boundaries via Cross-Module Optimization (CMO)Most of benefit comes from cross-module inlining

Page 18: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Performance! 10% boost is common.

Why LTO?Benefits

• Binary size: inherent dead-stripping and auto-hidden visibility via internalization.

Removes module optimization boundaries via Cross-Module Optimization (CMO)Most of benefit comes from cross-module inlining

Page 19: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Performance! 10% boost is common.

Why LTO?Benefits

`

• Binary size: inherent dead-stripping and auto-hidden visibility via internalization.

Removes module optimization boundaries via Cross-Module Optimization (CMO)Most of benefit comes from cross-module inlining

Page 20: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Performance! 10% boost is common.

Why LTO?Benefits

`

Single source improvements because global variables can be internalized (better alias analysis, etc.). LTO is more powerful than “Unity Build” because of Linker supplied information.

• Binary size: inherent dead-stripping and auto-hidden visibility via internalization.

Removes module optimization boundaries via Cross-Module Optimization (CMO)Most of benefit comes from cross-module inlining

Page 21: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 22: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 23: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 24: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 25: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 26: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 27: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 28: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 29: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 30: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 31: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 32: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 33: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 34: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Page 35: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Example

Optimization across module boundaries!

+ Linker Information

Page 36: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!

LLVM 3.7

LLVM 3.8

LLVM 3.9

Page 37: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

LLVM 3.7

LLVM 3.8

LLVM 3.9

Page 38: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

Page 39: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 40: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

https://www.xkcd.com/303/

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 41: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

https://www.xkcd.com/303/

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 42: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

Malloc Peak for the LTO Link of the `clang-3.6` binary, with only the X86 backend configured

11GB

12GB

20GB

43GB

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

LLVM 3.6

https://www.xkcd.com/303/

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 43: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

Malloc Peak for the LTO Link of the `clang-3.6` binary, with only the X86 backend configured

11GB

12GB

20GB

43GB

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

LLVM 3.6

https://www.xkcd.com/303/

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 44: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

Malloc Peak for the LTO Link of the `clang-3.6` binary, with only the X86 backend configured

11GB

12GB

20GB

43GB

LLVM 3.7

LLVM 3.8

LLVM 3.9

• Slow: inherently serial / can’t be distributed

LLVM 3.6

https://www.xkcd.com/303/

Dead end: linking Chromium with debug info still crashes now after >2h and >50GB mem

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

1000252s

970sMonolithic LTO Non LTO

`time ninja clang`

4s734s

Page 45: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

Page 46: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

Parallel

Page 47: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

Parallel

Incremental

Page 48: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

Parallel

Incremental

Memory lean

Page 49: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Monolithic LTO: No Free Lunch!LTO adoption is still low after >10 years of existence: why?

• Slow: inherently serial / can’t be distributed

• Not friendly with incremental build: fix a typo, and see the full program being re-optimized as a whole.

• Memory hungry: all the program in a single bitcode file in memory.

Parallel

Incremental

Memory lean

To fulfill this goal of LTO always enabled,

we need a solution designed around these three key features!

Page 50: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Design

Page 51: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Design• Designed from the start for huge (Google scale) applications

Page 52: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Opt Opt Opt Opt Opt Opt Opt OptBE BE BE BE BE BE BE BE

Fully-parallel (very boring) usual optimizations and CodeGen {Parallel

Backends

ThinLTO: Design

.bc.bc .bc .bc .bc .bc .bc .bcFully-parallel frontend processing+ initial optimizations{Compile

.bc.bc .bc .bc .bc .bc .bc .bc

• Designed from the start for huge (Google scale) applications• Fully parallel compile step and backends enables distributed builds

Page 53: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Opt Opt Opt Opt Opt Opt Opt OptBE BE BE BE BE BE BE BE

Fully-parallel (very boring) usual optimizations and CodeGen {Parallel

Backends

{Thin Link Thin serial synchronization step

ThinLTO: Design

Traditional Linking

.bc.bc .bc .bc .bc .bc .bc .bcFully-parallel frontend processing+ initial optimizations{Compile

.bc.bc .bc .bc .bc .bc .bc .bc

• Designed from the start for huge (Google scale) applications• Fully parallel compile step and backends enables distributed builds

Page 54: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Opt Opt Opt Opt Opt Opt Opt OptBE BE BE BE BE BE BE BE

Fully-parallel (very boring) usual optimizations and CodeGen {Parallel

Backends

{Thin Link Thin serial synchronization step

ThinLTO: Design

Traditional Linking

.bc.bc .bc .bc .bc .bc .bc .bcFully-parallel frontend processing+ initial optimizations{Compile

.bc.bc .bc .bc .bc .bc .bc .bc

• Designed from the start for huge (Google scale) applications• Fully parallel compile step and backends enables distributed builds• Module is unit of compilation enables incremental builds

Page 55: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Opt Opt Opt Opt Opt Opt Opt OptBE BE BE BE BE BE BE BE

Fully-parallel (very boring) usual optimizations and CodeGen {Parallel

Backends

{Thin Link Thin serial synchronization step

ThinLTO: Design

Traditional Linking

.bc.bc .bc .bc .bc .bc .bc .bcFully-parallel frontend processing+ initial optimizations{Compile

.bc.bc .bc .bc .bc .bc .bc .bc

• Designed from the start for huge (Google scale) applications• Fully parallel compile step and backends enables distributed builds• Module is unit of compilation enables incremental builds • Only perform profitable cross module optimization into each module memory scaling

Page 56: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Page 57: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

.bc.bc .bc .bc .bc .bc .bc .bc.bc.bc .bc .bc .bc .bc .bc .bc

Page 58: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations{Phase 1:

Compile.bc.bc .bc .bc .bc .bc .bc .bc.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information

are generated“on the side”

Page 59: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

FrontendLinker

{Phase 1: Compile

.bc.bc .bc .bc .bc .bc .bc .bc.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information are generated“on the side”

Page 60: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

Link only the summary info in a giant index: thin-link. No need to parse the IR

FrontendLinker

{Phase 1: Compile

{Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information are generated“on the side”

Page 61: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

Link only the summary info in a giant index: thin-link. No need to parse the IR

FrontendLinker

Fully-parallel cross-module function Importing based on summary.

{Phase 1: Compile

{Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information are generated“on the side”

Page 62: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

Link only the summary info in a giant index: thin-link. No need to parse the IR

FrontendLinker

Fully-parallel cross-module function Importing based on summary.

Imported functions are dropped after inlining.

{Phase 1: Compile

{Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information are generated“on the side”

Page 63: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO: Overview

Fully-parallel frontend processing+ initial optimizations

Link only the summary info in a giant index: thin-link. No need to parse the IR

FrontendLinker

Fully-parallel cross-module function Importing based on summary.

Fully-parallel (very boring) usual optimizations and CodeGen

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

Imported functions are dropped after inlining.

{Phase 1: Compile

{Phase 2: Thin Link

{Phase 3: Backends

.bc.bc .bc .bc .bc .bc .bc .bc

.bc.bc .bc .bc .bc .bc .bc .bc Extra per-function summary information are generated“on the side”

Page 64: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Model: Summary Generation

Page 65: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1

ThinLTO Model: Summary Generation

Page 66: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto

ThinLTO Model: Summary Generation

Page 67: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

ThinLTO Model: Summary Generation

Page 68: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

ELF / MachO / COFF 10101101010101011101011010110101010101110101101011010101010111010110101101010101011101011010110101010101110101101011010101010111101101011011 1011010101010111101101011011

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

Summaries can contain profile data (PGO) data as well, or be extended with other attributes

ThinLTO Model: Summary Generation

Page 69: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summaries can contain profile data (PGO) as well, or be extended with other attributes

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

ThinLTO Model: Summary Generation

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

Page 70: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

ThinLTO Model: Summary Generation

Page 71: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

ThinLTO Model: Thin-Link Phase

Page 72: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

ThinLTO Model: Thin-Link Phase

Page 73: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Thin-link: • Produce the combined index • Sequential step, but fast (<1s for clang)

ThinLTO Model: Thin-Link Phase

Page 74: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Summary: main, 2 instructions, …main, 2 instructions, …

Thin-link: • Produce the combined index • Sequential step, but fast (<1s for clang)

ThinLTO Model: Thin-Link Phase

Page 75: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Summary: main, 2 instructions, …

main

, 2 instructions, …

Thin-link: • Produce the combined index • Sequential step, but fast (<1s for clang)

ThinLTO Model: Thin-Link Phase

Page 76: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Summary: main, 2 instructions, …

main , 2 instructions, … main.oThin-link: • Produce the combined index • Sequential step, but fast (<1s for clang)

ThinLTO Model: Thin-Link Phase

Page 77: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Summary: main, 2 instructions, …

, 2 instructions, … main.oThin-link: • Produce the combined index • Sequential step, but fast (<1s for clang) • Original function names are lost!

ThinLTO Model: Thin-Link Phase

Page 78: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

, 2 instructions, … main.oThin-link: • Produce the combined index • Sequential step, but fast (<1s for clang) • Original function names are lost!

ThinLTO Model: Thin-Link Phase

Page 79: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

, 2 instructions, … main.o

3597eb0 , 42 instructions, …

768595e , 2 instructions, …

Thin-link: • Produce the combined index • Sequential step, but fast (<1s for clang) • Original function names are lost!

ThinLTO Model: Thin-Link Phase

Page 80: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

, 2 instructions, … main.o

3597eb0 , 42 instructions, …

test2.o

768595e , 2 instructions, …

, 13 instructions, …77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<1s for clang) • Original function names are lost!

ThinLTO Model: Thin-Link Phase

Page 81: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Original Model: Backend Phase

Page 82: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Original Model: Backend Phaseclang -o test -flto main.o test1.o test2.o

ld […] -o test main.o test1.o test2.oLLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

Parallel backend: • Cross-Module Importing • Optimization pipeline • Code generation

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

Page 83: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 84: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 85: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 86: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

3597eb0

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 87: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

3597eb0

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 88: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

3597eb0

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 89: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

3597eb0

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 90: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 91: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 92: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 93: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 94: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …77c4a42

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 95: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …77c4a42

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 96: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …77c4a42

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 97: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 98: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 99: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 100: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 101: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 102: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

768595e

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 103: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 104: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

LLVM Linker Plugin

Combined Index

test1.o

test1.o

7c897c21

Summary: main, 2 instructions, …

Summary: _Z5test2i, 13 instructions, …

main.o test1.o test2.o

main.o

3597eb0

test2.o

768595e

77c4a42

, 2 instructions, …

, 42 instructions, …

, 2 instructions, …

, 13 instructions, …

clang -o test -flto main.o test1.o test2.o ld […] -o test main.o test1.o test2.o

ThinLTO Original Model: Backend Phase

Page 105: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

Page 106: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

Page 107: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/ Parallel

Page 108: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/ Parallel Incremental

Page 109: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/ Parallel Incremental Memory lean

Page 110: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

• January: link time on single machine was terrible:

cross-module importing alone >3x the total time of Monolithic LTO

Parallel Incremental Memory lean

Page 111: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

linking opt: 1036 inputs files, but >30000 IR loads

• January: link time on single machine was terrible:

cross-module importing alone >3x the total time of Monolithic LTO

Parallel Incremental Memory lean

Page 112: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

linking opt: 1036 inputs files, but >30000 IR loads

• January: link time on single machine was terrible:

cross-module importing alone >3x the total time of Monolithic LTO

Loading IR is slow! Loading IR with debug-info is insanely slow!

Parallel Incremental Memory lean

Page 113: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

linking opt: 1036 inputs files, but >30000 IR loads

• January: link time on single machine was terrible:

cross-module importing alone >3x the total time of Monolithic LTO

Loading IR is slow! Loading IR with debug-info is insanely slow!

“none of the subsystems in LLVM are really good until they have been rewritten at least once.”

This is what we did between January and March!

Parallel Incremental Memory lean

Page 114: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTOEuro-LLVM 2015: A Fine-Grained Demand-Driven Infrastructure

• December 2015: clang bootstrap on MacOS \o/

linking opt: 1036 inputs files, but >30000 IR loads

• January: link time on single machine was terrible:

cross-module importing alone >3x the total time of Monolithic LTO

Loading IR is slow! Loading IR with debug-info is insanely slow!

“none of the subsystems in LLVM are really good until they have been rewritten at least once.”

This is what we did between January and March!

(Chris Lattner, The Architecture of Open Source Applications: LLVM, 2011)

Parallel Incremental Memory lean

Page 115: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

Page 116: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)

Page 117: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)

Page 118: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)

Page 119: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)+ Full reference/call graph

Page 120: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)+ Full reference/call graph

Inter-procedural Analysis

Page 121: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)+ Full reference/call graph

Inter-procedural AnalysisAnalyses Results

Page 122: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{

ThinLTO: Overview

.bc.bc .bc .bc.bc .bc .bc.bc

FrontendLinker

{Phase 1: Compile

Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Opt Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE BE

Traditional Linking

{Phase 3: Backends

+ Local reference/call graph+ Module Hash (for incremental build)+ Full reference/call graph

Inter-procedural Analysis

Analyses Results

+ Parallel Inter-procedural transformation based on the analyses results

Page 123: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

ThinLTO Revisited: Compile Phase

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

Page 124: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

ThinLTO Revisited: Compile Phase

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

Page 125: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

ThinLTO Revisited: Compile Phase

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

Page 126: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

ThinLTO Revisited: Compile Phase

Page 127: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Page 128: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LLVM Linker Plugin

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<2s for clang) • Original function names are lost!

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Page 129: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LLVM Linker Plugin

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<2s for clang) • Original function names are lost!

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

call 3597eb0, global linkage

call 77c4a42, global linkage

call 768595e, global linkage

Page 130: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LLVM Linker Plugin

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<2s for clang) • Original function names are lost!

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

call 3597eb0, global linkage

call 77c4a42, global linkage

call 768595e, global linkage

Page 131: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Run inter-procedural analyses: Transformations: Global DCE, Linkage, …

LLVM Linker Plugin

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<2s for clang) • Original function names are lost!

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

call 3597eb0, global linkage

call 77c4a42, global linkage

call 768595e, global linkage

Page 132: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

• Run inter-procedural analyses: Transformations: Global DCE, Linkage, …

LLVM Linker Plugin

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

Thin-link: • Produce the combined index • Sequential step, but fast (<2s for clang) • Original function names are lost!

ThinLTO Revisited: Thin Linkmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

call 3597eb0, global linkage

call 77c4a42, global linkage

call 768595e, global linkage

No need to load any IR! This is still a very fast serial phase.

Page 133: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 134: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

Import Lists

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 135: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

Import Lists

3597eb077c4a42768595e

main.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 136: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 137: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 138: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 139: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Thin Link

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 140: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Parallel Backends

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 141: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Parallel Backends

Parallel ThinLTO Backends:

• Read the import lists

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 142: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Parallel Backends

Parallel ThinLTO Backends:

• Read the import lists

• Process all imports from one module after the other

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 143: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Parallel Backends

Parallel ThinLTO Backends:

• Read the import lists

• Process all imports from one module after the other

Import Lists

3597eb077c4a42768595e

test1.o test2.o

main.o test1.o77c4a42

test2.o

test2.o768595e

test1.o

Knowing upfront what to import allows to limit the number of IR loading

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 144: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 145: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

CGSCC Simplification and Inlining

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 146: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 147: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 148: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 149: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 150: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 151: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) Inlining

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 152: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) Inlining7) “Simplification” passes

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 153: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) Inlining7) “Simplification” passes

8) “Eliminate Available Externally” Pass

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 154: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) Inlining7) “Simplification” passes

8) “Eliminate Available Externally” Pass

9) “Optimization Passes”: 43 passes including vectorization

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 155: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

*See PassManagerBuilder::addFunctionSimplificationPasses()

1) “Simplification” passes: 47 passes including SROA - JumpThreading - InstCombine - LICM - … (and many others*)

CGSCC Simplification and Inlining

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) Inlining7) “Simplification” passes

8) “Eliminate Available Externally” Pass

9) “Optimization Passes”: 43 passes including vectorization

main.o10) Codegen

ThinLTO Revisited: Parallel Backendsmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

LLVM Linker Plugin

Page 156: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o

ThinLTO Revisited: Parallel BackendsLLVM Linker Pluginmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Page 157: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.oCGSCC Simplification and Inlining

1) “Simplification” passes

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) “Eliminate Available Externally” Pass

7) “Optimization Passes”: 43 passes including vectorization

test1.o8) Codegen

ThinLTO Revisited: Parallel BackendsLLVM Linker Pluginmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Page 158: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.oCGSCC Simplification and Inlining

1) “Simplification” passes

2) Inlining3) “Simplification” passes

4) Inlining5) “Simplification” passes

6) “Eliminate Available Externally” Pass

7) “Optimization Passes”: 43 passes including vectorization

test1.o8) Codegen

Parallelism is obtained thanks to redundant optimizations over the same IR!

ThinLTO Revisited: Parallel BackendsLLVM Linker Pluginmain.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Page 159: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA: Correctness Global Scope Promotion

Page 160: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA: Correctness Global Scope Promotion

Page 161: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA: Correctness Global Scope Promotion

Page 162: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

a

Thin-Link IPA: Correctness Global Scope Promotion

Page 163: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

a

Thin-Link IPA: Correctness Global Scope Promotion

Page 164: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

a

Introduction of an external reference (due to importing)!

Thin-Link IPA: Correctness Global Scope Promotion

Page 165: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

a

Introduction of an external reference (due to importing)!

Thin-Link IPA: Correctness Global Scope Promotion

requires promotion/rename of local linkage functions

Page 166: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

a

Introduction of an external reference (due to importing)!

Thin-Link IPA: Correctness Global Scope Promotion

similar technique for exported “discardable” linkage references requires promotion/rename of local linkage functions

Page 167: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA: Compile Time Optimization Weak Linkage Resolution

Page 168: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

… ~100 instructions

C++ template generates a lot of redundant code!

Thin-Link IPA: Compile Time Optimization Weak Linkage Resolution

Page 169: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

… ~100 instructions

C++ template generates a lot of redundant code!• Monolithic LTO will merge these and codegen only one naturally

Thin-Link IPA: Compile Time Optimization Weak Linkage Resolution

Page 170: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

… ~100 instructions

C++ template generates a lot of redundant code!

• ThinLTO selects one at Thin Link time other copies marked for drop after inlining

• Monolithic LTO will merge these and codegen only one naturally

Thin-Link IPA: Compile Time Optimization Weak Linkage Resolution

Page 171: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

… ~100 instructions

C++ template generates a lot of redundant code!

• ThinLTO selects one at Thin Link time other copies marked for drop after inlining

Linking clang: ~25% less function that are codegen!

• Monolithic LTO will merge these and codegen only one naturally

Thin-Link IPA: Compile Time Optimization Weak Linkage Resolution

Page 172: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

Page 173: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

Page 174: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

Page 175: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

• Linker identifies external reference to getGlobalOption() Linker Info: external ref.

getGlobalOption

Page 176: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

• Linker identifies external reference to getGlobalOption() Linker Info: external ref.

getGlobalOption• Compute reachability to externally referenced nodes in index

Page 177: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

• Linker identifies external reference to getGlobalOption() Linker Info: external ref.

getGlobalOption• Compute reachability to externally referenced nodes in index

• Prune unreachable nodes from the graph

Page 178: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin Link IPA: Dead Global Pruning

• Linker identifies external reference to getGlobalOption() Linker Info: external ref.

getGlobalOption• Compute reachability to externally referenced nodes in index

• Prune unreachable nodes from the graph

• Option can be internalized and later constant folded. • The function-importing will generate a smaller list save CPU cycles!

Enabler for better subsequent analyses

Page 179: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA Future Optimization Example: Global Variables

<- this call is dead

Page 180: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Thin-Link IPA Future Optimization Example: Global Variables

<- this call is dead

Page 181: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Knowing the range for i.llvm.A570184 (here it is even easier: it is a constant) would enable folding the test as false

Thin-Link IPA Future Optimization Example: Global Variables

<- this call is dead

Page 182: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

All this code can be dead-stripped by the linker, but takes time to optimize/codegen

Knowing the range for i.llvm.A570184 (here it is even easier: it is a constant) would enable folding the test as false

Thin-Link IPA Future Optimization Example: Global Variables

<- this call is dead

Page 183: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

All this code can be dead-stripped by the linker, but takes time to optimize/codegen

Knowing the range for i.llvm.A570184 (here it is even easier: it is a constant) would enable folding the test as false

Thin-Link IPA Future Optimization Example: Global Variables

<- this call is dead

This is just one example of opportunity for better summary-based optimization!

Page 184: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Indirect Call Promotion

Page 185: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Indirect Call Promotion

Page 186: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Indirect Call Promotion

Indirect Call Promotion can only promote if the target is in the same module.

Page 187: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Indirect Call Promotion

Indirect Call Promotion can only promote if the target is in the same module.

Page 188: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

Page 189: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

Page 190: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

Page 191: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

indirect_call.o

Summary: _Z5test2i, 13 instructions, global linkage,

clang -cc1 -flto=thin

Page 192: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

indirect_call.o

Summary: _Z5test2i, 13 instructions, global linkage,

clang -cc1 -flto=thin -fprofile-use

PGO Functionname hash

Page 193: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

PGO Indirect Call Promotion

indirect_call.o

Summary: _Z5test2i, 13 instructions, global linkage,

clang -cc1 -flto=thin

Will also be used as key in the Combined

Indexcall 768595e

-fprofile-use

PGO Functionname hash

The summary records indirect calls possible targets as regular calls (speculative).

Page 194: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Thin-link:

PGO Indirect Call PromotionLLVM Linker Plugin

Page 195: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Thin-link:

PGO Indirect Call PromotionLLVM Linker Plugin

Page 196: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42, global linkage

call 3597eb0, global linkage

call 768595e, global linkage

Thin-link:

• Produce the combined index as usual

PGO Indirect Call PromotionLLVM Linker Plugin

Page 197: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42, global linkage

call 3597eb0, global linkage

call 768595e, global linkage

Thin-link:

• Produce the combined index as usual• Indirect profile-based edge points to target,

if in index, the same way as direct edges.

PGO Indirect Call PromotionLLVM Linker Plugin

Page 198: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42, global linkage

call 3597eb0, global linkage

call 768595e, global linkage

Thin-link:

• Produce the combined index as usual• Indirect profile-based edge points to target,

if in index, the same way as direct edges.• The indirect call target can be selected for import,

just like targets of direct calls.

PGO Indirect Call PromotionLLVM Linker Plugin

Page 199: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Summary: main, 2 instructions, …

Summary: _Z5test1i, 42 instructions, … _Z9test1_fooi, 2 instructions, …

main.o test1.o

Summary: _Z5test2i, 13 instructions, global linkage, call 768595e

test2.o

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42, global linkage

call 3597eb0, global linkage

call 768595e, global linkage

Thin-link:

• Produce the combined index as usual• Indirect profile-based edge points to target,

if in index, the same way as direct edges.• The indirect call target can be selected for import,

just like targets of direct calls.

PGO Indirect Call PromotionLLVM Linker Plugin

• Indirect calls are promoted after importing, but before the inliner

Page 200: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

Page 201: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Only cold() would be inlined!

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

Page 202: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Only cold() would be inlined!

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

The importer would only import cold()!

Page 203: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Only cold() would be inlined!

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

With PGO data, cold will not be inlined, hot would be inlined if available

The importer would only import cold()!

Page 204: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Only cold() would be inlined!

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

, hot , cold

With PGO data, cold will not be inlined, hot would be inlined if available

The importer would only import cold()!

Page 205: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Profile Guided Optimization (PGO): Importing Heuristics

Only cold() would be inlined!

Summary:

_Z3hotv, 150 instructions, global linkage _Z4coldv, 50 instructions, global linkage

Summary: main, 10 instructions, global linkage, call f9f84d51 call f46f05f7

Mirror inliner heuristic by giving bonus for hot edges and a penalty for cold ones!

, hot , cold

With PGO data, cold will not be inlined, hot would be inlined if available

The importer would only import cold()!

Page 206: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

ThinLTO Revisited: Incremental Build

Page 207: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: 4daa04c497e11d7dd51732f055

Module Hash: e10adbb9c3508bd5e55782d884

ThinLTO Revisited: Incremental Build

Page 208: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang -cc1 clang -cc1 clang -cc1 -flto -flto -flto=thin =thin =thin

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: 4daa04c497e11d7dd51732f055

Module Hash: e10adbb9c3508bd5e55782d884

ThinLTO Revisited: Incremental Build

Page 209: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

main.o test1.o test2.o

Summary: main, 2 instructions

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test1i global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: 4daa04c497e11d7dd51732f055

Module Hash: e10adbb9c3508bd5e55782d884

ThinLTO Revisited: Incremental Build

Page 210: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

libLTO.dylib ThinLTO Revisited: Incremental Build

Page 211: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

libLTO.dylib

main.o 4daa04c497test2.o aee693ed4btest1.o e10adbb9c3

ThinLTO Revisited: Incremental Build

Page 212: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

main.o

libLTO.dylib

main.o 4daa04c497test2.o aee693ed4btest1.o e10adbb9c3

ThinLTO Revisited: Incremental Build

Page 213: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o

libLTO.dylib

main.o 4daa04c497test2.o aee693ed4btest1.o e10adbb9c3

ThinLTO Revisited: Incremental Build

Page 214: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib

main.o 4daa04c497test2.o aee693ed4btest1.o e10adbb9c3

ThinLTO Revisited: Incremental Build

Page 215: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

Combined Index7c897c21 main.o, 2 instructions

test1.o, 42 instructions3597eb0

test1.o, 2 instructions768595etest2.o, 13 instructions77c4a42

call 77c4a42

call 3597eb0

call 768595e main.o 4daa04c497test2.o aee693ed4btest1.o e10adbb9c3

Page 216: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

Page 217: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

Page 218: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Parallel ThinLTO Backends: Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

• Read the import list

Page 219: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Parallel ThinLTO Backends:

12f67fe8

• Compute a unique hash based on all the inputs: without reading any IR!

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

• Read the import list

Page 220: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Parallel ThinLTO Backends:

12f67fe8

• Compute a unique hash based on all the inputs: without reading any IR!

• Query a persistent cache and early return on cache hit.

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

• Read the import list

Page 221: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Parallel ThinLTO Backends:

12f67fe8

• Compute a unique hash based on all the inputs: without reading any IR!

• Query a persistent cache and early return on cache hit.• Otherwise, process as normal, but commit the object to the

cache before returning to the linker.

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

• Read the import list

Page 222: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

test1.o test2.o

Summary: _Z5test1i, 42 instructions

_Z9test1_fooi, 2 instructions

Summary: _Z5test2i, 13 instructions

global linkage, call _Z5test2i global linkage, call _Z9test1_fooi

Module Hash: aee693ed4b5674829d05f56ef0b

Module Hash: e10adbb9c3508bd5e55782d884

main.o

Summary: main, 2 instructionsglobal linkage, call _Z5test1i

Module Hash: 4daa04c497e11d7dd51732f055

Parallel ThinLTO Backends:

12f67fe8

• Compute a unique hash based on all the inputs: without reading any IR!

• Query a persistent cache and early return on cache hit.• Otherwise, process as normal, but commit the object to the

cache before returning to the linker.

Import Lists

3597eb077c4a42768595e

test1.oe10adbb

test2.oaee693ed

main.o test1.o77c4a42

test2.oaee693ed

test2.o768595e

test1.oe10adbb

libLTO.dylib ThinLTO Revisited: Incremental Build

• Read the import list

• The final native link is not incremental: all the objects are re-linked always bit-to-bit the same binary.

Page 223: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{Phase 2: Thin Link

.bc.bc .bc .bc .bc .bc .bc .bc

Traditional Linking

LinkerFrontend

{Phase 1: Compile

.bc.bc .bc .bc.bc .bc .bc.bc

Inter-procedural AnalysisAnalyses Results

{Phase 3: Backends Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

ThinLTO: Distributed Builds

Page 224: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

{Phase 2: Thin Link

Traditional Linking

Linker

{.bc.bc .bc .bc.bc .bc .bc.bc

Inter-procedural AnalysisAnalyses Results

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

ThinLTO: Distributed Builds

Page 225: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc.bc .bc .bc.bc .bc .bc.bc.bc.bc .bc .bc.bc.bc .bc.bc.bc.bc .bc.bc .bc.bc .bc .bc.bc .bc .bc

LinkerDistributed

build system

.bc

Page 226: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc.bc .bc .bc.bc .bc .bc.bc.bc.bc .bc .bc.bc.bc .bc.bc.bc.bc .bc.bc .bc.bc .bc .bc.bc .bc .bc

LinkerDistributed

build system

.bc

Page 227: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc.bc .bc .bc.bc .bc .bc.bc.bc.bc .bc .bc.bc.bc .bc.bc.bc.bc .bc.bc .bc.bc .bc .bc.bc .bc .bc

LinkerDistributed

build system + Produce dependent module lists: the build system bundles these for each job

.bc

Page 228: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc.bc .bc .bc.bc .bc .bc.bc.bc.bc .bc .bc.bc.bc .bc.bc.bc.bc .bc.bc .bc.bc .bc .bc.bc .bc .bc

LinkerDistributed

build system + Produce dependent module lists: the build system bundles these for each job

.bc

Page 229: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc

.bc.bc

.bc.bc .bc .bc.bc

.bc.bc.bc.bc

.bc.bc .bc.bc.bc.bc.bc.bc .bc.bc .bc.bc.bc .bc.bc

LinkerDistributed

build system + Produce dependent module lists: the build system bundles these for each job

.bc

Page 230: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc

.bc.bc

.bc.bc .bc .bc.bc

.bc.bc.bc.bc

.bc.bc .bc.bc.bc.bc.bc.bc .bc.bc .bc.bc.bc .bc.bc

LinkerDistributed

build system

+ Distributed Inter-procedural transformations based on the analyses results

+ Produce dependent module lists: the build system bundles these for each job

.bc

Page 231: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc

.bc

.bc

.bc.bc .bc .bc.bc

.bc

.bc

.bc.bc

.bc

.bc

.bc.bc

.bc.bc.bc

.bc .bc

.bc

.bc

.bc.bc .bc.bc

LinkerDistributed

build system

+ Distributed Inter-procedural transformations based on the analyses results

+ Produce dependent module lists: the build system bundles these for each job

.bc

Page 232: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

Traditional Linking

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc

.bc

.bc

.bc.bc .bc .bc.bc

.bc

.bc

.bc.bc

.bc

.bc

.bc.bc

.bc.bc.bc

.bc .bc

.bc

.bc

.bc.bc .bc.bc

LinkerDistributed

build system

.o .o .o .o .o .o .o.o

+ Distributed Inter-procedural transformations based on the analyses results

+ Produce dependent module lists: the build system bundles these for each job

.bc

Page 233: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Import list

Import list

Import list

Import list

Import list

Import list

Import list

Import list

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.thinlto .bc

.bc.bc .bc .bc.bc .bc .bc.bc

ThinLTO: Distributed Builds

Traditional Linking

{Phase 3: Backends

Opt Opt Opt Opt Opt Opt Opt

BE BE BE BE BE BE BE

Opt

BE

+ Stream out analysis results for each backend job

{Phase 2: Thin Link Inter-procedural Analysis

Analyses Results

.bc

.bc

.bc

.bc.bc .bc .bc.bc

.bc

.bc

.bc.bc

.bc

.bc

.bc.bc

.bc.bc.bc

.bc .bc

.bc

.bc

.bc.bc .bc.bc

LinkerDistributed

build system

.o .o .o .o .o .o .o.o

+ Distributed Inter-procedural transformations based on the analyses results

+ Produce dependent module lists: the build system bundles these for each job

+ Individual index files encapsulate info necessary for incremental build decisions

.bc

Page 234: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

Page 235: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Monolithic LTO

Page 236: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

Half is the optimization pipeline Other half is the ThinLTO overhead

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

+25%Monolithic

LTO1 thread

Page 237: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

x15.1over Monolithic LTO: x12

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 238: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

17x less memory!

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

x15.1over Monolithic LTO: x12

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 239: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8

17x less memory!

41MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3over Monolithic LTO:

x15.1over Monolithic LTO: x12

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 240: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3over Monolithic LTO:

x15.1over Monolithic LTO: x12

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 241: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3over Monolithic LTO:

x15.1over Monolithic LTO: x12

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 242: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

54MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3 ÷7.5over Monolithic LTO:

x15.1 x13.7over Monolithic LTO: x12 x10.8

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 243: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

54MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3 ÷7.5over Monolithic LTO:

x15.1 x13.7over Monolithic LTO: x12 x10.8

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 244: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

54MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3 ÷7.5over Monolithic LTO:

x15.1 x13.7over Monolithic LTO: x12 x10.8

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

30x less memory!

Page 245: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

54MB per thread228MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3 ÷7.5 ÷3.9over Monolithic LTO:

x15.1 x13.7over Monolithic LTO: x12 x10.8

x12.5x7.1

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 246: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 3000226s

187s

209s

211s

212s

223s

220s

231s

254s

275s

276s

302s

344s

402s

465s

557s

657s

858s

1,315s

2,643s

1,500s

91s

96s

102s

102s

111s

116s

119s

120s

132s

141s

149s

166s

186s

212s

249s

293s

352s

462s

664s

1,517s

1,199s

93s

86s

96s

96s

96s

105s

109s

117s

123s

132s

139s

153s

171s

198s

233s

275s

340s

453s

704s

1,445s

1,147s

Release Line Tables Debug

LTO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

0 180000000005,372MB

5,031MB

4,881MB

4,785MB

4,471MB

4,165MB

3,937MB

3,760MB

3,609MB

3,301MB

2,984MB

2,795MB

2,494MB

2,258MB

2,143MB

1,802MB

1,682MB

1,248MB

933MB

574MB

17,351MB

1,428MB

1,403MB

1,398MB

1,351MB

1,285MB

1,240MB

1,154MB

1,113MB

1,049MB

968MB

952MB

895MB

852MB

685MB

690MB

603MB

540MB

497MB

391MB

302MB

9,678MB

1,144MB

1,115MB

1,035MB

1,035MB

1,015MB

938MB

953MB

882MB

802MB

788MB

782MB

652MB

599MB

615MB

565MB

490MB

434MB

425MB

357MB

273MB

5,421MB

Release Line Tables DebugPerformance: Link Time for Clang

LLVM 3.6LLVM 3.7LLVM 3.8 41MB per thread

54MB per thread228MB per thread

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

÷5.3 ÷7.5 ÷3.9over Monolithic LTO:

x15.1 x13.7over Monolithic LTO: x12 x10.8

x12.5x7.1

Monolithic LTO

1 thread

2 threads

3 threads

5 threads

6 threads

7 threads

8 threads

9 threads

10 threads

11 threads

12 threads

13 threads

14 threads

15 threads

16 threads

17 threads

18 threads

19 threads

20 threads

4 threads

Page 247: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Comparison with GCC LTO

Page 248: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Comparison with GCC LTO

GCC has a mature and well-tuned sophisticated LTO implementation (WHOPR), with two parts:

Page 249: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Comparison with GCC LTO

GCC has a mature and well-tuned sophisticated LTO implementation (WHOPR), with two parts:

1. WPA: Serial part that makes IPA and inlining decisions, rewrites partitioned IR

Page 250: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Comparison with GCC LTO

GCC has a mature and well-tuned sophisticated LTO implementation (WHOPR), with two parts:

1. WPA: Serial part that makes IPA and inlining decisions, rewrites partitioned IR

2. LTRANS: Parallel backends performing inlining within each partition, plus usual optimizations and code generation

Page 251: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Comparison with GCC LTO

GCC has a mature and well-tuned sophisticated LTO implementation (WHOPR), with two parts:

➡ Comparable Phase 2: Thin Link (both serial)

➡ Comparable to Phase 3: ThinLTO Backends (both parallel)

1. WPA: Serial part that makes IPA and inlining decisions, rewrites partitioned IR

2. LTRANS: Parallel backends performing inlining within each partition, plus usual optimizations and code generation

Page 252: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Scaling with the Input Size

Page 253: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Page 254: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Clang Chromium Ad Delivery

13,829

17,798

1,938

Number of IR Files

Page 255: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Clang Chromium Ad Delivery

7,4697,544

3,554

1,073706217

-g0 -g2

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Clang Chromium Ad Delivery

13,829

17,798

1,938

Number of IR Files Total IR Size (MB)

Page 256: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Clang Chromium Ad Delivery

7,4697,544

3,554

1,073706217

-g0 -g2

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Clang Chromium Ad Delivery

13,829

17,798

1,938

Number of IR Files Total IR Size (MB)

Page 257: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Clang Chromium Ad Delivery

7,4697,544

3,554

1,073706217

-g0 -g2

Clang Chromium Ad Delivery

6,809k

2,403k

414k

1,953k

858k169k

Nodes Edges

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Clang Chromium Ad Delivery

13,829

17,798

1,938

Number of IR Files Total IR Size (MB) Call/Reference Graph Size

Page 258: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Clang Chromium Ad Delivery

7,4697,544

3,554

1,073706217

-g0 -g2

Clang Chromium Ad Delivery

6,809k

2,403k

414k

1,953k

858k169k

Nodes Edges

Scaling with the Input Size

• Chromium - open-source web browser• Ad Delivery - internal Google datacenter application

• LLVM/Clang - C/C++ Compiler

Clang Chromium Ad Delivery

13,829

17,798

1,938

Number of IR Files Total IR Size (MB) Call/Reference Graph Size

Page 259: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Serial Step Comparisons Thin Link vs GCC WPA

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Page 260: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Serial Step Comparisons Thin Link vs GCC WPA

ThinLTO GCC WPA -g0 GCC WPA -g2

2625

2.27

10.77

8.05

1

2.891.99

0.13

Clang Chromium Ad Delivery

Kille

d

Kille

dPeak Memory (GB)

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Page 261: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Serial Step Comparisons Thin Link vs GCC WPA

ThinLTO GCC WPA -g0 GCC WPA -g2

2625

2.27

10.77

8.05

1

2.891.99

0.13

Clang Chromium Ad Delivery

Kille

d

Kille

d

ThinLTO GCC WPA -g0 GCC WPA -g2

180m 0s180m 0s

1m 42s

5m 0s

4m 0s

0m 30s1m 16s0m 55s

0m 5s

Kille

d

Kille

dTimePeak Memory (GB)

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Page 262: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Serial Step Comparisons Thin Link vs GCC WPA

ThinLTO GCC WPA -g0 GCC WPA -g2

2625

2.27

10.77

8.05

1

2.891.99

0.13

Clang Chromium Ad Delivery

Kille

d

Kille

d

ThinLTO GCC WPA -g0 GCC WPA -g2

180m 0s180m 0s

1m 42s

5m 0s

4m 0s

0m 30s1m 16s0m 55s

0m 5s

Kille

d

Kille

d

LLVM LTO doesn’t complete Ad Delivery even without debug (-g0), killed after 2 hours and > 12GB.

TimePeak Memory (GB)

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Page 263: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

Page 264: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase

ThinLTO GCC: WPA+LTRANS

Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

Page 265: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase28m18s

ThinLTO GCC: WPA+LTRANS

Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

Page 266: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase28m18s

ThinLTO GCC: WPA+LTRANS

Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

With Debug: Monolithic LTO crashes after >2h and >50GB mem

Page 267: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase28m18s

ThinLTO GCC: WPA+LTRANS

2m483m38s

6m

30s 30s 30s

Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

With Debug: Monolithic LTO crashes after >2h and >50GB mem

Page 268: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase28m18s

ThinLTO GCC: WPA+LTRANS

13m43s

5m15s

9m49s

4m25s 4m

8m21s

2m483m38s

6m

30s 30s 30s

Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

With Debug: Monolithic LTO crashes after >2h and >50GB mem

Page 269: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Chromium Build Comparisons No Debug (-g0)

8 16 32 8 16 32

Serial Phase Parallel Phase28m18s

ThinLTO GCC: WPA+LTRANS

8 16 32 8 16 32

13.7

7.1

4.0

1.31.31.2

8.18.18.1

1.01.01.0

9.5

ThinLTO GCC: WPA+LTRANS

13m43s

5m15s

9m49s

4m25s 4m

8m21s

2m483m38s

6m

30s 30s 30s

Peak Memory (GB)Time

Linux 16-core (32-logical) Intel Xeon E5-2690 @ 2.90GHz

MonolithicLTO

MonolithicLTO

With Debug: Monolithic LTO crashes after >2h and >50GB mem

Page 270: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Distributed Build: Ad Delivery

Page 271: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Distributed Build: Ad Delivery

# Nodes

Page 272: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance: Incremental Build Time for Clang

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 273: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 274: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 275: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 276: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 277: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

Changing the implementation of visitCallInst() in InstCombineCalls.cpp

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 278: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

Changing the implementation of visitCallInst() in InstCombineCalls.cpp

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 279: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

Changing the implementation of visitCallInst() in InstCombineCalls.cpp

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 280: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

1000

4s

247s252s

10s

302s315s

734s

960s970sMonolithic LTO Thin LTO Non LTO

Performance: Incremental Build Time for Clang

Clean Build Changing the implementation of DenseMap::grow()

Changing the implementation of visitCallInst() in InstCombineCalls.cpp

+25%

2013 MacPro - 12-cores E5-2697 @ 2.70GHz

Time to run `ninja clang`

Page 281: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance against Monolithic LTO: LLVM test-suite

Page 282: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance against Monolithic LTO: LLVM test-suite

Runtime on X86_64 (similar results observed on ARM64)

-60%

-45%

-30%

-15%

0%

15%

30%

45%

60%

Page 283: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance against Monolithic LTO: LLVM test-suite

Runtime on X86_64 (similar results observed on ARM64)

-60%

-45%

-30%

-15%

0%

15%

30%

45%

60%

Some benchmarks are stressing global variables

Page 284: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance against Monolithic LTO: LLVM test-suite

Runtime on X86_64 (similar results observed on ARM64)

-60%

-45%

-30%

-15%

0%

15%

30%

45%

60%

Some benchmarks are stressing global variables

Benefits from a more aggressive LTO pass pipeline

Page 285: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Performance against Monolithic LTO: LLVM test-suite

Runtime on X86_64 (similar results observed on ARM64)

-60%

-45%

-30%

-15%

0%

15%

30%

45%

60%

Some benchmarks are stressing global variables

Benefits from a more aggressive LTO pass pipeline

Binary Sizes on ARM64 -Os

-10%

0%

10%

20%

30%

40%

~3.25% regression GeoMean (but still ~2.6% improvement over non-LTO)

Mostly due to lack of hidden visibility, can be recovered

by improving implementation

Page 286: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Run-time Performance: SPEC cpu2006

-4%-2%0%2%4%6%8%

10%12%14%16%18%

Monolithic LTO + PGO ThinLTO + PGO

Improvement over -O2 (all with PGO)

Page 287: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

Page 288: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

Page 289: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO

Page 290: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO• LTO Devirtualization and CFI (Control-Flow Integrity) integration.

Page 291: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO• LTO Devirtualization and CFI (Control-Flow Integrity) integration.• Propagate function attributes across modules via summary.

Page 292: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO• LTO Devirtualization and CFI (Control-Flow Integrity) integration.• Propagate function attributes across modules via summary.• More bitcode format changes to improve link-time, especially with debug info

Page 293: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO• LTO Devirtualization and CFI (Control-Flow Integrity) integration.• Propagate function attributes across modules via summary.• More bitcode format changes to improve link-time, especially with debug info• Ability to move function across modules instead of importing, especially

useful when a single call site exists and we could internalize.

Page 294: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Future Work

• On-going new libLTO Interface: better linker information for better codegenExample: resolution of weak definition to strong definition.

• On-going tuning for PGO• LTO Devirtualization and CFI (Control-Flow Integrity) integration.• Propagate function attributes across modules via summary.• More bitcode format changes to improve link-time, especially with debug info• Ability to move function across modules instead of importing, especially

useful when a single call site exists and we could internalize.• Augment the edges in the call-graph with mod-ref, constant range, …

Page 295: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111
Page 296: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Page 297: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Page 298: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Scales as a regular non-LTO build

Page 299: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Scales as a regular non-LTO build

Performance close to Monolithic LTO

Page 300: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Scales as a regular non-LTO build

Incremental build is critical for day-to-day development!

Performance close to Monolithic LTO

Page 301: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Scales as a regular non-LTO build

On the path of replacing O2/O3 by default in production build

Incremental build is critical for day-to-day development!

Performance close to Monolithic LTO

Page 302: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Conclusion

Special Thanks: Akira Hatanaka and Bruno Cardoso Lopes (performance testing/debunking),

Adrian Prantl and Duncan Exon Smith (major changes in debug info and bitcode format), Peter Collingbourne (new LTO API)

Rafael Ávila de Espíndola (Lot of support and help!) Davide Italiano (lld support)

Piotr Padlewski (PGO import heuristics)

Scales as a regular non-LTO build

On the path of replacing O2/O3 by default in production build

Incremental build is critical for day-to-day development!

Performance close to Monolithic LTO

Page 303: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111
Page 304: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Backup Slides

Page 305: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 306: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 307: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 308: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 309: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

LinkerCustom LTO PIPELINE

(includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 310: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Codegen

Custom LTO PIPELINE (includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 311: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Codegen

Custom LTO PIPELINE (includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

Early Opt

IR SimplificationCGSCC

Inliner{-O3 ThinLTO

Page 312: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Linker

Codegen

Custom LTO PIPELINE (includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

Early Opt

IR SimplificationCGSCC

Inliner{-O3 ThinLTO

Page 313: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Linker

Codegen

Custom LTO PIPELINE (includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

Early Opt

IR SimplificationCGSCC

Inliner{Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 314: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Different optimization pipelines

-O3 w/o LTOEarly Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 LTO

Codegen

Linker

Linker

Codegen Codegen

Custom LTO PIPELINE (includes inliner, …)

Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

Early Opt

IR SimplificationCGSCC

Inliner{Early Opt

IR SimplificationCGSCC

Inliner{Drop Available Ext.

Aggressive Optimizations

-O3 ThinLTO

Page 315: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

clang:

input files 217MB, and final binary is 52M

Summary blocks 0.85% on avg of IR .o sizes

Combined index total: 4M on-disk (not stored on disk unless distributed though)

Page 316: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

With Debug (-g2)16-core (32-logical) Intel Xeon

E5-2690 @ 2.90GHz

Chromium Build Comparisons

Page 317: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

With Debug (-g2)16-core (32-logical) Intel Xeon

E5-2690 @ 2.90GHz

Chromium Build Comparisons

Page 318: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

With Debug (-g2)16-core (32-logical) Intel Xeon

E5-2690 @ 2.90GHz

Tim

e

0m 0s

5m 30s

11m 0s

16m 30s

22m 0s

LLVM LTO 8 16 32 8 16 32

Serial portion Parallel portion> 25m

ThinLTO GCC: WPA+LTRANS

Cra

sh

Chromium Build Comparisons

Page 319: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

With Debug (-g2)16-core (32-logical) Intel Xeon

E5-2690 @ 2.90GHz

Tim

e

0m 0s

5m 30s

11m 0s

16m 30s

22m 0s

LLVM LTO 8 16 32 8 16 32

Serial portion Parallel portion> 25m

ThinLTO GCC: WPA+LTRANS

Cra

sh

Peak

Mem

ory

(GB)

0

7.5

15

22.5

30

LLVM LTO 8 16 32 8 16 32

Serial portion Parallel portion

ThinLTO GCC: WPA+LTRANS

Cra

sh

> 50G

Chromium Build Comparisons

Page 320: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

With Debug (-g2)16-core (32-logical) Intel Xeon

E5-2690 @ 2.90GHz

Tim

e

0m 0s

5m 30s

11m 0s

16m 30s

22m 0s

LLVM LTO 8 16 32 8 16 32

Serial portion Parallel portion> 25m

ThinLTO GCC: WPA+LTRANS

Cra

sh

Peak

Mem

ory

(GB)

0

7.5

15

22.5

30

LLVM LTO 8 16 32 8 16 32

Serial portion Parallel portion

ThinLTO GCC: WPA+LTRANS

Cra

sh

> 50G

-> For GCC, time versus memory tradeoff is much more severe!

Chromium Build Comparisons

Page 321: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Run-time Performance: SPEC cpu2006

0%

2%

4%

6%

8%

10%LTO ThinLTO

Improvement over -O2 (no PGO)

Page 322: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Binary/Text Size: SPEC cpu2006 471.gcc

Binary (MB) Text (MB)

Type LTO ThinLTO Increase LTO ThinLTO Increase

-g0 4276168 4438720 3.80% 4087383 4161320 1.81%

-g 12347112 20794136 68.41% 4087655 4161456 1.81%

Page 323: ThinLTO - LLVM · 2019. 10. 30. · main.o test1.o test2.o LLVM LTO: in a Nutshell. ELF / MachO / COFF 1010110101010101110101101011 0101010101110101101011010101 0101110101101011010101010111

Binary/Text Size: SPEC cpu2006 471.gcc

Binary (MB) Text (MB)

Type LTO ThinLTO Increase LTO ThinLTO Increase

-g0 4276168 4438720 3.80% 4087383 4161320 1.81%

-g 12347112 20794136 68.41% 4087655 4161456 1.81%

• Despite recent improvements, still too much debug metadata imported during ThinLTO importing!

• E.g. composite type definitions-> Area of future work

68.41%