NanoLog: A Nanosecond Scale Logging System · Overview • Implemented a fast C++ Logging System...

Preview:

Citation preview

NanoLog:ANanosecondScaleLoggingSystem

StephenYangJohnOusterhout

February9th,2017PlatformLab Review2017

Overview• ImplementedafastC++LoggingSystem• 12.5nsmedianlatencyat60M logmsgs/sec• 10-100xfasterthanexistingsystemssuchasLog4j2andspdlog• Maintainsprintf-likesemantics

• Shiftsworkoutoftheruntimehot-path• Extractionofstaticinformationatcompile-time• Compactedbinaryoutputatruntime• Defersformattingtoanofflineprocess

• BenefitsandCosts• Allowsdetailedlogsinlowlatencysystems• Comesatthecostof1MBofRAMperthread,onecore,anddiskbandwidth

WhyFast Logging?• Cornerstoneofdebugging• Affordsvisibilityapplicationstate• Helpsinrootcauseanalysisafterexecution

• Problem:Loggingisslow• Applicationresponsetimesaregettingfaster(microseconds)• Loggingisnot(100-1000’sofnanoseconds)• Example:RAMCloudresponsetime=5µs,butlogtime=1µs

Whatmakesloggingslow?

• Compute:ComplexFormatting• Loggersneedtoprovidecontext(i.e.filelocation,time,severity,etc)• Themessageabovehas7argumentsandtakes850nstocompute

• OutputBandwidth:DiskIO• Ona250MB/sdisk,the129bytemessageabovetakes500ns tooutput!

1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000

Solutions

• Compute:RawDataOutput• Mostlogsinproductionarenotconsumedbyhumans• Savecomputationbydeferringformattingtoanofflineprocess• Sidebenefit:moreefficientforanalysisengines

• IO:ExtractingStaticInformation• StaticInfoinmessage:filelocation,line#,function,severity,formatstring.• Replacewithidentifierandcompactremainingdynamicinformation

1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000

NanoLogSystemArchitecture

CompactLog

Runtime

ApplicationExecutable

NanoLogRuntime

BufferBuffer

UserThread

Buffer

DecompressorAggregator

Offline

HumanReadable

Log

NanoLogPreprocessor

GCC

Compilation-TimeUserSourcesUser

SourcesUser

SourcesProcessed

UserSources

LibrarySourcesDecompressorAggregator

ApplicationExecutable

Compile-timeOptimizationsPost-ProcessedUserSource(main.ii)UserSource(main.cc)

NanoLogLibrary(StaticInfo.cc)

(a)Extractstaticloginfo

(b)Injectoptimizedlogcode

ApplicationExecutable

DecompressorExecutable

compilecompile

FastRuntimeArchitecture• IsolatetheThreads• Useper-threadbufferstolowersynchronization• Don’tnotifythebackgroundthread;letitpollfordata

• MinimizeOutputCost• Callerpushesdatauncompressed tosaveoncompute• IOThreadneedstosaveonbothIOandcomputetimes.

• Useonlyrudimentarycompaction(deltas+smallestbyterepresentations)

Runtime

NanoLogBackgroundThread

OutputLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....

UserThread BufferUserThread BufferUserThread Buffer

Decompressor/Aggregator• Offlineprocesstodecompresslog• Recombinesthestatic+dynamicdatatoproduceahuman-readablefile

• FutureWork• Query/Aggregateincompactedformat

CompactLogFile[1bytesHeader][1-4byteUniqueId][1-8byteTimediff][0-4bytessize][0-nbytesarguments]....

HumanReadableLogFile

2/9/1712:45:24[main]:HelloWorld21

Decompressor/Aggregator

Benchmarks• SystemSetup• Processor:Quad-CoreIntelXeonX3470@2.93GHz• Memory:24GBDDR3@1333Mhz• Disk:120GBCrucialM4overSATAII(~250MB/s)

• TestSetup• 100Miterationsoflogmessages,backtoback• LogMessage:“{time}{severity}:{56-bytemessage}”

• OverallResultsZeroArguments Boostv1.55 Log4j2 Spdlog NanoLog

Throughput(Log/s) 0.82M 1.43M 1.50M 60.1M

AverageLatency(ns) 1110ns 697ns 668ns 16.5ns

0.82 1.43 1.5

60.1

0

20

40

60

Throug

hput

(MillionsLog

s/sec)

Throughputvs.System

BoostLog Log4j2 spdlog NanoLog

TailLatencies

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104 105 106 107 108 109

Frac

tion

of L

ogs

Latency (ns)

Kernel InterferenceBoost

Log4j2spdlog

NanoLog

TailLatency(+NanoLogCompute)

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104 105 106 107 108 109

Frac

tion

of L

ogs

Latency (ns)

Kernel InterferenceBoost

Log4j2spdlog

NanoLogNanoLog with 10ns Compute

IncreasingParameters

60.1 60

37

2125

13.7

21.6

9.7

17.6

8

14.5

6.41

0

10

20

30

40

50

60

70

SmallIntegers(~1Byte) LargeIntegers(~4bytes)

MillionsofLogM

essages/second

Throughputwithincreasing“%d”parameters

0Params 1Param 2Params 3Params 4Params 5Params

Limitations/FutureWork• BetterCompression?• Isthereabetterwaytocompacttheoutput,butinaperformantway?

• Fullyfeatureddecompressor/aggregator• Operatingonthecompactrepresentationismoreefficient.• Iteratingoveracompactlogmessagetakesabout100nsvs.1.3µstooutput

• ResourceUtilization• Currentlythesystemrequires1MBperuserthread,afullcoretocompact,andthefullbandwidthofaSATASSDtomainlowlatency.Howdoesthischangewithnewhardware?

NanoLogSystemSummary• Compile-TimePreprocessor

• Extractstaticinformationfromlogmessagesatcompiletime• Filename,line#,functionname,etc

• CatalogsstaticinfoandassignsauniqueIDtoeachlogstatement• CodeInjectiontorecordonlyanidentifier+parameterarguments

• RuntimeLibrary• Producer/ConsumerLogoutput• Simplecompaction(takingdeltas/compactingintegers)

• OfflineDecompressor/Aggregator• Recombinestaticinformationforhumanconsumption(ifnecessary)• OfflineSearch/Grep/Aggregateincompressedformat

Questions

Recommended