184
Pivot Tracing Dynamic Causal Monitoring for Distributed Systems Jonathan Mace, Ryan Roelke, Rodrigo Fonseca Brown University 1

Pivot Tracing - Brown Universitycs.brown.edu/~jcmace/presentations/mace15pivot-presentation.pdfHBase MapReduce HDFS How is disk bandwidth being used?Time [mi n] 0 50 100 150 200 0

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Pivot TracingDynamic Causal Monitoring for

    Distributed Systems

    Jonathan Mace, Ryan Roelke, Rodrigo Fonseca

    Brown University

    1

  • Pivot TracingDynamic Causal Monitoring for

    Distributed Systems

    Dynamically instrument live distributed systems

    2

  • Pivot TracingDynamic Causal Monitoring for

    Distributed Systems

    Dynamically instrument live distributed systems

    Correlate and group events across components

    2

  • Pivot TracingDynamic Causal Monitoring for

    Distributed Systems

    Dynamically instrument live distributed systems

    Correlate and group events across components

    Hadoop Stack2

  • Disk

    HDFS

    Hadoop Stack3

  • Disk

    HBase

    HDFS

    Hadoop Stack3

  • Disk

    HBase MapReduce

    HDFS

    Hadoop Stack3

  • FSREAD4MFSREAD64M

    Disk

    HBase MapReduce

    HDFS

    Hadoop Stack4

  • FSREAD4MFSREAD64M

    HSCANHGET

    Disk

    HBase MapReduce

    HDFS

    Hadoop Stack4

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    Hadoop Stack4

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?4

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?

    DataNodeMetrics

    5

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?

    DataNodeMetrics

    5

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    5

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    6

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    6

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    7

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    7

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    8

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Missed!

    8

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    IOStream

    8

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    How is disk bandwidth being used?Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    IOStream

    8

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]0

    50

    100

    150

    200

    0 5 10 15

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    IOStream

    9

  • MRSORT10G

    MRSORT100GFSREAD4M

    FSREAD64MHSCAN

    HGET

    Disk

    HBase MapReduce

    HDFS

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    IOStream

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    9

  • Instrumentation is decided at development time

    10

  • Instrumentation is decided at development time

    Probably not have enough info for your problem

    Probably too much irrelevant info for your problem

    10

  • Instrumentation is decided at development time

    Probably not have enough info for your problem

    Probably too much irrelevant info for your problem

    Should every user bear the cost of a feature?

    10

  • Instrumentation is decided at development time

    Probably not have enough info for your problem

    Probably too much irrelevant info for your problem

    Should every user bear the cost of a feature?

    10

  • Instrumentation is decided at development time

    Probably not have enough info for your problem

    Probably too much irrelevant info for your problem

    Should every user bear the cost of a feature?

    10Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

  • Dynamic dependenciesYou often need to correlate information from different points in the system

    11

  • Dynamic dependenciesYou often need to correlate information from different points in the system

    Systems are designed to compose

    Systems don’t embed monitoring that relates to other services

    11

  • Dynamic dependencies

    Netflix “Death Star” Microservices Dependencies@bruce_m_wong

    You often need to correlate information from different points in the system

    Systems are designed to compose

    Systems don’t embed monitoring that relates to other services

    11

  • Pivot Tracing

    12

  • Pivot Tracing

    You don’t know the questions in advance

    Dynamic instrumentation

    Fay (SOSP’11), Dtrace (ATC’04), …

    12

  • Pivot Tracing

    You don’t know the questions in advance

    Dynamic instrumentation

    Fay (SOSP’11), Dtrace (ATC’04), …

    You often need to correlate information from different points in the system

    Causal tracing

    X-Trace (NSDI’07), Dapper (Google), Pip (NSDI’06), …

    12

  • 13

    Pivot Tracing

  • 13

    Model system events as tuples in a streaming, distributed dataset

    Pivot Tracing

  • 13

    Model system events as tuples in a streaming, distributed dataset

    Dynamically evaluate relational queries over this dataset

    Pivot Tracing

  • 13

    Model system events as tuples in a streaming, distributed dataset

    Dynamically evaluate relational queries over this dataset

    Join based on Lamport’s happened-before relation

    Happened-before Join ( )

    Pivot Tracing

  • Pivot TracingOverview

    14

  • 15

  • HBase

    HDFS

    MapReduce

    DataNodeMetrics

    15

  • HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    15

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    15

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    15

    From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    15

    From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    15

    From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    15

    From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)

  • (“DataNodeMetrics”, delta=10, host=“hop01”, …)

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    DataNodeMetrics.java

    50 public class DataNodeMetrics {...

    266 public void incrBytesRead(int delta) {267 ...268 }

    ...407 }

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    TracepointClass: DataNodeMetricsMethod: incrBytesReadExports: “delta”=delta

    15

    From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)

  • HBase

    HDFS

    MapReduce

    DataNodeMetrics

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    15

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    16

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    ClientProtocols

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    16

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    DataNodeMetrics

    Happened-before Join ( )

    ClientProtocols

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    16

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    ClientProtocols

    DataNodeMetrics

    17

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    17

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    17

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    17

    ClientProtocols

    DataNodeMetrics

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    17

    ClientProtocols

    DataNodeMetrics

    (“ClientProtocols”, procName=“HGET”, … “DataNodeMetrics”, delta=10, host=“Hop01”, …)

  • Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    17

    From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

  • From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    18

  • From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    18

  • From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    18

  • From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    18

  • From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    put [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    HBase

    HDFS

    MapReduce

    Happened-before Join ( )

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    (“DataNodeMetrics”, delta=10, host=“Hop01”, …)

    (“ClientProtocols”, procName=“HGET”, …)

    ClientProtocols

    DataNodeMetrics

    18

    (procName=“HGET”, delta=10, …)

  • Design & ImplementationPivot Tracing Pre-requisites

    19

  • Design & ImplementationPivot Tracing Pre-requisites

    Dynamic instrumentation PT Agent

    19

  • Design & ImplementationPivot Tracing Pre-requisites

    Dynamic instrumentation

    Causal tracing Baggage

    PT Agent

    19

  • Causal tracing Baggage19

  • Process

    Tenant

    Workload

    Request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    • Generalization of metadata in end-to-end tracing• One instance per request

    20

  • A

    B

    Process

    Tenant

    Workload

    Request

    • Generalization of metadata in end-to-end tracing• One instance per request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    21

  • A

    B

    Process

    Tenant

    Workload

    Request

    • Generalization of metadata in end-to-end tracing• One instance per request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    21

  • A

    B

    PACK(clientName,“FSRead”)

    clientName=“FSRead”

    Process

    Tenant

    Workload

    Request

    • Generalization of metadata in end-to-end tracing• One instance per request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    21

  • A

    B

    clientName=“FSRead”

    Process

    Tenant

    Workload

    Request

    • Generalization of metadata in end-to-end tracing• One instance per request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    21

  • A

    B UNPACK(clientName)

    clientName=“FSRead”

    clientName=“FSRead”

    Process

    Tenant

    Workload

    Request

    • Generalization of metadata in end-to-end tracing• One instance per request

    Causal tracing Baggage

    Baggage is a Key:Value container propagated alongside a request

    21

  • Process

    Tenant

    Workload

    Request

    21

  • Pivot Tracing Enabled

    + Baggage

    PT Agent+

    Process

    Tenant

    Workload

    Request

    22

  • Design & ImplementationQueries

    23

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    24

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    24

  • Places where PT can add instrumentation

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    Tracepoints

    25

  • Places where PT can add instrumentation

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    Tracepoints

    Tracepoint AClass: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    25

  • Places where PT can add instrumentation

    Export identifiers accessible to queriesDefaults: host, timestamp, pid, proc name

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    Tracepoints

    Tracepoint AClass: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    25

  • Places where PT can add instrumentation

    Export identifiers accessible to queriesDefaults: host, timestamp, pid, proc name

    Only references – not materialized until query is installed

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    Tracepoints

    Tracepoint AClass: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    25

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    Relational query language, similar to SQL, LINQ

    Refers to tracepoint-exported identifiers

    Query Language

    • Selection• Projection• Filter

    • GroupBy• Aggregation• Happened-Before Join Tracepoint A

    Class: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    26

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    PTRelational query language, similar to SQL, LINQ

    Refers to tracepoint-exported identifiers

    Query Language

    • Selection• Projection• Filter

    • GroupBy• Aggregation• Happened-Before Join Tracepoint A

    Class: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    26

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    PTRelational query language, similar to SQL, LINQ

    Refers to tracepoint-exported identifiers

    Output: stream of tuplese.g., (procName, delta)

    Query Language

    • Selection• Projection• Filter

    • GroupBy• Aggregation• Happened-Before Join Tracepoint A

    Class: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    26

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    PT

    Pivot TracingFront End

    Query is compiled to advice(intermediate representation for instrumentation)

    Advice

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    27

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    PT

    Pivot TracingFront End

    Query is compiled to advice(intermediate representation for instrumentation)

    Advice will be installed at tracepoints

    Advice

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    27

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    A

    B

    PT

    Pivot TracingFront End

    Query is compiled to advice(intermediate representation for instrumentation)

    Advice will be installed at tracepoints

    Limited instruction setOBSERVE

    PACK

    FILTER

    UNPACK

    EMIT

    Advice

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    27

  • PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    PT Agent dynamically enables advice at tracepoints

    Weaving

    B1

    A1

    From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    28

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procNameOBSERVE

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procName

    PACK

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procName

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procName

    deltaOBSERVE

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procName

    deltaprocName

    UNPACK

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    procName

    deltaprocName

    EMIT

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1delta

    procName

    Baggage explicitly follows execution

    Evaluated inline during a request(no global aggregation needed)

    Evaluating

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    Query Results

    Tuples are accumulated locally in PT Agent

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    29

  • From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    A

    B

    Query Results

    Tuples are accumulated locally in PT Agent

    Periodically reported back to usere.g., every second

    PT Agent

    PT Agent

    Instrumented System (+ Baggage, PT Agent)+

    PT

    Pivot TracingFront End

    A1

    B1

    29

  • Pivot TracingEvaluation

    30

  • Java-Based Implementation

    31

  • Java-Based Implementation

    PT agent thread that runs inside each process• Javassist for dynamic instrumentation

    • PubSub to receive commands / send tuples

    PT Agent

    31

  • Java-Based Implementation

    PT agent thread that runs inside each process• Javassist for dynamic instrumentation

    • PubSub to receive commands / send tuples

    Baggage library for use by instrumented system• Data format specified using Protocol Buffers

    PT Agent

    31

  • Java-Based Implementation

    PT agent thread that runs inside each process• Javassist for dynamic instrumentation

    • PubSub to receive commands / send tuples

    Baggage library for use by instrumented system• Data format specified using Protocol Buffers

    Front-end client library• Define tracepoints and write text queries

    • Compile queries to advice

    • Submit advice to PT agents

    PT Agent

    A1

    B1

    A

    B

    Pivot TracingFront End

    31

  • Pivot Tracing Enabled (+ Baggage, PT Agent)+

    HBase YARN ZooKeeper

    HDFS MapReduce

    32

  • Pivot Tracing Enabled (+ Baggage, PT Agent)+

    HBase YARN ZooKeeper

    HDFS MapReduce

    Adding Baggage: ~50-200 lines of code per system

    32

  • Pivot Tracing Enabled (+ Baggage, PT Agent)+

    HBase YARN ZooKeeper

    HDFS MapReduce

    Adding Baggage: ~50-200 lines of code per system

    Primarily modifying execution boundaries:Thread, Runnable, Callable, Queue

    RPC invocations

    32

  • Pivot Tracing Enabled (+ Baggage, PT Agent)+

    HBase YARN ZooKeeper

    HDFS MapReduce

    Adding Baggage: ~50-200 lines of code per system

    Primarily modifying execution boundaries:Thread, Runnable, Callable, Queue

    RPC invocations

    32

  • Pivot Tracing Overheads

    • Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead

    (+ Baggage, PT Agent)+

    33

  • Pivot Tracing Overheads

    • Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead

    • No overhead for queries / tracepoints until installed

    (+ Baggage, PT Agent)+

    33

  • Pivot Tracing Overheads

    • Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead

    • No overhead for queries / tracepoints until installed

    • With queries from paper installedApplication level benchmarks: max 14.3% overhead

    (CPU-only lookups)

    (+ Baggage, PT Agent)+

    33

  • Pivot Tracing Overheads

    • Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead

    • No overhead for queries / tracepoints until installed

    • With queries from paper installedApplication level benchmarks: max 14.3% overhead

    (CPU-only lookups)

    Largest baggage size: ~137 bytes

    (+ Baggage, PT Agent)+

    33

  • Experiments

    34

  • Experiments

    1. Monitoring queries with various groupings

    Time [min]

    0

    50

    100

    150

    200

    0 5 10 15

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ] Host A Host EHost B Host FHost C Host GHost D Host H

    Time [min]

    HD

    FS

    Th

    rou

    gh

    pu

    t [M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    MRSORT100G HSCANMRSORT10G HGET

    FSREAD4MFSREAD64M

    MapTaskShuffleHandlerReduceTask

    DataNode

    0

    50

    100

    150

    200

    0 5 10 15Time5[min]

    Dis

    k5T

    hro

    ug

    hp

    ut5[M

    B/s

    ]

    0

    50

    100

    150

    200

    0 5 10 15

    34

  • 1. Monitoring queries with various groupings

    2. Decomposing request latencies

    Experiments

    35

  • 1. Monitoring queries with various groupings

    2. Decomposing request latencies

    3. Debugging recurring problems

    Experiments

    36

  • 1. Monitoring queries with various groupings

    2. Decomposing request latencies

    3. Debugging recurring problems

    Experiments

    36

  • HDFS NameNode

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File

    FileFile File

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File

    File 2 3 5

    FileFile File

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client FileGetBlockLocations

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client

    File 2 3 5

    GetBlockLocations

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client

    File 2 3 5

    GetBlockLocations

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client

    File

    GetBlockLocations

    DataTransferProtocol

    37

  • HDFS NameNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode

    HDFS DataNode DataNode

    Replicated block storage

    File 2 3 5

    FileFile File

    Client

    File

    GetBlockLocations

    DataTransferProtocol

    37

  • Client Workload Generator• Randomly read from large dataset

    HDFS DataNode

    8 Worker Hosts

    Host A Host B Host C Host D

    Host E Host F Host G Host H

    38

  • Client Workload Generator• Randomly read from large dataset

    HDFS DataNode

    8 Worker Hosts

    + HDFS NameNode

    Host A Host B Host C Host D

    Host E Host F Host G Host H

    38

  • Same machines, same processes, same workloads

    39

  • Same machines, same processes, same workloads

    I expected uniform throughput from workload generators

    39

  • Same machines, same processes, same workloads

    I expected uniform throughput from workload generators

    39

  • Same machines, same processes, same workloads

    I expected uniform throughput from workload generators

    I expected uniform throughput on DataNodes

    39

  • Same machines, same processes, same workloads

    I expected uniform throughput from workload generators

    I expected uniform throughput on DataNodes

    39

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    HDFS NameNode

    GetBlockLocations

    From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.fileNameSelect blockLocations.fileName, COUNT

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    HDFS NameNode

    GetBlockLocations

    From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.fileNameSelect blockLocations.fileName, COUNT

    Freq

    ue

    ncy

    Number of times accessed

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    Freq

    ue

    ncy

    Number of times accessed

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    Freq

    ue

    ncy

    Number of times accessed

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    Freq

    ue

    ncy

    Number of times accessed

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    Freq

    ue

    ncy

    Number of times accessed

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    40

  • It’s probably a bug in the workload generator I wrote

    My hypothesis: Workload generator is not randomly looking up files

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    40

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    41

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    How often was each DataNode a replica host?

    41

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    HDFS NameNode

    GetBlockLocations

    From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.replicasSelect blockLocations.replicas, COUNT

    How often was each DataNode a replica host?

    41

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    HDFS NameNode

    GetBlockLocations

    From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.replicasSelect blockLocations.replicas, COUNT

    Replica Location

    Co

    un

    t

    How often was each DataNode a replica host?

    41

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.replicasSelect cl.host, blockLocations.replicas, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    Replica Location

    Co

    un

    t

    How often was each DataNode a replica host?

    41

  • Maybe skewed DataNode throughput is because some DataNodes store more files than others

    From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.replicasSelect cl.host, blockLocations.replicas, COUNT

    HDFS NameNode

    GetBlockLocations

    ClientDoRandomRead

    How often was each DataNode a replica host?

    Replica Location

    Clie

    nt

    41

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random

    Hypothesis: choice of replica isn’t random?

    When a file is read from a DataNode, where else could it have been read from?

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    42

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    When both and host replicas,

    Clients choose this often: (~50%)

    Clients choose this often: (~50%)

    43

  • Clients choose this often: (100%)

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • Clients choose this often: (100%)

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • Clients choose this often: (100%)

    ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    Clients choose this often: (100%)

    Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    Clients choose this often: (100%)

    Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • ClientDoRandomRead

    HDFS NameNode

    GetBlockLocations

    HDFS DataNode

    DataTransferProtocol

    When both and host replicas,

    Clients choose this often: (100%)

    Clients choose this often: (0%)

    From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations

    On blockLocations -> readBlockJoin cl In Client.DoRandomRead

    On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT

    43

  • 44

  • • Lack of randomization skewed workload toward certain DNs

    44

  • • Lack of randomization skewed workload toward certain DNs

    • Independently discovered. Fixed in HDFS 2.5

    44

  • • Lack of randomization skewed workload toward certain DNs

    • Independently discovered. Fixed in HDFS 2.5

    • Seamlessly add correlations between multiple components• Very specific, one-off metrics• This experiment: 1.5% application-level overhead

    44

  • Pivot TracingDynamic Causal Monitoring for Distributed Systems

    45

  • Pivot Tracing

    Happened-Before Join

    Dynamic Causal Monitoring for Distributed Systems

    45

  • Pivot Tracing

    Dynamic Instrumentation Causal Tracing

    Happened-Before Join

    Dynamic Causal Monitoring for Distributed Systems

    45

  • Pivot Tracing

    Acceptable overheads for production (we think)

    Dynamic Instrumentation Causal Tracing

    Happened-Before Join

    Dynamic Causal Monitoring for Distributed Systems

    45

  • Pivot Tracing

    Acceptable overheads for production (we think)

    Dynamic Instrumentation Causal Tracing

    Happened-Before Join

    Standing basic queries Potential to dig deeper

    Dynamic Causal Monitoring for Distributed Systems

    45

  • Pivot Tracing

    Acceptable overheads for production (we think)

    Dynamic Instrumentation Causal Tracing

    Happened-Before Join

    Standing basic queries Potential to dig deeper

    Dynamic Causal Monitoring for Distributed Systems

    Jonathan Mace Ryan Roelke Rodrigo Fonseca

    Tracepoint AClass: AMethod: A1()

    Tracepoint BClass: BMethod: B1()Exports: “delta”=delta

    From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)

    Advice A1OBSERVE procNamePACK procName

    A

    Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)

    B

    46