of 105/105
CICADA PREDICTIVE GUARANTEES FOR CLOUD NETWORKS Katrina LaCurts Jeff Mogul Hari Balakrishnan Yoshio Turner MIT CSAIL, Google Inc., HP Labs

CICADA - USENIX · 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm 8 vm 9 vm 2 vm 3 vm 4 vm 5 vm 6 vm 7 vm 8 vm 9 vm 1 rigid application (similar to VOC [1]) vm 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm

  • View
    35

  • Download
    0

Embed Size (px)

Text of CICADA - USENIX · 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm 8 vm 9 vm 2 vm 3 vm 4 vm 5 vm 6 vm 7 vm 8 vm 9...

  • CICADAPREDICTIVE GUARANTEES FOR CLOUD NETWORKS

    Katrina LaCurts Jeff Mogul

    Hari Balakrishnan Yoshio Turner

    MIT CSAIL, Google Inc., HP Labs

  • [email protected]|2

    BANDWIDTH GUARANTEES

    `

  • [email protected]|2

    BANDWIDTH GUARANTEES10 machines

    `

  • [email protected]|2

    BANDWIDTH GUARANTEES10 machines

    `

  • [email protected]|2

    BANDWIDTH GUARANTEES10 machines

    `

  • [email protected]|2

    BANDWIDTH GUARANTEES

    bandwidth guarantee: a promise from the provider to the customer that its VMs will be able to

    communicate with each other at a particular rate (informal definition)

    10 machines

    `

  • [email protected]|3

    WHY GUARANTEES?

  • [email protected]|3

    WHY GUARANTEES?

    is there really that much data?

  • [email protected]|3

    WHY GUARANTEES?applications can send

    terabytes or petabytes of data internally [1]

    is there really that much data?

    [1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

  • [email protected]|3

    WHY GUARANTEES?applications can send

    terabytes or petabytes of data internally [1]

    aren’t cloud networks homogeneous and well-provisioned?

    [1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

  • [email protected]|3

    WHY GUARANTEES?applications can send

    terabytes or petabytes of data internally [1]application traffic is

    affected by other users

    aren’t cloud networks homogeneous and well-provisioned?

    [1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

  • [email protected]|3

    WHY GUARANTEES?applications can send

    terabytes or petabytes of data internally [1]application traffic is

    affected by other users

    what customers care about network performance?

    [1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

  • [email protected]|3

    WHY GUARANTEES?applications can send

    terabytes or petabytes of data internally [1]application traffic is

    affected by other users

    enterprise customers need to satisfy SLAs with their own customers

    what customers care about network performance?

    [1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

  • [email protected]|4

    POSSIBLE ARCHITECTURE

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    10 machines, 10Gb/s between each pair

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

    what if some pairs of VMs use fewer than 10Gb/s?

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

    what if some pairs of VMs use fewer than 10Gb/s?

    over-provisioned network increased cost to customer,

    wasted bandwidth within network

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

    what if some pairs of VMs use fewer than 10Gb/s?

    what if some pairs of VMs need more than 10Gb/s?

    over-provisioned network increased cost to customer,

    wasted bandwidth within network

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

    what if some pairs of VMs use fewer than 10Gb/s?

    what if some pairs of VMs need more than 10Gb/s?

    over-provisioned network increased cost to customer,

    wasted bandwidth within network

    under-provisioned network poor performance for

    customer’s application

  • [email protected]|4

    POSSIBLE ARCHITECTURE

    customer requests machines and guarantee

    provider places customer’s VMs,

    enforces guarantee

    10 machines, 10Gb/s between each pair

    what if some pairs of VMs use fewer than 10Gb/s?

    what if some pairs of VMs need more than 10Gb/s?

    over-provisioned network increased cost to customer,

    wasted bandwidth within network

    under-provisioned network poor performance for

    customer’s application

    problem: how do customers know what guarantee they need?

  • [email protected]|5

    CICADA’S ARCHITECTUREcicada makes predictions about an application’s

    traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    customer makes initial request for machines

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    customer makes initial request for machines provider places

    tenant VMs

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines provider places

    tenant VMs

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines provider places

    tenant VMs

    cicada predicts bandwidth for each tenant

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines

    provider offers guarantee based on cicada’s prediction

    provider places tenant VMs

    cicada predicts bandwidth for each tenant

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|5

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines

    provider offers guarantee based on cicada’s prediction

    provider places tenant VMs

    cicada predicts bandwidth for each tenant

    provider updates placement based on cicada’s prediction

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|6

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines

    provider offers guarantee based on cicada’s prediction

    provider places tenant VMs

    cicada predicts bandwidth for each tenant

    provider updates placement based on cicada’s prediction

    cicada makes predictions about an application’s traffic to automatically generate a guarantee

  • [email protected]|6

    CICADA’S ARCHITECTURE

    hypervisors send measurements to cicada controller

    customer makes initial request for machines

    provider offers guarantee based on cicada’s prediction

    provider places tenant VMs

    cicada predicts bandwidth for each tenant

    provider updates placement based on cicada’s prediction

    problem: is application traffic actually predictable? if so, is it captured by existing models?

  • [email protected]|7

    APPLICATION TRAFFICto design cicada’s prediction algorithm, we need to understand how

    cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    more data

    less data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    variable application

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    variable application

    spatial variability: different pairs of VMs transfer different amounts of data

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    variable application

    spatial variability: different pairs of VMs transfer different amounts of data

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    variable application

    spatial variability: different pairs of VMs transfer different amounts of data

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|7

    APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    rigid application (similar to VOC [1])

    vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1

    variable application

    spatial variability: different pairs of VMs transfer different amounts of datatemporal variability: pairs of VMs transfer different amounts of data at different times

    more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

    to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

  • [email protected]|

    DATASET

    ... ... ......

    HP Cloud Serviceshttp://hpcloud.com

    8

  • [email protected]|

    DATASET

    ... ... ......

    observed sflow data (could also use tcpdump, etc.)

    HP Cloud Serviceshttp://hpcloud.com

    8

  • [email protected]|

    DATASET

    ... ... ......

    observed sflow data (could also use tcpdump, etc.)

    HP Cloud Serviceshttp://hpcloud.com

    8

    (in this talk) collected one traffic matrix per hour, for

    each application

  • [email protected]|

    DATASET

    ... ... ......

    observed sflow data (could also use tcpdump, etc.)

    M1 M2 Mn

    ...

    observed traffic

    HP Cloud Serviceshttp://hpcloud.com

    8

    each entry represents the number of bytes

    transferred between i and j

    (in this talk) collected one traffic matrix per hour, for

    each application

  • [email protected]|

    DATASET

    ... ... ......

    observed sflow data (could also use tcpdump, etc.)

    M1 M2 Mn

    ...

    observed traffic

    HP Cloud Serviceshttp://hpcloud.com

    8

    each entry represents the number of bytes

    transferred between i and j

    (in this talk) collected one traffic matrix per hour, for

    each application

    goal: use this dataset to quantify spatial and temporal variability

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    Fij = 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20 Fij < 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    2. calculate the coefficient of variation (σ/µ) of the Fij values

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    2. calculate the coefficient of variation (σ/µ) of the Fij values

    ⇒ σ(Fij)/µ(Fij) = 0

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    2. calculate the coefficient of variation (σ/µ) of the Fij values

    ⇒ σ(Fij)/µ(Fij) = 0

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    ⇒ σ(Fij)/µ(Fij) = 1*

    *see paper for this result

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    2. calculate the coefficient of variation (σ/µ) of the Fij values

    ⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    ⇒ σ(Fij)/µ(Fij) = 1*

    *see paper for this result

    more data

    less datano data

  • [email protected]|

    high spatial variabilityvm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    no spatial variability

    9

    SPATIAL VARIABILITYhow do we quantify spatial variability?

    1. let Fij = fraction of tenant traffic sent from VMi to VMj

    all Fij values are equal many distinct Fij values

    Fij = 1/20 Fij < 1/20 Fij > 1/20

    2. calculate the coefficient of variation (σ/µ) of the Fij values

    ⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

    vm1 vm2 vm3 vm4 vm5

    vm2

    vm3

    vm4

    vm5

    vm1

    some spatial variability

    Fij = 0 Fij = 1/12

    two distinct Fij values

    ⇒ σ(Fij)/µ(Fij) = 1*

    *see paper for this result

    the higher the cov, the higher the spatial variation

    more data

    less datano data

  • [email protected]|10

    SPATIAL VARIABILITY

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.01 0.1 1 10 100

    CD

    F

    Spatial Variation(cov of Fij values)

  • [email protected]|10

    SPATIAL VARIABILITY

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.01 0.1 1 10 100

    CD

    F

    Spatial Variation(cov of Fij values)

    applications exhibit high spatial variability

  • [email protected]|10

    SPATIAL VARIABILITY

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.01 0.1 1 10 100

    CD

    F

    Spatial Variation(cov of Fij values)

    applications exhibit high spatial variability

    cov values as large as 100

  • [email protected]|10

    SPATIAL VARIABILITY

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.01 0.1 1 10 100

    CD

    F

    Spatial Variation(cov of Fij values)

    applications exhibit high spatial variability

    cov values as large as 100

    (the same result holds for temporal variability, see paper)

  • [email protected]|10

    SPATIAL VARIABILITY

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.01 0.1 1 10 100

    CD

    F

    Spatial Variation(cov of Fij values)

    applications exhibit high spatial variability

    cov values as large as 100

    (the same result holds for temporal variability, see paper)

    customers cannot model such variability themselves, and existing models do not capture it

  • [email protected]|11

    CICADA’S ALGORITHMM1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    M̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    = f(M1,M2, . . . ,Mn)

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    wi(n+ 1) = wi(n)

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    · e�L(i,n)loss function

    wi(n+ 1) = wi(n)

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    L(i, n) =k ~Mi � ~Mnk

    k ~Mnk· e�L(i,n)

    loss function

    wi(n+ 1) = wi(n)

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • [email protected]|11

    CICADA’S ALGORITHM

    = w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

    M1 M2 Mn

    ...observed trafficM̂n+1

    prediction for next hour

    instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

    prediction based on errors made in previous predictions

    L(i, n) =k ~Mi � ~Mnk

    k ~Mnk· e�L(i,n)

    loss function

    normalization

    · 1Zn+1

    wi(n+ 1) = wi(n)

    Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

  • lacurt[email protected]|12

    EVALUATIONto evaluate cicada, we compared its predictions to ones made by

    a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1 more data

    less datano data

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

  • [email protected]|12

    EVALUATIONto evaluate cicada, we compared its predictions to ones made by

    a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1 more data

    less datano data

    VOC requires customers to input parameters

    (group sizes, amount of bandwidth)

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

  • [email protected]|12

    EVALUATIONto evaluate cicada, we compared its predictions to ones made by

    a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1 more data

    less datano data

    VOC requires customers to input parameters

    (group sizes, amount of bandwidth)

    we used an oracle method to pick the best possible

    parameters, and also allowed for temporal variability

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

  • [email protected]|12

    EVALUATIONto evaluate cicada, we compared its predictions to ones made by

    a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1 more data

    less datano data

    VOC requires customers to input parameters

    (group sizes, amount of bandwidth)

    we used an oracle method to pick the best possible

    parameters, and also allowed for temporal variability

    L2-norm errorE =

    kM̂ �MkkM̂k

    relative error

    E =

    Pi=n2i=1 M̂ [i]�M [i]Pi=n2

    i=1 M [i]

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

  • [email protected]|12

    EVALUATIONto evaluate cicada, we compared its predictions to ones made by

    a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

    vm2

    vm3

    vm4

    vm5

    vm6

    vm7

    vm8

    vm9

    vm1 more data

    less datano data

    VOC requires customers to input parameters

    (group sizes, amount of bandwidth)

    we used an oracle method to pick the best possible

    parameters, and also allowed for temporal variability

    L2-norm errorE =

    kM̂ �MkkM̂k

    relative error

    E =

    Pi=n2i=1 M̂ [i]�M [i]Pi=n2

    i=1 M [i]

    differentiates between over- and under-prediction

    [1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

  • [email protected]|13

    RESULTS PREDICTING AVERAGE DEMAND

    does not differentiate between over- and under-prediction

    differentiates between over- and under-prediction

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 1 2 3 4 5

    CD

    F

    Relative L2-norm Error (fraction)

    CicadaVOC (Oracle)

    0

    0.2

    0.4

    0.6

    0.8

    1

    -1 0 1 2 3 4 5

    CD

    F

    Relative Error (fraction)

    CicadaVOC (Oracle)

  • [email protected]|13

    RESULTS PREDICTING AVERAGE DEMANDcicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

    require no customer input(71% for L2 error)

    does not differentiate between over- and under-prediction

    differentiates between over- and under-prediction

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 1 2 3 4 5

    CD

    F

    Relative L2-norm Error (fraction)

    CicadaVOC (Oracle)

    0

    0.2

    0.4

    0.6

    0.8

    1

    -1 0 1 2 3 4 5

    CD

    F

    Relative Error (fraction)

    CicadaVOC (Oracle)

  • [email protected]|13

    RESULTS

    cicada occasionally under-predicts; VOC does not because we selected its parameters to

    avoid under-prediction

    PREDICTING AVERAGE DEMAND

    cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

    require no customer input(71% for L2 error)

    does not differentiate between over- and under-prediction

    differentiates between over- and under-prediction

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 1 2 3 4 5

    CD

    F

    Relative L2-norm Error (fraction)

    CicadaVOC (Oracle)

    0

    0.2

    0.4

    0.6

    0.8

    1

    -1 0 1 2 3 4 5

    CD

    F

    Relative Error (fraction)

    CicadaVOC (Oracle)

  • [email protected]|13

    RESULTS

    cicada occasionally under-predicts; VOC does not because we selected its parameters to

    avoid under-prediction

    PREDICTING AVERAGE DEMAND

    cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

    require no customer input(71% for L2 error)

    does not differentiate between over- and under-prediction

    differentiates between over- and under-prediction

    cicada’s predictions typically require only 1-2 hours of

    application history

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 1 2 3 4 5

    CD

    F

    Relative L2-norm Error (fraction)

    CicadaVOC (Oracle)

    0

    0.2

    0.4

    0.6

    0.8

    1

    -1 0 1 2 3 4 5

    CD

    F

    Relative Error (fraction)

    CicadaVOC (Oracle)

  • [email protected]|14

    PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

  • [email protected]|14

    PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

    network resources must be available

  • [email protected]|14

    PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

    network resources must be available customers can add a buffer to the offered guarantees

    accept cicada’s guarantee, but add 5% more bandwidth

  • [email protected]|14

    PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

    network resources must be available

    the provider may choose to augment predictions that cicada is not confident about

    typically not confident for “small” applications

    customers can add a buffer to the offered guarantees

    accept cicada’s guarantee, but add 5% more bandwidth

  • [email protected]|14

    PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

    network resources must be available

    the provider may choose to augment predictions that cicada is not confident about

    typically not confident for “small” applications

    customers can add a buffer to the offered guarantees

    accept cicada’s guarantee, but add 5% more bandwidth

    cicada can detect when a guarantee is too low, and offer a new one

    observe packet drops

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

    wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

    wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

    provider’s goal: minimize wasted bandwidth

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

    wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

    provider’s goal: minimize wasted bandwidth

    intra-rack bandwidth is usually cheap

    (sometimes free)

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

    wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

    provider’s goal: minimize wasted bandwidth

    intra-rack bandwidth is usually cheap

    (sometimes free)

    inter-rack bandwidth is more expensive

  • [email protected]|15

    NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

    wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

    provider’s goal: minimize wasted inter-rack bandwidth

    intra-rack bandwidth is usually cheap

    (sometimes free)

    inter-rack bandwidth is more expensive

  • [email protected]|16

    NETWORK UTILIZATION

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck

    Ba

    nd

    wid

    th A

    va

    ila

    ble

    (GB

    yte

    /ho

    ur)

    Oversubscription Factor

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|16

    NETWORK UTILIZATION

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck B

    an

    dw

    idth

    Availab

    le(G

    Byte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1x = VOC’s placement [1]

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|16

    NETWORK UTILIZATION

    = VOC’s placement [1]= cicada’s placement

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck

    Ba

    nd

    wid

    th A

    va

    ila

    ble

    (GB

    yte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1x

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|16

    NETWORK UTILIZATION

    = VOC’s placement [1]= cicada’s placement

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck

    Ba

    nd

    wid

    th A

    va

    ila

    ble

    (GB

    yte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1xbw factor 25x

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • la[email protected]|16

    NETWORK UTILIZATION

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck B

    an

    dw

    idth

    Availab

    le(G

    Byte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1xbw factor 25x

    bw factor 250x

    = VOC’s placement [1]= cicada’s placement

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|16

    NETWORK UTILIZATION

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck B

    an

    dw

    idth

    Availab

    le(G

    Byte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1xbw factor 25x

    bw factor 250x

    = VOC’s placement [1]= cicada’s placement

    as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|16

    NETWORK UTILIZATION

    0

    100000

    200000

    300000

    400000

    500000

    600000

    1 1.4142 1.5 1.666 1.75 2 4 8 16

    Inte

    r-ra

    ck B

    an

    dw

    idth

    Availab

    le(G

    Byte

    /ho

    ur)

    Oversubscription Factor

    bw factor 1xbw factor 25x

    bw factor 250x

    = VOC’s placement [1]= cicada’s placement

    in some scenarios, cicada’s placement more than doubles the amount

    of available inter-rack bandwidth

    as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth

    [1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

    cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

  • [email protected]|17

    SUMMARY

    accurate

    cloud applications exhibit variability that existing models don’t capture

    cicada captures this variability, and provides guarantees that are

    calculated quickly*

    require little history

    increase network utilization

    *see paper for this result

  • [email protected]|17

    SUMMARY

    accurate

    cloud applications exhibit variability that existing models don’t capture

    cicada captures this variability, and provides guarantees that are

    calculated quickly*

    require little history

    increase network utilization

    *see paper for this result

    improve relative error by 90%

  • [email protected]|17

    SUMMARY

    accurate

    cloud applications exhibit variability that existing models don’t capture

    cicada captures this variability, and provides guarantees that are

    calculated quickly*

    require little history

    increase network utilization

    *see paper for this result

    improve relative error by 90%

    < 10milliseconds per application

  • [email protected]|17

    SUMMARY

    accurate

    cloud applications exhibit variability that existing models don’t capture

    cicada captures this variability, and provides guarantees that are

    calculated quickly*

    require little history

    increase network utilization

    *see paper for this result

    improve relative error by 90%

    < 10milliseconds per application

    1--2 hours for most applications

  • [email protected]|17

    SUMMARY

    accurate

    cloud applications exhibit variability that existing models don’t capture

    cicada captures this variability, and provides guarantees that are

    calculated quickly*

    require little history

    increase network utilization

    *see paper for this result

    improve relative error by 90%

    < 10milliseconds per application

    1--2 hours for most applications

    in some cases, inter-rack utilization is doubled