Evaluating Cache Contention Prediction ¢â‚¬â€œ A Ranking Based ... Evaluating Cache Contention Prediction

  • View
    3

  • Download
    0

Embed Size (px)

Text of Evaluating Cache Contention Prediction ¢â‚¬â€œ A Ranking Based...

  • Evaluating Cache Contention Prediction – A Ranking Based Approach –

    Michael Zwick

    Abstract—In multi-core processor systems, processor caches are generally shared among multiple processor cores. As a consequence, co-scheduled applications constantly displace each others data from the shared cache, which is called cache contention. As cache contention degrades application perfor- mance, efforts have been made to predict cache contention and then select application co-schedules that minimize interference. Several such methods have been proposed in the past. However, those methods have rarely been compared to one another, as an appropriate evaluation framework has been missing so far.

    In this paper, I present a framework to precisely eval- uate cache contention prediction techniques and compare them to one another. An implementation of the frame- work is licensed under GPLv3 and can be downloaded at http://www.ldv.ei.tum.de/cachecontention.

    Index Terms—cache-contention-prediction, multi-core, evalu- ation, framework.

    I. INTRODUCTION

    IN multi-core processor systems, applications are executedin parallel and simultaneously share resources such as memory controller, caches and busses (cf. figure 2). Sharing limited resources, however, applications frequently interfere with each other and degrade each other’s performance. As an example, figure 1 shows L2 cache hitrate degradation of SPEC 2006 benchmark milc when milc is co-scheduled with other SPEC 2006 benchmark applications: The bold black line illustrates L2 cache hitrate of milc when milc is executed stand-alone and does not share any caches with any other applications. The other lines represent L2 cache hitrate of milc when milc is co-scheduled with applications astar, gcc, bzip2, gobmk and lbm respectively. Note that a higher L2 cache hitrate implies better performance. As you can see from the figure, L2 cache degradation of milc heavily varies with the selection of the co-scheduled application. While there are applications that have only little effect on L2 cache performance of milc, such as astar and gcc, there are co- schedules that have severe impact such as gobmk and lbm. Generally speaking, application performance on multi-core processor systems does not only depend on the amount of resources provided by the computer system and the amount of resources the considered application applies for, but also on co-scheduled applications sharing those resources. As a consequence, resource-aware co-scheduling of applications has gained more and more attention the recent years and new scheduling techniques have been proposed that aim to optimize overall system performance by predicting and avoiding cache/resource contention. However, the proposed techniques have rarely been compared to other state-of-the- art methods, as an appropriate evaluation framework has

    Manuscript received June 24, 2012; revised August 20, 2012. M. Zwick is with the Department of Electrical Engineering and In-

    formation Technology, Technische Universität München, 80290 München, Germany; email: zwick@tum.de.

    been missing so far. In fact, such methods have rather been verified by comparing performance impacts of a state-of-the- art process scheduler to the impacts achieved by a modified version that predicts and minimizes resource contention by an adapted selection of co-schedules.

    0 50 100 150 200 250

    0.2

    0.4

    0.6

    0.8

    chunks (1 chunk = 2 instructions)20

    L 2

    ca ch

    e h it

    ra te

    ( %

    )

    milc @ c0: milc milc @ c0: milc, c1: astar milc @ c0: milc, c1: gcc milc @ c0: milc, c1: bzip2 milc @ c0: milc, c1: gobmk milc @ c0: milc, c1: lbm

    Million instructions

    20

    40

    60

    80

    Fig. 1. L2 cache hitrate degradation introduced to SPEC 2006 benchmark milc when co-scheduling milc with each of the SPEC 2006 benchmarks astar, gcc, bzip2, gobmk, and lbm on a CMP dual-core architecture as presented in [1].

    In the following, I • present three techniques that have been applied in the

    past to evaluate cache contention prediction and discuss their benefits and limitations (sec. II),

    • identify requirements on a new framework to evaluate cache contention prediction (sec. III),

    • propose a new evaluation framework (sec. IV) and • exemplarily show some evaluation results gained from

    the framework (sec. IV).

    II. STATE-OF-THE-ART EVALUATION TECHNIQUES

    A. Fedorova et al.’s Evaluation Method

    In [3], Alexandra Fedorova et al. analyze and evaluate prediction methods that estimate contention within a set of |A| = 4 applications A = {a1, a2, a3, a4} that share resources on an Intel Quad-Core Xeon processor. The applied processor architecture can be described by two of the proces- sors shown in figure 2 (greyed boxes) that are interconnected via the bus system.

    For evaluation, the authors first apply the prediction method to estimate the best co-schedule for each of the applications in A. As an example, a prediction method might estimate the best co-schedule set to be [(a1, a2), (a3, a4)], which means that applications a1 and a2 should be executed

    Proceedings of the World Congress on Engineering and Computer Science 2012 Vol I WCECS 2012, October 24-26, 2012, San Francisco, USA

    ISBN: 978-988-19251-6-9 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)

    WCECS 2012

  • Shared L2 cache

    Core 1

    Private L1 cache

    Memory

    Memory controller hub

    Prefetching hardware

    Front-side bus logic

    Core 2

    Private L1 cache

    Fig. 2. Dual-core processor architecture as presented in [2]. While the L1 caches are private to the processor cores, the L2 cache is shared, which results in L2 cache contention.

    on the one processor entity (greyed box in figure 2) and applications a3 and a4 should be executed on the other.

    Then, the authors compare the degradation introduced to a1, a2, a3, and a4 on this estimated best application co-schedule to the degradation introduced to a1, a2, a3, and a4 when applying the optimal co-schedule set, e.g. [(a1, a3), (a2, a4)].

    This is done in the following steps:

    • Run each application a ∈ A stand-alone on the proces- sor and measure execution time ta, i.e. execution time of application a when a does not suffer from resource contention.

    Let Ca be the set of applications that are co-scheduled with application a on the same processor and share the L2 cache1, and let tCa be the execution time of application a when application a is co-scheduled with the applications in Ca, then

    • for each a ∈ A and each Ca ∈ A\{a}, measure execution time tCa .

    • For each a ∈ A and each Ca ∈ A\{a}, calculate the degradation Ca introduces to a, i.e. dCa = (tCa−ta)/ta.

    • Calculate average degradation of the predicted best co- schedule dpred =

    ∑ a∈A dCa/|A|, where, for each a ∈

    A, co-schedule Ca is the predicted best co-schedule. • Calculate average degradation of the actual best co-

    schedule dact = ∑ a∈A dCa/|A|, where co-schedules

    Ca is chosen such that dCa is minimized. • Determine evaluation metric E as percentaged perfor-

    mance degradation of a when applying the predicted best co-schedule instead of the actual best co-schedule

    E = ((dpred − dact)/dact) · 100 %.

    1If there are only two processor cores sharing a cache, then Ca consists of only one application. Given a co-schedule set (a1, a2) and let a = a1, then Ca = {a2}.

    E is the metric Fedorova et al. apply in [3] to evaluate methods that predict contention for shared resources on multi-core processors.

    The authors limit their evaluation to prediction accuracy. An evaluation of the amount of time necessary to create or apply the predictors is not performed.

    Fedorova et al. are one of only a few authors that evaluate accuracy of contention prediction methods in the context of state-of-the-art prediction techniques. As evaluation method, they compare program execution times of predicted best co-schedules to program execution times of actual best co- schedules, measured on a physically available processor, not a simulator.

    Generally, Fedorova et al.’s method to evaluate the predic- tion of resource contention is a highly accepted approach, as it relies on real applications executed on a physical machine. And as long as this approach is applied to evaluate methods to predict contention in general, everything is fine.

    But it comes to problems when you try to use this technique to soleley evaluate methods to predict cache con- tention, as execution time in general depends on more factors than just cache misses, and the ground-truth reference then would incorporate effects that do not origin from cache con- tention – and are not modeled in the prediction techniques. In [3], Fedorova et al. address this effect when they explain the surprisingly good performance of stand-alone cache misses to predict cache misses of application co-schedule sets, as it has been proposed by Knauerhase et al. [4].

    As a consequence, a cache simulator would be more suited to evaluate the prediction accuracy of cache contention pre- diction methods, as results obtained from a cache simulator would more precisely reflect the characteristics addressed by cache contention prediction methods. However, when it comes to an evaluation of prediction accuracy regarding contention in general, i.e. when overall system performance is the measure to optimize, real hardware would definitely be the best choice.

    B. Chandra et al.’s evaluation method

    In [5], Dhruba Chandra et al.

Related documents