Big.littLE TC2

Embed Size (px)

Citation preview

  • 8/12/2019 Big.littLE TC2

    1/28

    1

    Update on big.LITTLE on TC2

    Morten Rasmussen

    Technology Researcher

  • 8/12/2019 Big.littLE TC2

    2/28

    2

    Agenda

    big.LITTLE Software solutions overview

    ARM's Test Chip 2 overview Benh!ar"ing Metho#olog$ an# %se Cases

    I&S status up#ate

    big.LITTLE M status up#ate

  • 8/12/2019 Big.littLE TC2

    3/28

    big.LITTLE o!er!ie"

    erfor!ane an# power effiien$ in one s$ste!(

    Corte#$A1% !s Corte#$A&'er(ormance

    Corte#$A& !s Corte#$A1%Energy E((iciency

    )hrystone 1.*# .%#

    +)CT 2.# .,#

    IM)CT .-# .-#

    MemCopy L1 1.*# 2.#

    MemCopy L2 1.*# .#

  • 8/12/2019 Big.littLE TC2

    4/28

    I/0 solution asics

    In)&ernel Swither *I&S+(

    Targete# first generation big.LITTLE pro#uts.

    Corte#$A&

    Corte#$A1%

    /ernel

    scheduler I/0

    Tas3 1

    Tas3 2

    Logical C'U 4

  • 8/12/2019 Big.littLE TC2

    5/28

    %

    M' solution

    Corte#$A&

    Corte#$A1%

    /ernel

    scheduler

    Tas3 1

    Tas3 2

    4

  • 8/12/2019 Big.littLE TC2

    6/28

    ,

    ARMs Test Chip 2 (TC#2): An Overview

    A Versatile Express core tilepublically available:

    Capabilities

    2 x A15 (r2p1) @ up to 1.2 Ghz

    x A! (r"p1) @ up to 1Ghz

    CC#$%&C$G#C$A%' (r"p")

    %&A (")

    2G' exter*al %%+2 ,e,ory@ -""&hz

    -/ i*ter*al 0+A&

    Coresiht ebu (i*clui* 34AGa* #4& trace but *o 04&)

    o G6

    cpu7re8 support: #*epe*e*t 7oreach cluster 9ith li,ite voltaescali*

    cpuile support: Cluster po9erati*

    TC2

  • 8/12/2019 Big.littLE TC2

    7/28-

    Benchmarking Methodoog!

    Results

    erfor!ane

    ower

    Configurable(- CCI- ftrae- strea!line

    C0V co*7i:- 6se case- 0cheuli* ,oel- u,bers o7 cores to use- 0cali* over*ors

    Auto,ate syste, 7or

    ru**i* user 9or/loaso* taret evice

    Choose 9or/loa

    Choose C6 ,oe:

    CortexA!; CortexA15; &iratio*(cluster or C6); or &

    Choose active cores i* eachcluster4C2: 12 bi; 1 #44E

    Choose %V

  • 8/12/2019 Big.littLE TC2

    8/28,

    I/0 solution

    Targete# first generation big.LITTLE pro#uts.

    Corte#$A&

    Corte#$A1%

    /ernel

    scheduler I/0

    Tas3 1

    Tas3 2

    Logical C'U 4

  • 8/12/2019 Big.littLE TC2

    9/28C56+I)E6TIAL*

    I/07 C'U Migration

    big.LITTLE eten#s /01S

    /01S algorith! !onitors loa# on eahC%

    hen loa# is low it an be han#le# on a

    LITTLE proessor

    hen loa# is high the ontet is

    transferre# to a big proessor

    The unuse# proessor an be powere#

    #own

    hen all proessors in a luster are

    inative the luster an# its L2 ahe an

    be powere# #own

  • 8/12/2019 Big.littLE TC2

    10/28C56+I)E6TIAL1-

    I/07 C'U Migration

    big.LITTLE eten#s /01S

    /01S algorith! !onitors loa# on eahC%

    hen loa# is low it an be han#le# on a

    LITTLE proessor

    hen loa# is high the ontet is

    transferre# to a big proessor

    The unuse# proessor an be powere#

    #own

    hen all proessors in a luster are

    inative the luster an# its L2 ahe an

    be powere# #own

  • 8/12/2019 Big.littLE TC2

    11/28

  • 8/12/2019 Big.littLE TC2

    12/2812

    I/07 Results (or Audio on TC2

    ower o!pare# to eeuting the use ase on A56

    I&S #oes not use A56s #uring Au#io run

    -78 saving

    TC27

    A1% up to 1.2 9:;A& up to 1 9:;

    etter results e#pected on

    representati!e silicon.

  • 8/12/2019 Big.littLE TC2

    13/281

    I/07 Results (or ench < Audio on TC2

    erfor!ane is !easure# as fro! page loa#ing ti!es ofBBenh

    Results nor!alise# to power an# perfor!ane onsu!e# on

    sa!e use ase run on A56 onl$

    BBenh page 9 Au#io

    TC27

    A1% up to 1.2 9:;A& up to 1 9:;

    etter results e#pected on

    representati!e silicon.

  • 8/12/2019 Big.littLE TC2

    14/281

    I/07 5''s on TC2

  • 8/12/2019 Big.littLE TC2

    15/281%

    I/07 Interacti!e go!ernor on TC2

    if (cpu_load >= go_hispeed_load){

    ... new_freq = max_freq * cpu_load / 100

    ...

    !

    else {

    ...

    new_freq = hispeed_freq*cpu_load/100

    ... !

    1or A56 on TC2 with a go:highspee# at ;68 *#efault+ this algorith!onl$ uses over#rive setion of A56

    Approah is to intro#ue a seon# point of infletion(highspee#2

  • 8/12/2019 Big.littLE TC2

    16/281=

    I/07 :ispeed2

  • 8/12/2019 Big.littLE TC2

    17/281&

    I/07 Results7 bench < Audio

    ower i!proves with no perfor!ane ost

    BBenh page 9 Au#io

    TC27

    A1% up to 1.2 9:;A& up to 1 9:;

    etter results e#pected on

    representati!e silicon.

  • 8/12/2019 Big.littLE TC2

    18/281,

    M' solution

    Corte#$A&

    Corte#$A1%

    /ernel

    scheduler

    Tas3 1

    Tas3 2

    4

  • 8/12/2019 Big.littLE TC2

    19/281*

    M' solution more details

    She#uler !o#ifiations(

    Treat big an# LITTLE pus asseparate she#uling #o!ains.

    %se

  • 8/12/2019 Big.littLE TC2

    20/282-

    M'7 E#perimental Implementation

    She#uler !o#ifiations(

    Appl$ balance load>balance

    select>tas3>r?>(air@8

    +orced migration

  • 8/12/2019 Big.littLE TC2

    21/2821

    M'7 ARM TC27 Audio

    or"loa#( Au#io *!p> pla$ba"+

    erfor!ane?Energ$ target( A- energ$

    Status(

    Au#io relate# tas" #o not use A56s@ but

    the power onsu!ption is stillsignifiantl$ !ore than A- alone.

    M not as power effiient as I&S $et

    To#o(

    Target spurious wa"e)ups on A56. Allthe etra power o!es fro! the A56's

    whih shoul#n't be use# at all. Energy

    A& -.&*B

    M' *.,=B

    7

    57

    27

    >7

    7

    67

    ,7

    -7

    ;7

    7

    577Au#io

    A56

    A- 2C%

    I&S

    M

    Energ$

    TC27

    A1% up to 1.2 9:;

    A& up to 1 9:;

    etter results e#pected onrepresentati!e silicon.

  • 8/12/2019 Big.littLE TC2

    22/2822

    M'7 Audio "or3load analysis

    here is the etra energ$ spent

    with M Dee# a loo" at wh$ A56's onsu!e

    power when the$ are not neessar$.

    A- M

    7

    7.2

    7.

    7.,

    7.;

    5

    5.2

    5.

    5.,Au#io energ$ brea"#own

    A56 luster

    A- luster

    Energ$

    hrtimer (unctions cpu- cpu1 cpu2 cpu cpu

    hrtimer>"a3eup 2 2 1212 1& 1*-

    tic3>sched>timer - %, , %-& &&*

    D (unctions cpu- cpu1 cpu2 cpu cpu

    !mstat>update - 2 2& 2% 2,

    cache>reap 1% 2 1 1 1

    phy>state>machine 1 - - - -

    Enter idle cpu- cpu1 cpu2 cpu cpu

    - = 2 2&* 2=- 2

    1 ,-1 ,-& ,1= *& *=%2

    TC27

    A1% up to 1.2 9:;

    A& up to 1 9:;

    etter results e#pected on

    representati!e silicon.

  • 8/12/2019 Big.littLE TC2

    23/28

    2

    0cale in!ariant load

    Loa# au!ulation rate #oes not sale with available

    o!pute apait$ *fre3uen$@ big?LITTLE pu+ Currentl$@ there is no lin" between pufre3 an# the she#uler

    Tas"s !a$ be !igrate# awa$ fro! a pu at low fre3uen$ b$ the

    she#uler before pufre3 has inrease# the fre3uen$ to !ath the

    pu loa#.

    Saling the tra"e# loa# au!ulation to !ath the urrent

    fre3uen$ !itigates this issue.

    Tas"s annot au!ulate enough loa# at low fre3uen$ to trigger

    !igration an# !ust wait for pufre3 to reat first.

    +re? # +re? 2#

  • 8/12/2019 Big.littLE TC2

    24/28

    2

    0cale in!ariant load

    !!?2.1 !!?2.2 !!?2. !!?2.- !!?2.5 !!?2.

    "

    2""

    -""

    ""

    ?""

    1"""

    !2.5 !."5 !.15 !.25 !.5 !.-5

    "

    2""

    -""

    ""

    ?""

    1"""

    5riginal +re?uency in!ariant

  • 8/12/2019 Big.littLE TC2

    25/28

    2%

    Load accumulation rate

    1or so!e wor"loa#s tra"e# loa# saturates too fast an# lea#s

    to unneessar$ tas" !igrations. Eten#ing the tra"e# loa# histor$ re#ues tra"e# loa#

    variations #ue to su##en hanges in the loa# harateristis.

    Inreasing the $ fator in the loa# epression #ereases the

    loa# au!ulation an# #ea$ rates.

    load=u0+u1y+u2y

    2++uny

    n

    1024+y+y2

    ++yn

    +1

    5 25 5 ,5 ;5 575,

    555, 2,

    >5>, ,

    656, ,,

    -5-, ;,

    5, 57,

    55555,

    52552,

    5>55>,

    555,

    565

    7

    7.5

    7.2

    7.>

    7.

    7.6

    7.,

    7.-

    7.;

    7.

    5$7.-;6

    Ti!e F!sG

    y

  • 8/12/2019 Big.littLE TC2

    26/28

    2=

    Load accumulation rate

    Inreasing $ lea#s to a !ore onservative tra"e# loa#

    Shoul# lea# to less up?#own !igrations

    Inreases up?#own !igrations #ela$ for tas"s that nee#s to be

    !igrate#.

    5 - 5> 5 26 >5 >- > 66 ,5 ,- -> - ;6 5 - 57 5, 22 2; > 7 , 62 6; , -7 -, ;2 ;; 577

    57>57,

    57552

    55655;

    52552

    52-5>7

    5>>5>,

    5>52

    565;

    56556

    56-5,7

    5,>5,,

    5,5-2

    5-65-;

    5;55;

    5;-57

    5>5,

    5

    Loa# au!ulation rate

    Tas"

    $7.-;6

    $7.;

    $7.22

    Ti!e F!sG

    Tra"e#

    loa#

  • 8/12/2019 Big.littLE TC2

    27/28

    2&

    M' Top Issues

    Spurious wa"eups

    A56s are wo"en up b$ she#uler ti"s *!ainl$+

    or"3ueues

    Ti!ers

    RC%

    pu wa"eup prioritisation i" the heapest target pu

    Hlobal balaning

    Sprea# loa# to A-s when A56s are overloa#e#

    a" vs. sprea#

    Cluster aware pufre3 governors

  • 8/12/2019 Big.littLE TC2

    28/28

    Duestions4