31
How to get realistic C-states latency and residency ? Vincent Guittot

How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

How to get realistic C-states latency and residency ?

Vincent Guittot

Page 2: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Agenda● Overview● Exit latency● Enter latency● Residency● Conclusion

Page 3: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Overview

Page 4: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Overview● PMWG uses hikey960 for testing our dev on b/L system

○ Cluster off and residency values in DT binding were looking really high:

● Decided to find a way to check the correctness of the figures● How to easily get realistic figures for the C-states table of my platform ?

○ Without expensive materials○ Without deep knowledges in power management and idle states○ Define values for a platform or check current values

Entry latency (us) Exit Latency (us) Residency time (us)

CPU off (big and LITTLE) 40 70 3000

LITTLE cluster off 500 5000 20000

Big cluster off 1000 5000 20000

Page 5: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

C-state latency● Prepare :

○ Cache maintenance○ Abortable

● Entry :○ HW & SW sequence to enter idle step○ Not abortable

● Exit :○ HW & SW sequences needed to bring back CPU to running state

* Read Documentation/devicetree/bindings/arm/idle-states.txt for details

Exec

Pre

pare

EntryIdle

Exit Exec

Page 6: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

How to measure latency ?● Trigger contentions

○ Compete for accessing critical resources○ Look for worst values

● Trigger slowest path○ Cache flush for entering latency

Page 7: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Test environment● CPU isolation

○ Isolate CPUs from external noise and background activity○ Works great for big cluster○ Not enough for little cluster

■ Boot CPU in little cluster

■ Interruptions pinned to CPU0

■ “Lot” of spurious activity pinned on little cluster

● Use rt-app○ Sync wake up of CPUs○ Range of wake up periods○ Log events and phases duration

● Hikey960○ Modified for accessing VDD_4V2 voltage domain

● Arm Energy Probe USB dongle

Page 8: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Exit latency

Page 9: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

1st test: exit latency● Enable only 1 state to force cpuidle

○ Not fully robust

● Wake up CPUs simultaneously

● rt-app logs wake up latency○ Get min, max, average and std-dev

Timer IRQ Read clock

CPU0

CPU1

CPU2

CPU3

Page 10: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

1st test: exit latency

Max

Min

95%

@903Mhz @2362Mhz

Page 11: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

1st test: exit latency

@903Mhz @2362Mhz

Page 12: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

1st test: exit latency● One CPU wakes up faster than others

○ Most probably the one that gets one “lock” first

● Frequency of other cluster impacts exit latency○ Flatten the difference between min and max OPP○ +400us for max OPP when other cluster runs at lowest OPP

● Local frequency has a limited impact at the end○ Around 200us on the 2900us budget

● Sync wake up with other cluster has a limited impacts latency○ Few dozen of us

● Firmware mode has an impact○ Release vs debug mode

Page 13: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

All latencies● Big cluster off slower than LITTLE cluster off

○ Most probably more thing are shut down compared to little■ Like powering down power domain

● Measured latency includes full wake up path1. timer interrupt fires (at almost the programmed timestamp as the granularity of the timer is 52ns)2. PM coprocessor HW wakes up sequence (when involved)3. ATF firmware resume sequence (when involved)4. cpuidle driver5. cpuidle framework6. Idle thread including starting/stopping tick nohz idle7. Switching to rt-app thread8. Read time clock

big cluster little cluster

CLUSTER CPU WFI CLUSTER CPU WFI

exit 2900 550 70 1600 650 100

Page 14: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Entry latency

Page 15: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

2nd test: entry latency● Enable only 1 state to force cpuidle

○ Not fully robust

● rt-app logs phases duration○ Get min, max, average and std-dev

● Increase the sleep duration step by step○ Phase duration increase @ entry latency

Timer IRQ

CPU0

Timer IRQ

CPU0

Timer IRQ

CPU0

Timer IRQ

CPU0

Timer IRQ

CPU0

Page 16: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

2nd test: entry latency (single cpu)

sleep duration becomes longer

than entry latency

Spurious wake up that can be discarded

Page 17: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

2nd test: entry latency (multi cpu)

sleep duration becomes longer than wake up latency

1st abort point

Page 18: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

2nd test: entry latency● Wake up duration includes

○ rt-app task events○ Entry latency○ Extra sleep time○ Exit latency

● Steps in charts○ Show the different abortable points

big cluster little cluster

CLUSTER CPU WFI CLUSTER CPU WFI

entry 900 400 ~0 500 400 ~0

Page 19: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

All latencies

big cluster little cluster

CLUSTER CPU WFI CLUSTER CPU WFI

entry 800 400 ~0 500 400 0

exit 2900 550 70 1600 650 100

wake up 3700 950 70 2100 1050 100

Page 20: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Residency time

Page 21: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

● Residency time○ Minimum idle time above which

it’s worth selecting the C-state

● Estimated idle duration○ Select longest residency time

● Wakeup latency○ Skip some C-states

C-state residency

Exec

Pre

pare

Idle

Exec

ExecIdle

Exec

ExecEntry

Idle

Exit Exec

Pre

pare

Page 22: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

How to estimate residency time ?● Measure precisely each step independently

○ Energy consumed during each step of each state○ Isolate CPUs power domain from others

● Imply○ Having access to all power domains○ Having very precise power meters (some steps are short, transient and difficult to measure)

● Don’t really care of absolute value○ Just want to compare idle states to each others

● Don’t really care about power impact of each step○ Only interested by end results

Page 23: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

How to estimate residency time ?● Wake up periodically the CPU and measures power consumption

○ Task don’t do anything else than wake up and sleep■ Power impact is mainly entry/exit sequence

○ With decreasing periods, entry and exit steps take more and more importance

○ Run the same number of wakeup/sleep sequence■ Thousands of times■ Relax power meters precision constraint

○ Don’t need to have access to dedicated power domain■ Only interested in difference■ Side and noise power consumption will be removed as long as stable across tests

Page 24: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

How to estimate residency time ?● Use rt-app to generate periodic wake up

○ Task don’t do anything else than wake up and sleep○ Run thread with a decreasing period

■ 10ms down to 1ms with a step of 0.5ms has been used for hikey960

● Minimize impact of background activity of other cluster(s)○ Enable only WFI○ use lowest OPP

● Run long enough (20 seconds) and several times (x8)○ Filter background activity of the system○ Keep iteration with min value○ Test is really long : more than 3 days of continuous tests for hikey960

Page 25: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

3rd test: residency time

Wake up latencyfor cluster off

Break even point between cluster

and cpu off

Break even point between cpu off

and WFI

Page 26: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

3rd test: residency time

Wake up latencyfor cluster off

Break even point between cpu off

and WFI

Page 27: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Residency

Big cluster Little cluster

CLUSTER CPU WFI CLUSTER CPU WFI

Lowest OPP

5000 1500 N/A 8000 4500 N/A

Highest OPP

0 1500 N/A 0 1500 N/A

Page 28: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

3rd test: residency time● Residency time differs widely with OPP● Understandable when we looks the “static” power consumption

○ big core @ lowest OPP: cluster off is 8% < WFI (absolute value)○ big core @ highest OPP: cluster off is 25% < WFI (absolute value)○ Need to weight residency time value of each OPP with % saved

● New residency value means increase the usage on cluster off state○ Can see some responsiveness increases○ 20ms residency time for cluster off versus 16ms for display sync event○ Use CPU latency constraint instead: per CPU or system wide

Page 29: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Conclusion

Page 30: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Conclusion● More rt-app test cases can be used:

○ With memory event as an example■ Not real difference has been shown

● OPP has a significant impact on residency time

● Scripts will be publicly available soon○ Run tests and gather results

● Next step○ Automate charts creation○ Automate entry, exit, and residency values extraction

Page 31: How to get realistic C-states latency and residencyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/...Overview PMWG uses hikey960 for testing our dev on b/L system Cluster

Thank You

#HKG18HKG18 keynotes and videos on: connect.linaro.orgFor further information: www.linaro.org