View
3
Download
0
Category
Preview:
Citation preview
Intel® VTune™ Amplifier 2016 for SystemsPerformance and Power Profiling on Android* Devices
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Agenda
2
Overview
Intel® System Studio 2016
Intel® VTune™ Amplifier 2016 for Systems
Intel® Energy Profiler
What’s New
Other Tool Options for Android*
Basic Performance Analysis
Steps to do “Basic Hotspots”
Additional Features
Performance Optimization Basics
Other Collection Options
Viewpoints/Grouping/Filtering
How to Collect
Advanced Topics
Power Analysis
Power Optimization Basics
Power Views in the VTune Amplifier GUI
How to Collect:
SoCWatch
Advanced Topics
Installation Overview
Device Requirements
Host Installation
Target Installation
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Overview
3
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® System Studio for Android*Deep System Insights for Mobile System Developers
4
IA Coverage
OS Support
From single to multicore
High performance libraries
SoC, CPU, and GPU analysis
System Debug & Trace
In-depth Analysis & Debug
Smartphone and Tablet
Support for Latest Intel Processor and SoC
Advanced system debug & trace for greater system stability
SoC-wide analysis for enhanced power efficiency and performance
Graphics Performance Analysis and optimization tools for graphics-intensive applications
Industry-leading performance from exceptional C++ Compiler and libraries
Boost Performance
Windows* and Linux* host
Android* Target
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® System Studio for Android* Overview
Debug
• Intel® JTAG Debugger
System Application
Analyze• Intel® VTuneTM Amplifier• Intel® Graphics Performance Analyzers
(System Analyzer)• Intel® Energy Profiler
Power & Performance
Write and Test Code
JTAGInterface
Intel® Processor-based Mobile Systems
Integrated software tool suite that provides deep system-wide insights to help:
Accelerate Time To Market
Strengthen System Reliability
Boost Power Efficiency and Performance
• Intel® C/C++ Compiler• Intel® Integrated Performance Primitives
Preview • Intel® Hardware Acclerated Execution
Manager
System and Application Code
5
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® System Studio 2015 Components
6 Target OS Support
Linux* 1, 5 Android* 5 Windows* VxWorks*
Category Component
Co
mp
ose
r E
dit
ion
Pro
fess
ion
al
Ed
itio
n
Ult
ima
te E
dit
ion
Co
mp
ose
r E
dit
ion
Pro
fess
ion
al
Ed
itio
n
Ult
ima
te E
dit
ion
Co
mp
ose
r E
dit
ion
Pro
fess
ion
al
Ed
itio
n
Co
mp
ose
r E
dit
ion
Host Operating Systems Linux*, Windows* Linux*, Windows* Windows*Linux*,
Windows*
Integrated Development EnvironmentEclipse*, Wind River*
Workbench*Eclipse* Visual Studio*
Wind River* Workbench*
Compiler & Libraries
Intel® C++ Compiler √ √ √ √ √ √ √ √ √ 2
Intel® Integrated Performance Primitives √ √ √ √ √ √ √ √ √ 2
Intel® Math Kernel Library √ √ √ √ √
Intel® Threading Building Blocks √ √ √ √ √ √ √ √
Application Debugger
Intel-enhanced GDB* Application Debugger √ √ √ √ √ √
Analyzers
Intel® VTune™ Amplifier for Systems √ √ √ √ √
Intel® Energy Profiler √ √ √
System Analyzer √ √ √
Frame Analyzer 4 √ √ √
Platform Analyzer 4 √ √ √
Intel® Inspector for Systems √ √ √
System Debugger
Intel® System Debugger (JTAG) 3 √ √
1 Linux*, Embedded Linux, Wind River* Linux*, Yocto Project*, Tizen*2 Delivered with Wind River* VxWorks* platform*3 Via Intel® ITP-XDP3 probe, OpenOCD*, Macraigor* usb2demon* and EDKII* for UEFI*4 Available on Windows* host only5 Linux* and Android* target support available in a single product
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® VTune™ Amplifier for SystemsPerformance Profiler
Get the Tuning Data You Need
− Low overhead “hotspot” analysis with call stacks
− Advanced analysis for cache, branching, …
Find Answers Fast
− Powerful analysis & data mining
− Results mapped to C/C++ or Java source
Easy to Use
− Remote analysis from the User Interface
− Windows or Linux Host analyzes Linux or Android target
Available now as part of Intel® System Studio
7
Optimize Your Software Performance
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® Energy ProfilerEnergy and Power Profiler for System Software Developers
Optimize Software for Extended Battery Life
Find the Cause of Wake Ups That Waste Energy
− Interrupts mapped to the IRQ/device
− Timers mapped to the scheduling process
− Data correlated with Android Wake Locks
Available now for Linux and Android
Part of Intel® System Studio
8
Get Actionable Data to Extend Battery Life
Requires specific SOCs. On Android, a rootable OS is required with version compatible device drivers. See release notes for details.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Android Support including…
Basic hotspots, Locks & Waits and EBS with stacks for RT kernel and RT applications for Linux Targets
EBS based stack sampling for kernel mode threads
Support for Intel® Atom™ x7 Z8700 & x5 Z8500/X8400 processor series (Cherry Trail) including GPU analysis
Automated remote EBS analysis on SoFIA (by leveraging existing sampling driver on target)
Super Tiny display mode added for the Timeline pane to easily identify problem areas for results with multiple processes/threads
Platform window replacing Tasks and Frames window and providing CPU, GPU, and Bandwidth metrics data distributed over time
General Exploration analysis views extended to display confidence indication (greyed out font) for non-reliable metrics data resulted, for example, from the low number of collected samples
GPU usage analysis for OpenCL™ applications extended to display compute-originated batch buffers on the GPU software queue in the Timeline pane (Linux* target only)
New filtering mode for command line reports to display data for the specified column names only
Continually expanding Mobile Development Kit Program - http://software.intel.com/mdk
Many other features for embedded OS’s, improvements to the GUI, and various bug fixes…
See Release Notes9
What’s New In Intel® VTune™ Amplifier 2016 for Systems
9
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
10
Other Intel® Software Developer Tools for Android*
10
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Basic Performance AnalysisUsing Intel® VTune™ Amplifier 2016 for Systemson Android Systems
11
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Basic HotspotsStart Here - Makes It Easy
12
#1 used feature
Easiest feature to use
Enables the most important feature of identifying the hotspot
Works on non-rooted (and rooted) Intel® architecture devices
Collects samples using OS-timer event for a specific application/process
Associate samples to:Module/thread/functionC/C++ source or assemblyJITted Java/Dalvik functions/ART functions/assembly/dex/source
Collects User Mode Stacks (default)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Basic HotspotsStep 1) Attach device to host System
13
Host SystemWindows* or Linux*
Android* DeviceIntel® architecture based
adb
• ADB connectivity installed on host system• “Enable USB Debugging” (Under Developer Options on Target Device)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Basic HotspotsStep 2) Create Project in Intel® VTune™ Amplifier
14
Create Project in VTune Amplifierset target type: “Android Device (ADB)”set target type: Launch Android Packageset target system: Your deviceset Package or Process NameOptionally set other options
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Click “Launch New Analysis” via play buttonThen select “Basic Hotspots” under “Analysis Type”Then click “Start”
15
Basic HotspotsStep 3) Start Hotspot Analysis
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Now interact with your app… make it run whatever it is you want to tune.
Tell VTune Amplifier to Stop
VTune Amplifier will then copy the data files back to the host system for post analysis
16
Basic HotspotsStep 4) Run benchmark & stop VTune Amplifier collection
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
17
Basic HotspotsStep 5) Identify hottest functions
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
18
Basic HotspotsOptional Step 5) Enable C/C++ Source View
amplxe-cl [options] –search-dirs=Path-to-Symbols-on-host
Binaries with symbols traditionally are located in:[AndroidAppBuildDir]/out/target/product/[your target]/obj
[AndroidOSBuildDir]/out/target/product/[your target]/symbols, or[AndroidOSBuildDir]/out/target/product/[your target]/obj[AndroidOSBuildDir]/out/target/product/[your target]/linux/kernel
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Basic Hotspots is
• Easy
• Works on all Intel architecture-based Android* devices
• Extremely useful for diagnosing performance issues
Simple set of steps presented
19
Basic HotspotsSummary
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Additional Features of Intel® VTune™ AmplifierUsing Intel® VTune™ Amplifier 2016 for Systemson Android Systems
20
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
1. Determine desired performance
2. Build Application/System for optimization not debug (w symbols avail)
1. System = Userdebug Build (rootable)
2. Application = Compile C/C++ binaries w optimizations (debuggable attribute disables this)
3. Find hotspots (Where the application/system is spending time)
4. [Optional] Determine the efficiency of the hotspots
Determine architectural issues of the hotspots
5. [Optional] Find changes to hotspots between builds
6. Make 1 code change to 1 hotspot to improve performance
Remove hotspot
Improve algorithm
Change code to better utilize hardware
Get new hardware
7. Validate that change improved performance
8. Go to step 3 – until desired performance achieved
21
Performance Analysis Steps (condensed)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Advanced HotspotsTo get more information
22
Identifies the hotspot using hardware counters (PMU) of Intel® processors
Allows system-wide collection Allowing you to see all processes running on system
For single application can collect:User-Stacks + Kernel-Stacks Context SwitchesCall Counts
Associate samples to:Process/Module/functionCore/threadC/C++ source or assemblyJITted Java/Dalvikfunctions/assembly/dex/source
System Wide:
amplxe-cl --collect advanced-hotspots --duration=<N>--target-system=android
Stacks, Context & Counts:
amplxe-cl –collect advanced-hotspots -knob collection-detail=stack-and-callcount–-target-process=<appName> --target-system=android
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Uses hardware counters (PMU) to identify microarchitectural issues in your system/application
Makes it easy to select appropriate counters for each Intel® microarchitecture to find issues in…
Memory (Cache, TLB, Reissues, Bus-Locks)
Branch mis-prediction
Machine Clears, Floating Point Stalls
Efficiency (CPI, uOp(s)-Retired)
Applies Formulas/Heuristics developed by Intel engineers to highlights issues
Easy to Interpret – If it is Pink – Examine in more detail
amplxe-cl --collect advanced-hotspots --duration=<N>
--target-system=android
23
General ExplorationTo Diagnose Microarchitecture Bottlenecks
amplxe-cl --collect [atom-general-exploration | snb-general-exploration] --target-system=android
Pink
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
ViewpointsEnable looking at the data the way you need it
24
It is a pre-defined view that determines what needs to be displayed in the grid and timeline for a given analysis type
An analysis type may support more than one view points
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Groupings
Each analysis type has many viewpoints
Each viewpoint has pre-defined groupings
Allows you to analyze the data in different hierarchies and granularities
25
Intel® VTune™ Amplifier Key Concepts
Click
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
For example, pre-defined groupings can be used to determine load imbalance
26
Groupings
Change to Function/Thread
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Filtering (Time or Group) - Lets you focus on what’s important
27
FilteringOnly view data related to your issue
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice28
Caller/Callee ViewGUI Layout Select a function in the
Bottom-Up and find the caller/callee
List of functions sorted by CPU Time
List of callers and their stacks
List of calleesand their stacks
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
29
Adding User Marks to the TimelineGUI Controls
Start application without data collection
Resume data collection when needed
Observe paused region on the Time Line
Click “Mark Timeline” during collection
Observe the mark on the Time Line
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Repeat: Overview of Remote/Attached Collection Procedure/Architecture for Android*
Uses “adb” protocol/binary for collection & data transfer (must be in path)
Flexible collection configuration + control (pause/resume/stop)30
Target device
amplxe-runss
Host
VTuneGUI
VTune result
VTune collector binary runs on target and stores result on target
Data is opened in GUI and symbols are
resolved using modules stored in result dir
User can specify search dir with separate debug
files if needed
amplxe-cl
control collection
transfer data/modules
VTune result
driver
adb
adb
Transfers data collected remotely back to host automatically together with stripped application modules for symbol resolution
GUI Collector Control
Some collection types require signed drivers accessed from rooted device
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
How to Collect
31
Via the command line
amplxe-cl --collect hotspots --target-process=<appName> [Other Options] --target-system=android
amplxe-cl --collect advanced-hotspots[Other Options] --target-system=android
amplxe-cl --collect [atom-general-exploration | snb-general-exploration] [Other Options] --target-system=android
Via the GUI1)Attach device to host – via adb2)Create Project in VTune
and set target system3)Click “Launch New Analysis”
then Select Analysis Typethen click “Start”
4)Wait till collection finishes or click Pause, Resume, or Stop Collection.
Click “Command Line….” – dialog will displaythe Command Line for that analysis type
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Advanced TopicsIntel® VTune™ Amplifier for Systems
32
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Choose Specific Processor Events
Change Sampling Rates
Collect Samples for Entire System, Single Application, Kernel Space Only, User Space Only
Many, Many More Options….
Note: Very flexible option set – (with defaults presented earlier) lets user collect as much or as little potential information in a single run, with the caveat that the more you collect the higher the overhead – which will impact the performance of what you are measuring
33
Custom Analysis Typesallows many advanced options for collection
More details in Intel® VTune™ Amplifier Help: User's Guide
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Quickly identify cause of regressions.
Run a command line analysis daily
Identify the function responsible so you know who to alert
Compare 2 optimizations – What improved?
Compare 2 systems – What didn’t speed up as much?
34
Compare Results Quickly - Sort By Difference
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Power AnalysisUsing Intel® Energy Profiler in Intel® VTune™ Amplifier for Systems
35
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Traditional optimization Race to Idle
Perform Operation Faster – Then Sleep
Achieved by
Use of new instructions
Increase core parallelism
Use Standard Performance Optimization Tools – like Intel® VTune™ Amplifier for performance Analysis
New optimization Increase uninterrupted idle time
Use SoC components as needed
Achieve by
Reduce the frequency of activity
Consolidate activities
Run code on appropriate SoC block
Turn off components (or system)when not in use
Increase Power Efficiency
Minimize Wake-ups from Timers and Interrupts
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® Energy ProfilerEnergy and Power Profiler for System Software Developers
Optimize Software for Extended Battery Life
Find the Cause of Wake Ups That Waste Energy
− Interrupts mapped to the IRQ/device
− Timers mapped to the scheduling process
− Data correlated with Android Wakelocks
Android Tools
• SoCWatch -collector (command line only)
• VTune Amplifier 2016 for Systems UI – for displaying data
37
Get Actionable Data to Extend Battery Life
Requires specific SOCs. A rootable version of the OS is required, with version compatible device drivers (or kernel sources + signing key)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
CPU Sleep States
38
• Flexible C-States to Select Idle Power Level vs. Responsiveness
Core voltage*
Core clock
PLL
L1 caches
L2 cache
Wakeup time*
Idle power*
off
Active state
off
flushed
off
off
flushed
off
partial flush
active
* Rough approximation
C6
off
off
off
off
C0 C1 C2 C3
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Processor Power and Processor Frequency
Copyright © 2004 Intel Corporation. All rights reserved.
39
9
59
109
159
209
259
309
359
0 0.5 1 1.5 2 2.5 3 3.5
Po
we
r (w
)
Frequency (GHz)
Power vs. Frequency Curve for Single Architecture
Small Increases in Processor Speed
Results in Large Increases in Power
39
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
40
CPU C-States / P-States
C1
C2
C3
C4
C6
Pn
P1
P0 CPU
Active
CPU
Sleep
P0 - CPU active at highest frequency (HFM)
Pn - CPU active at lowest frequency (LFM)
C0 - CPU active (In any P-state)C0
C1 - Core clock is Off
C3/C4 - Reduced Voltage, Partial L2 cache flush
C6 - Core Off, L2 cache flush, state saved to SRAM
The deeper the sleep state
more power saving
but longer to wake up
Po
we
r H
igh
er
La
ten
cy G
rea
ter
40
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Find Process/Thread Waking System up
41
C-State Wakeup
Identify the object which woke the CPU up the most often
Reduce the # of wakeups
Identify if the Processor was asleep (C1-C6) mostly or awake (C0)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Small Increases in Processor Speed Results in Large Increases in Power
Determine when the CPU Frequency went up
Determine what frequency the CPU was running at and for how long…
42
Determine the CPU Frequency
Hovering the mouse cursor over a point in the timeline will bring up a pop-up box showing more detailed information such as specific measured frequency at that measurement time.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Component Device StatesFind Components Wasting Power
43
Intel Device States:
DOi0 = On
DOi1-DOi2 = Intermediate
D0i3 = Off
Find:
• Which Devices are on/off?
• For this example no media use and only periodic rendering?
• When they got turned on/off?
• When a device is not in use… the software needs to turn it off
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
C-States (CPU wakeups)
P-States (CPU frequency)
SOi States (System State)
Android Wakelocks
Temperature (Core, Skin, SoC, PMIC)
N(orth) C(luster) Device States
S(outh) C(luster) Device States
45
Potential Collections
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Correlate Metricsto find patterns
46
See all the metrics at once
Find patterns
Find the cause
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
47
Correlate CPU Frequency, Sleep State, Wake-up Objects, etc...
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Command Line Tool focused on Power Analysis
Correlates key hardware and OS data providing complete system view
Selects the best collection method based on user input
Tracing: 100% accurate
Collects every state change
Snapshot: minimum overhead
Read at start and end of collection, provide difference
Polling: reads values 10 times/sec (configurable)
48
SoCWatch for Android
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
49
SoCWatch Process
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
./socwatch –f sys –f wakelocks –t 10
-f sys // collects all metrics
-t 10 // defines duration of collection
Snapshots , Traces PStates for 10 seconds, create default SocWatchOutputfiles
Import into VTune Amplifier on host via:
adb pull <path-on-target>/SocWatchtOutput.sw1
amplxe-cl -import ./SocWatchOutput.sw1 –r <project name>
Open Results in VTune Amplifier GUI
50
Example Command Line Usage
More details in SoCWatchForAndroid_v1_3_0.pdf
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
51
CPU Collectable Metrics
parameter Metric Description Collection Method(s)
-f cpu-cstate App Processor Utilization
% of time spent executing instructions
Snapshot
-f cpu-pstate App Processor Frequency –
PStates
% of time spent in each PState or timeline of PState
transitions
Trace
-m –f cpu-cstate
App Processor Wakeups
Reasons why the App Processor exits a CState
Trace
-f cpu cpu-cstate & cpu-pstate
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
52
SoC Collectable MetricsMetric Description Collection
Method(s)
-f nc-dstate SoC South Cluster
Component Power States –SC D0ix States
% of time each SC SoC component spent in
each D0ix state
snapshot
-f sc-dstate SoC North Cluster
Component Power States –NC D0ix States
Statistical % of time each NC SoC
component spent in each D0ix state or
statistical timeline of D0ix behavior
Poll
-f device nc-dstate + sc-dstate
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
53
Temperature Collectable MetricsMetric Description Collection
Method(s)
-f core-temp App Processor Temperature
Statistical % of time spent in each temperature or statistical timeline of
temperature behavior
poll
-f soc-temp SoC Temperature
poll
-f pmic-temp PMICTemperature
poll
-f skin-temp Skin Temperature
Poll
-f temp core-temp + soc-temp +pmic-temp + skin-temp
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
54
System Collectible Metrics
Metric Description Collection Method(s)
-f acpi-state ACPI Suspend-To-RAM State – S3
% of time spent in S3 state or timeline of S3
transitions
Trace
-f s0i-state SoC Power States -S0ix States
% of time spent in each S0ix state
snapshot
-f wakelock Traces both user and kernel wakelocks used during the collection
trace
-f sys cpu + device + temp +acpi-state + s0i-state
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Advanced reporting options
Configure polling intervals
Command line text report output format
VTune Amplifier GUI offers same Viewpoints/Grouping/Filtering and regression analysis capabilities as presented earlier.
55
More Options Available for SoCWatch
More details in SoCWatchForAndroid_v1_5_4.pdf
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Installation Overview
56
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Installing Intel® System Studio to the Host System
57
Linux* Windows*
1) ./online_install.sh 1) w_cembd_2016.0.xxx_online.exe
…or… …or…
2) tar l_cembd_p_2016.0.xxx.tgzl_cembd_p_2016.0.xxx.tg/install.sh
2) w_cembd_2016.0.xxx.exe
Default Install Location
/opt/intel/system_studio_2016.0.xxx/vtune_amplifier_for_systems
C:\Program Files (x86)\Intel\System Studio 2016.0.xxx\vtune_amplifier_for_systems
Location of Target Files
<path-to-SystemStudio>/target/system_studio_target.tgz
<path-to-SystemStudio>\target\system_studio_target.tgz
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Performance Collection Requirements for VTune Amplifier 2016 for Systems:
Basic Hotspots including user mode stacks to C/C++ Source
− All available Android devices based on Intel® Architecture such as those at http://software.intel.com/en-us/android/get-device.
− Note: Install to target device is not needed
Java Functions with Java Source, JIT assembly or DEX Drill Down
− Rootable device
− Instrumented Java/Dalvik/ART JVM (Available in most Intel® Atom processor builds)
Hardware-Event Based Sampling (Advanced Hotspots, General Exploration, System [Wide] Profiling, Kernel Stacks, and Custom Analysis Types)
− Rootable device
− Version compatible signed performance analysis drivers
58
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Requirements for Intel® Energy Profiler
−Rootable device
−Version compatible signed Intel® Energy Profiler drivers
59
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Best option is to get the drivers integrated into the build
Reference builds from Intel® Corporation already have them integrated –sometimes the latest version of driver needs to be updated…
Some OEM builds have the driver
Note: This is “much, much” easier….
Else you need to build the drivers
The install package contains the sources and scripts needed to build the drivers If you need to build the drivers against the existing kernel
You need to have:– The public/private key for that kernel
– The exact source for the kernel used to build that kernel
If you do not have the above then you will need to build a new kernel on your host system, install it on the device, and use the newly generated key to build and sign the drivers.
Instructions in Help/User’s Guides
60
Building Device Drivers
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Install Intel® Energy Profiler to Remote Target
61
Linux* Windows*
$tar –zxvf <path-to-SystemStudio>/target/system_studio_target.tgz
“unzip” <path-to-SystemStudio>\target\system_studio_target.tgz
$adb root $adb root
$system_studio_target/socwatch_android_vXX.XX/socwatch_android_install.sh
$system_studio_target/socwatch_android_vXX.XX/socwatch_android_install.bat
See SoCWatchForAndroid_vXX_XX_XX.pdf for full details
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
62
Steps to start SoCWatch
SoCWatch
$adb shell
#cd /data/socwatch
#. ./setup_socwatch_env.sh
#insmod /lib/modules/socperf1_2.ko
#insmod /lib/modules/SOCWATCH1_5.ko
#./socwatch --help
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Capabilities of VTune Amplifier 2016 for Systems on Android*
To Identify Performance Issues
Hotspot, Advanced-Hotspots, General Exploration
Other Advanced Options(Custom Collections, Regressions, Frames)
To Identify Power Issues
CPU Wake-ups, Frequency, Device States, Wakelocks
To Zoom In on your Issue via the GUI
Grouping, Filtering, Sorting, Comparing
Workflow for
Performance and Power Analysis Steps
Installation
Including System Requirements
Collecting
Viewing Data
63
Summary
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Intel® System Studio 2016 provides deep system-level insights into power, reliability and performance to help accelerate time to market of Intel Architecture-based embedded and mobile systems
For other versions contact:
Your Intel representative or …
intelsystemstudio@intel.com
Note: Most features presented here require access to a rootable Android* device, and version compatible device drivers.
64
Call to Action
For more information, to evaluate, or purchase:http://intel.ly/system-studio
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
http://intel.ly/system-studiohttp://software.intel.com/en-us/intel-vtune-amplifier-for-systems
http://software.intel.com/en-us/intel-energy-profiler
Premier Support: https://premier.intel.com
Forums: http://software.intel.com/en-us/forum/intel-system-studio/
Email: intelsystemstudio@intel.com
Release Notes:
http://software.intel.com/sites/default/files/release_notes_amplifier_for_android_linux.pdf
VTune Amplifier Help Documentation:
http://software.intel.com/en-us/vtuneampxe_2013_ug_linSubTopic-> Intel VTune Amplifier User’s Guide : Running Analysis Remotely
http://software.intel.com/sites/default/files/managed/c8/f9/SoCWatchForAndroid_v1_3_0.pdf
http://software.intel.com/sites/default/files/managed/9d/59/WakeUpWatch_v3_1_6.pdf
KB Articles: http://software.intel.com/en-us/articles/intel-system-studio-articles
http://software.intel.com/en-us/articles/android-features-in-intel-vtune-amplifier-2014-for-systems-requirements
http://software.intel.com/en-us/articles/using-intel-vtune-amplifier-on-non-rooted-android-devices
http://software.intel.com/en-us/articles/how-to-use-the-intel-energy-profiler-in-intel-system-studio-2014
65
Additional Resources
65
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Copyright © 2014, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
66
Legal Disclaimer & Optimization Notice
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Backup SlidesUsing Intel® VTune™ Amplifier 2016 for Systemson Android Systems
User APIs
WuWatch
68
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
69
Intel® VTune™ Amplifier for SystemsUser APIs
Enable you to
• control collection
• set marks during the execution of the specific code
• specify custom synchronization primitives implemented without standard system APIs
To use the user APIs, do the following:
• Include ittnotify.h, located at <install_dir>/include
• Insert __itt_* notifications in your code
• Link to the libittnotify.lib file located at <install_dir>/lib
• New feature allows creating a csv file instead of using user
API’s… see Help : Creating a CSV File with External Data
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
70
Intel® VTune™ Amplifier User APIs
Collection Control APIs
Thread naming APIs
void __itt_pause (void) Run the application without collecting data. VTune™
Amplifier reduces the overhead of collection, by collecting
only critical information, such as thread and process
creation.
void __itt_resume (void) Resume data collection. VTune™ Amplifier resumes
collecting all data.
void __itt_thread_set_name (const
__itt_char *name)
Set thread name using char or Unicode string, where
name is the thread name.
void __itt_thread_ignore (void) Indicate that this thread should be ignored from
analysis. It will not affect the concurrency of the
application. It will not be visible in the Timeline pane.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
71
Intel® VTune™ Amplifier Collection Control APIs
int main(int argc, char* argv[]){
doSomeInitializationWork();
__itt_resume();while(gRunning) {
doSomeDataParallelWork();}__itt_pause();
doSomeFinalizationWork();return 0;
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
• Useful to observe when certain events occur in your application or identify how long certain regions of code take to execute
• Event APIs enables you to annotate an application when certain events occur
__itt_event __itt_event_create(char *, int);
__itt_event_start(__itt_event);
__itt_event_end(__itt_event);
72
Intel® VTune™ Amplifier User Event APIs
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
73
Intel® VTune™ Amplifier User Event APIs
__itt_event __itt_event_create(const
__itt_char *name, int namelen ); Create a user event type with the specified name. This API
returns a handle to the user event type that should be
passed into the following APIs as a parameter. The
namelen parameter refers to the number of characters,
not the number of bytes.
int __itt_event_start( __itt_event event ); Call this API with an already created user event handle to
register an instance of that event. This event appears in
the Timeline pane display as a tick mark.
int __itt_event_end( __itt_event event ); Call this API following a call to __itt_event_start() to show
the user event as a tick mark with a a duration line from
start to end. If this API is not called, the user event
appears in the Timeline pane as a single tick mark.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
74
Intel® VTune™ AmplifierUser Events - using APIs
DWORD WINAPI aiWork(LPVOID lpArg){
int tid = *((int*)lpArg);__itt_event aiEvent;aiEvent = __itt_event_create("AI Thread Work",14);
while(gRunning) {WaitForSingleObject(bSignal[tid], INFINITE);__itt_event_start(aiEvent);doSomeDataParallelWork();__itt_event_end(aiEvent);SetEvent(eSignal[tid]);
}return 0;
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
75
Intel® VTune™ AmplifierVisualizing Events in the Timeline View
User defined task
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Frame Analysis –Analyze Long Latency Activity
Frame: a region executed repeatedly (non-overlapping).
• API marks start and finish
Examples:
• Game – Compute next graphics frame
• Simulator – Time step loop
• Computation – Convergence loop
76
Intel® VTune™ Amplifier Frame Analysis Application
voidalgorithm_1();voidalgorithm_2(int myid);doubleGetSeconds();DWORD WINAPI do_xform (void * lpmyid);bool checkResults();__itt_domain* pD = __itt_domain_create (“myDomain”);
while( gRunning ) {__itt_frame_begin_v3(pD, NULL);. . .
//Do Work. . .__itt_frame_end_v3(pD, NULL);
}
for (int k = 0; k < N; ++k) {int ik = i*N + k;int kj = k*N + j;c2[ij] += a[ik]*b[kj];}
Region (Frame)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
__itt_domain* __itt_domain_create( const __itt_char *name );
Create a domain with a domain name. Since the domain is expected to be static over the application's execution time, there is no mechanism to destroy a domain. Any domain can be accessed by any thread in the process, regardless of which thread created the domain. This call is thread-safe.
void __itt_frame_begin_v3(const__itt_domain *domain, __itt_id *id);
Define the beginning of the frame instance. A __itt_frame_begin_v3 call must be paired with a __itt_frame_end_v3 call. Successive calls to __itt_frame_begin_v3 with the same ID are ignored until a call to __itt_frame_end_v3 with the same ID. •domain is the domain for this frame instance. •id is the instance ID for this frame instance, or NULL.
void __itt_frame_end_v3(const__itt_domain *domain, __itt_id *id);
Define the end of the frame instance. A __itt_frame_end_v3 call must be paired with a __itt_frame_begin_v3 call. The first call to __itt_frame_end_v3 with a given ID ends the frame. Successive calls with the same ID are ignored, as are calls that do not have a matching __itt_frame_begin_v3 call. •domain - The domain for this frame instance •id - The instance ID for this frame instance, or NULL for the current instance.
77
Intel® VTune™ Amplifier Frame APIs
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
78
Frame Analysis Using APIs
__itt_domain* pD = __itt_domain_create ("SimDomain");
while(gRunning) {__itt_frame_begin_v3(pD, NULL);
start = clock();//Wait all threads before moving into the next frameWaitForMultipleObjects(FUNCTIONAL_DOMAINS, eSignal, TRUE,
INFINITE);stop = clock();//Give all threads the "go" signalfor (int i = 0; i < FUNCTIONAL_DOMAINS; i++)
SetEvent(bSignal[i]);if (frame % NETWORKCONNETION_FREQ == 0) {
//Start network threadSetEvent(bNetSignal);
}__itt_frame_end_v3(pD, NULL);
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Summary View / Frame Rate Chart
Adjust the frame rate then Apply changes
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
80
Frame Analysis Find Slow Frames With One Click
(1) Regroup Data
… (Partial list shown)
Before: List of Functions Taking Time
After: List of Slow Frames
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
81
Just 2 more clicks shows where to focus tuning…
Slow functions in slow framesResult: Functions taking a lot of time in slow frames
(1) Only show slow frames
(2)Regroup: Show functions
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
• A task is a logical unit of work performed by a particular thread
• Tasks can be nested
• You can use task APIs to assign tasks to threads
• One thread executes one task at a given time
• Tasks may correspond to functions, scopes, or a case block in a switch statement
82
Intel® VTune™ AmplifierTask APIs
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
83
Task API primitivesUse This Primitive To Do This
void ITTAPI__itt_task_begin ( const__itt_domain *domain, __itt_id taskid, __itt_idparentid, __itt_string_handle *name)
Create a task instance on a thread. This becomes the current task instance for that thread. A call to __itt_task_end() on the same thread ends the current task instance.
void ITTAPI__itt_task_begin_fn ( const__itt_domain *domain, __itt_id taskid, __itt_idparentid, void *fn)
Begin a task instance on a thread.
void ITTAPI__itt_task_end ( const__itt_domain *domain)
End a task instance on a thread.
Parameter Description
__itt_domain The domain of the task.
__itt_id taskid This is a reserved parameter.
__itt_id parentid This is a reserved parameter.
__itt_string_handle The task string handle.
*fn This is a reserved parameter.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
84
Task API usage__itt_domain* domain = __itt_domain_create(L"Task Domain");__itt_string_handle* UserTask = __itt_string_handle_create(L"UserTask");__itt_string_handle* UserSubTask = __itt_string_handle_create(L“UserSubTask");
int main(int argc, char* argv[]){
...__itt_task_begin (domain, __itt_null, __itt_null, UserTask);//create many threads to call work()__itt_task_end (domain);...
}
work(){
__itt_task_begin (domain, __itt_null, __itt_null, UserSubTask);do_foo();__itt_task_end (domain);return 0;
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
85
Using Task APIHotspots analysis – Bottom-up pane
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
86
Using Task APIHotspots analysis – Tasks pane
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Collection and reporting
WuWatch
– High-K next generation Intel® Atom™ processor (codenamed Medfield, Lexington, or Clovertrail Plus)
SoCWatch
– Intel 22 nm ultra-mobile processor (code name: Silvermont)
– 4th Generation Intel Core Processors (code name:Haswell)
Collection runs on target system (through adb)
Reporting options
Text/CSV style reports from command line
– Some advanced reports only available from command line
Python scripts will generate summary reports
Intel VTune Amplifier GUI (Linux or Windows)
87
Collection Tools for Power State Analysis
WuWatch no longer used
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
88
Wakeup Watch Process
wuwatch
(collect)
Raw data
(WW1 file)
wuwatch
(process)
Raw text trace
(TXT file)
Summary_data_vX.py
Summary results
(CSV or TXT file)
VTune Amplifier
(correlate / visualize)
WuWatch no longer used
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
In adb root shell on target device:
Get Help>./wuwatch --help
Execute wuwatch command on Android target>./wuwatch -cs –ps –kb –ss –ds -dn -wl -t 60 –o /data/results/test
Execute Benchmark (usually via Android UI)
Import into VTune Amplifier GUI
adb pull /data/wuwatch/results/test.ww1
amplxe-runss --import-socwatch-data ./test.ww1 [-r testresults]
89
WuWatch 3.1 Collection Usage
More details in WakeUpWatchForAndroid.pdf
WuWatch no longer used
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Potential Collections:
Monitors and traces sleep state transitions (c-states) causing wakeups to srccode that scheduled the timer. switch = -cs
Monitors processor frequency changes (p-states). switch = -ps
Platform sleep states (S–states). switch = -ss
Device sleep states (D-states). switch = -ds
Traces Android* wakelocks to the process. switch = -wl
90
WakeUp Watch (wuwatch)
WuWatch no longer used
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
Drivers needed:
SOCWATCH1_3.ko or apwr3_1.ko
vtsspp.ko (only needed for call stack info)
pax.ko
sep3_10.ko*
For Intel® reference builds, drivers are located in either:
/lib/modules or /system/lib/modules
Verify if the drivers are on your system via:
adb shell ls /lib/modules /system/lib/modules
91
Drivers Needed for VTune Amplifier 2016 for Systems
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice
(Re)Installing VTune™ Amplifier Remote Collector to Target
92
Linux* Windows*
“unzip” <path-to-SystemStudio>\Target\system_studio_target.tgz
$adb root $adb root
$vtune_amplifier_2016_for_systems/vtune_amplifier_2016_for_systems/bin[64|32]/amplxe-androidreg.sh --package-command=install --jitvtuneinfo=[src|none|jit|dex]
$system_studio_target\vtune_amplifier_2016_for_systems/bin32/amplxe-androidreg.bat --package-command=install --jitvtuneinfo=[src|none|jit|dex]
See Help -> Intel® VTune™ Amplifier User's Guide -> Running Analysis Remotely -> Preparing a Target Android* System for Remote Analysis
This step is only needed to support Java.
Recommended