42

Application/OS performance: What does it depend on? Hands-on Lab

Embed Size (px)

DESCRIPTION

HP Technology Services Master Technologists Chris and Greg Tinker will demonstrate the advanced debugging and technical tactics HP Enterprise Technical Services engineers use to triage back-office IT events that could critically impact the business. This is a deep technical session with engineers demonstrating the methodologies they employ to address enterprise application and operating system issues.

Citation preview

Page 1: Application/OS performance: What does it depend on? Hands-on Lab
Page 2: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Application OS performance What does it depend on?

Greg Tinker – HP Master Technologist

Chris Tinker – HP Master Technologist

Month day, 2013

Page 3: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3

My background

Title

HP Master Technologist

IT industry experience • Published Author

• Patents pending

• Social media/white papers

Professional information • HP MVP

• Social media ambassador

Years at HP

14

Current responsibilities • Lead technologist for HP’s Global Solution Support

Engineering (GSSE) team

Name: Chris Tinker

E-mail: [email protected]

Page 4: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4

My background

Title

HP Master Technologist

IT industry experience • Published Author

• Patents pending

• Social media/white papers

Professional information • HP MVP

• Social media ambassador

Years at HP

14

Current responsibilities • Lead technologist for HP’s Global Solution Support

Engineering (GSSE) team

Name: Greg Tinker

E-mail: [email protected]

Page 5: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Application performance

Page 6: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6

The stack

Layer overview U

ser Sp

ace

Applications ~~ User Code

GNU C lib

Kern

el Sp

ace

System Call Interface

VFS (ext3, NTFS, VxFS, etc)

Page alloc

MPIO – device mapper

Char devices

LVM, VxVM, sd<alpha>

BLK DV Drivers SCSI IDE Etc…

sockets memory process

Tasks

scheduler

Interrupts

CPU

VM

logical

protocols

Net Drv BUS Dvrs

Page 7: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7

Overview

Application Performance

Ap

plica

tion

Execution

Data Access

Managing resources

Platform Architecture

Page 8: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8

Architecture CPU

IA32 program on an X86_64 machine – can it run on a PA_RISC?

Can an executable run on a machine for which it was not compiled?

Performance trade offs

MAGIC

Originally used to determine binary object type exec_magic, demand_magic, shared_magic,

shmem_magic; however, around 1999/2000 ELF was adopted as the new file format,

replacing the magic

Page 9: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9

Architecture CPU

• Instruction set – leverage branch prediction

• Frequency

• BUS

• cache– L3, L2, and L1 (location from Cores: registers, AL Units, Branch

UNITS, LS units, FP UNITS, etc)

• CPU bus: – QPI – Intel QuickPath Interconnect

– HTB – AMD Hyper Transport Bus

– Frontside Bus – Older INTEL/AMD

– RunWay bus – IA64

• NUMA

Page 10: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10

Architecture Execution – access to address space

• Locality domains

• Memory interleaving: NODE, Channel, Bank, Cell( depends on hardware)

• OS’s ability to determine Locality domains and differentiate cost to each from each

• SLIT – Advanced performance tuning option on HP Proliant BIOS systems

• Integrity supports LDOMS – Locality domains

Page 11: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11

Architecture Execution – access to address space: interleaving

• Memory bank interleaving When you use memory bank interleaving, data goes alternately to memory banks through the common memory channel connecting the DIMM banks and the integrated memory controller. Memory bank interleaving increases the probability that more DIMMs will remain in an active state (requiring more power) because the memory controller alternates between memory banks and between DIMMs.

Memory bank interleaving is automatically enabled on a processor node under the following conditions:

• Two single-rank DIMMs per channel result in two-way bank interleaving.

• Two dual-rank DIMMs per channel result in four--way bank interleaving.

• Two quad-rank DIMMs per channel result in eight-way bank interleaving.

• Two dual-rank DIMMs and one quad-rank DIMM result in eight-way bank interleaving, in servers using three DIMMs per channel.

Page 12: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12

Architecture Execution – access to address space: interleaving

Memory channel interleaving

Memory channel interleaving transfers data by alternate routing through the two available

memory channels. As a result, when the memory controller must access a block of logically

contiguous memory, the requests don’t stack up in the queue of a single channel. Alternate

routing decreases memory access latency and increases performance. However, memory

channel interleaving increases the probability that more DIMMs must remain in an active state.

Memory channel interleaving is always active on AMD Opteron 6200 Series processors.

Page 13: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13

Architecture Execution – access to address space: interleaving

Memory node interleaving

Node interleaving can interleave memory across any subset of nodes in the multi-processor

system.

Memory Cell interleaving

The way a multi-cell machine would interleave memory (cell local vs. global see superdome

partitioning)

Page 14: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14

Architecture PA - Runway

CC

CPU

P0

Runway Runway

Runway Runway

CPU

P2

CPU

P1 CPU

P3

MID1 Data

Quad 2 Quad 3

Quad 0 Quad 1

MID0 Data

MID0 Adr + Ctl

MID1 Adr + Ctl

M2

M2 M2

M2 M2

M2 M2

M2

Legacy Superdome cell

Page 15: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

Architecture INTEL - FSB

Legacy FSB

Page 16: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

Architecture AMD HTB

DL685 Hyper Transport BUS

Page 17: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17

Architecture Intel QPI

*http://www.intel.com/content/dam/staging/image/Kim/quickpath-technology.png

Page 18: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18

Architecture BUS limits

Bandwidth is limited by the lanes and the protocols

Manufactures standardize on a PCI bus for the cards & slots

• 2X 32bit PCI @ 33 Mhz ~125 MB/s

• 4X 64Bit PCI @ 33/66 Mhz

• 4X 64Bit PCIX @ 66 Mhz

• 4X 32Bit PCIX @ 133 Mhz

• 8X 64Bit PCIX @ 133Mhz ~ 1024MB/s

PCI-e replaces the above older PCI architecture… and is capable of hitting significantly higher signaling rates per lane 8Gbit/sec per lane!

Expect this to increase as protocols become more efficient

Page 19: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19

Architecture BUS limits

Different types of memory have way different performance profiles!

• Anywhere from 800Mhz to 1333MHz

• http://h18004.www1.hp.com/products/servers/options/tool/hp_memtool.html

Page 20: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

Architecture BUS limits

SLIT

• Allows the BIOS to send the

hardware layout to the OS

• System locality Information

Table

• OS must support SLIT in order

to leverage these latency

factors

Page 21: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Execution

Page 22: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22

Execution Objects

Compiled or interpreted

• speed vs. agility

– Interpreted can change at runtime..

Interpreted is Indirectly executed

Compiled is directly executed

Many languages today implement just-in-time compilers

• PERL is compiled by the Perl engine before it is executed (so it is first interpreted, then compiled, then executed). Of course, you can compile PERL to produce an executable object.

Page 23: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23

CPU Executable types Cross platform

IA-64 ~ RX8600

32bit ELF

X86_64 ~ DL980

PA RISC

MIPS

IA64 ELF

IA32

ELF-64 / X86_64

PARISC

MIPS

Use of emulation engines

ARIES

− HP HPUX platform engine allows for PA RISC to

execute on IA64 OS kernel and platform

Binfmt

− Linux driver module that allows for emulation of

many architecture types

Objects

Execution

Page 24: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24

Execution Language examples

Compiled Interpreted

C,C++,C# BASIC

Visual Basic .NET PostScript

Python Python

Lisp Scripting Languages

Java

PERL PERL*

Page 25: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25

Execution Determine object type

# file <string> • Uses the magic to determine file type!

# file /boot/vmlinuz-3.0.0-26-generic-pae

/boot/vmlinuz-3.0.0-26-generic-pae: Linux kernel x86 boot executable bzImage, version 3.0.0-26-generic-pae

(buildd@roseapple) #42-Ubuntu SMP Wed Sep , RO-rootFS, root_dev 0x801, swap_dev 0x4, Normal VGA

# file /bin/ls

/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)

readelf -a /bin/ls | head -50 ELF Header:

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Class: ELF32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: EXEC (Executable file)

Machine: Intel 80386

Version: 0x1

Entry point address: 0x804be34

Page 26: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26

Sharing resources

System V message queues

Mutex locks

Data sharing

Context switching

Data access

The never forgiving sleep() interrupt is a better way to go

Execution

Page 27: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27

Execution Processes and Threads

execve()

#include <unistd.h>

int execve(const char *filename, char *const argv[],

char *const envp[]);

*filename ~ must be executable or shell with interpreter called out “#!”

Page 28: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28

Execution Processes and threads

Exec(), fork(),clone() .. Vfork(), clone2(), etc

Examples:

16935 fork() = 17424 <-- NEW task's (HWP)

17424 execve("/bin/ls", ["ls", "-F", "--color=auto", "-l", "test"], [/* 56 vars */])

= 0

Page 29: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29

Processes and threads

HWP – Heavy Weight Process –forks() a new process

LWP – light Weight Process – thread ~ clone()

Major different is in sharing of resources

HWP only shares the parent's text; whereas, a LWP can share everything but the

private stack.

HWP’s utilize pipes, PF_UNIX (Unix sockets), signals, or Inter-process

Communication's shared memory, message queues, and semaphores to share data.

Execution

Page 30: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 30

Processes and threads

UNIX Processes

Single threaded process

Multithreaded process

Linux Processes

Single threaded process

Multithreaded process

Task group

Process/Task --

Thread(s) -- Execution

Page 31: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31

Execution Basic portions of address space

Text ~ machine code instructions.

• Usually the OS sets this to read only .. Allows for many instances of the same execution to reference a single structure– the application code normally does not change.

Data

• Initialized Read only

• Initialized read/write

• Uninitialized Data

• Heap – dynamically allocated memory

Stack – local variables, stack frames

Shared memory

Page 32: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32

Memory – user address space

routine var1() var2() …

Main() routine1() routine2() …

Array1 Array2 …

stack

text

data

heap

routine1 var1() var2()

Main() routine1() routine2() …

Array1 Array2 …

Thread stack

text

data

heap

routine1 var1() var2()

Thread stack

Execution

Page 33: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33

Execution Tempered by

logic

• Compiler optimization

• Execution flow

CPU

• Hardware

• Scheduler – task switching

Data fetch

• Memory

• IO

Locks and/or IPC

Page 34: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Profiling

Page 35: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35

Profiling Toolbag

Application instrumentation

• gprof, Valgrind, Visual Studio, komodo, Xcode – many others

Compiler instrumentation

• At time of compile – use flags to leverage trace pointers

Kernel tracing

• Great for understanding what the application is doing when it enters KERNEL space

System profiling

Environment profiling

Page 36: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 36

Profiling The layer involved and precision required determines toolbox

What is the application waiting on?

• CPU

• Networking

• Disk

• Filesystem

• locks?

Page 37: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 37

IPC Network Access

Semaphores

semop(), semctl()

Locking of resources

Messages queues

msgsnd() / msgrcv()

Shared memory

shmget() shmat()

RPC – (request /response framework)

Normally leverages sockets but can leverage Pipes (no network)

Socket (layer 5)

TCP/IP (transport)

Segments – frames!

RTT

Sliding windows

BDP (bandwidth delay product)

Latency

Throughput/bandwidth

Serialization/parallelization

Flow Control

PROFILING

Page 38: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38

Profiling The toolbox : Example

Linux Windows HP-UX Solaris AIX ESX

Collectl / Glance

Perfmon / sysinternals

GLANCE GLANCE topas esxtop

strace Sysinternals, Xperf

tusc Truss / strace

truss

Kitrace / Oprofile

Logman/perfmon/PAL

Kitrace caliper

trace

Page 39: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39

Profiling Glance

Ap

plica

tion

Object

Execution

Profiling

Labs

Platform Architecture

Page 40: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Labs

Page 41: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41

Labs Scenario 1

1. Where do you start?

2. What data would you collect?

3. How would you analyze it?

Page 42: Application/OS performance: What does it depend on? Hands-on Lab

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you