52
Managed Or Unmanaged? H o m e A b o u t W o r k s h o p s A r t i c l e s W r i t i n g T a l k s B o o k s C o n t a c t Is Managed Code Slo w er Than Unmanaged Code? Ask anyone the questio n above and they will say that manage d is slower than unmana ged code. Are they right?  No they are not . The  proble m is that when

Managed Faster

Embed Size (px)

Citation preview

Page 1: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 1/52

Managed Or Unmanaged?

H

ome

A

bout

Wor

k shops

Art

icles

Wr

iting

T

alk s

B

ook s

Co

ntact

Is Managed Code

Slower Than

Unmanaged

Code?

Ask 

anyone

the

questio

n above

and

they

will say

that

manage

d is

slower 

than

unmana

ged

code.

Are

they

right?

 No they

are not.

The

 proble

m is

that

when

Page 2: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 2/52

most

 people

think of 

.NET

they

think of 

other 

framew

orks

with a

runtime

, likeJava or 

Visual

Basic;

or they

may

even

think 

about

interpre

ters.

They do

not

think 

about

applicat

ions, or 

what

they do;

they do

not

think 

about

limiting

factors

like

network 

Page 3: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 3/52

or disk 

access;

in short,

they do

not thin

k .

.NET is

not like

those

framew

orks. It

has

 been

well

though

out and

Micros

oft has

 put a lot

of effort

in

making

it work 

well. In

this

article I

will

 presentsome

code

that

 perform

s some

comput

ationall

yintensiv

Page 4: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 4/52

e

operatio

n and I

will

compile

it as

 both

manage

d C++

and

unmana

ged C++. Then

I will

measur 

e how

each of 

these

libraries

 perform

. As

you will

see,

.NET is

not

automat

ically m

uch

 slower t

han

unmana

ged

code,

indeed,

in some

cases it

is much

Page 5: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 5/52

 faster .

Fast Fourier

Transform

Data

that

varies

over 

time

(for 

exampl

e,music)

will be

the

combin

ation of 

various

frequen

cy

compon

ents. A

Fourier 

Transfo

rm will

convert

the

time-

varying

data

into its

frequen

cy

compon

ents. I

came

across

Fourier 

Page 6: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 6/52

Transfo

rms

 because

I spent

six

years as

a

researc

h

scientist

 perform

ing spec

troscop

y

experim

ents.

One

experim

ent I

 perform

ed produce

d

an inter 

 ferrogr 

am, that

is, the

sample

under investig

ation

 produce

s a

respons

e over  

time

when

an

Page 7: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 7/52

interfer 

ence

 pattern

generat

ed from

white

light is

shone

on it.

The

interferr 

ogram

is the

respons

e over  

time but

the

informa

tion

require

d wasthe

respons

e of the

sample

to the

frequen

cy of  

theradiatio

n. So, a

Fourier 

Transfo

rm was

taken of 

the

time-

varying

Page 8: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 8/52

data to

yield

the

frequen

cy-

varying

respons

e. I was

 perform

ing

these

measur 

ements

in 1993,

and at

that

time a

PC

running

DOS

was justabout

fast

enough

to allow

me to

take the

time-

varyingdata,

 perform

the

Fourier 

Transfo

rm and

display

the

frequen

Page 9: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 9/52

cy

 based

results,

all in

real

time,

that is,

as the

measur 

ements

were

taken.

The

limiting

factor 

in this

 progra

m was

the

Fourier 

Transform

routine,

 because

it

involve

d so

many

calculations.

A

Fourier 

Transfo

rm

on N po

ints will

involve

2N 2 co

Page 10: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 10/52

mputati

ons, so

if you

have a

thousan

d data

 points

you will

 perform

two

million

comput

ations.

There is

a lot

of theor 

y about

Fourier 

Transfo

rms,

and Iwill not

go into

details

here,

  but the

theory

has lead

to aroutine

called

the Fas

 Fourier 

Transfo

rm (FF

T) that

through

Page 11: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 11/52

careful

data

manipul

ation it

will

generat

e a

Fourier 

Transfo

rm that

will

involve

 perform

ing

 just 2Nl 

og 2 N op

erations

. For  

exampl

e, if you

have athousan

d data

 points

then

using

the FFT

you will

 perform20,000

comput

ations.

The

FFT

routine

still

involve

s

Page 12: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 12/52

 perform

ing

some

trigono

metric

calculat

ions,

and it

involve

s many

numeric

and

array

operatio

ns.

Althoug

h the

FFT is

optimiz

ed

compar ed to

the

Fourier 

Transfo

rm, it is

still a

comput

ationally

intensiv

e

calculat

ion and

is a

good

routine

to

Page 13: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 13/52

exercise

the

 perform

ance of 

manage

d and

unmana

ged

code.

The Code

There

aremany

algorith

ms

availabl

e, and if 

you

 perform

a

Google

search

for FFT

you will

get

many

thousan

ds of  

hits. I

chose to

use

the Real 

Discret

e

Fourier 

Transfo

rm byTakuya

Page 14: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 14/52

Ooura

mainly

 because

the

code

was

clear 

and

easy to

change.

I made

four 

copies

of 

Takuya'

s code:

for 

unmana

ged

calculat

ions,for 

manage

d

calculat

ions

using

Manage

d C++.C+

+/CLI

and for 

C#.

The

only

change

I made

to the

Page 15: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 15/52

unmana

ged

library

was to

change

the

name of 

the

functio

n

to fourier 

andadd__decl

spec(dllexp

ort) so

that this

method

could

  be was

exporte

d from

the

library.

The

Manage

d C++

code

involve

d a few

more

changes

, but

these

were

relativel

y

minor:

Page 16: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 16/52

• m

et

h

od

s

 p

ar 

a

m

et

er 

s

w

er 

e

c

h

a

n

g

e

d

to

ta

e

m

an

a

g

e

d

ar 

ra

ys,

Page 17: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 17/52

a

n

d

the

o

ut

in

e

s

u

s

e

d

th

e

Ar

ra

y:

:L

en

gt

 p

o

 p

er ty

ra

th

er 

th

a

n

req

Page 18: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 18/52

ui

ri

n

ga

s

e

 p

ar 

at

e

le

n

gt

h

 p

ar 

a

m

et

er 

• tr 

ig

o

n

o

m

et

ri

c

o

ut

in

e

s

wer 

Page 19: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 19/52

e

u

s

ed

fr 

o

m

th

e

Ma

th 

cl

a

ss

• th

e

o

ut

in

e

w

a

s

e

x

 p

o

rt

e

d

a

s

a

st

atic

Page 20: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 20/52

m

e

m

 ber 

o

a

 p

u

 bl

ic

cl

a

ss

The

manage

d C++

code

wasthen

convert

ed to

C#,

which

involve

d fairly

minor 

changes

(mainly

in the

syntax

to

declare

arrays).

Finally,

the

Page 21: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 21/52

manage

d C++

code

was

convert

ed to

C+

+/CLI,

this

involve

d a little

 bitmore

work,

again,

mainly

 because

of the

way

that

arrays

are

declare

d. The

C+

+/CLI

code is

compile

d

as /clr:sa

fe becau

se it

does

not use

any

unverifi

able

constru

Page 22: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 22/52

Page 23: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 23/52

will be

a single

spike.

The test

harness

 process

takes

two

 paramet

ers, the

first

determines the

number 

of data

 points

that will

 be

tested,

and the

second

number 

is the

number 

of 

repeats

that will

 be

 perform

ed. FFT

routines

work 

 better if 

the

number 

of data

 points

is a

Page 24: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 24/52

 power 

of two,

so the

number 

you

give for 

the first

 paramet

er is

used as

the

 power (and

must be

less

than

28).

Each

routine

is

 perform

ed a

single

time

without

timing

so that

initializ

ation

can be

 perform

ed: the

library

is

loaded

and any

JIT

compila

Page 25: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 25/52

tion is

 perform

ed. This

is

importa

nt

 because

I am

interest

ed in

the time

taken to perform

the

calculat

ion.

After 

initializ

ation

the

calculat

ion is

 perform

ed

within a

loop

and

each

timing

is

stored

for later 

analysis

. From

these

timings

the

average

Page 26: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 26/52

time is

calculat

ed and

the

standar 

d error 

is

calculat

ed from

the

standar 

ddeviatio

n. The

standar 

d error 

gives a

measur 

e of the

spread

of 

values

that

were

taken.

Occasio

nally a

roguetime

will

occur 

(perhap

s due to

the

schedul

ing of ahigher 

Page 27: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 27/52

 priority

thread

in

another 

 process

) and

these

will

have an

effect

on the

mean.For a

 Normal

Distribu

tion the

majorit

y of 

values

should

 be

within

one

standar 

d

deviatio

n of the

mean.

So I

treat

any

value

outside

of this

range as

a rogue

value.

Of 

Page 28: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 28/52

course,

the

mean

and

standar 

d

deviatio

n are

calculat

ed

using

therogue

value(s)

, but

their 

effect

will be

minimi

zed if  

large

datasets

are

used.

So once

the

code

has

calculat

ed the

mean

and

standar 

d

deviatio

n it

goes

through

Page 29: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 29/52

the

dataset

and

remove

s values

that are

outside

of the

accepta

 ble

range

andthen the

mean

and

standar 

d error 

are

calculat

ed on

this

new

dataset.

The C+

+

compile

r will

allowyou to

optimiz

e code

for 

space

and for 

speed,

so Ihave

Page 30: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 30/52

written

a

makefil

e that

allows

you to

compile

the

libraries

for both

optimiz

ationsand for 

no

optimiz

ation.

The C#

compile

r also

 provide

s an

optimiz

ation

switch,

  but this

switch

does

not

distingu

ish

 betwee

n

optimiz

ation

for 

speed

or size,

so I just

Page 31: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 31/52

compile

d the

optimiz

ed

library

once.

The

results

given

 below

are for 

all of  these

options.

There is

a batch

file that

will

call nmake 

for each

option

and

store

the

results

in a

separate

folder.

Themanagedcodeuses a

 privateassembly and soit is fullytrusted .In this

casemuch of 

Page 32: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 32/52

the .NETsecuritycheckshave

 been

optimized away.

Results

I

 perform

ed two

sets of 

tests on

two

machin

es

with

.NET

2.0.

One

machin

e had

XPSP2

and had

a single

 process

or,

850MH

z

Pentium III,

with

512Mb

of 

RAM.

The

other 

machine had

Page 33: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 33/52

 build

5321 of 

Vista

and had

a single

 process

or, 2

GHz

Mobile

Pentiu

m 4,

with1Gb of 

RAM.

In each

case I

calculat

ed the

average

of 100

separate

FFT

calculat

ions on

217 (13

1072)

data

values.

From

these

values I

calculat

ed the

standar 

d error 

from

the

standar 

Page 34: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 34/52

d

deviatio

n. The

results

are

shown

in ms.

The

results

for the

Pentiu

m IIImachin

e are:

 

Unmanaged

Managed C++

C++/CLI

C# Managed

The

results

for the

Mobile

Pentiu

m 4 are:

 

Unmanaged

Managed C++

C++/CLI

C# Managed

As you

can see

the

Page 35: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 35/52

Page 36: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 36/52

ing this

rough

analysis

on the

values

for 

manage

d code

shows

that the

optimiz

ed codeis

 barely

faster 

than the

non-

optimiz

ed

code.

This

shows

that for 

manage

d code,

the

optimiz

ation

 perform

ed by

the com

 piler 

and 

linker h

as a

relativel

y small

effect

Page 37: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 37/52

on the

final

execute

d code,

 bear 

this in

mind

when

you

read my

conclus

ionsderived

from

these

results.

Interesti

ngly, in

these

tests,

there

are few

differen

ces

 betwee

n the

time

taken

for 

manage

d code

optimiz

ed for  

space

and

speed,

and that

on the

Page 38: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 38/52

Vista

machin

e code

optimiz

ed

for  spa

ce runs

quicker 

than

code

optimiz

edfor  spee

d .

 Note

that the

measur 

ements

of a

 particul

ar 

optimiz

ation

setting

were

taken at

the

aboutthe

same

time, so

the

relative

values

 betwee

n thedifferen

Page 39: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 39/52

Page 40: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 40/52

with

speed

optimiz

ed

code),

so the

state of 

the

machin

e is

more

likely tochange

 betwee

n each

 process

run than

during

the

 process

run.

Thus it

is less

accurat

e to

compar 

e results

for 

differen

t

optimiz

ations.

The

results

for C#

codeshowed

Page 41: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 41/52

that

there is

little

differen

ce

 betwee

n C#

and

manage

d C++

in terms

of  perform

ance.

Indeed,

the

optimiz

ed C#

library

was

actually

 slightly

 faster th

an the

optimiz

ed

manage

d C++

libraries

.

 Now

compar 

e the

manage

d

resultswith the

Page 42: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 42/52

Page 43: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 43/52

ed for  

speed

that the

unmana

ged

code is

faster 

than

manage

d code.

The

difference

 betwee

n

unmana

ged

code

and C#

code is

  just 2%

for the

Pentiu

m

II/XPS

P2

machin

e. There

is also a

2%

differen

ce for  

the

Pentiu

m

4/Vista

machin

e, but

Page 44: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 44/52

here the

C# code

is quick 

er .

Conclusions

Think 

of .NET

in these

terms.

The

.NET

compile

(manag

ed C++

in this

case,

  but the

samecan be

said for 

the

other 

compile

rs) is

essentia

lly theequival

ent of  

the

 parsing

engine

in the

unmana

ged C+

+

Page 45: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 45/52

compile

r. That

is, the

compile

r will

generat

e tables

of the

types

and

method

s and perform

some

optimiz

ations

 based

on high

level

aspects

like

how

loops

and

 branche

s are

handled

. Think 

of 

the

.NET

JIT

compile

r as the

 back 

end of 

the

unmana

Page 46: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 46/52

ged

compile

r: this is

the part

that rea

lly kno

ws

about

generati

ng code

 because

it has togenerat

e the

low

level

x86

code

that will

 be

execute

d. The

combin

ation of 

a .NET

compile

r and

the JIT

compile

r is an

equival

ent

entity to

the

unmana

ged C+

+

compile

Page 47: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 47/52

r, the

only

differen

ce is

that it is

split

into two

compon

ents

meanin

g that

thecompila

tion is

split

over 

time. In

fact,

since

the JIT

compila

tion

occurs

at the

time of 

executi

on the

JIT

compile

r can

take

advanta

ge of  

'local

knowle

dge' of 

the

machin

Page 48: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 48/52

e that

will

execute

the

code,

and the

state of 

that

machin

e at that

 particul

ar time,to

optimiz

e the

code to

a

degree

that is

not

 possible

with the

unmana

ged C+

+

compile

r run on

the

develop

er's

machin

e. The

results

shows

that the

optimiz

ation

switche

Page 49: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 49/52

s in

manage

d C++

and C#

have

relativel

y small

effects,

and that

there is

only a

2%differen

ce

 betwee

n

manage

d and

unmana

ged

code.

Signific

antly,

C# code

is as

good,

or 

 better 

than

manage

d C++

or C+

+/CLI

which

means

that

your 

choice

Page 50: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 50/52

to use a

manage

d

version

of C++

should

 be

 based

on the

languag

e

featuresrather 

than a

 perceiv

ed idea

that C+

+ will

 produce

'more

optimiz

ed

code'.

There isnothingin .NETthatmeansthat it

should automatically bemuchslower thannativecode,indeed,as theseresults

haveshown

Page 51: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 51/52

there arecaseswhen managed code is

quicker thanunmanaged code.Anyonewho tellsyouthat.NETshould

 be

slower has notthoughtthroughtheissues.

Downloads

The

code for 

these

tests is

supplie

d as C+

+/CLI

code

and so

it will

only

compile

for 

.NET

2.0.

Downlo

adTop of Form

Bottom of 

Page 52: Managed Faster

8/6/2019 Managed Faster

http://slidepdf.com/reader/full/managed-faster 52/52

Form