69
Is It Possible to Code an Efficient Software Switch? Linearizing the Heap, and the Pervasive Use of Hardware Accelerators Nick Mitchell along with Ioana Baldini, Peter Sweeney IBM T.J. Watson Research Center July 31, 2014

Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Is It Possible to Code an Efficient Software Switch?

Linearizing the Heap, and the Pervasive Use of Hardware Accelerators !!

Nick Mitchell along with Ioana Baldini, Peter Sweeney

IBM T.J. Watson Research Center !!

July 31, 2014

Page 2: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

What Nick Does: Work with IBM customers and

developers to make their applications run well

the data herein is backed by thousands of real apps

Page 3: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Summer School Means Nick Learns At Least As Much

From You, As You From Him

Page 4: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

0. The Apps

Page 5: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Most of Our Data Comes from Somewhere Else

RequestMongoDB

My Service

FPGA, GPU

Your Service

Response Hadoop

Page 6: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Each Source Has its Own Wire Protocol

RequestMongoDB

My Service

FPGA, GPU

Your Service

Response Hadoop

Thrift

BSON

streaming!binary

RPC

HTTP

HTTP

Page 7: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Each Service Has its Operational Form

RequestMongoDB

My Service

FPGA, GPU

Your Service

Response Hadoop

Page 8: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Solutions Denied: (but worth learning from)

!

keep everything in memory code in a systems language

Page 9: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Summary• Programmers are plumbers who write software switches

• ECOOP Languages offer poor support for efficient software switches, because of the distance between operational and wire forms

• Amdahl’s Law strikes again, severely limiting the use of specialized computational circuitry

• What are our options for doing better?

Page 10: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

1. Initial Thought Exercises

Page 11: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Implement an Efficient Map

HashMap

Entry

Entry

Key

Value

Key

Value

a conventional chained hash map

~3 pointers per entry~3-5 cache misses per GETcopying required for RPC

c.f.

Page 12: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Implement This Without Copying Any Bits Into the Heap

HashMap

Entry

Entry

KeyValue

KeyValue

Java Heap

HashMap

Entry

Entry

KeyValue

KeyValue

update this value

Page 13: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Implement an Efficient StringString

characters

4–8 bytes of data24–50 bytes of non-data

8–33% efficiency

Page 14: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

2. Programmers as Plumbers

Page 15: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

For Example… (a JAX-RS RESTful web app)

@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

Page 16: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Dispatch and Routing@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

Page 17: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Protocol Unwrapping@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

Page 18: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Deserialization into Application Data Structures

@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

Page 19: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Application Logic!@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

(a lonely increment operation)

Page 20: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Serialize Application Data Structures Back into a Response@POST @Path(“cart/add”) @Consumes(MediaType.MULTIPART_FORM_DATA) @Produces(MediaType.APPLICATON_JSON) !Response addToCart(@CookieParam(“session”) Cookie session, @FormParam(“item”) int itemCode, @FormParam(“quantity”) int quantity) { ! ShoppingCart cart = new ShoppingCart(session); cart.add(itemCode, quantity); return Response.ok(cart.status()) .cookie(cart.toCookie()) .build(); }

Page 21: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Marshalling and Data Formats

serialize transfer deserialize

wire formoperational form

Page 22: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

3. The Profitability Threshold

Page 23: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Trade-offs

• At what point does externalizing a computation become more pain than it’s worth?

• granularity of kernel • cost of externalization • accelerator speedup

Amdahl’s Law

Page 24: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law

Externalization More Expensive

Tune

d Ke

rnel

Gro

ws

Smal

ler

Offloading Computation Helps

Offloading Computation Hurts

Page 25: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 6.25% faster

free externalization, kernel is 100% of overall computation

Externalization More Expensive

Page 26: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 6.25% faster

externalization equivalent to 1% of overall computation, kernel is 100% of overall computation

Externalization More Expensive

Page 27: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 12.5% faster

Page 28: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 25% faster

Page 29: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 50% faster

Page 30: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 2x faster

Page 31: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 4x faster

Page 32: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 8x faster

Page 33: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 16x faster

Page 34: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Amdahl’s Law make kernel 32x faster

Page 35: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

4. The Costs of Externalization

Page 36: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Rough Measurements

• 3000 temps to ingest one document

• 70 temps to turn a SOAP date (as bytes) into a Java Calendar

• 6 temps to turn month into int

Object Churn Fractal Webs of Invocations

• 200k calls to ingest that document

• 2000 calls to turn an XML timecard into a Java data structure

(Sevitsky 2006)

Page 37: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Spending CPU Cycles

21%32%

Application Logic Caching

Encryption

Serialization String Operations Intra-object Copying

Reflection Data-driven Dispatching Connection Management

43%

Page 38: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Three Kinds of Expenses

Application Logic Caching

Encryption

Serialization String Operations Intra-object Copying

Reflection Data-driven Dispatching Connection Management

Optimizable Data Motion

Amortizable

Vital

21%32%43%

Page 39: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Does the Story Vary?

trade

SPECjbb2013

49% Optimizable

33%

54% Amortizable

All Data Sets 50% Optimizable Data Motion 19% Vital31% Amortizable

Web Apps

Analytics Apps

BEN

CH

MAR

KS

34% AmortizableSPECjEnterprise 13%

53% Vital

Page 40: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

What the JIT Doesn’t Catch (numbers relative to original w/o JIT optimizations)

Copies

Comparisons

ALU

Loads/Stores

0% 25% 50% 75% 100%

original with JIT optimizations handtuned w/o JIT optimizations

(Xu 2009)

Page 41: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

How much do we have to inline to have a chance of removing temps and copies?

Allocate-Use SeparationThread.run

Page 42: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Method Inlining% temps eliminated

Date field base J9 inliner JOLT inliner

Eclipse 0.4% 1.9%JPetStore on Spring 0.7% 2.5%TPCW on JBoss 0% 4.3%DaCapo 3.4% 13.3%

(Shankar 2008)

Inlining is hard! Small benchmarks don’t reflect difficulties

Page 43: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

class Student { String name; BigDecimal score;

} Student[] data = new Student[40];

Spending Memory COBOL vs Java

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

2

3

... ...

40

600 bytes 4492 bytes

01 Student occurs 40 02 Name PIC X(10) 02 Score PIC 999V99

(Suganuma 2008)

Page 44: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Memory OverheadO

verh

ead

0-10%

10-20%

20-30%

30-40%

40-50%

50-60%

60-70%

70-80%

80-90%

90-100%

Number of Heap Snapshots0 125 250 375 500

Page 45: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

5. Alternative Marshalling Schemes

Page 46: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Options

optimize the translation

wire form

operational form

wire form

operational form

transmit less transmit less

wire form

operational form

Page 47: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Options

optimize the translation

wire form

operational form

wire form

operational form

transmit less transmit less

wire form

operational form

????Google protobuf,

Apache Thrift, Apache Avro, Scala Pickling

Page 48: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

protobuf, avro, thrift

• declaratively specify schema of data

• automatically generate marshallers

optimize the translation

wire form

operational form

wire form

operational form

transmit less

struct Work { 1: i32 num1 = 0, 2: i32 num2, 3: Operation op, 4: optional string comment, }

Page 49: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Does it Matter?java-manual

protostuff-manualprotostuff

kryoprotobuf

thrift-compactthriftavro

hessianjava-built-in

scala/java-built-in

relative time to serialize and deserialize an object

(http://ganges.usc.edu/pgroupW/images/a/a9/Serializarion_Framework.pdf)

Page 50: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Does it Matter?

100000 200000 300000 400000 500000 600000 700000 800000 900000 1e+060

50

100

150

200

250

300

350

400

450

500

Number of Elements

Tim

e [m

s]

JavaKryo v1Kryo v2Scala PicklingPickler CombinatorsUnsafe Pickler Combinators

200000 400000 600000 800000 1e+061.25

1.3

1.35

1.4

1.45

1.5

1.55 x 109

Free

Mem

ory

[Byt

es]

Number of Elements

200000 400000 600000 800000 1e+060

2

4

6

8

10

12 x 106

Size

[Byt

es]

Number of Elements(a)

(b)

(c)

)LJXUH �� 5HVXOWV IRU SLFNOLQJ DQG XQSLFNOLQJ DQ LPPXWDEOH ������ƃ���Ƅ XVLQJ GLIIHUHQW IUDPHZRUNV� )LJXUH ��D� VKRZV WKHURXQGWULS SLFNOH�XQSLFNOH WLPH DV WKH VL]H RI WKH ������ YDULHV� )LJXUH ��E� VKRZV WKH DPRXQW RI IUHH PHPRU\ DYDLODEOH GXULQJSLFNOLQJ�XQSLFNOLQJ DV WKH VL]H RI WKH ������ YDULHV� )LJXUH ��F� VKRZV WKH SLFNOHG VL]H RI �������

RXU K\EULG FRPSLOH�WLPH�UXQWLPH DSSURDFK� ZKLOH VFDOD�SLFN�OLQJ KDV WR LQFXU WKH RYHUKHDG RI WUDFNLQJ REMHFW LGHQWLW\ LQWKH FDVH RI JHQHUDO REMHFW JUDSKV� LQ WKLV FDVH� WKH FRPSLOH�WLPH SLFNOHU JHQHUDWLRQ LV DEOH WR GHWHFW WKDW REMHFW LGHQWLW\GRHV QRW KDYH WR EH WUDFNHG IRU WKH SLFNOHG GDWD W\SHV� 0RUH�RYHU� LW LV SRVVLEOH WR SURYLGH D VL]H KLQW WR WKH SLFNOH EXLOGHU�HQDEOLQJ WKH XVH RI D IL[HG�VL]H DUUD\ DV WKH WDUJHW IRU WKHSLFNOHG GDWD� :H KDYH IRXQG WKDW WKRVH WZR RSWLPL]DWLRQV�ZKLFK UHTXLUH WKH NLQG RI VWDWLF FKHFNLQJ WKDW VFDOD�SLFNOLQJLV DEOH WR GR� FDQ OHDG WR VLJQLILFDQW SHUIRUPDQFH LPSURYH�PHQWV� 7KH SHUIRUPDQFH RI PDQXDOO\ ZULWWHQ SLFNOHU FRPEL�QDWRUV� KRZHYHU� LV VWLOO FRQVLGHUDEO\ EHWWHU� 7KLV LV OLNHO\ GXHWR WKH IDFW WKDW SLFNOHU FRPELQDWRUV UHTXLUH QR UXQWLPH FKHFNVZKDWVRHYHU± SLFNOHU FRPELQDWRUV DUH GHILQHG SHU W\SH� DQGPDQXDOO\ FRPSRVHG� UHTXLULQJ QR VXFK FKHFN� ,Q SULQFLSOH�LW VKRXOG EH SRVVLEOH WR JHQHUDWH FRGH WKDW LV DV IDVW DV WKHVHSLFNOHU FRPELQDWRUV LQ WKH FDVH ZKHUH VWDWLF SLFNOHUV FDQ EHJHQHUDWHG�

)LJXUH � �E� VKRZV WKH FRUUHVSRQGLQJ PHPRU\ XVDJH� RQWKH \�D[LV WKH YDOXH RI ������Ŝ���������� LV VKRZQ� 7KLV SORWUHYHDOV HYLGHQFH RI D NH\ SURSHUW\ RI .U\R� QDPHO\ �D� WKDW LWVPHPRU\ XVDJH LV TXLWH KLJK FRPSDUHG WR RWKHU IUDPHZRUNV�DQG �E� WKDW LWV VHULDOL]DWLRQ LV VWDWHIXO EHFDXVH RI LQWHUQDOEXIIHULQJ� ,Q IDFW� ZKHQ SUHSDULQJ WKHVH EHQFKPDUNV ZH KDGWR PDQXDOO\ DGMXVW .U\R EXIIHU VL]HV VHYHUDO WLPHV WR DYRLGEXIIHU RYHUIORZV� ,W WXUQV RXW WKH PDLQ UHDVRQ IRU WKLV LV WKDW.U\R UHXVHV EXIIHUV ZKHQHYHU SRVVLEOH ZKHQ VHULDOL]LQJ RQH

REMHFW DIWHU WKH RWKHU� ,Q PDQ\ FDVHV� WKH QHZO\ SLFNOHG RE�MHFW LV VLPSO\ DSSHQGHG DW WKH FXUUHQW SRVLWLRQ LQ WKH H[LVW�LQJ EXIIHU ZKLFK UHVXOWV LQ XQH[SHFWHG EXIIHU JURZWK� 2XUIUDPHZRUN GRHV QRW GR DQ\ EXIIHULQJ ZKLFK PDNHV LWV EH�KDYLRU YHU\ SUHGLFWDEOH� EXW GRHV QRW QHFHVVDULO\ PD[LPL]HLWV SHUIRUPDQFH�

)LQDOO\� )LJXUH � �F� VKRZV WKH UHODWLYH VL]HV RI WKH VHUL�DOL]HG GDWD� )RU D ������ƃ���Ƅ RI ��������� HOHPHQWV� -DYDUHTXLUHG ���������� E\WHV� $V FDQ EH VHHQ� DOO RWKHU IUDPH�ZRUNV SHUIRUP RQ SDU ZLWK DQRWKHU� UHTXLULQJ DERXW ��� RIWKH VL]H RI -DYD¶V ELQDU\ IRUPDW� 2U� LQ RUGHU RI ODUJHVW WRVPDOOHVW� .U\R Y� � ��������� E\WHV� .U\R Y� � ���������E\WHV� VFDOD�SLFNOLQJ ��������� E\WHV� DQG 3LFNOHU &RPELQD�WRUV ��������� E\WHV�

��� :LNLSHGLD� &\FOLF 2EMHFW *UDSKV,Q WKH VHFRQG EHQFKPDUN� ZH HYDOXDWH WKH SHUIRUPDQFH RI RXUIUDPHZRUN ZKHQ SLFNOLQJ REMHFW JUDSKV ZLWK F\FOHV� 8VLQJUHDO GDWD IURP WKH :LNLSHGLD SURMHFW� WKH EHQFKPDUN EXLOGVD JUDSK ZKHUH QRGHV DUH :LNLSHGLD DUWLFOHV DQG HGJHV DUHUHIHUHQFHV EHWZHHQ DUWLFOHV� ,Q WKLV EHQFKPDUN ZH FRPSDUHDJDLQVW -DYD¶V QDWLYH VHULDOL]DWLRQ DQG .U\R� 2XU REMHFWLYHZDV WR PHDVXUH WKH IXOO URXQG�WULS WLPH �SLFNOLQJ DQG XQ�SLFNOLQJ� IRU DOO IUDPHZRUNV� +RZHYHU� .U\R FRQVLVWHQWO\FUDVKHG LQ WKH XQSLFNOLQJ SKDVH GHVSLWH VHYHUDO ZRUN�DURXQGDWWHPSWV� 7KXV� ZH LQFOXGH WKH UHVXOWV RI WZR H[SHULPHQWV���� ³SLFNOH RQO\´� DQG ��� ³SLFNOH DQG XQSLFNOH´� 7KH UHVXOWV

(Miller 2013)

Page 51: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Experiment: specjbb2013Fr

actio

n of

CPU

Spe

nt in

Ser

ializ

atio

n

0%

10%

20%

30%

40%

50%

Stack Sample Frequency (seconds)10 15 20 25 30 60 90 120 240 300 600

change ~100 lines of code in the marshaller

original implementation

Page 52: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

6. Alternative Storage Schemes

Page 53: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Value Types and Structs

pay the header cost once per record

struct Student { char[] name; int score;

}

Page 54: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Ref Poisoningstruct Student {

String name; int score;

}…. unless we use a reference type

in which case, we’re back to where we started

Page 55: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Arrays of Recordsstruct Student {

chart[] name; int score;

} Student[] data = new Student[10];

pay the header cost once per array

1 2 3 4 5 6 7 8 9 10

Page 56: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Marshalling Arrays of Recordsstruct Student {

string name; decimal score;

} Student[] data = new Student[10];

pay the header cost once per array operational form and wire form

RPC

data[3].name.charAt(5)

Page 57: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

c.f. Cobol1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

2

3

... ...

10

1 2 3 4 5 6 7 8 9 10

both are fine… until we need to store non-scalar data, i.e. variable-length data

Page 58: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

class Student { String name; BigDecimal score;

}

Column Stores

nameStart

nameEnd

scoreStart

scoreEnd

1

2

3

4

5

6

7

8

9

10

scorePool

added overhead (start and end pointers) unspeakably horrible code

maintains easily serializability allows for mix-and-match use of attributes

PROS

CONS

Student #

Student #

Student #

Page 59: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

7. Alternative Compilation Schemes

and Optimizable Language Kernels

Page 60: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

High-level Languages for Targeting FPGAs

• LegUp (U. Toronto)

• Kiwi (Microsoft Research)

• Bluespec (MIT)

• Lime (IBM Research)

make it easier to write computational kernels

Page 61: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

asm.js

!

Javascript

subset that is easier to optimize

C/C++

LLVM

Browser JIT with asm.js support

Page 62: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

RPython (c.f. Truffle)

!

Python

subset that is easier to optimize

Human codes interpreter for language X

JIT for language X

Page 63: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Models vs Storageclass Student {

String name; BigDecimal score;

}

code is easy to maintain, but performance sucks

code is hard to maintain, or impossible to express

in the language, but performance is great

Page 64: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Lowering…class Student {

String name; BigDecimal score;

}

Can we lower from one to the other?

Page 65: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Lowering…class Student {

String name; BigDecimal score;

}

What is lowering other than… a partial evaluation of serialization?

Page 66: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Partial Marshalclass Student {

String name; BigDecimal score;

}

class StudentTable { CharTable names; int[] nameStart; int[] nameEnd; IntTable scores; int[] scoreStart; int[] scoreEnd;

}

student.getName();

students.getName(i);

names.splice(students.nameStart[i],students.nameEnd([i]);

transformer?? transformer??

Page 67: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Partial Marshalclass Student {

String name; BigDecimal score;

}

class StudentTableCharTable names; int[] nameStart; int[] nameEnd; IntTable scores; int[] scoreStart; int[] scoreEnd;

}

student.getName();

students.getName(i);

names.splice(students.nameStart[i],students.nameEnd([i]);

data model

data access

Page 68: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Attic

Page 69: Is It Possible to Code an Efficient Software Switch?ecoop14.it.uu.se/programme/linearizing-the-heap.pdf · 2014-08-17 · streaming! binary RPC HTTP HTTP. Each Service Has its Operational

Framework X

Batch B

Framework T

Batch A

SPECjEnterprise

Framework YFramework Z

Framework W

J2EE Provider A

J2EE Provider Bjboss

Analytics A

Framework U

Analytics CAnalytics D

Analytics EAnalytics F

tradesoap

tradebeansSPECjbb2013

77%

49% Optimizable46%

35%11%

43% Amortizable

59%

55% Amortizable

65% Vital

All Data Sets 50% Optimizable Data Motion 19% Vital31% Amortizable