85
V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. Sustran University of Belgrade Oskar Mencer Imperial College, London Oliver Pell Maxeler Technologies, London and Palo Alto Michael Flynn Stanford University, Palo Alto, USA Valentina Balas Aurel Vlaicu University of Arad, Romania, Maxeler Ambassador 1/83

Anegdotic Maxeler (Romania)

Embed Size (px)

Citation preview

Page 1: Anegdotic Maxeler (Romania)

V. Milutinovic, G. Rakocevic, S. Stojanovic, and Z. SustranUniversity of Belgrade

Oskar MencerImperial College, London

Oliver Pell Maxeler Technologies, London and Palo Alto

Michael FlynnStanford University, Palo Alto, USA

Valentina BalasAurel Vlaicu University of Arad, Romania, Maxeler Ambassador

1/83

Page 2: Anegdotic Maxeler (Romania)

How to hire more than 1000 PhD students

at no additional cost to tax payers?

An Alternative Title:

Page 3: Anegdotic Maxeler (Romania)

For Big Data algorithms and for the same hardware price as before, achieving:

a) speed-up, 20-200 b) monthly electricity bills, reduced

20 timesc) size, 20 times smaller

The major issues of engineering are: design cost and design complexity.

Remember, economy has its own rules: production count and market demand!

3/83

Page 4: Anegdotic Maxeler (Romania)

Elaboration :) If a computer center spends E50M/year on electricity bills, and moves most of its time-consuming algorithms to Maxeler, which uses 20 times less power, the yearly spending drops down to E2.5M, and E47.5M is saved to tax payers :)

If the average net salary of a PHD student in Germany is E1500, and if the overhead factor is 1.00, it is easy to calculate that E47.5M can pay 2611 PHD students to work for one year, and that can go year after year :)

If the overhead factor is 2.611 (I do not know how big it is, but it is less than 2.611, for sure), one can hire 1000 PHD students, at no additional cost :)

Page 5: Anegdotic Maxeler (Romania)

1. Over 95% of run time in loops [loops to almost zero]

2. Reusability of data (e.g., x+x2+x3+x4+…)[how close to zero?]

3. BigData[prog: for data streaming, not for data control]

4. Latency5. A new programming model6. WORM [prog.effort+comp.tim]

Use a tractor, not a Ferrari, to drive over a plowed field

5/83

Page 6: Anegdotic Maxeler (Romania)

Absolutely all results achieved in Europe:

a) All hardware produced in Europe, specifically UK

b) All software generated by programmers

of EU and WB

6/83

Page 7: Anegdotic Maxeler (Romania)

ControlFlow (MultiFlow and ManyFlow): Top500 ranks using Linpack

(Japanese K, IBM Sequoya, Cray Titan, …)

DataFlow: Coarse Grain (HEP) vs. Fine Grain

(Maxeler)

7/83The history starts in 1960's!The enabler technology did not exist before the year 2000!

Page 8: Anegdotic Maxeler (Romania)

Compiling below the machine code level brings speedups;also a smaller power, size, and cost.

The price to pay:The machine is more difficult to program.

Consequently:Ideal for WORM applications :)

Examples using Maxeler:GeoPhysics (20-200), Banking (200-2000, with JP Morgan

20%),M&C (New York City), Datamining (Google), …

8/83

Page 9: Anegdotic Maxeler (Romania)

9

Simulator builderHardware builder

2n+3

Page 10: Anegdotic Maxeler (Romania)

10/83

Page 11: Anegdotic Maxeler (Romania)

11/83Why Java? Minimal Kolmogorov Complexity, etc…

Page 12: Anegdotic Maxeler (Romania)

12

Page 13: Anegdotic Maxeler (Romania)

13

Page 14: Anegdotic Maxeler (Romania)

Assumptions: 1. Software includes enough parallelism to keep all cores busy 2. The only limiting factor is the number of cores.

tGPU = N * NOPS * CGPU*TclkGPU / NcoresGPU

tCPU = N * NOPS * CCPU*TclkCPU /NcoresCPU

tDF = NOPS * CDF * TclkDF + (N – 1) * TclkDF / NDF

14/83

Page 15: Anegdotic Maxeler (Romania)

DualCore?

Which way are the horses going?

15/83

Page 16: Anegdotic Maxeler (Romania)

Is it possibleto use 2000 chicken instead of two horses?

?==

16/83

What is better, real and anecdotic?

Page 17: Anegdotic Maxeler (Romania)

2 x 1000 chickens (CUDA and rCUDA) 17/83

Page 18: Anegdotic Maxeler (Romania)

How about 2 000 000 ants?

18/83

Dat

a

Page 19: Anegdotic Maxeler (Romania)

Marmalade

Big Data Input Results

19/83

Page 20: Anegdotic Maxeler (Romania)

Factor: 20 to 200

MultiCore/ManyCore

Dataflow

Machine Level Code

Gate Transfer Level

20/83

Page 21: Anegdotic Maxeler (Romania)

Factor: 20

MultiCore/ManyCore

Dataflow

21/83

Page 22: Anegdotic Maxeler (Romania)

Factor: 20

Data Processing

Process ControlData Processing

Process Control

MultiCore/ManyCore

DataFlow

22/83

Page 23: Anegdotic Maxeler (Romania)

MultiCore:Explain what to do, to the driverCaches, instruction buffers, and predictors needed

ManyCore:Explain what to do, to many sub-driversReduced caches and instruction buffers needed

DataFlow:Make a field of processing gates: 1C+2nJava+3JavaNo caches, etc. (300 students/year: BGD, BCN, LjU,

ICL,…)

23/83

Page 24: Anegdotic Maxeler (Romania)

MultiCore:Business as usual

ManyCore:More difficult

DataFlow:Much more difficultDebugging both, application and configuration

code

24/83

Page 25: Anegdotic Maxeler (Romania)

MultiCore/ManyCore:Several minutes

DataFlow:Several hours for the real hardwareFortunately, only several minutes for the simulator,

several seconds for reload (90% due to DRAM inertia),and several milliseconds to restart

The simulator supports both the large JPMorgan machineas well as the smallest “University Support” machine

Good news:Tabula@2GHz

25/83

Page 26: Anegdotic Maxeler (Romania)

26/83

Page 27: Anegdotic Maxeler (Romania)

MultiCore:Horse stable

ManyCore:Chicken house

DataFlow:Ant hole

27/83

Page 28: Anegdotic Maxeler (Romania)

MultiCore:Haystack

ManyCore:Cornbits

DataFlow:Crumbs

28/83

Page 29: Anegdotic Maxeler (Romania)

29/83

Small Data: Toy Benchmarks (e.g., Linpack)

Page 30: Anegdotic Maxeler (Romania)

30/83

Medium Data (benchmarks favorising NVidia,compared to Intel,…)

Page 31: Anegdotic Maxeler (Romania)

31/83

Big Data

Page 32: Anegdotic Maxeler (Romania)

Maxeler Hardware

CPUs plus DFEsIntel Xeon CPU cores and up to

4 DFEs with 192GB of RAM

DFEs shared over Infiniband Up to 8 DFEs with 384GB of RAM and dynamic allocation

of DFEs to CPU servers

Low latency connectivityIntel Xeon CPUs and 1-2 DFEs with up to six 10Gbit Ethernet

connections

MaxWorkstationDesktop development system

MaxCloudOn-demand scalable accelerated compute resource, hosted in London

3232/83/83

Page 33: Anegdotic Maxeler (Romania)

1. Coarse grained, stateful: Business– CPU requires DFE for minutes or hours– Interrupts

2. Fine grained, transactional with shared database: DM– CPU utilizes DFE for ms to s– Many short computations, accessing common database data

3. Fine grained, stateless transactional: Science (Phy, ...)– CPU requires DFE for ms to s– Many short computations

3333/83/83

Major Classes of Algorithms, from the Computational Perspective

Page 34: Anegdotic Maxeler (Romania)

• Long runtime, but:• Memory requirements

change dramatically based on modelled frequency

• Number of DFEs allocated to a CPU process can be easily varied to increase available memory

• Streaming compression• Boundary data exchanged

over chassis MaxRing

3434/83/83

Coarse Grained: Modeling

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

1 4 8

Equi

vale

nt C

PU c

ores

Number of MAX2 cards

15Hz peak frequency

30Hz peak frequency

45Hz peak frequency

70Hz peak frequency

0

10

20

30

40

50

60

70

80

0 10 20 30 40 50 60 70 80Peak Frequency (Hz)

Timesteps (thousand)

Domain points (billion)

Total computed points (trillion)

Page 35: Anegdotic Maxeler (Romania)

• DFE DRAM contains the database to be searched• CPUs issue transactions find(x, db)• Complex search function

– Text search against documents– Shortest distance to coordinate (multi-dimensional)– Smith Waterman sequence alignment for genomes

• Any CPU runs on any DFE that has been loaded with the database– MaxelerOS may add or remove DFEs

from the processing group to balance system demands– New DFEs must be loaded with the search DB before use

3535/83/83

Fine Grained, Shared Data: Monitoring

Page 36: Anegdotic Maxeler (Romania)

• Analyse > 1,000,000 scenarios• Many CPU processes run on many DFEs• ≈50x MPC-X vs. multi-core x86 node• Each transaction executes on any DFE

in the assigned group atomically

3636/83/83

Fine Grained, Stateless: The BSOP Control

CPU DFE Loop over instrumentsLoop over instruments

Random number generator and

sampling of underliers

Random number generator and

sampling of underliers

Price instruments using Black

Scholes

Price instruments using Black

Scholes

Tail analysis on CPU

Tail analysis on CPU

CPU DFE Loop over instrumentsLoop over instruments

Random number generator and

sampling of underliers

Random number generator and

sampling of underliers

Price instruments using Black

Scholes

Price instruments using Black

Scholes

Tail analysis on CPU

Tail analysis on CPU

CPU DFE Loop over instrumentsLoop over instruments

Random number generator and

sampling of underliers

Random number generator and

sampling of underliers

Price instruments using Black

Scholes

Price instruments using Black

Scholes

Tail analysis on CPU

Tail analysis on CPU

CPU DFE Loop over instrumentsLoop over instruments

Random number generator and

sampling of underliers

Random number generator and

sampling of underliers

Price instruments using Black

Scholes

Price instruments using Black

Scholes

Tail analysis on CPU

Tail analysis on CPU

DFE Loop over instrumentsLoop over instrumentsCPUMarket and instruments data

Random number generator and

sampling of underliers

Random number generator and

sampling of underliers

Price instruments using Black

Scholes

Price instruments using Black

ScholesInstrument values

Tail analysis on CPU

Tail analysis on CPU

Page 37: Anegdotic Maxeler (Romania)

3737/83/83

Selected Examples:Business,Mathematics,GeoPhysics, etc.

Page 38: Anegdotic Maxeler (Romania)

3838

Page 39: Anegdotic Maxeler (Romania)

An MIS Example: Credit Derivatives

Page 40: Anegdotic Maxeler (Romania)

Climber

Tether

Orbital station

HW

Page 41: Anegdotic Maxeler (Romania)

4141

Page 42: Anegdotic Maxeler (Romania)

Seismic Imaging

• Running on MaxNode servers- 8 parallel compute pipelines per chip- 150MHz => low power consumption!- 30x faster than microprocessors

An Implementation of the Acoustic Wave Equation on FPGAs T. Nemeth†, J. Stefani†, W. Liu†, R. Dimond‡, O. Pell‡, R.Ergas§

†Chevron, ‡Maxeler, §Formerly Chevron, SEG 20084242/83/83

Page 43: Anegdotic Maxeler (Romania)

Performance of one MAX2 card vs. 1 CPU core

Land case (8 params), speedup of 230x

Marine case (6 params), speedup of 190x

The CRS Results

CPU Coherency MAX2 Coherency

4343/83/83

Page 44: Anegdotic Maxeler (Romania)

4444

Page 45: Anegdotic Maxeler (Romania)
Page 46: Anegdotic Maxeler (Romania)

46464646/83/83

Page 47: Anegdotic Maxeler (Romania)

• DM for Monitoring and Control in Seismic processing • Velocity independent / data driven method

to obtain a stack of traces, based on 8 parameters• Search for every sample of each output trace

Trace Stacking: Speed-up 217P. Marchetti et al, 2010

parameters( emergence angle & azimuth

Normal Wave front parametersKN,11; KN,12 ; KN22

NIP Wave front parameters( KNip,11; KNip,12 ; KNip22 )

hHKHhmHKHmmw TzyNIPzy

TTzyNzy

TT

0

0

2

00

2 22

v

t

vtthyp

4747/83/83

Page 48: Anegdotic Maxeler (Romania)

4848

Maxeler running Smith Waterman

Page 49: Anegdotic Maxeler (Romania)
Page 50: Anegdotic Maxeler (Romania)

Molecular Correlates of Tumor Signatures from a Large Cohort

From whole slide sections, of a cohort, to pathway analysis (Prof Bahram Parvin, Berkeley)

High Content Analysis (HCA) on MPC-X

Page 51: Anegdotic Maxeler (Romania)

5151

Page 52: Anegdotic Maxeler (Romania)

This is about algorithmic changes, to maximize

the algorithm to architecture match:

algorithmic modifications,pipeline utilization,data choreography,

anddecision making precision.

The winning paradigm of Big Data ExaScale?

5252/83/83

Conclusion: Nota Bene

Page 53: Anegdotic Maxeler (Romania)

Algorithmic Changes: Data Dependencies

PSI[0] PSI[1] PSI[N-3] PSI[N-2] PSI[N-1]

cbeta[N-2]cbeta[N-3]cbeta[1]cbeta[0]

OP

PSI[0] PSI[1] PSI[N-3] PSI[N-2] PSI[N-1]…PSI[2]

0 0OP’

OPOPOP

OP’ … OP’ OP’

5353/83/83Example generated by Sasa Stojanovic (Gross-Pitaevskii)

Page 54: Anegdotic Maxeler (Romania)

Pipeline Changes: Higher Efficiency

X[0,0]

[0,0]X[0,1]

0

R[0,0]R[0,0]

0

[7,0][6,0][5,0][4,0][3,0][2,0][1,0][0,0]

[0,1][7,0][6,0][5,0][4,0][3,0][2,0][1,0]

R[0,0]

5454/83/83Example generated by Sasa Stojanovic (Gross-Pitaevskii)

Page 55: Anegdotic Maxeler (Romania)

Data Recoreography: Pipeline Utilization

Order of data accesses inside of a burst

… … …

5555/83/83

Example generated by Sasa Stojanovic (Gross-Pitaevskii)

Page 56: Anegdotic Maxeler (Romania)

• Consider fixed point compared to single precision floating point

• If the range is tightly confined, one could use 24-bit fixed point

• If data has a wider range, may need 32-bit fixed point

• Arithmetic is not 100% of the chip. In practice, often ~5x performance boost from fixed point.

5656

Fixed Point: Savings Reinvestable

hwFloat(8,24) hwFix(24,...) hwFix(32,...)

Add 500 LUTs 24 LUTs 32 LUTs

Multiply 2 DSPs 2 DSPs 4 DSPs

Page 57: Anegdotic Maxeler (Romania)

Revisiting the Top 500 SuperComputers benchmarksOur paper in Communications of the ACM

Revisiting all major Big Data DM algorithmsMassive static parallelism at low clock frequencies

Concurrency and communicationConcurrency between millions of tiny cores difficult,

“jitter” between cores will harm performance at synchronization points

Reliability and fault tolerance10-100x fewer nodes, failures much less often

Memory bandwidth and FLOP/byte ratioOptimize data choreography, data movement,

and the algorithmic computationNew architecture of n-Programming paradigms

57/83

Page 58: Anegdotic Maxeler (Romania)

FP7: RoMoL@BCN

5858/83/83

The SAB goal: Out of box thinking!

Page 59: Anegdotic Maxeler (Romania)

FP7: BalCon@SRB

5959/83/83

The SAB goal: Seed for new proposals!

The vision of Alkis Konstantellos

Page 60: Anegdotic Maxeler (Romania)

60/8360/83

DAFNE: Leader MISANU

Page 61: Anegdotic Maxeler (Romania)

61/8361/83

DAFNE = South (MaxCode) + North (BigData)

MISANU, IMP, KG, NS, BSC, UPV, U of Siena, U of Roma,IJS, FRI,IRB, QPLAN, Bogazici, U of Istanbul,U of Bucharest, U of Arad,U of Tuzla, Technion, Maxeler Israel, IPSI

61/83

UKSwedenNorway

DenmarkGermany

FranceAustria

SwissPoland

Hungary

Page 62: Anegdotic Maxeler (Romania)

62/8362/83

The DAFNE Map

Page 63: Anegdotic Maxeler (Romania)

63/83

The TriPeak @ DATAMAN

Siena+ BSC+ Imperial College + Maxeler+ Belgrade

46/83

Page 64: Anegdotic Maxeler (Romania)

64/83

The TriPeak: EssenceMontBlanc = A ManyCore (NVidia) + a MultiCore (ARM)Maxeler = A FineGrain DataFlow (FPGA)

How about a happy marriage?MontBlanc (ompSS) and Maxeler (an accelerator)

In each happy marriage,it is known who does what :)

The Big Data DM algorithms:What part goes to MontBlanc and what to Maxeler?

64/83

Page 65: Anegdotic Maxeler (Romania)

65/83

TriPeak: Core of the Symbiotic Success

An intelligent DM algorithmic scheduler,partially implemented for compile time,and partially for run time.

At compile time:Checking what part of code fits where(MontBlanc or Maxeler): LoC 1M vs 2K vs 20K

At run time:Rechecking the compile time decision,based on the current data values.

65/83

Page 66: Anegdotic Maxeler (Romania)

66/8366/8366

Page 67: Anegdotic Maxeler (Romania)

67/8367/83

Maxeler: Research (Google: good method)

Structure of a Typical Research Paper: Scenario #1[Comparison of Platforms for One Algorithm]Curve A: MultiCore of approximately the same PurchasePriceCurve B: ManyCore of approximately the same PurchasePriceCurve C: Maxeler after a direct algorithm migrationCurve D: Maxeler after algorithmic improvementsCurve E: Maxeler after data choreographyCurve F: Maxeler after precision modifications

Structure of a Typical Research Paper: Scenario #2[Ranking of Algorithms for One Application]CurveSet A: Comparison of Algorithms on a MultiCoreCurveSet B: Comparison of Algorithms on a ManyCoreCurveSet C: Comparison on Maxeler, after a direct algorithm migrationCurveSet D: Comparison on Maxeler, after algorithmic improvementsCurveSet E: Comparison on Maxeler, after data choreographyCurveSet F: Comparison on Maxeler, after precision modifications

67/83

Page 68: Anegdotic Maxeler (Romania)

68/8368/83

Maxeler Research in Serbia: Special Issue of IPSI Transactions Journal

KG: Blood Flow, Tijana Djukic and Prof. Filipovic

NS: Combinatorial Math, Prof. Senk and Ivan Stanojevic

MISANU: The SAT Math, Zivojin Sustran and Prof. Ognjanovic

ETF: Meteorology, Radomir Radojicic and Marko Stankovic

ETF: Physics (Gross Pitaevskii 3D real), Sasa Stojanovic

ETF: Physics (Gross Pitaevskii 3D imaginary), Lena Parezanovic

68/83

Page 69: Anegdotic Maxeler (Romania)

69/8369/83

Maxeler Research WorldWide:Special Issue of Advances in Computers @ SCI

Stanford, Texas,Imperial, Maxeler,ETF, MF, MISANU, IMP, KG, NS, BSC, UPV, U of Siena, U of Roma,IJS, FRI, …

69/83

Page 70: Anegdotic Maxeler (Romania)

70/8370/83© H. Maurer70

Page 71: Anegdotic Maxeler (Romania)

71/8371/83

Maxeler: Teaching (Google: prof vm)TEACHING, VLSI, PowerPoints, Maxeler:

Maxeler Veljko Explanations, August 2012Maxeler Veljko Anegdotic, Maxeler Oskar Talk, August 2012Maxeler Forbes ArticleFlyer by JP MorganFlyer by Maxeler HPCTutorial Slides by Sasha and Veljko: Practice (Current Update)Paper, unconditionally accepted for Advances in Computers by ElsevierPaper, unconditionally accepted for Communications of the ACMTutorial Slides by Oskar: Theory (7 parts)Slides by Jacob, New YorkSlides by Jacob, AlabamaSlides by Sasha: Practice (Current Update)Maxeler in MeteorologyMaxeler in MathematicsExamples generated in Belgrade and Worldwide

THE COURSE ALSO INCLUDES DARPA METHODOLOGY FOR MICROPROCESSOR DESIGN, with an example

71/83

Page 72: Anegdotic Maxeler (Romania)

72/8372/83

Maxeler PreConference Tutorials (2013)

Google:

IEEE HiPeak, Berlin, Germany, January 2013

ACM iSAC, Coimbra, Portugal, March 2013

IEEE MECO, Budva, Montenegro, June 2013

ACM ISCA, Tel Aviv, Israel, June 2013

72/83

Page 73: Anegdotic Maxeler (Romania)

73/8373/83

Maxeler InHouse Tutorials (2013)

73/83

Page 74: Anegdotic Maxeler (Romania)

74/8374/83© H. Maurer74

Page 75: Anegdotic Maxeler (Romania)

75/8375/83

Maxeler University Program Members

Page 76: Anegdotic Maxeler (Romania)

76/8376/83

How to Become a Family Member?

Options to consider:

a. MAX-UP free of charge b. Purchasing a university-level machine (min about $10K) c. Purchasing a JPM-level machine

(slowly approaching $100M), or at least a Schlumberger-level machine

(slowly moving above $10M)

76/83

Page 77: Anegdotic Maxeler (Romania)

77/8377/83

   Good to Know!Maxeler employs close to 100 people, GBR and USA:

a. Maxeler cash burn per year = about $10M b. If a university-level machine is sold at the 100% profit margin, the company life of Maxeler is extended for about 2 hours. c. If a university-level machine is sold at the 1% profit margin, the company life of Maxeler is extended for 1 minute.

Our past or ongoing FP7 projects requiring Maxeler speeds:

a. ProSenseb. ARTreatc. HiPEAC

77/83

Page 78: Anegdotic Maxeler (Romania)

78/8378/83

The Educational Mission

Important note:

a. Total number of accredited universities in the whole world? b. As per WeboMetrics, about 20000. c. Consequently, all universities of the world together bring only: 20000 minutes of extra life, or about two weeks of extra life.

The reality:

a. University-level machines are sold at the ZERO profit margin! b. Only the Xilinx costs, handling, and shipping. c. Email support for student doing thesis is practically unlimited!

Conclusion: This is a chance for those who jump in first :)

78/83

Page 79: Anegdotic Maxeler (Romania)

79/8379/83

Our Work Impacting Maxeler

Milutinovic, V., Knezevic, P., Radunovic, B., Casselman, S., Schewel, J., Obelix Searches Internet Using Customer Data, IEEE COMPUTER, July 2000 (impact factor 2.205/2010).

Milutinovic, V., Cvetkovic, D., Mirkovic, J., Genetic Search Based on Multiple Mutation Approaches, IEEE COMPUTER, November 2000 (impact factor 2.205/2010).

Milutinovic, V., Ngom, A., Stojmenovic, I., STRIP --- A Strip Based Neural Network Growth Algorithm for Learning Multiple-Valued Functions, IEEE TRANSACTIONS ON NEURAL NETWORKS, March 2001, Vol.12, No.2, pp. 212-227.

Jovanov, E., Milutinovic, V., Hurson, A., Acceleration of Nonnumeric Operations Using Hardware Support for the Ordered Table Hashing Algorithms, IEEE TRANSACTIONS ON COMPUTERS, September 2002, Vol.51, No.9, pp. 1026-1040 (impact factor 1.822/2010).

79/83

Page 80: Anegdotic Maxeler (Romania)

80/8380/83

Maxeler Impacting Our Work

Tafa, Z., Rakocevic, G., Mihailovic, Dj., Milutinovic, V., Effects of Interdisciplinary Education On Technology-driven Application Design IEEE Transactions on Education, August 2011, pp.462-470. (impact factor 1.328/2010).

Tomazic, S., Pavlovic, V., Milovanovic, J., Sodnik, J., Kos, A., Stancin, S., Milutinovic, V., Fast File Existence Checking in Archiving Systems ACM Transactions on Storage (TOS) TOS Homepage archive, Volume 7 Issue 1, June 2011, ACM New York, NY, USA.

Jovanovic, Z., Milutinovic, V., FPGA Accelerator for Floating-Point Matrix Multiplication, IEE Computers & Digital Techniques, 2012, 6, (4), pp. 249-256.

Flynn, M., Mencer, O., Milutinovic, V., Rakocevic, G., Stenstrom, P., Trobec, R., and Valero, M., Moving from Petaflops (on Simple Benchmarks) to Petadata per Unit of Time and Power (On Sophisticated Benchmarks) Communications of the ACM, May 2013 (impact factor 1.919/2010).

80/83

Page 81: Anegdotic Maxeler (Romania)

81/8381/83

Current Main Efforts of Maxeler

1. To encourage a lot of software to be written/ported.This is a key business opportunity that needs to be developed.

2. Maxeler is building up a website and a communityto share software for DFEs.

This would allow the software to also be sold directly from the Maxeler website.

3. If a PhD student ports an important software to a Maxeler machine,she/he could become the first software vendor in the worldfor dataflow computers,and Maxeler would be happy to help sell licenses.

Page 82: Anegdotic Maxeler (Romania)

82/8382/83

Current Side Efforts of Maxeler

1. Developing new tools for easier making of kernels.

2. Bringing new languages to Maxeler: C, C++, MathLab, Matematika

3. Porting popular application packages to Maxeler: OpenSees, etc...

4. Trying the Tabula FPGA!

5. Getting more than 1TeraByte/sec thru I/O

6. Minimizing the hardware, so it can go into Galaxy 5,6…

Page 83: Anegdotic Maxeler (Romania)

83/8383/8383

NewTools: MaxSkinsCustom Engine

Interfaces(.c)

MaxCompiler

Dataflow Design(.maxj)

.max file developer

.max file user

.max file

MaxCompilerApp Packager

MATLAB C/C++ R PythonExcel

.mex .m

Testing /Application integration

App Installer

SLiC level programming

Page 84: Anegdotic Maxeler (Romania)

84/8384/83

Getting Started a Practical Work from the Linux Shell1. Open a shell terminal (e.g., $ /usr/bin/xfce4-terminal).

2. Connect to the Maxeler machine (e.g., $ ssh [email protected]).

3. If more shell screens needed, start screen (e.g., $ screen).

4. Switch to the directory that contains the 2n+3 programs you wrote (e.g., $ cd Desktop/workspace/src/ind/z88/).

5. Prepare your C code for measuring the execution time (e.g., clock_gettime(CLOCK_REALTIME, &t2);).

6. See what you can do (e.g., $ make).

7. Select one of those that you can do (e.g., $ make build-sim, $ make run-sim, $ make build-hw, $ make run-hw).

8. Measure the power consumption at the wall plug.

Page 85: Anegdotic Maxeler (Romania)

85/8385/83© H. Maurer85 [email protected]

Q&A