SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Preview:

Citation preview

SoC HPC: Design and Optimization

Mark DelgadoBS in Nuclear Engineering From NC State

Python User

Presentation Topics

Why? SoC Choices and Economics Software and IT Stack Cluster Design and Decisions Optimizations and Improvements What has been done today What will be done tomorrow

Not Presentation Topics

Calculation and Data Decisions Data Acquisition Parameter Selections Application Strategies Broker Selection and Integration Heavy Quantitative Finance Cluster Application Strategies Source Code

Why?

Professional Curiosity, New Challenges, New Technologies

Project Hypothesis

Can I build a system that can perform massive amounts of calculations?

Can I then use this system to solve problems, find relationships, and find strategies?

Can I build or modify the system to take any strategies and apply them?

What Kind of System? What is HPC?

Titan Supercomputer

Titan Economics

18,688 Nodes with 16 Cores per Node 299,008 Total CPU Cores

18,688 GPUs Total Cost: $97,000,000 Individual Unit? Only $15,000!

SoC Choices and Economics

Raspberry Pi 3 1.2 GHz Quad Core ARM, 1GB RAM, $35

Parallella Dual Core ARM, 16 Core RISC CPU, $150

Odroid-XU4 Quad Core ARM 1.5 GHz, Quad Core ARM 2.0GHz, 2GB

RAM, $75

SoC vs Server?

XU4 Energy Requirements: 20 Watts

Server Energy Requirements: 750 Watts

Total Yearly Cost of Server ~$725

Total Yearly Cost of XU4 ~$89

Software and IT Stack

Software and IT Stack

1 Gbps switch Cat6 Cables 1 Gbps Supporting SoC and Laptop Configured and Mounted NFSv4 Folders and

Partitions SSH access

Software and IT Stack

What is the System Being Designed for? Ease of Use and Support? Less Ease, less support, more performance?

Software and IT Stack

Pure and Raw Performance? Less Support, More Difficult to Use Difficult to Setup, Difficult to Hand-off 5-10% increase, modern software

Software and IT Stack

Languages Used: Python, Cython, C/C++

Message Passing OpenMPI, 0mq

Networking 0mq

Database MongoDB

Software and IT Stack

Python Modules: Message Passing

Pyzmq, mpi4py Networking

pyzmq Database

PyMongo

All Modules found on PIP!

Software and IT Stack

Cluster Design and Decisions

The Buy Strategy: MACD Cross Over The Sell Strategy: TP/SL Timeframe: Weekly Data Resolution: Minute Question: Using a MACD Cross Over as a buy

strategy, and a TP/SL as a sell strategy, is there a combination that yields higher ROI vs the weekly ROI of that equity?

Cluster Design and Decisions

Hypothesis: YES! Problem:

f(a,b,c,tp,sl) a=2..100, b=2..100, c=2..100, tp=1..10, sl=1..10 98**3*10**2*37s = ~6600 years of calculations

Solution: Parallelization, Network Optimization, Algo Optimization

Cluster Design and Decisions

Lesson 0: Memory > Database 15 second query done every calculation New time: ~4000 years

Cluster Design and Decisions

0mq Pub/Sub Network Architecture

Cluster Design and Decision

Lesson 1: Avoid ‘Pre-Processing’ Data

More Gbps = More Time

Cluster Design and Decisions

New Calculation time, ~2s New Total time, 11.1 years

Cluster Design and Decisions

Lesson 2: Memory > Network

Cluster Design and Decisions

Lesson 3: Parallelize Everything

Cluster Design and Decisions

Different Designs Yield Different Results Control time = 0.6s Pub/Sub = ~1s = 11.1 years Pub/Sub/Modified = 0.83s = 10.2 years Pub/Sub/Modified/Parallel = 0.78s = 9.5 years

Cluster Design and Decisions

Lesson 4: Cython isn’t always the answer

Still slow, worth exploring?

Cluster Design and Decisions

Different types of clusters for different problems Previous cluster designs = Centralized Streaming

and Centralized Storage

Cluster Design and Decisions

Introducing Decentralized Streaming and Centralized Storage

Cluster Design and Decision

Lesson 5: Good Memory Management = Good Results

Cluster Design and Decision

Removing the network stream reduces the data transmission time to 0s

New Calculation time = 1s New Total time = 5.56years

Optimizations and Improvements

Lesson 6: Profile Profile Profile What are the pain points in the algo? Given the current algo design, what can be ported

to C/Cython? Are the parameters ‘good’ ?

Optimizations and Improvements

Choosing ‘good’ parameters = .5s New time = 2.78 years

Exporting math to C/Cython = .2s New time = 1.1 years

Combining C/Cython and Pypy = .09s New time = 0.5 years

Choosing ‘actually good’ parameters = .06s ***Speculating*** New Time = .33 years

Optimizations and Improvements

The problem: 98**3*10**2*.06s = Total Time

98**3*10**2 = C 0.06 = t C*t = Total Time

Optimizations and Improvements

Total Calculation number = 98**3*10**2 = 94,119,200 = C

Decrease Resolution of C = Cn Cn = C*.99 New time after Cn = .021 years

Optimizations and Improvements

Lesson 7: IT Automation is Awesome! Especially when applied to math!

Use IT automation to determine new values of Cn and automatically parallelize calculations New time=***0.005-0.01 years***

Optimizations and Improvements

New Estimated Total Time: .01 years .01 years = 3.6 days

From 6600 years to 3.6 days

Optimizations and Improvements

What did we just do? S(M1,M2,M3,TP,SL) M1=x1→x1* M2=x2→x2* M3=x3→x3* TP=x4→x4* SL=x5→x5*

What Has Been Done Today

Everything except Pypy and C/Cython merge, IT Automation, and IT Automation + Math

What can I show you? Fully functioning cluster without automation Real performance differences between Python and Pypy NFS to aggregate the results

What Will Be Done Tomorrow?

Pypy and C/Cython merge, IT Automation, and IT Automation + Math

Pandas to handle data Matplotlib to graph potential strategies

Questions?

Thanks

Recommended