40
SoC HPC: Design and Optimization Mark Delgado BS in Nuclear Engineering From NC State Python User

SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Embed Size (px)

Citation preview

Page 1: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

SoC HPC: Design and Optimization

Mark DelgadoBS in Nuclear Engineering From NC State

Python User

Page 2: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Presentation Topics

Why? SoC Choices and Economics Software and IT Stack Cluster Design and Decisions Optimizations and Improvements What has been done today What will be done tomorrow

Page 3: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Not Presentation Topics

Calculation and Data Decisions Data Acquisition Parameter Selections Application Strategies Broker Selection and Integration Heavy Quantitative Finance Cluster Application Strategies Source Code

Page 4: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Why?

Professional Curiosity, New Challenges, New Technologies

Page 5: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Project Hypothesis

Can I build a system that can perform massive amounts of calculations?

Can I then use this system to solve problems, find relationships, and find strategies?

Can I build or modify the system to take any strategies and apply them?

Page 6: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

What Kind of System? What is HPC?

Titan Supercomputer

Page 7: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Titan Economics

18,688 Nodes with 16 Cores per Node 299,008 Total CPU Cores

18,688 GPUs Total Cost: $97,000,000 Individual Unit? Only $15,000!

Page 8: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

SoC Choices and Economics

Raspberry Pi 3 1.2 GHz Quad Core ARM, 1GB RAM, $35

Parallella Dual Core ARM, 16 Core RISC CPU, $150

Odroid-XU4 Quad Core ARM 1.5 GHz, Quad Core ARM 2.0GHz, 2GB

RAM, $75

Page 9: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

SoC vs Server?

XU4 Energy Requirements: 20 Watts

Server Energy Requirements: 750 Watts

Total Yearly Cost of Server ~$725

Total Yearly Cost of XU4 ~$89

Page 10: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

Page 11: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

1 Gbps switch Cat6 Cables 1 Gbps Supporting SoC and Laptop Configured and Mounted NFSv4 Folders and

Partitions SSH access

Page 12: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

What is the System Being Designed for? Ease of Use and Support? Less Ease, less support, more performance?

Page 13: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

Pure and Raw Performance? Less Support, More Difficult to Use Difficult to Setup, Difficult to Hand-off 5-10% increase, modern software

Page 14: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

Languages Used: Python, Cython, C/C++

Message Passing OpenMPI, 0mq

Networking 0mq

Database MongoDB

Page 15: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

Python Modules: Message Passing

Pyzmq, mpi4py Networking

pyzmq Database

PyMongo

All Modules found on PIP!

Page 16: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Software and IT Stack

Page 17: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

The Buy Strategy: MACD Cross Over The Sell Strategy: TP/SL Timeframe: Weekly Data Resolution: Minute Question: Using a MACD Cross Over as a buy

strategy, and a TP/SL as a sell strategy, is there a combination that yields higher ROI vs the weekly ROI of that equity?

Page 18: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Hypothesis: YES! Problem:

f(a,b,c,tp,sl) a=2..100, b=2..100, c=2..100, tp=1..10, sl=1..10 98**3*10**2*37s = ~6600 years of calculations

Solution: Parallelization, Network Optimization, Algo Optimization

Page 19: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Lesson 0: Memory > Database 15 second query done every calculation New time: ~4000 years

Page 20: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

0mq Pub/Sub Network Architecture

Page 21: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decision

Lesson 1: Avoid ‘Pre-Processing’ Data

More Gbps = More Time

Page 22: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

New Calculation time, ~2s New Total time, 11.1 years

Page 23: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Lesson 2: Memory > Network

Page 24: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Lesson 3: Parallelize Everything

Page 25: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Different Designs Yield Different Results Control time = 0.6s Pub/Sub = ~1s = 11.1 years Pub/Sub/Modified = 0.83s = 10.2 years Pub/Sub/Modified/Parallel = 0.78s = 9.5 years

Page 26: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Lesson 4: Cython isn’t always the answer

Still slow, worth exploring?

Page 27: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Different types of clusters for different problems Previous cluster designs = Centralized Streaming

and Centralized Storage

Page 28: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decisions

Introducing Decentralized Streaming and Centralized Storage

Page 29: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decision

Lesson 5: Good Memory Management = Good Results

Page 30: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Cluster Design and Decision

Removing the network stream reduces the data transmission time to 0s

New Calculation time = 1s New Total time = 5.56years

Page 31: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

Lesson 6: Profile Profile Profile What are the pain points in the algo? Given the current algo design, what can be ported

to C/Cython? Are the parameters ‘good’ ?

Page 32: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

Choosing ‘good’ parameters = .5s New time = 2.78 years

Exporting math to C/Cython = .2s New time = 1.1 years

Combining C/Cython and Pypy = .09s New time = 0.5 years

Choosing ‘actually good’ parameters = .06s ***Speculating*** New Time = .33 years

Page 33: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

The problem: 98**3*10**2*.06s = Total Time

98**3*10**2 = C 0.06 = t C*t = Total Time

Page 34: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

Total Calculation number = 98**3*10**2 = 94,119,200 = C

Decrease Resolution of C = Cn Cn = C*.99 New time after Cn = .021 years

Page 35: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

Lesson 7: IT Automation is Awesome! Especially when applied to math!

Use IT automation to determine new values of Cn and automatically parallelize calculations New time=***0.005-0.01 years***

Page 36: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

New Estimated Total Time: .01 years .01 years = 3.6 days

From 6600 years to 3.6 days

Page 37: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Optimizations and Improvements

What did we just do? S(M1,M2,M3,TP,SL) M1=x1→x1* M2=x2→x2* M3=x3→x3* TP=x4→x4* SL=x5→x5*

Page 38: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

What Has Been Done Today

Everything except Pypy and C/Cython merge, IT Automation, and IT Automation + Math

What can I show you? Fully functioning cluster without automation Real performance differences between Python and Pypy NFS to aggregate the results

Page 39: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

What Will Be Done Tomorrow?

Pypy and C/Cython merge, IT Automation, and IT Automation + Math

Pandas to handle data Matplotlib to graph potential strategies

Page 40: SoC HPC: Design, Optimization, and Application to Algorithmic Trading

Questions?

Thanks