Upload
vudung
View
215
Download
0
Embed Size (px)
Citation preview
Experts in numerical algorithms and HPC services
Implementing Algorithms
Edvin Deadman
10th May 2013
2
NAG Background
Founded 1970 Not-for-profit organisation
Surpluses fund on-going R&D
Mathematical and Statistical Expertise Libraries of components
Consulting
HPC Services
Computational Science and Engineering (CSE) support
Procurement advice, market watch, benchmarking
3
Outline
Numerical issues:
Dealing with floating-point arithmetic, overflow and underflow
Performance:
Parallelism, measuring performance
Testing:
How do we know we’re getting the right answer?
4
Overflow - example
Cauchy’s Theorem:
Integral estimated by quadrature (trapezium rule)
Leads to a numerical differentiation method
Error estimate for the method is available
6
Overflow - example
Algorithm involves n!
32-bit signed integer, maximum: 2 147 483 647
But: 13! = 6 227 020 800
A compiler may represent 13! as 1 932 053 504
7
Underflow
Smallest representable double precision floating point number is ~10-308
Some systems will flush to zero.
Others may use denormalized numbers – performance penalty.
8
Underflow - example
Converging iteration on a vector:
2.51.31.60.1
2.0011.0021.99910−50
2.00011.00011.999910−308
2.000011.000011.9999910−310
9
Underflow - example
Converging iteration on a vector:
2.51.31.60.1
2.0011.0021.99910−50
2.00011.00011.999910−308
2.000011.000011.9999910−310
slower
10
Reproducibility
Order of floating-point operations can affect the result:
>> u = eps/2
>> x = 0
>> x = x+1
>> for i=1:10
>> x = x+u
>> end
x = 1
>> u = eps/2
>> x = 0
>> for i=1:10
>> x = x+u
>> end
>> x = x+1
x = 1.000000000000001
Parallelism can’t guarantee reproducibility
11
Outline
Numerical issues:
Dealing with floating-point arithmetic, overflow and underflow
Performance:
Parallelism, measuring performance
Testing:
How do we know we’re getting the right answer?
14
Matrix Square Root: Cache
Cache miss: failed attempt to obtain data from cache.
Cache size is a power of 2.
If n=2k, lines or blocks of matrix “line up” and exceed cache associativity.
RAM
L2 Cache
L1 Cache
Registers
Processor
Slo
wer
15
Parallelism: Matrix Square Root
The fastest algorithm in serial is not always fastest in parallel
Point: matrix square root with no blocking Block: standard blocking scheme Recursion: recursive blocking scheme
16
Parallelism: Schur-Parlett Algorithm for f(A)
Performance gain in parallel may depend on the input data
1. Schur decomposition
2. Group eigenvalues into diagonal blocks. Taylor series to evaluate f(A) on blocks.
3. Off-diagonal blocks computed via Sylvester equations.
4. “Undo” Schur decomposition.
17
Outline
Numerical issues:
Dealing with floating-point arithmetic, overflow and underflow
Performance:
Parallelism, measuring performance
Testing:
How do we know we’re getting the right answer?
18
Testing: Matrix Functions f(A)
Computed solution can be written in terms of forward or backward errors: f(A)+∆f or f(A+∆A)
Demand normwise relative backward error of order u
Forward error should then be ~u
Higher precision arithmetic can provide ‘exact’ result
Identities e.g. sin2A + cos2A=I
19
Summary
Numerical issues:
Numbers in a computer aren’t like numbers in “real life”
Performance:
It’s not as simple as “larger n makes it slower; more processors makes it faster”
Testing:
Takes longer that writing code in the first place
20
Why Not Take a look? Implemented in..
NAG C Library
NAG Fortran Library
NAG Library for Java
NAG Toolbox for MATLAB
NAG Library for SMP & Multi-core
22
Portfolio
Numerical Libraries Highly flexible for use in many computing languages, programming
environments, hardware platforms and for high performance computing methods
Connector products for Excel, LabVIEW, Python, R and Visual Basic
Giving users of the spread sheets, software packages, scripting environments access to NAG’s library of highly optimized and often superior numerical routines
NAG Fortran Compiler and GUI based Windows Compiler: Fortran Builder
Product and HPC Training
23
NAG Library and Toolbox Contents
Root Finding
Summation of Series
Quadrature
Ordinary Differential Equations
Partial Differential Equations
Numerical Differentiation
Integral Equations
Mesh Generation
Interpolation
Curve and Surface Fitting
Optimization
Approximations of Special Functions
Dense Linear Algebra
Sparse Linear Algebra
Correlation & Regression Analysis
Multivariate Methods
Analysis of Variance
Random Number Generators
Univariate Estimation
Nonparametric Statistics
Smoothing in Statistics
Contingency Table Analysis
Survival Analysis
Time Series Analysis
Operations Research
24
Most Universities have a NAG site licence
ARE YOU USING NAG?
ARE YOU AWARE OF YOUR UNIVERSITY LICENCE?
Typically the licence covers unlimited use on Linux, Windows and or Mac o/s As long as for academic or research purposes
Installation may be on any university, staff or student machine
Products & Training NAG Libraries (Fortran, C, Data Mining Components , Java, .NET)
NAG Toolbox for MATLAB, Numerical Components for GPU (CUDA)
NAG Fortran Compiler
Getting access www.nag.co.uk ; [email protected]
UK Academic Account Manager: [email protected]