Edvin Deadman - NAG · Edvin Deadman 10th May 2013 . 2 ... >> for i=1:10 >> x = x+u >> end x = 1 ... Numerical Excellence Louise Mitchell 10th May 2013 . 22

Experts in numerical algorithms and HPC services

Implementing Algorithms

Edvin Deadman

10th May 2013

2

NAG Background

Founded 1970 Not-for-profit organisation

Surpluses fund on-going R&D

Mathematical and Statistical Expertise Libraries of components

Consulting

HPC Services

Computational Science and Engineering (CSE) support

Procurement advice, market watch, benchmarking

3

Outline

Numerical issues:

Dealing with floating-point arithmetic, overflow and underflow

Performance:

Parallelism, measuring performance

Testing:

How do we know we’re getting the right answer?

4

Overflow - example

Cauchy’s Theorem:

Integral estimated by quadrature (trapezium rule)

Leads to a numerical differentiation method

Error estimate for the method is available

5

Overflow - example

6

Overflow - example

Algorithm involves n!

32-bit signed integer, maximum: 2 147 483 647

But: 13! = 6 227 020 800

A compiler may represent 13! as 1 932 053 504

7

Underflow

Smallest representable double precision floating point number is ~10-308

Some systems will flush to zero.

Others may use denormalized numbers – performance penalty.

8

Underflow - example

Converging iteration on a vector:

2.51.31.60.1

2.0011.0021.99910−50

2.00011.00011.999910−308

2.000011.000011.9999910−310

9

Underflow - example

Converging iteration on a vector:

2.51.31.60.1

2.0011.0021.99910−50

2.00011.00011.999910−308

2.000011.000011.9999910−310

slower

10

Reproducibility

Order of floating-point operations can affect the result:

>> u = eps/2

>> x = 0

>> x = x+1

>> for i=1:10

>> x = x+u

>> end

x = 1

>> u = eps/2

>> x = 0

>> for i=1:10

>> x = x+u

>> end

>> x = x+1

x = 1.000000000000001

Parallelism can’t guarantee reproducibility

11

Outline

Numerical issues:


Performance:


Testing:


12

Performance: Matrix Square Root

Matrix size, n increases in increments of 50

13

Performance: Matrix Square Root

Add n=512, 1024, 2048 to the plot

14

Matrix Square Root: Cache

Cache miss: failed attempt to obtain data from cache.

Cache size is a power of 2.

If n=2k, lines or blocks of matrix “line up” and exceed cache associativity.

RAM

L2 Cache

L1 Cache

Registers

Processor

Slo

wer

15

Parallelism: Matrix Square Root

The fastest algorithm in serial is not always fastest in parallel

Point: matrix square root with no blocking Block: standard blocking scheme Recursion: recursive blocking scheme

16

Parallelism: Schur-Parlett Algorithm for f(A)

Performance gain in parallel may depend on the input data

1. Schur decomposition

2. Group eigenvalues into diagonal blocks. Taylor series to evaluate f(A) on blocks.

3. Off-diagonal blocks computed via Sylvester equations.

4. “Undo” Schur decomposition.

17

Outline

Numerical issues:


Performance:


Testing:


18

Testing: Matrix Functions f(A)

Computed solution can be written in terms of forward or backward errors: f(A)+∆f or f(A+∆A)

Demand normwise relative backward error of order u

Forward error should then be ~u

Higher precision arithmetic can provide ‘exact’ result

Identities e.g. sin2A + cos2A=I

19

Summary

Numerical issues:

Numbers in a computer aren’t like numbers in “real life”

Performance:

It’s not as simple as “larger n makes it slower; more processors makes it faster”

Testing:

Takes longer that writing code in the first place

20

Why Not Take a look? Implemented in..

NAG C Library

NAG Fortran Library

NAG Library for Java

NAG Toolbox for MATLAB

NAG Library for SMP & Multi-core

Experts in numerical algorithms and HPC services

Numerical Excellence

Louise Mitchell

10th May 2013

22

Portfolio

Numerical Libraries Highly flexible for use in many computing languages, programming

environments, hardware platforms and for high performance computing methods

Connector products for Excel, LabVIEW, Python, R and Visual Basic

Giving users of the spread sheets, software packages, scripting environments access to NAG’s library of highly optimized and often superior numerical routines

NAG Fortran Compiler and GUI based Windows Compiler: Fortran Builder

Product and HPC Training

23

NAG Library and Toolbox Contents

Root Finding

Summation of Series

Quadrature

Ordinary Differential Equations

Partial Differential Equations

Numerical Differentiation

Integral Equations

Mesh Generation

Interpolation

Curve and Surface Fitting

Optimization

Approximations of Special Functions

Dense Linear Algebra

Sparse Linear Algebra

Correlation & Regression Analysis

Multivariate Methods

Analysis of Variance

Random Number Generators

Univariate Estimation

Nonparametric Statistics

Smoothing in Statistics

Contingency Table Analysis

Survival Analysis

Time Series Analysis

Operations Research

24

Most Universities have a NAG site licence

ARE YOU USING NAG?

ARE YOU AWARE OF YOUR UNIVERSITY LICENCE?

Typically the licence covers unlimited use on Linux, Windows and or Mac o/s As long as for academic or research purposes

Installation may be on any university, staff or student machine

Products & Training NAG Libraries (Fortran, C, Data Mining Components , Java, .NET)

NAG Toolbox for MATLAB, Numerical Components for GPU (CUDA)

NAG Fortran Compiler

Getting access www.nag.co.uk ; [email protected]

UK Academic Account Manager: [email protected]

http://www.nag.co.uk/

mailto:[email protected]

mailto:[email protected]

Documents

Edvin Deadman - NAG · Edvin Deadman 10th May 2013 . 2 ... >> for i=1:10 >> x = x+u >> end x = 1 ... Numerical Excellence Louise Mitchell 10th May 2013 . 22