24

A concept for the estimation of high-degree gravity field models in a high performance computing environment

Embed Size (px)

Citation preview

Page 1: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Stud. Geophys. Geod., 58 (2014), 1-xxx, DOI: 10.1007/s11200-013-1246-3 i © 2014 Inst. Geophys. AS CR, Prague

A concept for the estimation of high-degree gravity field models in a high performance computing environment

JAN MARTIN BROCKMANN, LUTZ ROESE-KOERNER AND WOLF-DIETER SCHUH

University of Bonn, Institute of Geodesy and Geoinformation, Nussallee 17, 53115 Bonn, Germany ([email protected], [email protected], [email protected])

Received: July 29, 2013; Revised: November 15, 2013; Accepted: December 17, 2013

ABSTRACT

The determination of global gravity field models by the combination of complementary observations within a joint adjustment is a computationally expensive task. Global gravity field models are typically parameterized by a finite series of spherical harmonic base functions. The effort to compute the set of coefficients depends on the one hand on the maximum degree of the spherical harmonic expansion and on the other hand on the number of observations and their stochastic characteristics. The main result of this contribution is a computation scheme for the rigorous (i.e. avoiding computationally motivated approximations) estimation of high-degree gravity models. A conjugate gradient based iterative solver is implemented in a high performance computing environment allowing to estimate hundreds of thousands to millions of unknown parameters. To perform the computations in a reasonable time, the solver is designed to operate on thousands of computing cores. A flexible design is considered to process an arbitrary number of observation groups. For each group a variance component can be estimated to derive a data adaptive weighting factor. The combined solution results from the weighted joint inversion of all observation groups, which might be provided in terms of preprocessed normal equations (e.g. from satellite gravity field missions like GRACE or GOCE) or in terms of observations like datasets of point-wise terrestrial gravity field information. A small-scale closed-loop simulation (250000 unknowns, spherical harmonic degree and order 500) for the estimation of the Earth’s gravity field serves as proof of concept. Normal equations derived by observations from three satellite missions and 15 datasets of point-wise measurements (gravity anomalies and along-track altimetry) are combined in the gravity field adjustment. The simulation study i) verifies the implementation, ii) analyzes the computational effort of the individual steps involved and iii) leads to the main conclusion that the rigorous least-squares solution can be determined in a reasonable amount of time. K e y w o r d s : high performance computing, iterative solvers, global gravity field

recovery, high-degree spherical harmonics

Page 2: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

ii Stud. Geophys. Geod., 58 (2014)

1. INTRODUCTION

Global gravity field models are often described as a finite series of spherical harmonic coefficients. According to Heiskanen and Moritz (1993, p. 59), the potential in a point

with spherical coordinates ,,r can be written as

1

0 0

,, cos sin cosmax

ll l

lm lm lml m

GM aVr c m s m P

a r

, (1)

where l and m denote the spherical harmonic degree and order, respectively, clm and slm

the coefficients of the spherical harmonic series, a the equatorial radius of the reference

ellipsoid, lmP the fully normalized associated Legendre functions, and GM the

gravitational constant. lmax is the maximum degree of the expansion which defines the

spatial resolution. From a computational point of view, it is useful to generally distinguish two classes of

global gravity field models, depending on the observations used and the maximum spherical harmonic degree. The first class are satellite-only gravity field models which are based on observations collected by satellites only. Gravity field models are either derived from the observations from a single satellite mission like CHAMP (CHAllenging Mini-Satellite Payload, Reigber et al., 2002), GRACE (Gravity Recovery And Climate Experiment, Tapley et al., 2004) or GOCE (Gravity field and steady-state Ocean Circulation Explorer, ESA, 1999) or from a combination of them. For the processing of individual satellite-only models, the computational challenge does not result from the number of parameters to be estimated but from the huge number of observations (e.g. several hundreds of millions for GRACE or GOCE over several years) and the complex stochastic characteristics of the data (see e.g. Schuh et al., 2010; Rummel et al., 2011). Iterative solvers are often used to derive fast solutions (Baur, 2009; Baur et al., 2008; Xie, 2005) e.g. for parameter tuning (Brockmann et al., 2010). The final model is then typically set up via the assembly and solution of full normal equations (NEQs) (Pail and Plank, 2003; Bruinsma et al., 2013) to derive in addition the full error covariance matrix. The same holds for models which combine observations of different satellite missions to derive combined satellite-only gravity field models. These combinations are usually performed at the level of NEQs. Due to the limited resolution, the computational challenges are limited (e.g. Pail et al., 2010; Bruinsma et al., 2013; Farahani et al., 2013). An exception of satellite-only models with respect to the maximal resolution are the recently derived Lunar gravity field models. With the data available from the very low orbiting Gravity Recovery And Interior Laboratory (GRAIL) (Zuber et al., 2013) mission, lunar gravity field models are computed up to spherical harmonic degree and order (d/o) 660 (Lemoine et al., 2013; Konopliv et al., 2013) using rigorous solution techniques as well. The resolution is high, but due to the short mission lifetime of about six months (and a 5 s sampling) the amount of data being processed is small compared to the Earth’s gravity field missions. Lemoine et al. (2013) as well as Konopliv et al. (2013) use QR factorization based solvers, without providing details on the parallel implementation concept. A possible out-of-core implementation of an updated QR decomposition is given

Page 3: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) iii

in Gunter and Van De Geijn (2005) and the development of an in-core method in Baboulin et al. (2009). The second class are high resolution combined gravity field models. In addition to

satellite observations (in form of NEQs) also terrestrial measurements like gravity anomalies on land and altimetry over the oceans are taken into account. Using these measurements, which are sensitive to higher frequencies, the resolvable degrees considerably increase. Current combined models are available up to d/o 360 (GIF48, Ries et al., 2011), 1949 (EIGEN6C, Förste et al., 2008, 2012) and 2190 (EGM2008, Pavlis et

al., 2012) which corresponds to 130 000 to 4.8 106 gravity field parameters. Whereas the GIF48 model was computed via the assembly and solution of full NEQs (Ries et al., 2011), the other models were computed introducing approximations. EIGEN6C was computed via averaging two solutions, i.e. a full NEQ solution from d/o 2 to 370 and a block-diagonal solution for the gridded high resolution data from 2 to degree 1949 (Förste et al., 2012). The final solution was composed from the full NEQ solution for degrees 2 to 260 and for degrees 370 to 1949 from the block-diagonal solution. For degrees 260 to 370 the coefficients of both sets were averaged. EGM2008 was computed from an entirely block-diagonal solution. Also the involved satellite NEQs were approximated as a block-diagonal system of NEQs (Pavlis et al., 2012). To avoid computationally motivated approximations either in the model or in the

solution algorithms, a massive parallel implementation of the inversion for gravity field recovery must be used. Either direct solvers via the assembly and solution of full NEQs might be used (e.g. Fecher et al., 2011; Brockmann et al., 2013) or in some cases the QR decomposition of the design matrix (e.g. Gunter and Van De Geijn, 2005; Baboulin et al., 2009). Studies have shown that this is possible in reasonable time up to d/o 600720 (e.g. Fecher et al., 2011; Brockmann et al., 2013; Lemoine et al., 2013; Konopliv et al., 2013). Brockmann et al. (2013) describe a massive parallel in-core implementation of the assembly and solution of full NEQs from an arbitrary number of observation groups also including group specific variance components using the same standard High Performance Computing (HPC) concepts as used in the iterative approach proposed in this article. Going for even higher spherical harmonic degrees, massive parallel implementations of iterative solvers are an alternative approach. While Schuh (1996) studied the general use of iterative solvers for spherical harmonic analysis and suggests the use of preconditioned conjugate gradients, Baur et al. (2008) investigated the use of iterative solvers based on QR decomposition. Both solvers are Krylov subspace-based iterative solvers and thus have comparable convergence behavior, memory requirements and numerical complexity (Baur et al., 2008). This study presents an implementation of a massive parallel iterative least-squares

solver which is capable to compute the rigorous solution for gravity field models with hundreds of thousands to millions of unknown parameters. The goal is not to provide an alternative solution technique, but to go into some details implementing a more or less well known iterative solver in a HPC environment. Standard concepts from high performance and scientific computing are used to derive a portable implementation. Section 2 summarizes the implemented iterative solver with a special focus on the integration of variance component estimation (VCE) to derive data adaptive weights for the observation groups. Section 3 describes the computational tasks and challenges which

Page 4: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

iv Stud. Geophys. Geod., 58 (2014)

have to be solved for a massive parallel implementation. A small-scale closed-loop simulation is presented in Section 4 to verify and to analyze the implementation. A summary and conclusions are given in Section 5.

2. IMPLEMENTED SOLVER

The goal of this section is to review the implemented iterative solver. The algorithms are summarized and adapted for the implementation in a HPC environment. The basic concepts, i.e. the general mathematical adjustment model, the solver itself and VCE to determine the unknown weights of the complementary observation groups are introduced.

2.1. Basic setup of the adjustment model

The chosen iterative solver - a preconditioned conjugate gradient based solver (Hestenes and Stiefel, 1952) - is well understood and well documented in literature. The solver was already chosen by Schuh (1996) and tailored for the processing of GOCE observations as Preconditioned Conjugate Gradient Multiple Adjustment (PCGMA, Schuh, 1996). That implementation was optimized for the requirements of GOCE processing, i.e. the limited spherical harmonic resolution and the huge number of auto-correlated observations. The implementation introduced here is more flexible and - most important - is able to derive rigorous solutions for millions of unknown parameters. Nevertheless, the description and summary of the basic algorithm is close to Schuh (1996) and Boxhammer (2006).

Solution of combined system

The goal of the solver is to find the least-squares solution for the unknown spherical harmonic parameters arranged in the vector x. In general, it is assumed that several observation groups must be combined, which can be either available as raw observation equations (OEQs) plus an error description or as preprocessed (band-limited) NEQs which are assumed to be sufficient statistics of the original observations. Hence, on the one hand

there is a set of (linearized) OEQs for different groups 1, ,o O

o o o l v xA and 20oo ooll ll Q , (2)

where Ao are the design matrices, lo the vectors of observations and vo the observation

residuals. The covariance matrices of the observations ooll are the rescaled cofactor

matrices ooll

Q which are assumed to be known (for instance derived from empirically

estimated covariance functions). 20 are the unknown variance components.

On the other hand, 1, ,n N indicates observation groups that are available in

terms of NEQs:

2 2

1 1n n n

n n x nN , (3)

Page 5: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) v

1 12 2

1 1n n n n

n n

x x and T 1,nn

n n nlln l lQ , (4)

where 1n n

N are the preprocessed normal equation matrices and 1n n n

n x the

preprocessed normal equation vectors. For the use of VCE, to determine the unknown

variances 2n the number of observations nn used in the assembly of the NEQs and the

product T 1

nnn nlll lQ have to be known.

The joint solution x determined from all observation groups is subject to

T 1 T 1

2 2 2 21 1 1 1

1 1 1 1

oo oo

N O N O

n o o n o oll lln o n on o n o

x n lN A Q A A Q , (5)

x nN , (6)

where N and n are the normal matrix and normal vector, respectively, of the weighted and combined minimization problem. For small- to medium-scale problems the assembly of N and n and the solution of Eq. (5) via Cholesky reduction is possible. Brockmann et al. (2013) for example demonstrated that a gravity field up to d/o 720 (520000 unknowns,

size of N ≈ 2 TB using double precision as a standard) can be computed in less than

2 hours on modern supercomputers (≈ 8000 cores). Nevertheless, increasing the resolution

to, e.g., d/o 2000 (4000000 unknowns, size of N ≈ 116 TB), the direct solution technique,

although implemented on a supercomputer, becomes unreasonable.

Estimation of weights as variance components

Combining complementary data types to determine the joint least-squares solution requires to address the question of weighting the individual observation groups relative to each other. Thus, in addition to the unknown parameter vector x, an individual weight for each group is estimated from the data. A data adaptive concept was developed by Koch and Kusche (2002) who interpreted the weights within a data combination procedure as

inverse variance components (Förstner, 1979). Thus, the weights 21i iw for each

group ,i no can be determined via VCE. Following again Koch and Kusche (2002)

and Koch (1999) these can be iteratively determined via

2 i i

i

i i in m r

, (7)

where in is the number of observations of group i, mi the number of parameters

determined by group i, i

the weighted sum of squared residuals after iteration

Page 6: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

vi Stud. Geophys. Geod., 58 (2014)

TT 1 1

ii iii i i ii i ill ll

v v x l x lQ A Q A (8)

T T T 12ii

i i i ill x x x n l lN Q (9)

and ir the partial redundancy

1 1 T 1

2 2

1 1tr tr

iii i i i ii ll

i i

r n n

N N N A Q A . (10)

To handle large systems, Koch and Kusche (2002) proposed a stochastic trace estimator to determine the trace in Eq. (10). Within this context, as a preparation for the iterative solver, where N is not explicitly determined, it is required to use the stochastic estimator as well. According to Hutchinson (1990), the estimator

Ttr EB B , 1,1 , as discrete uniform distribution (11)

has minimal variance if B is symmetric. Thus Eq. (10) has to be rewritten to obtain symmetric matrices in the trace term. We have to distinguish two cases, namely groups available in terms of OEQs and NEQs. Partial redundancy for groups n: Introducing the Cholesky decomposition

Tn n nN R R , the partial redundancy can be reformulated as

1 1 T 1 T2 2 2

1 1 1tr tr trn n n n n n n n n

n n n

r n n n

N N N R R RN R , (12)

as the trace is invariant under cyclic permutations (e.g. Koch, 1999, p. 40). The term 1 T

n nRN R is symmetric due to its quadratic form. Inserting the stochastic estimation

from Eq. (11), the partial redundancy is

T 1 T T 12 2

1 1n n n n n n n n n

n n

r n E n E

RN R N with T:n n nR . (13)

Using k realizations of n instead of the expectation value arranged as columns in the

matrix nP and applying the same transformation, T:n n nP RP, the estimate for the partial

redundancy as mean value from the k samples is

T 12

11trn n n n

n

r nk

P N P . (14)

For better readability, the iteration indices and 1 were omitted.

Page 7: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) vii

Partial redundancy for groups o: Introducing the Cholesky decomposition 1 T

ooo oll

Q G G , the partial redundancy from Eq. (10) can be rewritten as

1 T T 1 T T2 2

1 1tr tro o o o o o o o o o o

o o

r n n

N A G G A G AN A G , (15)

with 1 T To o o o

G AN A G being symmetric. Thus, again inserting the stochastic trace

estimator yields

T 1 T T T 12 2

1 1o o o o o o o o o o o

o o

r n E n E

G AN A G N , (16)

with T T:o o o oA G . Replacing again the expectation value by k realizations arranged in

the matrix oP and applying the same transformation, T T:o o o oP A GP, the estimate for the

partial redundancy is

T 12

11tro o o o

o

r nk

P N P . (17)

2.2. Preconditioned conjugate gradient multiple adjustment including variance component estimation

As described above, the direct assembly and solution of Eq. (5) becomes unreasonable for very high maximum resolutions. Iterative solvers can be used as an alternative as they avoid the assembly of N. For the combination of NEQs and OEQs the Preconditioned Conjugate Gradient Multiple Adjustment (PCGMA) algorithm was proposed by Schuh (1996) as a connection of the conjugate gradient algorithm proposed by Hestenes and Stiefel (1952) and the extension for adjustment problems by Schwarz (1970). This section summerizes the basic (serial) PCGMA algorithm, which is extended by VCE following the line of thought of Alkhatib (2007).

Basic preconditioned conjugate gradient multiple adjustment algorithm

The basic PCGMA algorithm is summarized in Algorithm 1. The expensive

computations for the assembly of N, especially T 1

ooo o oll

N A Q A , are avoided. Instead

only a few matrix-vector multiplications have to be performed in each iteration (especially

for the computation of h , cf. line 11 in Algorithm 1). The result of the algorithm is

a least-squares estimate of the unknown parameters x after iterations. In contrast to direct solution methods, a covariance matrix of the parameters cannot be determined in a direct efficient way. As the expected convergence rate strongly depends on the condition of N and gravimetric problems tend to be ill-conditioned (e.g. Schwintzer et al., 1997; Ilk et al., 2002), one essential step for the convergence of the PCGMA algorithm is the design of a preconditioner N . This preconditioner should have the following properties: i) it

Page 8: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

viii Stud. Geophys. Geod., 58 (2014)

reflects the main characteristics of N, ii) it is sparse and easily computable and iii) the decomposition (e.g. Cholesky factorization) is sparse as well. Whereas very special preconditioners might be designed for very individual problems and setups (e.g. Boxhammer and Schuh, 2006), also very simple designs are able to considerably improve the convergence rate. As shown by Boxhammer (2006, p. 68) a simple block-diagonal preconditioner can be a very efficient approximation and significantly accelerates the PCGMA convergence.

Variance component estimation for iterative solvers

The basic PCGMA algorithm in Algorithm 1 is extended by VCE for an arbitrary number of variance components. Because N is not computed within iterative solvers, but needed in standard VCE, cf. Eq. (10), the Monte Carlo based method is used as introduced in Eqs (14) and (17) (cf. Alkhatib, 2007; Brockmann and Schuh, 2010). Whereas i can be directly computed as for the direct solver, the computation of the

partial redundancies, especially the trace term in Eqs (14) and (17), has to be adapted and integrated into the iterative procedure. Mainly, this is the estimation of the trace term T 1i iP N P for both types of groups n and o. Defining

1:i i i i Z N P NZ P, (18)

iZ can be directly determined via solving the system of NEQs with PCGMA for

additional right hand sides iP. The partial redundancies are then determined via

2

11tr T

i i i ii

r nk

P Z . (19)

Including these steps into Algorithm 1 yields the PCGMA algorithm with VCE as shown in Algorithm 2. The intention of Algorithm 2 is to give a quite simple overview about the implemented

algorithm. Thus, so far, a major point is not addressed and not taken into account in the algorithm. Not all NEQs are assembled for the same set of parameters, instead individual NEQs may just be assembled for a (small) subset of the parameters to be estimated. Thus the NEQs differ in size and consequently in parameter ordering. This point is addressed in the following section, in which the implementation on high performance computers is discussed and a concept for a massive parallel implementation is presented.

3. COMPUTATIONAL ASPECTS AND PARALLEL IMPLEMENTATION

Within this section, the massive parallel implementation of the PCGMA algorithm on a HPC cluster is summarized. Standard concepts of scientific computing are used to implement the solver presented in Section 2. Using this standard concepts, a portable implementation which is able to run on nearly every HPC cluster is created. In addition, the design is (nearly) independent of the number of cores. It is capable to operate on thousands of computing cores. The goal is a flexible implementation of the algorithm, allowing for an easy extension of the algorithm e.g. for new observation types and or additional parameters to be estimated.

Page 9: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) ix

Algorithm 1: PCGMA algorithm following Schuh (1996).

Data:

N sparse preconditioning matrix max maximal number of PCGMA iterations

oA design matrix, 1o O ol vector of observations 1o O

oollQ

cofactor matrix of observations

1o O owweight for observation group

1o O

nN normal matrix for group 1n N nn right hand side of NEQ for 1n N

nw weight for observation group

1n N 0x initial values for parameters

1. 0 0 0T 1

oo

N O

n n n o o o olln o

w w r x n x lN A Q A //compute initial joint residuals

2. 0 0solve ,ρ rN //apply preconditioner to residuals

3. 0 0π ρ //initial search direction

4. //PCGMA – iterations 5. for = 0 to 1max do

6. if 0 then 7. //update step for search direction

8.

T

1T 1e

r ρ

r ρ

9. 1e

π ρ π

10. end

11. T 1

oo

N O

n n o o olln o

w w h π πN A Q A

12. //update step for parameters

13.

T

Tq

r ρ

π h

14. 1q

x x π

15. //update step for residuals

16. 1q

r r h

17. //apply preconditioner to residuals

18. 1 1solve ,

ρ rN

19. end

20. return maxx

Page 10: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

x Stud. Geophys. Geod., 58 (2014)

Algorithm 2: PCGMA algorithm extended for VCE.

Data:

N sparse preconditioning matrix max maximal number of PCGMA iterations

oA design matrix, 1o O ol vector of observations 1o O

oollQ

cofactor matrix of observations

1o O owweight for observation group

1o O

nN normal matrix for group 1n N nnright hand side of NEQ for

1n N

nw weight for observation group

1n N 0,0x initial values for parameters

max maximal number of VCE iterations knumber of Monte Carlo (MC) samples

1. for = 1 to max do

2. ,0 1,0

X X 0 0 //compose start solution, 0 for the k N O MC

parameters

3. 0R 0 //initialize combined residuals with 0, dimension: # parameters 1N Ok

4. //add contribution by NEQ group n 5. for n = 1 to N do 6. //generate MC samples for all groups n, dimension: # parameters k

7. 1 1,1n P

8. //sample transformation

9. choln nR N //operation chol applies Cholesky factorization

10. Tn nnP RP

11. //right hand sides for group n, first column to insert :1 1n n k P

12. n n n B n0 0P0 0

13. 0 1 ,0n n nw

R NX B

14. end 15. //add contribution by OEQ group o 16. for o = 1 to O do 17. //generate MC samples for all groups o, dimension: # observations k

18. 1 1,1o P

19. 1choloo

o llG Q

20. T To o ooP A GP

21. //right hand sides for group o, first column to insert :1 1o Nk o k P

22. o oo o lL G 0 0P0 0

Page 11: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xi

23. 0 1 ,0T Tn o o o o ow

R A G GAX L

24. end

25. 0 0solve , N R //apply preconditioner to residuals

26. 0 0 //initial search direction

27. //PCGMA – iterations

28. for v = 0 to 1maxv do

29. if v > 0 then 30. //update step for search direction

31.

T

1T 1

v v

v v v R

ER

element-wise division

32. 1v v vv

E := scaling of column k with ,vkkE

33. end

34. T 1

oo

N Ov v v

n n o o olln o

w w H N A Q A

35. //update step for parameters

36.

T

T

v vv

v vR

QH

element-wise division

37. , 1 ,v v v v X X Q := scaling of column k with ,vkkQ

38. //update step for residuals

39. 1v v v v R R Q H := scaling of column k with ,vkkQ

40. //apply preconditioner to residuals

41. 1 1solve ,

v v N R

42. end 43. //update weights of OEQ groups i (o and n) using Eqs (7) and (8), (19)

44. 1 ,

compute VCE ,maxviiw

X P

45. 1

46. end

47. return ,max maxvX , max

ow

, maxnw

Assume a cluster with N RC computing cores, which can be virtually arranged as

an R C Cartesian grid. These cores, belonging to a set of computing nodes, are connected via a network device. Furthermore, it is assumed that each of the computing cores has only a direct access to its local main memory. Thus, if data stored in the memory of a specific core is needed by one of the other cores, the data has to be shared via data communication over the network. Within HPC, the Message Passing Interface (MPI, MPI-Forum, 2009) standard exists. Different implementations of this standard; i.e.

Page 12: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xii Stud. Geophys. Geod., 58 (2014)

OpenMPI (Gabriel et al., 2004), used in this study, or MPICH (Balaji et al., 2013) allow to start one process of the application on each core and provide dedicated send and receive operations to communicate data along computing cores/processes. In combination with an MPI library and extensions like the Basic Linear Algebra Communication Subprograms library (BLACS, Dongarra and Whaley, 1997) besides the communication (i.e. message passing routines), the setup of Cartesian grids for the cores is provided. Coordinates are assigned to each individual core (or the process of the application running on that core)

such that each of the N cores can be addressed via Cartesian coordinates ,rc with

0, , 1r R and 0, , 1c C .

3.1. General design

The setup and use of MPI and BLACS is a prerequisite for the use of block-cyclic distributed matrices. The concept of distributing large matrices block-cyclically along a Cartesian grid of computing cores is a (quasi-)standard in HPC (Sidani and Harrod, 1996; Blackford et al., 1997, Chapter 4). As this concept is used for all matrices and vectors in the implementation of the PCGMA algorithm, the concept is briefly described. For details, as they are very technical, the reader is referred to Blackford et al. (1997).

Given a general dense matrix gA , the whole matrix (often called global matrix) is divided into blocks of an arbitrary block size br bc,

0,0 0,1 0, 1

1,0 1,1 1, 1

1,0 1,1 1, 1

g g gJ

g g gg J

g g gI I I J

A A A

A A AA

A A A

. (20)

Except the matrix blocks 1,gIA and , 1

gJ A , all blocks are of dimension br bc.

1,gIA and , 1

gJ A might be of smaller dimension. These blocks are now cyclically

distributed along the Cartesian grid of dimension R C. The blocks of the first row (i = 0)

are distributed alternating to the processors of the first row of the core grid 0,c. Block

0,gjA is stored on processor with coordinates 0,mod ,jC . The second row of blocks

is distributed alternating along the second row of the core grid. The general rule can be

written as , mod , , mod ,gij iR jCA . If I > R, the I-th row will be distributed

again over the first row of the processors grid. All matrix blocks mapped to a core ,rc are arranged in a local, serial stored matrix

Page 13: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xiii

, ,

, , ,

g gij ij C

g glrc i Rj i Rj C

A A

A A A

. (21)

Thus, instead of operating on a global matrix, local operations have to be performed

on ,lrcA , containing elements of the global matrix but not necessarily neighboring ones. It

is important to realize that neighboring elements of gA occur only as neighboring

elements in the local matrices ,lrcA within the subblocks of size br bc. For matrices (and

vectors) stored according to this scheme, two libraries exist for basic matrix and linear algebra operations (Parallel Basic Linear Algebra Subprograms, PBLAS; Scalable Linear Algebra Package; ScaLAPACK, Blackford et al., 1997) containing basic matrix computations.

3.2. Implementation of preconditioned conjugate gradient multiple adjustment in the high performance computing

environment

The PCGMA algorithm shown in Algorithm 2 is implemented using the concept of block-cyclic distributed matrices for all matrices (and vectors) involved. The standard operations for matrices like additions, multiplications or factorizations can be performed using the PBLAS and ScaLAPACK libraries. In this section we focus on the

implementation of H (see Algorithm 2, line 34), as it is the computationally most demanding step and some individual processing is needed. Numbering scheme for the PCGMA algorithm: As the OEQs have to be set up in

each PCGMA iteration , the fast assembly of the distributed matrix oA is essential

within the implementation of the iterative solver. Thus, using block-cyclically distributed

matrices, the local matrices loA must be locally computable without communication with

other processes. In addition, redundant computations of several cores should be avoided, or at least minimized. Redundant computations might occur during the expensive recursive computations of the Legendre functions, which are performed by orders (e.g. Holmes and Featherstone, 2002). Thus, it is important that all columns of oA

corresponding to gravity field coefficients of the same order are within a single local

matrix loA , or spread over as few local matrices as possible. The order of the parameters

in the parameter vector and thus in the columns of the design matrix is often referred to as numbering scheme. Choosing a standard order-wise numbering scheme for the global vector x and thus

for the columns of oA , would result in parameters of many different orders in the

columns of the local matrices ,lrcA , due to the block-cyclic distribution. Instead, an

ordering of the parameters depending on the parameters of the block-cyclic distribution

Page 14: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xiv Stud. Geophys. Geod., 58 (2014)

(br and bc) and the size of the computing grid (R and C) is determined at runtime. The

numbering scheme of the global matrix oA is driven by the fact that the virtual matrix

composed by the local matrices along a row of the processor grid

,0 ,1 ,2 ,ˆ l l l lo r r r rC

A A A A A follows the standard order-wise numbering

scheme. Thus, the global numbering scheme might look arbitrary at first view, but results from the fact that parameters of the same order are collected as good as possible in the same local matrices. This minimizes redundant computations in the recursive computations of Legendre functions. Reordering of block-cyclic distributed matrices: Choosing a configuration-

dependent numbering scheme allows that the matrix oA can be efficiently computed, but

results in a numbering scheme which differs to the fixed numbering scheme of the preconditioner ( N is only block-diagonal in the order-wise numbering). Before

applying the preconditioner to the residuals R (cf. Algorithm 2, lines 25 and 41), they have to be modified via a sequence of row interchanges to achieve - in view of the global matrix - the order-wise ordering of the preconditioner. As the numbering scheme

(parameter sequence) of R is configuration dependent, a reordering between two arbitrary numbering schemes is required. A flexible method via a comparison and sorting of the symbolic parameters is used to

derive an index vector from two general symbolic numbering schemes. Transforming the index vector to a serial sequence of permutations, a ScaLAPACK helping function for pivoting can be used to reorder rows and columns of a block-cyclic distributed matrix (routine pdlapiv). Computation of the update vector for the OEQ groups: For the groups which are

processed at the level of OEQs, the fast and efficient setup of the OEQs is essential. As described above, using the special numbering scheme, Legendre functions can be computed recursively with only a few redundant computations on different cores. Thus,

via the local (i.e. serial) computation of ,lrcA the global matrix oA is set up and the

computation of T 1:oo

o o o ollw

H A Q A and the update step can be performed using

PBLAS and ScaLAPACK routines. As oA is assembled in the proper numbering scheme,

the update step o

H H H can be directly performed. Note that it is more

efficient to set up ToA instead of oA , if column major ordering is used for serial matrix

storage. Computation of the update vector for the NEQ groups: The normal matrices nN

are available in specific parameter ordering and possibly only for a small subset of the parameters. The number of parameters in nN is denoted as mn. The contribution

n nw

N (cf. Algorithm 2, line 34) to the update vector was handled as follows:

Page 15: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xv

1. Perform a sequence of row interchanges of so that the parameter order of the

first mn rows corresponds to the parameter order of

n

N .

2. Perform the multiplication : 1: ,:n n n nw m H N , where 1: ,:nm

is the

subset of the first mn rows and all columns.

3. Extend n

H by zeros for parameters not contained in group n. Reorder the rows of

the matrix T

n

H 0 into the numbering scheme of original to obtain

n

H .

4. Perform the update n

H H H .

4. CLOSED-LOOP SIMULATION

To test the concept and to verify the implementation, a first small-scale closed-loop simulation is performed. In this section the simulation is summarized and the runtime behavior of a PCGMA iteration is analyzed.

4.1. Simulated data

The EGM2008 (Pavlis et al., 2012) model is used to generate 18 synthetic datasets. N = 3 datasets are created as NEQs. For this purpose, real satellite NEQs from GRACE and GOCE are used to generate synthetic normal vectors based on EGM2008 and noise is

added, according to the normal distribution 2 1,n n 0 N . The datasets used are

summarized in Table 1. O = 15 datasets are created as OEQs. Each observation group stands for a dataset from

a specific geographical region. Over land, gravity anomalies are simulated from EGM2008 for spherical harmonic degrees 3500. As observation error, uncorrelated noise

from a diagonal covariance matrix 2oo ooll o ll Q was added. The diagonal entries

,oolliiQ of the cofactor matrix are randomly generated for every data point following

Table 1. Data sets n based on NEQs used in the closed-loop simulation.The used real data covariance matrix (nn), the input for the synthetic right hand sides (nn), the spherical harmonic

resolution covered by the groups (d/o) and the variance component (n) used for data generation are

provided.

n Name NEQ/nn nn d/o n

1 GRACE ITG-Grace2010s EGM2008+ 2 1,n n 0 N 2180 1.054

2 GOCE SST EGM_TIM_RL03 EGM2008+ 2 1,n n 0 N 2100 0.954

3 GOCE SGG EGM_TIM_RL03 EGM2008+ 2 1,n n 0 N 2250 0.913

Page 16: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xvi Stud. Geophys. Geod., 58 (2014)

a uniform distribution between 0.9 and 1.1. This diagonal cofactor matrix is assumed to be known in the algorithm, whereas o is estimated from the data. Over the oceans, altimetry observations along a virtual CryoSat-2 orbit are simulated from EGM2008. As a simplification for this first study, the dynamic ocean topography was neglected, thus it is assumed altimetry directly observes the geoid. Four 30-day sub-cycles (sc1sc4) of different accuracies are introduced as individual observation groups for which a weight is estimated. The noise is generated as for the gravity anomaly datasets. All necessary information on the datasets is listed in Table 2 (accuracy levels in terms of standard deviations of the involved datasets). As the implementation should be verified via a comparison to a reference leasts-quares solution determined by the assembly and solution of the full system of NEQs (cf. Brockmann et al., 2013), a small resolution of lmax = 500

was chosen within this first simulation study.

Table 2. Data sets o introduced to the algorithm on the level of observations and used in the simulation. The observations were generated from EGM2008 for d/o 3500. no is the number of

observations, o is the variance component used in noise generation, ,oolliiQ is the creation

rule for the diagonal part of the cofactor matrix.

o Name Functional no o ,oolliiQ

1 Africa anomalies 63750 6.0 mGal 0.9, 1.1

2 Antarctica anomalies 155576 5.0 mGal 0.9, 1.1

3 Australia anomalies 17276 2.4 mGal 0.9, 1.1

4 Eurasia anomalies 163691 3.2 mGal 0.9, 1.1

5 Greenland anomalies 17061 3.6 mGal 0.9, 1.1

6 Indonesia anomalies 5095 5.2 mGal 0.9, 1.1

7 Island anomalies 497 2.8 mGal 0.9, 1.1

8 New Zealand anomalies 724 3.4 mGal 0.9, 1.1

9 North America anomalies 75048 2.0 mGal 0.9, 1.1

10 South America anomalies 38504 7.2 mGal 0.9, 1.1

11 North Pole Cap anomalies 23400 2.4 mGal 0.9, 1.1

12 CryoSat-2 sc1 geoid height 577272 1.0 cm 0.9, 1.1

13 CryoSat-2 sc2 geoid height 577293 3.1 cm 0.9, 1.1

14 CryoSat-2 sc3 geoid height 577479 1.2 cm 0.9, 1.1

15 CryoSat-2 sc4 geoid height 577211 3.3 cm 0.9, 1.1

Page 17: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xvii

4.2. Simulation results

The presented massive parallel implementation of the PCGMA algorithm is used to recover the 250 997 spherical harmonic coefficients and the 18 unknown weights (i.e. inverse variance components) from the 18 observation groups. As preconditioner a simple order-wise block-diagonal matrix is used. The blocks for the coefficients of the same order are either extracted from the NEQs or computed from the OEQs. The blocks are combined in a weighted addition. EGM96 (Lemoine et al., 1998) serves as initial solution for the iterative solver and coefficients above degree 360 are set to zero. Initial weights wi = 1.0 are used as starting values for the unknown weights for the iterative VCE. The

PCGMA algorithm is iterated max = 75 times with max = 4 VCE iterations. For each of the four VCE iterations, a reference solution is computed via the assembly

and solution of full NEQs. Fig. 1 shows the reference solution in terms of signal degree variances as solid black lines. The quality of the reference solution is reflected by the error degree variances estimated from the formal coefficient standard deviations (dashed lines). The four VCE iterations are shown in the different subplots. The convergence behavior of the PCGMA iterations with respect to the reference solutions in terms of degree error variances computed from coefficient differences are shown for each VCE iteration. The different PCGMA iterations - as difference to the reference solutions - are shown as gray lines. After 50 PCGMA iterations, the coefficients are determined within the formal coefficient accuracy for = 0 and after 20 iterations for = 1. As the reference solution

Fig. 1. Convergence behavior of PCGMA algorithm within the four VCE iterations. PCGMA iterations are shown as gray solid lines. The reference solutions (black solid lines) are the solutions obtained via assembly and solution of full NEQs of the same VCE iteration. The dashed black lines show the quality (formal errors) of the reference solution.

Page 18: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xviii Stud. Geophys. Geod., 58 (2014)

does not significantly change for the other VCE iterations, the quality of the start solution is below the coefficient accuracy. Nevertheless, the differences get smaller and smaller and are coming more and more close to the reference solution which is the direct least-squares solution. In addition to the convergence of the solution, the convergence of the estimated

variance components is shown as relative values , , , ,no no no no

in Fig. 2.

,no indicates the true values used for the generation of the synthetic data. For most

groups, the standard deviations converge to 102103 after iteration 2. Only the groups containing only a small number of observations (o = 6, 7, 8) show a slightly worse

behavior. Nevertheless, they are recovered better than 101 after the second VCE iteration. For the Monte Carlo based trace estimation five Monte Carlo samples are used per observation group (although a single sample should be sufficient, Koch and Kusche, 2002). Hence, the system of equations is solved for 91 (518 + normal vector n) right hand sides with the PCGMA algorithm.

4.3. Analysis of the algorithm

In addition to the verification of the implemented algorithm, the simulation scenario is used to analyse the performance of the algorithm as well as the massive parallel implementation on a supercomputer. This covers on the one hand runtime for the different steps within one PCGMA iteration and on the other hand the scalability of the implementation. That means the decrease of runtime compared to the increase of the number of computing cores is assessed. Note that all absolute (and relative) numbers and

Fig. 2. Relative convergence of square root of variance components ,no

, , ,no no no

for all groups n and o.

Page 19: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xix

all conclusions given are only valid for the scenario analyzed. Within this context, the word scenario means the whole setup: I) the problem to be solved itself (number of observations, number of datasets, number of unknown parameters, ...), thus the input to the algorithm and II) the technical configuration which is mainly the hardware and configuration (e.g. arrangement of computing grid, number of cores involved, parameters for block cyclic distribution). Runtime of a single PCGMA iteration: This section is used to briefly analyze the

runtime of a single PCGMA iteration step for the datasets introduced above. The given numbers are derived on the JUROPA supercomputer at FZ Jülich on a 24 24 core grid (576 cores) using the block size of distributed matrices br = bc = 80. On average one

PCGMA iteration takes 222 s. Not surprisingly, the major part is consumed by the

computation of the update vector H which takes 204 s (92%). The next time-intensive step in Algorithm 2 is the preconditioning step. However, it can be performed in less than

8 s. As the computation of H consists of two major parts, i.e. i) the processing of NEQs and ii) the handling of OEQs; these parts are considered separately. The processing of the NEQs takes 48 s (22%), where most of the time is spent for reading the NEQs from the disk. This is a part of the implementation which can easily be accelerated. It is possible to read all NEQ matrices once within a VCE iteration and perform the weighted combination only once. The consequence is that the combined NEQ has to be held in-core during all PCGMA iterations (which is possible as the NEQ groups are limited in size).

But still the major runtime is consumed during the computation of the update of H for the groups of OEQs. That is again not surprising, as the OEQs have to be set up for all observations, and for all parameters as it is assumed that the observation groups contain full signal content. The runtime for all O groups is 156 s, which corresponds to 70% of the whole runtime per iteration. Within this step the following three operations are performed:

i) Assembly oA : 39 s 25% (18% of overall time), ii) Computation of o

G

T: o

A : 66 s 42% (30% of overall time), and iii) Computation of To o o H A G :

51 s 33% (22% of overall time). Note that the use of a full covariance matrix of the

observations is an additional significant step in the computation of o

H which is

neglected here as the covariance matrix in the simulation is diagonal. Nevertheless, a full covariance matrix is forseen in the implementation. Scaling behavior: For the scenario above a scaling test is performed. One PCGMA

iteration is performed varying the size of the computing core grid R C. Only quadratic grids are used from 4 4 (16 cores) up to 44 44 (1936 cores). An almost linear scaling behavior is obtained from 16 to 576 cores. A still good scaling behavior is observed up to 1024 cores (64 times more cores compared to 16 cores, runtime is 1/45 compared to the runtime using 16 cores), cf. Fig. 3. Again, note that this simulation scenario is a small one and not really representative for the scenarios the implementation is actually designed for.

Page 20: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xx Stud. Geophys. Geod., 58 (2014)

5. CONCLUSIONS

The Preconditioned Conjugate Gradient Multiple Adjustment algorithm was implemented in a massive parallel HPC environment. In addition to the solution, the algorithm is extended to estimate unknown variance components (relative weights) for an arbitrary number of (complementary) observation groups. Standard concepts and libraries from scientific and HPC were intensively used to derive an implementation portable on different computing clusters. A small-scale simulation scenario was used to verify the implementation and the concept. The closed-loop simulation was used to demonstrate that the implementation works on

computing grids of different sizes with up to 2000 cores involved. It was demonstrated that the iterative solver converges to the least-squares solution and that the integrated VCE converges, too. The simulation shows that the block-diagonal preconditioner works, but still at least 50 iterations within the first VCE iteration (and 20 for the others) are necessary to obtain convergence. It is not answered yet, if the convergence could be accelerated using alternative (but probably still block-diagonal) preconditioners. Further open questions that arose throughout the simulation studies are: Is it sufficient

to extract the order-wise blocks from the preprocessed NEQs or is there a better option? Convergence for the degrees to which the NEQ groups contribute (d/o 2250) is not better than for the other degrees, although the residuals computed from the full NEQs enter the residuals (and thus search direction) in the PCGMA algorithm. The poorest convergence

Fig. 3. Scaling behavior of the implementation, normalized to the runtime of 4 4 cores. The solid line shows the ideal behavior. Circles show the scaling behavior for the processing of OEQs and NEQs, whereas the stars show the scaling behavior for the processing of OEQs only (excluding the intensive I/O of matrices in each iteration).

Page 21: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xxi

rate can be observed for the degrees to which NEQ and OEQ groups contribute

(overlapping area, degree ≈ 150250). Does the composed preconditioner contradict the

data there (e.g. due to the neglected correlations from the satellite NEQs in the preconditioner)? Is it useful to apply the estimated weights in the preconditioner (this was done in the simulation shown)? The blockdiagonal approximation gets closer to N but the block-diagonal matrices for every group have to be stored (just to get an idea: for d/o 2000 each block diagonal matrix needs 40 GB ) and newly combined within each VCE iteration. Is this additional memory requirement useful or is a preconditioner with constant weights sufficient? Going to the next level, i.e. significantly increasing the number of parameters towards

our target resolution of at least d/o 2000, would definitely benefit from a better convergence. Nevertheless, first numerical tests have shown that the current implementation can handle the estimation of d/o 2000 models, but a full closed-loop simulation (as done here for the d/o 500 model) was not performed yet. Acknowledgements: The study and the publication was founded within the DFG project

G/O2000+ (SCHU2305/3-1). The computations were performed on the JUROPA supercomputer at FZ Jülich. The computing time was granted by the John von Neumann Institute for Computing (project HBN15). The Open Source HPC computing libraries ATLAS, OpenMPI, BLACS, PBLAS and ScaLAPACK are gratefully acknowledged. The ITG-Grace2010s NEQs are provided online (http://www.igg.uni-bonn.de/apmg/index.php?id=itg-grace2010). We would like to thank the editor N. Sneeuw, O. Baur and an additional anonymous reviewer for the valuable comments which helped to improve the quality of the manuscript significantly.

References

Alkhatib H., 2007. On Monte Carlo Methods with Applications to the Current Satellite Gravity Missions. Ph.D. Thesis, Institute of Geodesy and Geoinformation, University of Bonn, Bonn, Germany.

Baboulin M., Giraud L., Gratton S. and Langou J., 2009. Parallel tools for solving incremental dense least squares problems. Application to space geodesy. J. Algorithms Comput. Technol., 19, 413433.

Balaji P., Bland W., Dinan J., Goodell D., Gropp W., Latham R., Pena A. and Thakur R., 2013. MPICH User’s Guide. 3rd Edn. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL.

Baur O., 2009. Tailored least-squares solvers implementation for high-performance gravity field research. Comput. Geosc., 35, 548556, DOI: 10.1016/j.cageo.2008.09.004.

Baur O., Austen G. and Kusche J., 2008. Efficient GOCE satellite gravity field recovery based on least-squares using QR decomposition. J. Geodesy, 82, 207221, DOI: 10.1007/s00190-007-0171-z.

Blackford L.S., Choi J., Cleary A., D’Azevedo E., Demmel J., Dhillon I., Dongarra J., Hammarling S., Henry G., Petitet A., Stanley K., Walker D. and Whaley R., 1997. ScaLAPACK Users Guide. 2nd Edition. SIAM, Philadelphia, PA.

Page 22: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xxii Stud. Geophys. Geod., 58 (2014)

Boxhammer C., 2006. Effiziente numerische Verfahren zur sphärischen harmonischen Analyse von Satellitendaten. Ph.D. Thesis, Institute of Geodesy and Geoinformation, University of Bonn, Bonn, Germany (in German).

Boxhammer C. and Schuh W.D., 2006. GOCE gravity field modeling: computational aspects - free kite numbering scheme. In: Flury J., Rummel R., Reigber C., Rothacher M., Boedecker G. and Schreiber U. (Eds.), Observation of the Earth System from Space. Springer Verlag, Berlin, Heidelberg, Germany, 209224, DOI: 10.1007/3-540-29522-4_15.

Brockmann J.M. and Schuh W.D., 2010. Fast variance component estimation in GOCE data processing. In: Mertikas S.P. (Ed.), Gravity, Geoid and Earth Observation. International Association of Geodesy Symposia 135, Springer Verlag, Berlin, Heidelberg, Germany, 185193, DOI: 10.1007/978-3-642-10634-7 25.

Brockmann J.M., Kargoll B., Krasbutter I., Schuh W.D. and Wermuth M., 2010. GOCE data analysis: From calibrated measurements to the global Earth gravity field. In: Flechtner F.M., Gruber Th., Güntner A., Mandea M., Rothacher M., Schöne T. and Wickert J. (Eds.), System Earth via Geodetic-Geophysical Space Techniques. Advanced Technologies in Earth Sciences, Springer Verlag, Berlin, Heidelberg, Germany, 213229, DOI: 10.1007/978-3-642-10228-8_17.

Brockmann J.M., Roese-Koerner L. and Schuh W.D., 2013. Use of high performance computing for the rigorous estimation of very high degree spherical harmonic gravity field models. In: Marti U. (Ed.), Gravity Geoid and Height Systems. International Association of Geodesy Symposia 141, Springer Verlag, Berlin, Heidelberg, Germany (in print).

Bruinsma S., Förste C., Abrikosov O., Marty J.C., Rio M.H., Mulet S. and Bonvalot S., 2013. The new ESA satellite-only gravity field model via the direct approach. Geophys. Res. Lett., 40, 36073612, DOI: 10.1002/grl.50716.

Dongarra J.J. and Whaley R.C., 1997. A User’s Guide to the BLACS v1.1. Technical Report 94, LAPACK Working Note (http://www.netlib.org/lapack/lawnspdf/lawn94.pdf).

ESA, 1999. The four Candidate Earth Explorer Core Missions - Gravity Field and Steady-State Ocean Circulation Mission. ESA Report SP-1233(1), ESA Publications Division, c/o ESTEC, Noordwijk, The Netherlands.

Farahani H., Ditmar P., Klees R., Liu X., Zhao Q. and Guo J., 2013. The static gravity field model DGM-1S from GRACE and GOCE data: computation, validation and an analysis of GOCE missions added value. J. Geodesy, 87, 843867, DOI 10.1007/s00190-013-0650-3.

Fecher T., Pail R. and Gruber T., 2011. Global gravity field determination by combining GOCE and complementary data. In: Ouwehand L. (Ed.), Proceedings of the 4th International GOCE User Workshop. ESA Publication SP-696, ESA Publications Division, c/o ESTEC, Noordwijk, The Netherlands.

Förste C., Schmidt R., Stubenvoll R., Flechtner F., Meyer U., Knig R., Neumayer H., Biancale R., Lemoine J.M., Bruinsma S., Loyer S., Barthelmes F. and Esselborn S., 2008. The GeoForschungsZentrum Potsdam/Groupe de Recherche de Geodesie Spatiale satellite-only and combined gravity field models: EIGEN-GL04S1 and EIGEN-GL04C. J. Geodesy, 82, 331346, DOI: 10.1007/s00190-007-0183-8.

Förste C., Bruinsma S., Flechtner F., Marty J., Lemoine J.M., Dahle C., Abrikosov O., Neumayer H., Biancale R., Barthelmes F. and Balmino G., 2012. A preliminary update of the direct approach GOCE processing and a new release of EIGEN-6C. San Francisco, no. 0923 in AGU Fall Meeting (http://icgem.gfz-potsdam.de/ICGEM/documents/Foerste-et-al-AGU_2012.pdf).

Page 23: A concept for the estimation of high-degree gravity field models in a high performance computing environment

Estimating gravity field models using HPC

Stud. Geophys. Geod., 58 (2014) xxiii

Förstner W., 1979. Ein Verfahren zur Schätzung von Varianz- und Kovarianzkomponenten. Allgemeine Vermessungsnachrichten, 11, 446453 (in german).

Gabriel E., Fagg G., Bosilca G., Angskun T., Dongarra J., Squyres J., Sahay V., Kambadur P., Barrett B., Lumsdaine A., Castain R., Daniel D., Graham R. and Woodall T., 2004. Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller D., Kacsuk P. and Dongarra J. (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science 3241, Springer Verlag, Berlin, Germany, 97104, DOI: DOI:10.1007/978-3-540-30218-6_19.

Gunter B.C. and Van De Geijn R.A., 2005. Parallel out-of-core computation and updating of the QR factorization. ACM Trans. Math. Softw., 31, 6078, DOI: 10.1145/1055531.1055534.

Heiskanen W. and Moritz H., 1993. Physical Geodesy, Reprint Edition. Institute of Physical Geodesy, Technical University, Graz.

Hestenes M. and Stiefel E., 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409436.

Holmes S.A. and Featherstone W.E., 2002. A unified approach to the clenshaw summation and the recursive computation of very high degree and order normalized associated Legendre functions. J. Geodesy, 76, 279299, DOI: 10.1007/s00190-002-0216-2.

Hutchinson M., 1990. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat., 19, 433450, DOI: 10.1080/03610919008812866.

Ilk K.H., Kusche J. and Rudolph S., 2002. A contribution to data combination in ill-posed downward continuation problems. J. Geodyn., 33, 7599, DOI: 10.1016/S0264-3707(01) 00056-4.

Koch K., 1999. Parameter Estimation and Hypothesis Testing in Linear Models, 2nd Edition. Springer Verlag, Berlin, Heidelberg, Germany.

Koch K. and Kusche J., 2002. Regularization of geopotential determination from satellite data by variance components. J. Geodesy, 76, 259268, DOI: 10.1007/s00190-002-0245-x.

Konopliv A.S., Park R.S., Yuan D.N., Asmar S.W., Watkins M.M., Williams J.G., Fahnestock E., Kruizinga G., Paik M., Strekalov D., Harvey N., Smith D.E. and Zuber M.T., 2013. The JPL lunar gravity field to spherical harmonic degree 660 from the GRAIL primary mission. J. Geophys. Res. Planets, 118, 14151434, DOI: 10.1002/jgre.20097.

Lemoine F.G., Kenyon S.C., Factor J.K., Trimmer R.G., Pavlis N.K., Chinn D.S., Cox C.M., Klosko S.M., Luthcke S.B., Torrence M.H., Wang Y.M., Williamson R.G., Pavlis E.C., Rapp R.H. and Olson T.R., 1998. The Development of the Joint NASA GSFC and the National Imagery and Mapping Agency (NIMA) Geopotential Model EGM96. NASA Technical Report TP-1998-206861, National Aeronautics and Space Administration, Goddard Space Flight Center, Greenbelt, Maryland, USA.

Lemoine F.G., Goossens S., Sabaka T.J., Nicholas J.B., Mazarico E., Rowlands D.D., Loomis B.D., Chinn D.S., Caprette D.S., Neumann G.A., Smith D.E. and Zuber M.T., 2013. High-degree gravity models from GRAIL primary mission data. J. Geophys. Res. Planets, 118, 16761698, DOI: 10.1002/jgre.20118.

MPI-Forum, 2009. MPI: A Message-Passing Interface Standard 2.2. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf.

Pail R. and Plank G., 2003. Comparison of numerical solution strategies for gravity field recovery from GOCE SGG observations implemented on a parallel platform. Adv. Geosci., 1, 3945.

Page 24: A concept for the estimation of high-degree gravity field models in a high performance computing environment

J.M. Brockmann et al.

xxiv Stud. Geophys. Geod., 58 (2014)

Pail R., Goiginger H., Schuh W.D., Höck E., Brockmann J.M., Fecher T., Gruber T., Mayer-Gürr T., Kusche J., Jäggi A. and Rieser D., 2010. Combined satellite gravity field model GOCO01S derived from GOCE and GRACE. Geophys. Res. Lett., 37, L20,314, DOI: 10.1029/2010GL044906.

Pavlis N.K., Holmes S.A., Kenyon S. and Factor J.K., 2012. The development and evaluation of the Earth Gravitational Model 2008 (EGM2008). J. Geophys. Res., 117, B04406, DOI: 10.1029 /2011JB008916.

Reigber C., Lühr H. and Schwintzer P., 2002. Champ mission status. Adv. Space Res., 30, 129134, DOI: 10.1016/S0273-1177(02)00276-4.

Ries J.C., Bettadpur S., Poole S. and Richter T., 2011. Mean background gravity fields for GRACE processing. GRACE Science Team Meeting, Austin, TX (ftp://ftp.csr.utexas.edu /pub/grace/GIF48/GSTM2011_Ries_etal.pdf).

Rummel R., Yi W. and Stummer C., 2011. GOCE gravitational gradiometry. J. Geodesy, 85, 777790, DOI: 10.1007/s00190-011-0500-0.

Schuh W.D., Brockmann J.M., Kargoll B., Krasbutter I. and Pail R., 2010. Refinement of the stochastic model of GOCE scientific data and its effect on the in-situ gravity field solution. In: Lacoste-Francis H. (Ed.), Proceedings of ESA Living Planet Symposium, SP-686, ESA Communications, ESTEC, Noordwijk, The Netherlands.

Schuh W.D., 1996. Tailored Numerical Solution Strategies for the Global Determination of the Earth’s Gravity Field. Mitteilungen der geodätischen Institute der Technischen Universität Graz, Vol 81. Technical Univcersity Graz, Graz, Austria (ftp://skylab.itg.uni-bonn.de/schuh /Separata/schuh_96.pdf).

Schwarz H., 1970. Die Methode der konjugierten Gradienten in der Ausgleichsrechnung. Zeitschrift für Vermessungswesen, 95, 130140 (in German).

Schwintzer P., Reigber C., Bode A., Kang Z., Zhu S.Y., Massmann F.H., Biancale R., Balmino G., Lemoine J.M., Moynot B., Marty J.C., Barlier F. and Boudon Y., 1997. Long-wavelength global gravity field models: GRIM4-s4, GRIM4-c4. J. Geodesy, 71, 189208, DOI: 10.1007/s001900050087.

Sidani M. and Harrod B., 1996. Parallel Matrix Distributions: Have We Been Doing It All Right? Technical Report 116, LAPACK Working Note, (http://www.netlib.org/lapack /lawnspdf/lawn116.pdf).

Tapley B., Bettadpur S., Ries J., Thompson P. and Watkins M., 2004. GRACE Measurements of mass variability in the Earth system. Science, 305, 503505, DOI: 10.1126/science.1099192.

Xie J., 2005. Implementation of Parallel Least-Squares Algorithms for Gravity Field Estimation. Report 474, Department of Civil and Environmental Engineering and Geodetic Science, The Ohio State University, Columbus, OH.

Zuber M.T., Smith D.E., Watkins M.M., Asmar S.W., Konopliv A.S., Lemoine F.G., Melosh H.J., Neumann G.A., Phillips R.J., Solomon S.C., Wieczorek M.A., Williams J.G., Goossens S.J., Kruizinga G., Mazarico E., Park R.S. and Yuan D.N., 2013. Gravity field of the moon from the Gravity Recovery And Interior Laboratory (GRAIL) mission. Science, 339, 668671, DOI: 10.1126/science.1231507.