Upload
hakiet
View
217
Download
0
Embed Size (px)
Citation preview
Large Scale Parallel Reservoir Large Scale Parallel Reservoir Simulations on a Linux PCSimulations on a Linux PC--ClusterCluster
Walid A. HabiballahWalid A. HabiballahM. Ehtesham HayderM. Ehtesham Hayder
Saudi AramcoSaudi Aramco
Fourth LCI International Conference on Linux ClustersFourth LCI International Conference on Linux ClustersClusterWorld Conference and ExpoClusterWorld Conference and Expo
San Jose, June 23San Jose, June 23--26, 200326, 2003
OutlineOutline
•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco
•• Formulation of the simulator (POWERS)Formulation of the simulator (POWERS)
•• PC PC -- ClustersClusters
•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster
•• Results Results
•• ConclusionsConclusions
Seismic Interpretation
Geological Model
Core Analysis & well logs
Simulation Model
Well Production& Injection Data
Demands for large scale reservoir simulations at Demands for large scale reservoir simulations at
Saudi Aramco is rapidly increasingSaudi Aramco is rapidly increasing
•• to reduce simulation turn around timesto reduce simulation turn around times•• to resolve heterogeneity in reservoirsto resolve heterogeneity in reservoirs•• to couple reservoir and facility simulationsto couple reservoir and facility simulations•• for well location and trajectory planningfor well location and trajectory planning
Other potential benefits of large scale reservoir Other potential benefits of large scale reservoir simulations to Saudi Aramcosimulations to Saudi Aramco
•• Longer time simulationsLonger time simulations•• Possibility of automatic or semiPossibility of automatic or semi--automatic history matchingautomatic history matching•• Computational steering Computational steering •• Exploitations of technological advances in computing Exploitations of technological advances in computing •• ……………………
Growth of POWERS million+ cell simulation modelsGrowth of POWERS million+ cell simulation models
411
19
3944
51
33
010203040506070
2000 2001 2002 2003 2004 2005 2006
Num
ber o
f Mod
els
Projected computing capacity required for Projected computing capacity required for reservoir simulations reservoir simulations
1 2 3
1921
23
15
0
5
10
15
20
25
30
2000 2001 2002 2003 2004 2005 2006
Nor
mal
ized
Com
pute
Cap
acity
13
5
1316
18
21
0
5
10
15
20
2 5
2000 2001 2002 2003 2004 2005 2 006
Projected storage capacity required for reservoir Simulations
Nor
mal
ized
Sto
rage
Cap
acity
OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco
•• Formulation of the simulator (POWERS)Formulation of the simulator (POWERS)
•• PC PC -- ClustersClusters
•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster
•• Results Results
•• ConclusionsConclusions
PParallelarallel OOilil WWatatERER SSimulatorimulator
•• It is a parallel reservoir simulatorIt is a parallel reservoir simulator•• Functionality includes:Functionality includes:
–– BlackBlack--OilOil–– CompositionalCompositional–– Dual Porosity Dual PermeabilityDual Porosity Dual Permeability–– Local Grid RefinementLocal Grid Refinement–– Comprehensive Well ManagementsComprehensive Well Managements
qUt
Sa
a a=∇+
∂
∂•
)(φρ
)( HgPKSK
U aaa
a
aaa ∇−∇−= ρρ
µ
1=∑ aS
Governing Equations:Governing Equations:
Numerical algorithmNumerical algorithm•• Implicit FormulationImplicit Formulation•• NewtonNewton--Raphson for nonlinearityRaphson for nonlinearity•• Generalized Conjugate Residual to solve linearized equationsGeneralized Conjugate Residual to solve linearized equations•• RedRed--Black orderingBlack ordering
OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco
•• Formulation of the simulatorFormulation of the simulator
•• PC PC -- ClustersClusters
•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster
•• Results Results
•• ConclusionsConclusions
Performance of different processors using the Performance of different processors using the SPECfp2000 benchmarkSPECfp2000 benchmark
322
532
843 872
0
200
400
600
800
1000
Power 3-375M Hz IBM -NH-2
SGI M IPS-Pro500M Hz-R14K
Power 4-1 GHzIBM -Reggata
Pent ium IV2.4GHz
SPEC
fp20
00 N
umbe
rs
Bandwidth and Latency of various switches in Bandwidth and Latency of various switches in PCPC--Clusters configurationsClusters configurations
5400Quadrics
7421Myrinet
26-12128Gigabit
15012Fast Ethernet
Latency(microsecond)
Bandwidth(Mbytes/sec)Switch
0
20
40
60
80
1997 1998 1999 2000 2001 2002
Num
ber o
f Ins
talla
tions Intel
HPAlphaAMD
Number of installation of NOW Clusters in the Top500 supercomputers list
Status of PCStatus of PC--Cluster Computations at Saudi AramcoCluster Computations at Saudi Aramco
•• Seismic processing activities in Saudi Aramco are taking Seismic processing activities in Saudi Aramco are taking full advantages of PCfull advantages of PC--cluster technology and abandoned cluster technology and abandoned other systems.other systems.
•• Reservoir simulation models are now being migrated to PCReservoir simulation models are now being migrated to PC--Cluster platform (this studyCluster platform (this study).).
Quadrics Cluster (Cluster_QQuadrics Cluster (Cluster_Q))
128 GBTotal RAM2.0 GHzClock64 (Xeon)Processors (CPUs)32Nodes
Configuration of the Quadrics Cluster
Quadrics Cluster (Cluster_QQuadrics Cluster (Cluster_Q))
RMSJob submission
MPICH - ElanCommunication
Vampir, VampirTrace, VtuneOptimization Tools
TotalviewDebugger
Intel ifc (v.6), iccCompiler
Redhat Linux 7.2OS
Software on Quadrics Cluster
Q u a d r ic sN e t w o r k
E th e r n e tN e t w o r k
S er ia l C o n c e n tra to r
W o rk sta ti o n
W o rk sta tio n
W o rk sta tio n
Q u a d ri csF a s t E th er n e t
S er ia l
Q u a d r ic sN e t w o r k
E th e r n e tN e t w o r k
S er ia l C o n c e n tra to r
W o rk sta ti o n
W o rk sta tio n
W o rk sta tio n
Q u a d ri csF a s t E th er n e t
S er ia l
Q u a d ri csF a s t E th er n e t
S er ia l
Quadrics Cluster (Cluster_Q)Quadrics Cluster (Cluster_Q)
Myrinet Cluster (Cluster_M)Myrinet Cluster (Cluster_M)
20 TBExternal Storage
1024 GB (1 TB) Total RAM
2.4 GHzClock
512 (Xeon)Processors (CPUs)
256Nodes
2 Sub-Clusters
Configuration of Myrinet Cluster
Myrinet Cluster (Cluster_MMyrinet Cluster (Cluster_M))
PBSJob submission
MPICH - MyrinetCommunication
Vampir, VampirTrace, VtuneOptimization Tools
TotalviewDebugger
Intel ifc (v.7), iccCompiler
Redhat Linux 7.3OS
Software on Myrinet Cluster
Master
Management
10 TB10 TB
MyrinetSwitch
to the router
32 nodes 32 nodes 32 nodes 32 nodes
MyrinetNetwork
Ethernet Network
Master
Management
10 TB10 TB
MyrinetSwitch
to the router
32 nodes 32 nodes 32 nodes 32 nodesMaster
Management
10 TB10 TB
MyrinetSwitch
to the router
32 nodes 32 nodes 32 nodes 32 nodes
MyrinetNetwork
Ethernet Network
Myrinet Cluster (Cluster_M)Myrinet Cluster (Cluster_M)
OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco
•• Formulation of the simulatorFormulation of the simulator
•• PC PC -- ClustersClusters
•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster
•• Results Results
•• ConclusionsConclusions
Parallel Implementations of POWERSParallel Implementations of POWERS
•• POWERS was originally written in Connection Machine Fortran. POWERS was originally written in Connection Machine Fortran.
•• Code was later ported to IBM SP using OpenMP and MPI calls.Code was later ported to IBM SP using OpenMP and MPI calls.
•• No modification of the numerical algorithm was made in this or iNo modification of the numerical algorithm was made in this or in n the previous port. the previous port.
Summary of PCSummary of PC--Cluster porting activitiesCluster porting activities
•• Recompiled the code on cluster using the IntelRecompiled the code on cluster using the Intel--ifc compilerifc compiler•• Expanded MPI parallelization into a second dimensionExpanded MPI parallelization into a second dimension•• Benchmarked interconnect switches (MericomBenchmarked interconnect switches (Mericom--Myrinet and Myrinet and
QuadricsQuadrics--Elan) using code kernels and the full codeElan) using code kernels and the full code•• Revised MPI communication algorithms in the codeRevised MPI communication algorithms in the code•• Tuned performanceTuned performance
Node 0
Node n
Node 1
J-Dimension
MPI
CPU 0 CPU mThreadsI-D
imen
sion
Original Implementation Original Implementation
IBMIBM--NH2NH2 PCPC--ClusterCluster
Porting from IBM NH2 to PC Porting from IBM NH2 to PC -- ClusterCluster
J-Dimension I-Dimension
I-DimensionMPI-Communication
CPU 0CPU 1
Node 1
CPU 0
CPU 1
Node m
CPU 0
CPU 1
Node 0
J-Dimension
Grid Partitioning
OpenMPThreads
New ImplementationNew Implementation
OutlineOutline•• Reservoir simulationsReservoir simulations
•• PC PC -- ClustersClusters
•• Formulation of the simulatorFormulation of the simulator
•• Implementation of the simulator on PCImplementation of the simulator on PC--ClusterCluster
•• ResultsResults
•• ConclusionsConclusions
242000.5Model IV
624,0000.2Model III
502,9609.6Model II
31443.7Model I
Years Simulated
Numberof wells
Grid Size(million cells)
Model
Test CasesTest Cases
Model I Execution time on PC Clusters and IBM_NH2Model I Execution time on PC Clusters and IBM_NH2
0
3000
6000
9000
12000
16 CPU 32 CPU
Cluster_QCluster_MIBM NH2
Tim
e(s
ec)
A comparison of platformsA comparison of platforms
0
0.5
1
1.5Ti
me
(hr)
IBM NH2 IBMRegatta
Cluster_Q
Execution time of Model I on PC ClustersExecution time of Model I on PC Clusters
1000
2000
3000
4000
16 32
Number of Processors
Tim
e (s
ec)
Cluster_Q(v.6)Cluster_M(v.6)Cluster_M(v.7)
Model I Simulation timeModel I Simulation time
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
16 32 48 64 80 96 120Number of Processors
IBM NH2 Cluster_M
Tim
e(s
ec)
Scalability of Model IScalability of Model I
16
32
48
64
80
96
16 32 48 64 80 96
Number of Processors
Spe
ed u
p
IdealNH2 (total)NH2 (core)Cluster (total)Cluster (core)
Simulations of Model IISimulations of Model II
0
5
10
15
20
25
30
IBM NH2 PC Cluster96 Processors
Tim
e (h
r)
Simulations of Model II on Cluster _MSimulations of Model II on Cluster _M
0
4
8
12
16
64 96 125
Number of Processors
Tim
e (h
r)
Simulations of Model II on Cluster _MSimulations of Model II on Cluster _M
64
96
128
64 96 128Number of Processors
IdealActualComputation
Spe
ed u
p
Timing of Code Sections in Model ITiming of Code Sections in Model I
0
1000
2000
3000
16 32 48 64 96 120
Number of Processors
Tim
e (s
ec)
CommSynchCore Total
Model III execution timeModel III execution time
4,6753,7205,230---50
7,7004,8605,7008,00020
13,5007,0907,3509,90010
98,35045,83045,83056,1001
IBM NH2(sec.)
Cluster_M(opt)(sec.)
Cluster_M(orig)(sec.)
Cluster_Q(sec.)
CPU
Effect of optimized communication algorithm for Model I Effect of optimized communication algorithm for Model I
SimulationSimulation
0.7
0.75
0.8
0.85
0.9
0.95
1
16 32 64 96 120
Number of Processors
T_op
t / T
_orig
Total Time
Core Time
Communication Overhead in Model III SimulationCommunication Overhead in Model III Simulation
0
500
1000
1500
2000
2500
3000
3500
10 20 50
Number of Processors
Tim
e (s
ec)
Reg Comm (orig)Reg Comm (opt)Irreg Comm (orig)Irreg Comm (opt)
Timing with Different Decompositions in Models I and IVTiming with Different Decompositions in Models I and IV
0
1000
2000
3000
32x1 16x2 8x4 4x8 2x16 1x32
Blocking
Tim
e (s
ec)
Model I total Model I coreModel IV totalModel IV core
OutlineOutline•• Reservoir simulations at Saudi AramcoReservoir simulations at Saudi Aramco
•• PC PC -- ClustersClusters
•• Formulation of the simulatorFormulation of the simulator
•• Implementation of the Simulator on PCImplementation of the Simulator on PC--ClusterCluster
•• Results Results
•• ConclusionsConclusions
ConclusionsConclusions
Based on our observations in this work, we conclude Based on our observations in this work, we conclude the followingsthe followings::
•• PCPC--Clusters are highly attractive option for parallel reservoir Clusters are highly attractive option for parallel reservoir simulations in terms of performance and cost. simulations in terms of performance and cost.
•• Simulations of massive reservoir models (in the order of 10 Simulations of massive reservoir models (in the order of 10 million cells) are possible at affordable computing cost.million cells) are possible at affordable computing cost.
•• Serial computation, synchronization and I/O issues become Serial computation, synchronization and I/O issues become significant at high number of processorssignificant at high number of processors..
ConclusionsConclusions•• Commodity switches can be effectively used in high Commodity switches can be effectively used in high
performance computing platform for reservoir simulation.performance computing platform for reservoir simulation.
•• Both hardware and software tools such as compilers are Both hardware and software tools such as compilers are constantly improving which in turn will reduce execution constantly improving which in turn will reduce execution times and computing costs on PCtimes and computing costs on PC--clusters.clusters.
•• The computing hardware is continuously changing and we will The computing hardware is continuously changing and we will follow up and monitor to take advantage of new and emerging follow up and monitor to take advantage of new and emerging technologies.technologies.