Upload
lytruc
View
252
Download
7
Embed Size (px)
Citation preview
Parallel DYNARE
Parallel DYNARE ToolboxFP7 Funded
Project MONFISPOL Grant no.: 225149
Marco RattoEuropean Commission, Joint Research Centre, Ispra, ITALY
with Ivano Azzini, Houtan Bastani, Sebastien Villemot
November 8, 2010
Parallel DYNARE
Outline
1 Introduction
2 The DYNARE environment
3 Installation and utilizationRequirementsThe user perspectiveThe Developers perspective
4 Parallel DYNARE: testing
5 Conclusions
Parallel DYNARE
Intro
Introduction
Parallelize portions of code that require a minimal (i.e. start-endcommunication) or no communications between differentprocesses, denoted in the literature as “embarrassingly parallel”(Goffe and Creel, 2008; Barney, 2009), with the following criteria:
provide the necessary input data to any sequence;
distribute the workload, automatically balancing between thecomputational resources;
collect the output data;
ensure the coherence of the results with the original serialexecution.
Parallel DYNARE
Intro
...
n=2;
m=10^6;
Matrix= zeros(n,m);
for i=1:n,
Matrix(i,:)=rand(1,m);
end,
Mse= Matrix;
...
Example 1
Parallel DYNARE
Intro
...
n=2;
m=10^6;
<provide to CPU1 and CPU2 input data m>
-
<Execute on CPU1> <Execute on CPU2>
Matrix1 = zeros(1,m); Matrix2 = zeros(1,m);
Matrix1(1,:)=rand(1,m); Matrix2(1,:)=rand(1,m);
save Matrix1 save Matrix2
-
retrieve Matrix1 and Matrix2
Mpe(1,:) = Matrix1;
Mpe(2,:) = Matrix2;
Example 2
Parallel DYNARE
Intro
The for cycle has disappeared and it has been split into twoseparated sequences that can be executed in parallel on two CPUs.We have the same result (Mpa=Mse) but the computational timecan be reduced up to 50%.
Parallel DYNARE
The DYNARE environment
The DYNARE environment IRoutines suitable for parallelization:
1 the RW Metropolis with multiple chains(random walk metropolis hastings.m)(independent metropolis hastings.m);
2 the diagnostic tests(McMCDiagnostics.m);
3 posterior IRF’s (posteriorIRF.m).
4 posterior statistics for filtered and smoothed variables,forecasts, smoothed shocks, etc..(prior posterior statistics.m).
5 plotting utility (pm3.m).
Parallel DYNARE
The DYNARE environment
The DYNARE environment II
MATLAB does not support multi-threads, without the use (andpurchase) of MATLAB Distributed Computing Toolbox.
When the execution should start in parallel:
1 pass the control to the operating system(Windows/Linux);
2 launched concurrent threads (i.e. MATLABinstances) on different processors/cores/machines;
3 when parallel is concluded the control is given backto the original MATLAB session .
Parallel DYNARE
Installation and utilization
Requirements
Requirements I
For a Windows grid:
1 a standard Windows network (SMB) must be in place;
2 PsTools (Russinovich, 2009) must be installed in the path ofthe master Windows machine;
3 the Windows user on the master machine has to be user ofany other slave machine in the cluster, and that user will beused for the remote computations.
Parallel DYNARE
Installation and utilization
Requirements
Requirements II
For a UNIX grid
1 SSH must be installed on the master and on the slavemachines;
2 the UNIX user on the master machine has to be user of anyother slave machine in the cluster, and that user will be usedfor the remote computations;
3 SSH keys must be installed so that the SSH connection fromthe master to the slaves can be done without passwords, orusing an SSH agent.
Parallel DYNARE
Installation and utilization
The user perspective
The interface I
Put the configuration of the cluster in a config file different fromthe MOD file, and to trigger the parallel computation withoption(s) on the dynare command line. The configuration file isdesigned as follows:
be in a standard location
$HOME/.dynare under Unix;c:\Documents and Setting\<username>\ApplicationData\dynare.ini on Windows;
have provisions for other Dynare configuration parametersunrelated to parallel computation
allow to specify several clusters, each one associated with anickname;
Parallel DYNARE
Installation and utilization
The user perspective
The interface IINode Options type default Win Unix
Local Remote Local Remote
Name string (stop) * * * *CPUnbr integer (stop) * * * *
or arrayComputerName string (stop) * *UserName string empty * *Password string empty *RemoteDrive string empty *RemoteDirectory string empty * *DynarePath string emptyMatlabOctavePath string emptySingleCompThread boolean true
Cluster Options type default Meaning Required
Name string empty name of the node *Members string empty list of members in this cluster *
Parallel DYNARE
Installation and utilization
The user perspective
Configuration file syntax (Unix)
[cluster]
Name = c1
Members = n1 n2 n3
[cluster]
Name = c2
Members = n2 n3
[node]
Name = n1
ComputerName = localhost
CPUnbr = 1
[node]
Name = n2
ComputerName = karaba.cepremap.org
CPUnbr = 5
UserName = houtanb
RemoteDirectory = /home/houtanb/Remote
DynarePath = /home/houtanb/dynare/matlab
MatlabOctavePath = matlab
[node]
Name = n3
ComputerName = hal.cepremap.ens.fr
CPUnbr = 3
UserName = houtanb
RemoteDirectory = /home/houtanb/Remote
DynarePath = /home/houtanb/dynare/matlab
MatlabOctavePath = matlab
Parallel DYNARE
Installation and utilization
The user perspective
Configuration file syntax (Win)
[cluster]
Name = local
Members = n1
[cluster]
Name = vonNeumann
Members = n2
[cluster]
Name = c2
Members = n1 n2
[node]
Name = n1
ComputerName = localhost
CPUnbr = 4
[node]
Name = n2
ComputerName = vonNeumann
CPUnbr = [4:6]
UserName = COMPUTOWN\John
Password = *****
RemoteDrive = C
RemoteDirectory = dynare_calcs\Remote
DynarePath = c:\dynare\matlab
MatlabOctavePath = matlab
Parallel DYNARE
Installation and utilization
The user perspective
DYNARE command line options
conffile=<path>: specify the location of the configurationfile if it is not standard
parallel: trigger the parallel computation using the firstcluster specified in config file
parallel=<clustername>: trigger the parallel computation,using the given cluster
parallel slave open mode: use the leaveSlaveOpen modein the cluster
parallel test: just test the cluster, dont actually run theMOD file
Parallel DYNARE
Installation and utilization
The Developers perspective
Core functions I
masterParallel is the entry point to the parallelization system:
It is called from the master computer, at thepoint where the parallelization system should beactivated;
all file exchange through the filesystem isconcentrated in this masterParallel routine;
there are two modes of parallel execution,triggered by optionparallel slave open mode;
slaveParallel.m/fParallel.m: are the top-level functions tobe run on every slave;
Parallel DYNARE
Installation and utilization
The Developers perspective
Core functions II
fMessageStatus.m: provides the core for simple message passingduring slave execution and typically replaces calls towaitbar.m;
closeSlave.m is the utility that sends a signal to remote slavesto close themselves;
Parallel DYNARE
Installation and utilization
The Developers perspective
Utilities
AnalyseComputationalEnviroment.m;
InitializeComputationalEnviroment.m;
distributeJobs.m;
generalized routines:
dynareParallelDelete.m : generalized delete;dynareParallelDir.m : generalized dir;dynareParallelGetFiles.m : generalized copy FROM slaves TO
master machine;dynareParallelMkDir.m : generalized mkdir on remote
machines;dynareParallelRmDir.m : generalized rmdir on remote
machined;dynareParallelSendFiles.m : generalized copy TO slaves
FROM master machine;
Parallel DYNARE
Parallel DYNARE: testing
Parallel DYNARE: testing
We present here all tests performed with Windows Xp/Matlab.The model used for testing is a modification of Lubik (2003): asmall scale open economy DSGE model with 6 observed variables,6 endogenous variables and 19 parameters to be estimated.
Parallel DYNARE
Parallel DYNARE: testing
Test 1.
Test 1
Bi-processor machine (Fujitsu Siemens, Celsius R630) powered withan Intel(R) Xeon(TM) CPU 2.80GHz Hyper Treading Technology.
Parallel DYNARE
Parallel DYNARE: testing
Test 1.
3
Figure 1. X=Runs / Y=Computational time in minutes
Figure 2. X=Runs Y=Time Gain
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200000 400000 600000 800000 1000000 1200000
Figure 2 plots the computation time gain against the chain length. The time gain
0
100
200
300
400
500
600
700
800
0 200000 400000 600000 800000 1000000 1200000
one processortwo processors
Figure: Computational time (in minutes) versus chain length for theserial and parallel implementation (Metropolis with two chains).
Parallel DYNARE
Parallel DYNARE: testing
Test 1.
3
Figure 1. X=Runs / Y=Computational time in minutes
Figure 2. X=Runs Y=Time Gain
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200000 400000 600000 800000 1000000 1200000
Figure 2 plots the computation time gain against the chain length. The time gain
0
100
200
300
400
500
600
700
800
0 200000 400000 600000 800000 1000000 1200000
one processortwo processors
Figure: Reduction of computational time (i.e. the ‘time gain’) using theparallel coding versus chain length. The time gain is computed as(Ts − Tp)/Tp, where Ts and Tp denote the computing time of the serialand parallel implementations respectively.
Parallel DYNARE
Parallel DYNARE: testing
Test 2.
Test 2 I
Test different hardware platforms, with chain lengths of 1,000,000runs on the following hardware platforms:
Single processor machine: Intel(R) Pentium4(R) CPU3.40GHz with Hyper Treading Technology (Fujitsu-SiemensScenic Esprimo);
Bi-processor machine: two CPU’s Intel(R) Xeon(TM)2.80GHz Hyper Treading Technology (Fujitsu-Siemens,Celsius R630);
Dual core machine: Intel Centrino T2500 2.00GHz Dual Core(Fujitsu-Siemens, LifeBook S Series).
Parallel DYNARE
Parallel DYNARE: testing
Test 2.
Test 2 II
Machine Single-processor Bi-processor Dual core
Parallel 8:01:21 7:02:19 5:39:38Serial 10:12:22 13:38:30 11:02:14Speed-Up rate 1.2722 1.9381 1.9498Ideal Speed-UP rate ∼1.5 2 2
Table: Trail results with normal PC operation. Computing time expressedin h:m:s. Speed-up rate is computed as Ts/Tp, where Ts and Tp are thecomputing times for the serial and parallel implementations.
Parallel DYNARE
Parallel DYNARE: testing
Test 2.
Test 2 III
Environment Computing time Speed-up ratew.r.t. Table 1
Parallel Waitbar Not Visible 5:06:00 1.06
Parallel waitbar Not Visible,Real-time Process priority, Un-plugged network cable.
4:40:49 1.22
Table: Trail results with different software configurations (optimizedoperating environment for computational requirements).
Parallel DYNARE
Parallel DYNARE: testing
Test 3
Test 3
We used a very simple computer class, quite diffuse nowadays:Netbook personal Computer. In particular we used the Dell Mini10 with Processor Intel Atom Z520 (1,33 GHz, 533 MHz), 1 GB diRAM (with Hyper-trading).
Parallel DYNARE
Parallel DYNARE: testing
Test 3
8
Chains Length Time Serial Time Parallel 105 85 151 1005 246 287 5005 755 599 10005 1246 948 15005 1647 1250 20005 2068 1502 25005 2366 1675 Table3. Computational Time using all the parallel functions in DYNARE and the Open/Close strategy. We can also plot the results in table 3. We call this situation Complete Parallel …
Complete Parallel
0
500
1000
1500
2000
2500
105 1005 5005 10005 15005 20005 25005
MH Runs
Com
puta
tiona
l Tim
e (s
ec)
SerialParallel
Figure 3. The plot of data in Table 3. Figure 3. show as …
Figure: Computational Time (s) versus Metropolis length, running all theparallelized functions in DYNARE and the basic parallel implementation(the ‘Open/Close’ strategy). (Lubik, 2003).
Parallel DYNARE
Parallel DYNARE: testing
Test 3
10
Figure 4. show as … Comments … As reported in … we also introduced a new computational Matlab instances are always open … Chain Lenght
Computational Times: Complete Parallel
Computational Times: Partial Parallel
105 103 951005 209 1225005 504 205
10005 915 30615005 1203 32020005 1506 33425005 1611 322
Table5. Computational Time with Always Open strategy. We can also plot and compare the results in table 5 with results in table 3 and 4.:
Open/Close vs AlwaysOpenComplete Parallel
0
200
400
600
800
1000
1200
1400
1600
1800
105 1005 5005 10005 15005 20005 25005
MH Runs
Com
puta
tiona
l Tim
e (s
ec)
Open/CloseAlways Open
Figure 5. The compared computational time for “complete” parallel . Figure: Comparison of the ‘Open/Close’ strategy and the ‘Always-open’strategy. Computational Time (s) versus Metropolis length, running allthe parallelized functions in DYNARE (Lubik, 2003).
Parallel DYNARE
Parallel DYNARE: testing
Test 4
Tets 4 I
Here we increase the dimension of the test model, using theQUEST III model (Ratto et al., 2009), using a more powerfulNotebook Samsung Q 45 with an Dual core Processor IntelCentrino.
Parallel DYNARE
Parallel DYNARE: testing
Test 4
12
Chains Length
Time Serial
Time Parallel
105 98 951005 398 2555005 1463 890
10005 2985 165520005 4810 281530005 6630 402240005 7466 524650000 9263 6565
Table6. Computational Time using all the parallel function involved and the Open/Close strategy.
Complete Parallel - Open/Close
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
105 1005 5005 10005 20005 30005 40005 50000
MH Runs
Com
puta
tiona
l Tim
e (s
ec)
SerialParallel
Chains Length
Comp Time Serial
Comp Time Parallel
Figure: Computational Time (s) versus Metropolis length, running all theparallelized functions in DYNARE and the basic parallel implementation(the ‘Open/Close’ strategy). (Ratto et al., 2009).
Parallel DYNARE
Parallel DYNARE: testing
Test 4
14
105 66 601005 273 1175005 871 332
10005 1588 46020005 2791 47030005 3963 49240005 5292 47950000 6624 498
Table8. Computational Time with Always Open strategy.
Open/Close vs AlwaysOpenComplete Parallel
0
1000
2000
3000
4000
5000
6000
7000
105 1005 5005 10005 20005 30005 40005 50000
MH Runs
Com
puta
tiona
l Tim
e (s
ec)
Open/CloseAlways Open
Figure: Comparison of the ‘Open/Close’ strategy and the ‘Always-open’strategy. Computational Time (s) versus Metropolis length, running allthe parallelized functions in DYNARE (Ratto et al., 2009).
Parallel DYNARE
Conclusions
Conclusions
The methodology identified for parallelizing MATLAB codes withinDYNARE proved to be effective in reducing the computationaltime of the most extensive loops. This methodology is suitable for‘embarrassingly parallel’ codes, requiring only a minimalcommunication flow between slave and master threads. Theparallel DYNARE is built around a few ‘core’ routines, that act as asort of ‘parallel paradigm’. Based on those routines, parallelizationof expensive loops is made quite simple for DYNARE developers.A basic message passing system is also provided, that allows themaster thread to monitor the progress of slave threads. The testmodel ls2003.mod is available in the folder \tests\parallel ofthe DYNARE distribution, that allows running parallel examples.
Parallel DYNARE
Conclusions
Bibliography I
B Barney. Introducing To Parallel Computing (Tutorial). LawrenceLivermore National Laboratory, 2009.
W.L. Goffe and M. Creel. Multi-core CPUs, clusters, and gridcomputing: A tutorial. Computational economics, 32(4):353–382, 2008.
T. Lubik. Investment spending, equilibrium indeterminacy, and theinteractions of monetary and fiscal policy. Technical ReportEconomics Working Paper Archive 490, The Johns HopkinsUniversity, 2003.
Parallel DYNARE
Conclusions
Bibliography II
Marco Ratto, Werner Roeger, and Jan in ’t Veld. QUEST III: Anestimated open-economy DSGE model of the euro area withfiscal and monetary policy. Economic Modelling, 26(1):222 –233, 2009. doi: DOI:10.1016/j.econmod.2008.06.014. URLhttp://www.sciencedirect.com/science/article/B6VB1-4TC8J5F-1/2/7f22da17478841ac5d7a77d06f13d13e.
M. Russinovich. PsTools v2.44, 2009. available at MicrosoftTechNet, http://technet.microsoft.com/en-us/sysinternals/bb896649.aspx.