Fluent Parallel

Chapter 31. Parallel Processing

The following sections describe the parallel-processing features of FLUENT.

• Section 31.1: Introduction to Parallel Processing

• Section 31.2: Starting Parallel FLUENT on a Windows System

• Section 31.3: Starting Parallel FLUENT on a Linux/UNIX System

• Section 31.4: Checking Network Connectivity

• Section 31.5: Partitioning the Grid

• Section 31.6: Checking and Improving Parallel Performance

31.1 Introduction to Parallel Processing

The FLUENT serial solver manages file input and output, data storage, and flow fieldcalculations using a single solver process on a single computer. FLUENT’s parallel solverallows you to compute a solution by using multiple processes that may be executing onthe same computer, or on different computers in a network. Figures 31.1.1 and 31.1.2illustrate the serial and parallel FLUENT architectures.

Parallel processing in FLUENT involves an interaction between FLUENT, a host process,and a set of compute-node processes. FLUENT interacts with the host process and thecollection of compute nodes using a utility called cortex that manages FLUENT’s userinterface and basic graphical functions.

Parallel FLUENT splits up the grid and data into multiple partitions, then assigns eachgrid partition to a different compute process (or node). The number of partitions is anintegral multiple of the number of compute nodes available to you (e.g., 8 partitions for 1,2, 4, or 8 compute nodes). The compute-node processes can be executed on a massively-parallel computer, a multiple-CPU workstation, or a network cluster of computers.

i In general, as the number of compute nodes increases, turnaround timefor the solution will decrease. However, parallel efficiency decreases as theratio of communication to computation increases, so you should be carefulto choose a large enough problem for the parallel machine.

FLUENT uses a host process that does not contain any grid data. Instead, the hostprocess only interprets commands from FLUENT’s graphics-related interface, cortex.

c© Fluent Inc. September 29, 2006 31-1

Parallel Processing

Solver

CORTEX

Data:CellFaceNode

File Input/OutputDisk

Figure 31.1.1: Serial FLUENT Architecture

Compute Node 0

Socket

HOST

CORTEX

COMPUTE NODES

Compute Node 2 Compute Node 3

Compute Node 1

File Input/OutputDisk

Data:CellFaceNode FLUENT

MPIFLUENT

MPI

FLUENTMPI

FLUENTMPI

Data:CellFaceNode

Data:CellFaceNode

Data:CellFaceNode

MP

FLUENTMPI

Figure 31.1.2: Parallel FLUENT Architecture

31-2 c© Fluent Inc. September 29, 2006

31.1 Introduction to Parallel Processing

The host distributes those commands to the other compute nodes via a socket inter-connect to a single designated compute node called compute-node-0. This specializedcompute node distributes the host commands to the other compute nodes. Each computenode simultaneously executes the same program on its own data set. Communicationfrom the compute nodes to the host is possible only through compute-node-0 and onlywhen all compute nodes have synchronized with each other.

Each compute node is virtually connected to every other compute node, and relies oninter-process communication to perform such functions as sending and receiving arrays,synchronizing, and performing global operations (such as summations over all cells).Inter-process communication is managed by a message-passing library. For example,the message-passing library could be a vendor implementation of the Message PassingInterface (MPI) standard, as depicted in Figure 31.1.2.

All of the parallel FLUENT processes (as well as the serial process) are identified bya unique integer ID. The host collects messages from compute-node-0 and performsoperations (such as printing, displaying messages, and writing to a file) on all of thedata, in the same way as the serial solver.

Recommended Usage of Parallel FLUENT

The recommended procedure for using parallel FLUENT is as follows:

1. Start up the parallel solver. See Section 31.2: Starting Parallel FLUENT on aWindows System and Section 31.3: Starting Parallel FLUENT on a Linux/UNIXSystem for details.

2. Read your case file and have FLUENT partition the grid automatically upon loadingit. It is best to partition after the problem is set up, since partitioning has somemodel dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh andshell-conduction encapsulation).

Note that there are other approaches for partitioning, including manual partitioningin either the serial or the parallel solver. See Section 31.5: Partitioning the Gridfor details.

3. Review the partitions and perform partitioning again, if necessary.See Section 31.5.6: Checking the Partitions for details on checking your partitions.

4. Calculate a solution. See Section 31.6: Checking and Improving Parallel Performancefor information on checking and improving the parallel performance.


Parallel Processing

31.2 Starting Parallel FLUENT on a Windows System

You can run FLUENT on a Windows system using either command line options or thegraphical user interface.

Information about starting FLUENT on a Windows system is provided in the followingsections:

• Section 31.2.1: Starting Parallel FLUENT on a Windows System Using CommandLine Options

• Section 31.2.2: Starting Parallel FLUENT on a Windows System Using the Graph-ical User Interface

• Section 31.2.3: Starting Parallel FLUENT with the Fluent Launcher

• Section 31.2.4: Starting Parallel FLUENT with the Microsoft Job Scheduler (win64Only)

i See the separate installation instructions for more information about in-stalling parallel FLUENT for Windows. The startup instructions belowassume that you have properly set up the necessary software, based on theappropriate installation instructions.

Additional information about installation issues can also be found in the FrequentlyAsked Questions section of the Fluent Inc. User Services Center (www.fluentusers.com).

31.2.1 Starting Parallel FLUENT on a Windows System Using Command LineOptions

To start the parallel version of FLUENT using command line options, you can use thefollowing syntax in a Command Prompt window:

fluent version -tnprocs [-pinterconnect] [-mpi=mpi type] -cnf=hosts file-path\\computer name\share name

where

• version must be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp,or 3ddp).

• -path\\computer name\share name specifies the computer name and the sharednetwork name for the Fluent.Inc directory in UNC form.

For example, if FLUENT has been installed on computer1 and shared as fluent.inc,then you should replace share name by the UNC name for the shared directory,\\computer1\fluent.inc.



• -pinterconnect (optional) specifies the type of interconnect. The ethernet inter-connect is used by default if the option is not explicitly specified. See Table 31.2.1,Table 31.2.2, and Table 31.2.3 for more information.

• -mpi=mpi type (optional) specifies the type of MPI. If the option is not specified,the default MPI for the given interconnect will be used (the use of the default MPIis recommended). The available MPIs for Windows are shown in Table 31.2.2.

• -cnf=hosts file specifies the hosts file, which contains a list of the computers onwhich you want to run the parallel job. If the hosts file is not located in thedirectory where you are typing the startup command, you will need to supply thefull pathname to the file.

You can use a plain text editor such as Notepad to create the hosts file. The onlyrestriction on the filename is that there should be no spaces in it. For example,hosts.txt is an acceptable hosts file name, but my hosts.txt is not.

Your hosts file (e.g., hosts.txt) might contain the following entries:

computer1

computer2

i The last entry must be followed by a blank line.

If a computer in the network is a multiprocessor, you can list it more than once.For example, if computer1 has 2 CPUs, then, to take advantage of both CPUs, thehosts.txt file should list computer1 twice:

computer1

computer1

computer2

• -tnprocs specifies the number of processes to use. When the -cnf option is present,the hosts file argument is used to determine which computers to use for the paralleljob. For example, if there are 8 computers listed in the hosts file and you want torun a job with 4 processes, set nprocs to 4 (i.e., -t4) and FLUENT will use the first4 machines listed in the hosts file.

For example, the full command line to start a 3d parallel job on the first 4 computerslisted in a hosts file called hosts.txt is as follows:

fluent 3d -t4 -cnf=hosts.txt -path\\computer1\fluent.inc


Parallel Processing

The default interconnect (ethernet) and the default communication library (mpich2)will be used since these options are not specified.

i The first time that you try to run FLUENT in parallel, a separate CommandPrompt will open prompting you to verify the current Windows accountthat you are logged into. Press the <Enter> key if the account is correct.If you have a new account password, enter in your password and pressthe <Enter> key, then verify your password and press the <Enter> key.Once the username and password have been verified and encrypted intothe Windows Registry, then FLUENT parallel will launch.

The supported interconnects for dedicated parallel ntx86 and win64 Windows machines,the associated MPIs for them, and the corresponding syntax are listed in Tables 31.2.1-31.2.3:

Table 31.2.1: Supported Interconnects for the Windows Platform

Platform Processor Architecture Interconnects*Windows 32-bit

64-bitntx86

win64

ethernet (default)ethernet (default),infiniband, myrinet

(*) Node processes on the same machine communicate by shared memory.

Table 31.2.2: Available MPIs for Windows Platforms

MPI Syntax(flag)

CommunicationLibrary

Notes

mpich2 -mpi=mpich2 MPICH2 MPI (1), (2)ms -mpi=ms Microsoft MPI (1), (2)net -mpi=net socket (1), (2)

(1) Used with Shared Memory Machine (SSM) where the memory is shared between the processors ona single machine.(2) Used with Distributed Memory Machine (DMM) where each processor has it’s own memory associatedwith it.



Table 31.2.3: Supported MPIs for Windows Architectures (Per Interconnect)

Architecture Ethernet Myrinet Infinibandntx86 mpich2 (default),

net

- -

win64 mpich2 (default),ms, net

ms ms

31.2.2 Starting Parallel FLUENT on a Windows System Using the GraphicalUser Interface

To run parallel FLUENT using the graphical user interface, type the usual startup com-mand without a version (i.e., fluent -t2), and then use the Select Solver panel (Fig-ure 31.3.1) to specify the parallel architecture and version information.

File −→Run...

Figure 31.2.1: The Select Solver Panel


Parallel Processing

Perform the following actions:

1. Under Versions, specify the 3D or 2D single- or double-precision version by turningthe 3D and Double Precision options on or off, and turn on the Parallel option.

2. Under Options, select the interconnect or system in the Interconnect drop-down list.The Default setting is recommended, because it selects the interconnect that shouldprovide the best overall parallel performance for your dedicated parallel machine.For a symmetric multi-processor (SMP) system, the Default setting uses sharedmemory for communication.

If you prefer to select a specific interconnect, you can choose either Ethernet/SharedMemory MPI, Myrinet, Infiniband, or Ethernet via sockets. For more informationabout these interconnects, see Table 31.2.1, Table 31.2.2, and Table 31.2.3.

3. Set the number of CPUs in the Processes field.

4. (optional) Specify the name of a file containing a list of machines, one per line, inthe Hosts File field.

5. Click the Run button to start the parallel version. No additional setup is requiredonce the solver starts.

i The first time that you try to run FLUENT in parallel, a separate CommandPrompt will open prompting you to verify the current Windows accountthat you are logged into. Press the <Enter> key if the account is correct.If you have a new account password, enter in your password and pressthe <Enter> key, then verify your password and press the <Enter> key.Once the username and password have been verified and encrypted intothe Windows Registry, then FLUENT parallel will launch.



31.2.3 Starting Parallel FLUENT with the Fluent Launcher

The Fluent Launcher (Figure 31.2.2), is a stand-alone Windows application that allowsyou to launch FLUENT jobs from a computer with a Windows operating system to acluster of computers. Settings made in the Fluent Launcher panel (Figure 31.2.2) areused to create a FLUENT parallel command. This command will then be distributed toyour network where typically another application may manage the session(s).

You can create a shortcut on your desktop pointing to the Fluent Launcher executable at

FLUENT_INC\fluent6.x\launcher\launcher.exe

where FLUENT INC is the root path to where FLUENT is installed, (i.e., usually theFLUENT INC environment variable) and x indicates the release version of FLUENT).

Figure 31.2.2: The Fluent Launcher Panel


Parallel Processing

The Fluent Launcher allows you to perform the following:

• Set options for your FLUENT executable, such as indicating a specific release or aversion number.

• Set parallel options, such as indicating the number of parallel processes (or if youwant to run a serial process), and an MPI type to use for parallel computations.

• Set additional options such as specifying the name and location of the currentworking folder or a journal file.

When you are ready to launch your serial or parallel application, you can check the valid-ity of the settings using the Check button (messages are displayed in the Log Informationwindow). When you are satisfied with the settings, click the Launch button to start theparallel processes.

To return to your default settings for the Fluent Launcher, based on your current FLUENTinstallation, click the Default button. The fields in the Fluent Launcher panel will returnto their original settings.

When you are finished using the Fluent Launcher, click the Close button. Any settingsthat you have made in the panel are preserved when you re-open the Fluent Launcher.



Setting the Path to FLUENT

You need to specify the location of where FLUENT is installed on your system using theFluent.Inc Path field, or click ... to browse through your directory structure to locate theinstallation folder (trying to use the UNC path if applicable). Once set, various fields inthe Fluent Launcher (e.g., Release, MPI Types, etc.) are automatically populated with theavailable options, depending on the FLUENT installations that are available.

Setting Executable Options with the Fluent Launcher

Under Executable Options, you can indicate the release number, as well as the version ofthe FLUENT executable that you want to run.

Specifying a FLUENT Release

Depending on what FLUENT releases are available in the Fluent.Inc Path, you can specifythe number associated with a given release in the Release list. The list is populated withthe FLUENT release numbers that are available in the Fluent Inc. Path field.

Specifying the Version of FLUENT

You can specify the dimensionality and the precision of the FLUENT product using theVersion list. There are four possible choices: 2d, 2ddp, 3d, or 3ddp. The 2d and 3d

options provide single-precision results for two-dimensional or three-dimensional prob-lems, respectively. The 2ddp and 3ddp options provide double-precision results for two-dimensional or three-dimensional problems, respectively.

Setting Parallel Options with the Fluent Launcher

Under Parallel Options, you can indicate the number of FLUENT processes, the specificcomputer architecture you want to run the processes on, the type of MPI, as well as alisting of computer nodes that you want to use in the calculations.

Specifying the Number of FLUENT Processes

You can specify the number of FLUENT processes in the Number of Processes field. Youcan use the drop-down list to select from pre-set values of serial, 1, 2, 4, 8, 16,

32, or 64, or you can manually enter the number into the field yourself (e.g., 3, 10, etc.).The range of parallel processes ranges from 1 to 1024. If Number of Processes is equal to1, you might want to consider running the FLUENT job using the serial setting.


Parallel Processing

Specifying the Computer Architecture

You can specify the computer architecture using the Architecture drop-down list. De-pending on the selected release, the available options are ntx86 and win64.

Specifying the MPI Type

You can specify the MPI to use for the parallel computations using the MPI Types field.The list of MPI types varies depending on the selected release and the selected architec-ture. There are several options, based on the operating system of the parallel cluster.For more information about the available MPI types, see Tables 31.2.1-31.2.2.

Specifying the List of Machines to Run FLUENT

Specify the hosts file using the Machine List or File field. You can use the ... button tobrowse for a hosts file, or you can enter the machine names directly into the text field.Machine names can be separated either by a comma or a space.

Setting Additional Options with the Fluent Launcher

Under Additional Options, you can specify a working folder and/or a journal file. Inaddition, you can specify whether to use the Microsoft Scheduler or whether to usebenchmarking options.

Specifying the Working Folder

You can specify the path of your current working directory using the Working Folder fieldor click ... to browse through your directory structure. Note that a UNC path cannot beset as a working folder.

Specifying a Journal File

You can specify the path and name of a journal file using the Journal File field or click... to browse through your directory structure to locate the file. Using the journal file,you can automatically load the case, compile any user-defined functions, iterate until thesolution converges, and write results to a output file.



Specifying Whether or Not to Use the Microsoft Job Scheduler (win64 MS MPI Only)

For the Windows 64-bit MS MPI only, you can specify that you want to use the Mi-crosoft Job Scheduler (see Section 31.2.4: Starting Parallel FLUENT with the MicrosoftJob Scheduler (win64 Only)) by selecting the Use Microsoft Scheduler check box. Onceselected, you can then enter a machine name in the with Head Node text field. If youare running FLUENT on the head node, then you can keep the field empty. This op-tion translates into the proper parallel command line syntax for using the Microsoft JobScheduler.

Specifying Whether or Not to Use the Fluent Launcher for Benchmarking

If you are creating benchmark cases using parallel FLUENT, you can enable the Benchmarkcheck box. This option involves having several benchmarking-related files available onyour machine. If you are missing any of the files, the Fluent Launcher informs you ofwhich files you need and how to locate them.

Fluent Launcher Example

The Fluent Launcher takes the options that you have specified and uses those settings tocreate a FLUENT parallel command. This command (displayed in the Log Informationwindow) will then be distributed to your network where typically another applicationmay manage the session(s).

For example, if, in the Fluent Launcher panel, you specified your Fluent.Inc Path to be\\my_computer\Fluent.Inc and under Executable Options, you selected 6.3.20 for theRelease, and 3d for the Version. Then, under Parallel Options, you selected 2 for thenumber of Processes, win64 for the Architecture, selected mpich2 in the MPI Types field,then entered the location of a Z:\fluent.host file in the Machine List or File field. Ifyou click the Check button, the command is displayed in the Log Information window.When you click the Launch button, the Fluent Launcher would then generate the followingparallel command:

\\my_computer\Fluent.Inc\ntbin\win64\fluent.exe 3d -r6.3.20 -t2 -mpi=mpich2

-cnf=Z:\fluent.hosts -awin64


Parallel Processing

31.2.4 Starting Parallel FLUENT with the Microsoft Job Scheduler (win64Only)

The Microsoft Job Scheduler allows you to manage multiple jobs and tasks, allocatecomputer resources, send tasks to compute nodes, and monitor jobs, tasks, and computenodes.

FLUENT currently supports Windows XP as well as the Windows Server operating sys-tem (win64 only). The Windows Server operating system includes a “compute clusterpackage” (CCP) that combines the Microsoft MPI type (msmpi) and Microsoft Job Sched-uler. FLUENT provides a means of using the Microsoft Job Scheduler using the followingflag in the parallel command:

-ccp head-node-name

where -ccp indicates the use of the compute cluster package, and head-node-name indi-cates the name of the head node of the computer cluster.

For example, if you want to use the Job Scheduler, the corresponding command syntaxwould be:

fluent 3d -t2 -ccp head-node-name

Likewise, if you do not want to use the Job Scheduler, the following command syntaxcan be used with msmpi:

fluent 3d -t2 -pmsmpi -cnf=host

i The first time that you try to run FLUENT in parallel, a separate CommandPrompt will open prompting you to verify the current Windows accountthat you are logged into. If you have a new account password, enter in yourpassword and press the <Enter> key. If you want FLUENT to rememberyour password on this machine, press the Y key and press the <Enter> key.Once the username and password have been verified and encrypted intothe Windows Registry, then FLUENT parallel will launch.

i If you do not want to use the Microsoft Job Scheduler, but you still wantto use msmpi, you will need to stop the Microsoft Compute Cluster MPIService through the Control Panel, and you need to start your own versionof SMPD (the process manager for msmpi on Windows) using the followingcommand on each host on which you want to run FLUENT:

start smpd -d 0


31.3 Starting Parallel FLUENT on a Linux/UNIX System


You can run FLUENT on a Linux/UNIX system using either command line options orthe graphical user interface.

Information about starting FLUENT on a Linux/UNIX system is provided in the followingsections:

• Section 31.3.1: Starting Parallel FLUENT on a Linux/UNIX System Using Com-mand Line Options

• Section 31.3.2: Starting Parallel FLUENT on a Linux/UNIX System Using theGraphical User Interface

• Section 31.3.3: Setting Up Your Remote Shell and Secure Shell Clients

31.3.1 Starting Parallel FLUENT on a Linux/UNIX System Using CommandLine Options

To start the parallel version of FLUENT using command line options, you can use thefollowing syntax in a command prompt window:

fluent version -tnprocs [-pinterconnect] [-mpi=mpi type] -cnf=hosts file

where

• version must be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp,or 3ddp).

• -pinterconnect (optional) specifies the type of interconnect. The ethernet inter-connect is used by default if the option is not explicitly specified. See Table 31.3.1,Table 31.3.2, and Table 31.3.3 for more information.

• -mpi=mpi type (optional) specifies the type of MPI. If the option is not specified,the default MPI for the given interconnect will be used (the use of the default MPIis recommended). The available MPIs for Linux/UNIX are shown in Table 31.3.2.

• -cnf=hosts file specifies the hosts file, which contains a list of the computers onwhich you want to run the parallel job. If the hosts file is not located in thedirectory where you are typing the startup command, you will need to supply thefull pathname to the file.

You can use a plain text editor to create the hosts file. The only restriction onthe filename is that there should be no spaces in it. For example, hosts.txt is anacceptable hosts file name, but my hosts.txt is not.


Parallel Processing

Your hosts file (e.g., hosts.txt) might contain the following entries:

computer1

computer2

i The last entry must be followed by a blank line.

If a computer in the network is a multiprocessor, you can list it more than once.For example, if computer1 has 2 CPUs, then, to take advantage of both CPUs, thehosts.txt file should list computer1 twice:

computer1

computer1

computer2

• -tnprocs specifies the number of processes to use. When the -cnf option is present,the hosts file argument is used to determine which computers to use for the paralleljob. For example, if there are 10 computers listed in the hosts file and you wantto run a job with 5 processes, set nprocs to 5 (i.e., -t5) and FLUENT will use thefirst 5 machines listed in the hosts file.

For example, to use the Myrinet interconnect, and to start the 3D solver with 4 computenodes on the machines defined in the text file called fluent.hosts, you can enter thefollowing in the command prompt:

fluent 3d -t4 -pmyrinet -cnf=fluent.hosts

Note that if the optional -cnf=hosts file is specified, a compute node will be spawnedon each machine listed in the file hosts file. (If you enter this optional argument, do notinclude the square brackets.)

The supported interconnects for parallel Linux/UNIX machines are listed below (Ta-ble 31.3.1, Table 31.3.2, and Table 31.3.3), along with their associated communicationlibraries, the corresponding syntax, and the supported architectures:



Table 31.3.1: Supported Interconnects for Linux/UNIX Platforms (Per Plat-form)

Platform Processor Architecture Interconnects/Systems*Linux 32-bit

64-bit

64-bit Itanium

lnx86

lnamd64

lnia64

ethernet (default), infiniband,myrinet

ethernet (default), infiniband,myrinet, crayxethernet (default), infiniband,myrinet, altix

Sun 32-bit64-bit

ultra

ultra 64

vendor** (default), ethernetvendor** (default), ethernet

SGI 32-bit64-bit

irix65 mips4

irix65 mips4 64


HP 32-bit64-bit PA-RISC64-bit Itanium

hpux11

hpux11 64

hpux11 ia64

vendor** (default), ethernetvendor** (default), ethernetvendor** (default), ethernet

IBM 32-bit64-bit

aix51

aix51 64


(*) Node processes on the same machine communicate by shared memory.(**) vendor indicates a proprietary vendor interconnect. The specific proprietary interconnects that aresupported are dictated by those that the vendor’s MPI supports.


Parallel Processing

Table 31.3.2: Available MPIs for Linux/UNIX Platforms

MPI Syntax(flag)

CommunicationLibrary

Notes

hp -mpi=hp HP MPI General purpose for SMPsand clusters

intel -mpi=intel Intel MPI General purpose for SMPsand clusters

mpich2 -mpi=mpich2 MPICH2 MPI-2 implementation fromArgonne NationalLaboratory. For both SMPsand Ethernet clusters

mpich -mpi=mpich MPICH1 Legacy MPI from ArgonneNational Laboratory

mpichmx -mpi=mpichmx MPICH-MX Only for Myrinet MXclusters

mvapich -mpi=mvapich MVAPICH Only for Infiniband clusterssgi -mpi=sgi SGI MPI for Altix Only for SGI Altix systems

(SMP); must start FLUENTon a system where parallelnode processes are to run

cray -mpi=cray Cray MPI for XD1 Only for Cray XD1 systemsvendor -mpi=vendor Vendor MPInet -mpi=net socket



Table 31.3.3: Supported MPIs for Linux/UNIX Architectures (Per Intercon-nect)

Architecture Ethernet Myrinet Infiniband ProprietarySystems

lnx86 hp (default),mpich2, net

hp hp -

lnamd64 hp (default),intel, net

hp (default),mpich-mx

hp (default),intel,mvapich

cray [for -pcrayx]

lnia64 hp (default),intel, net

hp hp (default),intel

sgi[for -paltix]

aix51 64 vendor (default),mpich, net

- - vendor[for -pvendor]

hpux11 64 vendor (default),net


hpux11 ia64 vendor (default),net


irix65 mpis4 64 vendor (default),mpich, net


ultra 64 vendor (default),mpich, net


aix51 vendor (default),mpich, net


hpux11 vendor (default),mpich, net


irix65 mpis4 vendor (default),mpich, net


ultra vendor (default),mpich, net



Parallel Processing

31.3.2 Starting Parallel FLUENT on a Linux/UNIX System Using the GraphicalUser Interface

To run parallel FLUENT using the graphical user interface, type the usual startup com-mand without a version (i.e., fluent), and then use the Select Solver panel (Figure 31.3.1)to specify the parallel architecture and version information.

File −→Run...

Figure 31.3.1: The Select Solver Panel



Perform the following steps:

1. Under Versions, specify the 3D or 2D single- or double-precision version by turningthe 3D and Double Precision options on or off, and turn on the Parallel option.

2. Under Options, select the interconnect or system in the Interconnect drop-down list.The Default setting is recommended, because it selects the interconnect that shouldprovide the best overall parallel performance for your dedicated parallel machine.For a symmetric multi-processor (SMP) system, the Default setting uses sharedmemory for communication.

If you prefer to select a specific interconnect, you can choose either Ethernet/SharedMemory MPI, Myrinet, Infiniband, Altix, Cray, or Ethernet via sockets. For more infor-mation about these interconnects, see Table 31.3.1, Table 31.3.2, and Table 31.3.3.

3. Set the number of CPUs in the Processes field.

4. (optional) Specify the name of a file containing a list of machines, one per line, inthe Hosts File field.

5. Click the Run button to start the parallel version. No additional setup is requiredonce the solver starts.

31.3.3 Setting Up Your Remote Shell and Secure Shell Clients

For cluster computing on Linux or UNIX systems, most parallel versions of FLUENT willneed the user account set up such that you can connect to all nodes on the cluster (usingeither the remote shell (rsh) client or the secure shell (ssh) client) without having toenter a password each time for each machine.

Provided that the appropriate server daemons (either rshd or sshd) are running, thissection briefly describes how you can configure your system in order to use FLUENT forparallel computing.


Parallel Processing

Configuring the rsh Client

The remote shell client (rsh), is widely deployed and used. It is generally easy to con-figure, and involves adding all the machine names, each on a single line, to the .rhosts

file in your home directory.

If you refer to the machine you are currently logged on as the ‘client’, and if you referto the remote machine to which you seek password-less login as the ‘server’, then onthe server, you can add the name of your client machine to the .rhosts file. The namecould be a local name or a fully qualified name with the domain suffix. Similarly, you canadd other clients from which you require similar access to this server. These machinesare then “trusted” and remote access is allowed without the further need for a password.This setup assumes you have the same userid on all the machines. Otherwise, each linein the .rhosts file would need to contain the machine name as well as the userid for theclient that you want access to. Please refer to your system documentation for furtherusage options.

Note that for security purposes, the .rhosts file must be readable only by the user.

Configuring the ssh Client

The secure shell client (ssh), is a more secure alternative to rsh and is also used widely.Depending on the specific protocol and the version deployed, configuration involves a fewsteps. SSH1 and SSH2 are two current protocols. OpenSSH is an open implementation ofthe SSH2 protocol and is backwards compatible with the SSH1 protocol. To add a clientmachine, with respect to user configuration, the following steps are involved:

1. Generate a public-private key pair using ssh-keygen (or using a graphical userinterface client). For example:

% ssk-keygen -t dsa

where it creates a Digital Signature Authority (DSA) type key pair.

2. Place your public key on the remote host.

• For SSH1, insert the contents of the client (~/.ssh/identity.pub) into theserver (~/.ssh/authorized_keys).

• For SSH2, insert the contents of the client (~/.ssh/id_dsa.pub) into the server(~/.ssh/authorized_keys2).

The client machine is now added to the access list and the user is no longer required totype in a password each time. For additional information, consult your system adminis-trator or refer to your system documentation.


31.4 Checking Network Connectivity

31.4 Checking Network Connectivity

For any compute node, you can print network connectivity information that includes thehostname, architecture, process ID, and ID of the selected compute node and all machinesconnected to it. The ID of the selected compute node is marked with an asterisk.

The ID for the FLUENT host process is always host. The compute nodes are numberedsequentially starting from node-0. All compute nodes are completely connected. Inaddition, compute node 0 is connected to the host process.

To obtain connectivity information for a compute node, you can use the Parallel Connec-tivity panel (Figure 31.4.1).

Parallel −→Show Connectivity...

Figure 31.4.1: The Parallel Connectivity Panel

Indicate the compute node ID for which connectivity information is desired in the Com-pute Node field, and then click the Print button. Sample output for compute node 0 isshown below:

------------------------------------------------------------------------------ID Comm. Hostname O.S. PID Mach ID HW ID Name------------------------------------------------------------------------------host net balin Linux-32 17272 0 7 Fluent Hostn3 hp balin Linux-32 17307 1 10 Fluent Noden2 hp filio Linux-32 17306 0 -1 Fluent Noden1 hp bofur Linux-32 17305 0 1 Fluent Noden0* hp balin Linux-32 17273 2 11 Fluent Node

O.S is the architecture, Comm. is the communication library (i.e., MPI type), PID is theprocess ID number, Mach ID is the compute node ID, and HW ID is an identifier specificto the interconnect used.


Parallel Processing

31.5 Partitioning the Grid

Information about grid partitioning is provided in the following sections:

• Section 31.5.1: Overview of Grid Partitioning

• Section 31.5.2: Preparing Hexcore Meshes for Partitioning

• Section 31.5.3: Partitioning the Grid Automatically

• Section 31.5.4: Partitioning the Grid Manually

• Section 31.5.5: Grid Partitioning Methods

• Section 31.5.6: Checking the Partitions

• Section 31.5.7: Load Distribution

31.5.1 Overview of Grid Partitioning

When you use the parallel solver in FLUENT, you need to partition or subdivide the gridinto groups of cells that can be solved on separate processors (see Figure 31.5.1). Youcan either use the automatic partitioning algorithms when reading an unpartitioned gridinto the parallel solver (recommended approach, described in Section 31.5.3: Partitioningthe Grid Automatically), or perform the partitioning yourself in the serial solver or afterreading a mesh into the parallel solver (as described in Section 31.5.4: Partitioning theGrid Manually). In either case, the available partitioning methods are those describedin Section 31.5.5: Grid Partitioning Methods. You can partition the grid before or afteryou set up the problem (by defining models, boundary conditions, etc.), although it isbetter to partition after the setup, due to some model dependencies (e.g., adaption onnon-conformal interfaces, sliding-mesh and shell-conduction encapsulation).

i If your case file contains sliding meshes, or non-conformal interfaces onwhich you plan to perform adaption during the calculation, you will haveto partition it in the serial solver. See Sections 31.5.3 and 31.5.4 for moreinformation.

i If your case file contains a mesh generated by the GAMBIT Hex Core mesh-ing scheme or the TGrid Mesh/Hexcore menu option (hexcore mesh), youmust filter the mesh using the tpoly utility or TGrid prior to partitioningthe grid. See Section 31.5.2: Preparing Hexcore Meshes for Partitioningfor more information.

Note that the relative distribution of cells among compute nodes will be maintainedduring grid adaption, except if non-conformal interfaces are present, so repartitioningafter adaption is not required. See Section 31.5.7: Load Distribution for more information.



If you use the serial solver to set up the problem before partitioning, the machine onwhich you perform this task must have enough memory to read in the grid. If yourgrid is too large to be read into the serial solver, you can read the unpartitioned griddirectly into the parallel solver (using the memory available in all the defined hosts)and have it automatically partitioned. In this case you will set up the problem after aninitial partition has been made. You will then be able to manually repartition the caseif necessary. See Sections 31.5.3 and 31.5.4 for additional details and limitations, andSection 31.5.6: Checking the Partitions for details about checking the partitions.

Partition 0 Partition 1After Partitioning

InterfaceBoundary

DomainBefore Partitioning

Figure 31.5.1: Partitioning the Grid


Parallel Processing

31.5.2 Preparing Hexcore Meshes for Partitioning

If you generate meshes using the GAMBIT Hex Core meshing scheme or the TGrid Mesh/Hexcoremenu option (hexcore meshes), you often have features that can interfere with partition-ing. Such features include hanging nodes and overlapping parent-child faces, and arelocated at the transition between the core of hexahedral cells and the surrounding body-fitted mesh. To remove these features before you partition your hexcore meshes, youmust convert the transitional hexahedral cells into polyhedra. The dimensions of eachof these transitional cells remains the same after conversion, but these transitional cellswill have more than the original 6 faces. The conversion to polyhedra must take placeprior to reading the mesh into FLUENT, and can be done using either the tpoly utilityor TGrid.

When you use the tpoly utility, you must specify an input case file that contains ahexcore mesh. This file can either be in ASCII or Binary format, and the file should beunzipped. If the input file does not contain a hexcore mesh, then none of the cells areconverted to polyhedra. When you use the tpoly utility, you should specify an outputcase file name. Once the input file has been processed by the tpoly filter, an ASCIIoutput file is generated.

i The output case file resulting from a tpoly conversion only contains meshinformation. None of the solver-related data of the input file is retained.

To convert a file using the tpoly filter, before starting FLUENT, type the following:

utility tpoly input filename output filename

You can also use TGrid to convert the transitional cells to polyhedra. You must eitherread in or create the hexcore mesh in TGrid, and then save the mesh as a case file withpolyhedra. To do this, use the File/Write/Case... menu option, being sure to enable theWrite As Polyhedra option in the Select File dialog box.

Limitations

Converted hexcore meshes have the following limitations:

• The following grid manipulation tools are not available on polyhedral meshes:

– extrude-face-zone under the modify-zone option

– fuse

– skewness smoothing

– swapping (will not affect polyhedral cells)

• The polyhedral cells that result from the conversion are not eligible for adaption.For more information about adaption, see Chapter 26: Adapting the Grid.



31.5.3 Partitioning the Grid Automatically

For automatic grid partitioning, you can select the bisection method and other options forcreating the grid partitions before reading a case file into the parallel version of the solver.For some of the methods, you can perform pretesting to ensure that the best possiblepartition is performed. See Section 31.5.5: Grid Partitioning Methods for informationabout the partitioning methods available in FLUENT.

Note that if your case file contains sliding meshes, or non-conformal interfaces on whichyou plan to perform adaption during the calculation, you will need to partition it in theserial solver, and then read it into the parallel solver, with the Case File option turnedon in the Auto Partition Grid panel (the default setting).

The procedure for partitioning automatically in the parallel solver is as follows:

1. (optional) Set the partitioning parameters in the Auto Partition Grid panel (Fig-ure 31.5.2).

Parallel −→Auto Partition...

Figure 31.5.2: The Auto Partition Grid Panel

If you are reading in a mesh file or a case file for which no partition information isavailable, and you keep the Case File option turned on, FLUENT will partition thegrid using the method displayed in the Method drop-down list.

If you want to specify the partitioning method and associated options yourself, theprocedure is as follows:

(a) Turn off the Case File option. The other options in the panel will becomeavailable.

(b) Select the bisection method in the Method drop-down list. The choices arethe techniques described in Section 31.5.5: Bisection Methods.

(c) You can choose to independently apply partitioning to each cell zone, or youcan allow partitions to cross zone boundaries using the Across Zones checkbutton. It is recommended that you not partition cells zones independently


Parallel Processing

(by turning off the Across Zones check button) unless cells in different zoneswill require significantly different amounts of computation during the solutionphase (e.g., if the domain contains both solid and fluid zones).

(d) If you have chosen the Principal Axes or Cartesian Axes method, you can improvethe partitioning by enabling the automatic testing of the different bisectiondirections before the actual partitioning occurs. To use pretesting, turn onthe Pre-Test option. Pretesting is described in Section 31.5.5: Pretesting.

(e) Click OK.

If you have a case file where you have already partitioned the grid, and the numberof partitions divides evenly into the number of compute nodes, you can keep thedefault selection of Case File in the Auto Partition Grid panel. This instructs FLUENTto use the partitions in the case file.

2. Read the case file.

File −→ Read −→Case...

Reporting During Auto Partitioning

As the grid is automatically partitioned, some information about the partitioning processwill be printed in the text (console) window. If you want additional information, you canprint a report from the Partition Grid panel after the partitioning is completed.

Parallel −→Partition...

When you click the Print Active Partitions or Print Stored Partitions button in the PartitionGrid panel, FLUENT will print the partition ID, number of cells, faces, and interfaces, andthe ratio of interfaces to faces for each active or stored partition in the console window.In addition, it will print the minimum and maximum cell, face, interface, and face-ratio variations. See Section 31.5.6: Interpreting Partition Statistics for details. You canexamine the partitions graphically by following the directions in Section 31.5.6: Checkingthe Partitions.

31.5.4 Partitioning the Grid Manually

Automatic partitioning in the parallel solver (described in Section 31.5.3: Partitioningthe Grid Automatically) is the recommended approach to grid partitioning, but it isalso possible to partition the grid manually in either the serial solver or the parallelsolver. After automatic or manual partitioning, you will be able to inspect the partitionscreated (see Section 31.5.6: Checking the Partitions) and optionally repartition the grid,if necessary. Again, you can do so within the serial or the parallel solver, using thePartition Grid panel. A partitioned grid may also be used in the serial solver without anyloss in performance.



Guidelines for Partitioning the Grid

The following steps are recommended for partitioning a grid manually:

1. Partition the grid using the default bisection method (Principal Axes) and optimiza-tion (Smooth).

2. Examine the partition statistics, which are described in Section 31.5.6: Interpret-ing Partition Statistics. Your aim is to achieve small values of Interface ratio

variation and Global interface ratio while maintaining a balanced load (Cellvariation). If the statistics are not acceptable, try one of the other bisection meth-ods.

3. Once you determine the best bisection method for your problem, you can turn onPre-Test (see Section 31.5.5: Pretesting) to improve it further, if desired.

4. You can also improve the partitioning using the Merge optimization, if desired.

Instructions for manual partitioning are provided below.

Using the Partition Grid Panel

For grid partitioning, you need to select the bisection method for creating the grid par-titions, set the number of partitions, select the zones and/or registers, and choose theoptimizations to be used. For some methods, you can also perform pretesting to ensurethat the best possible bisection is performed. Once you have set all the parameters in thePartition Grid panel to your satisfaction, click the Partition button to subdivide the gridinto the selected number of partitions using the prescribed method and optimization(s).See above for recommended partitioning strategies.

You can set the relevant inputs in the Partition Grid panel (Figure 31.5.3 in the parallelsolver, or Figure 31.5.4 in the serial solver) in the following manner:

Parallel −→Partition...

1. Select the bisection method in the Method drop-down list. The choices are thetechniques described in Section 31.5.5: Bisection Methods.

2. Set the desired number of grid partitions in the Number integer number field. Youcan use the counter arrows to increase or decrease the value, instead of typing inthe box. The number of grid partitions must be an integral multiple of the numberof processors available for parallel computing.


Parallel Processing

Figure 31.5.3: The Partition Grid Panel in the Parallel Solver

Figure 31.5.4: The Partition Grid Panel in the Serial Solver



3. You can choose to independently apply partitioning to each cell zone, or you canallow partitions to cross zone boundaries using the Across Zones check button. It isrecommended that you not partition cells zones independently (by turning off theAcross Zones check button) unless cells in different zones will require significantlydifferent amounts of computation during the solution phase (e.g., if the domaincontains both solid and fluid zones).

4. You can select Encapsulate Grid Interfaces if you would like the cells surroundingall non-conformal grid interfaces in your mesh to reside in a single partition at alltimes during the calculation. If your case file contains non-conformal interfaceson which you plan to perform adaption during the calculation, you will have topartition it in the serial solver, with the Encapsulate Grid Interfaces and Encapsulatefor Adaption options turned on.

5. If you have enabled the Encapsulate Grid Interfaces option in the serial solver, theEncapsulate for Adaption option will also be available. When you select this op-tion, additional layers of cells are encapsulated such that transfer of cells will beunnecessary during parallel adaption.

6. You can activate and control the desired optimization methods (described in Sec-tion 31.5.5: Optimizations) using the items under Optimizations. You can activatethe Merge and Smooth schemes by turning on the Do check button next to eachone. For each scheme, you can also set the number of Iterations. Each optimizationscheme will be applied until appropriate criteria are met, or the maximum numberof iterations has been executed. If the Iterations counter is set to 0, the optimizationscheme will be applied until completion, without limit on the maximum number ofiterations.

7. If you have chosen the Principal Axes or Cartesian Axes method, you can improve thepartitioning by enabling the automatic testing of the different bisection directionsbefore the actual partitioning occurs. To use pretesting, turn on the Pre-Test option.Pretesting is described in Section 31.5.5: Pretesting.

8. In the Zones and/or Registers lists, select the zone(s) and/or register(s) for whichyou want to partition. For most cases, you will select all Zones (the default) topartition the entire domain. See below for details.

9. You can assign selected Zones and/or Registers to a specific partition ID by enteringa value for the Set Selected Zones and Registers to Partition ID. For example, if theNumber of partitions for your grid is 2, then you can only use IDs of 0 or 1. Ifyou have three partitions, then you can enter IDs of 0, 1, or 2. This can be usefulin situations where the gradient at a region is known to be high. In such cases,you can mark the region or zone and set the marked cells to one of the partitionIDs, thus preventing the partition from going through that region. This in turnwill facilitate convergence. This is also useful in cases where mesh manipulation


Parallel Processing

tools are not available in parallel. In this case, you can assign the related cells toa particular ID so that the grid manipulation tools are now functional.

If you are running the parallel solver, and you have marked your region and assignedan ID to the selected Zones and/or Registers, click the Use Stored Partitions buttonto make the new partitions valid.

Refer to the example described later in this section for a demonstration of howselected registers are assigned to a partition.

10. Click the Partition button to partition the grid.

11. If you decide that the new partitions are better than the previous ones (if the gridwas already partitioned), click the Use Stored Partitions button to make the newlystored cell partitions the active cell partitions. The active cell partition is used forthe current calculation, while the stored cell partition (the last partition performed)is used when you save a case file.

12. When using the dynamic mesh model in your parallel simulations, the Partitionpanel includes an Auto Repartition option and a Repartition Interval setting. Theseparallel partitioning options are provided because FLUENT migrates cells whenlocal remeshing and smoothing is performed. Therefore, the partition interface be-comes very wrinkled and the load balance may deteriorate. By default, the AutoRepartition option is selected, where a percentage of interface faces and loads are au-tomatically traced. When this option is selected, FLUENT automatically determinesthe most appropriate repartition interval based on various simulation parameters.Sometimes, using the Auto Repartition option provides insufficient results, therefore,the Repartition Interval setting can be used. The Repartition Interval setting lets youto specify the interval (in time steps or iterations respectively) when a repartitionis enforced. When repartitioning is not desired, then you can set the RepartitionInterval to zero.

i Note that when dynamic meshes and local remeshing is utilized, updatedmeshes may be slightly different in parallel FLUENT (when compared toserial FLUENT or when compared to a parallel solution created with adifferent number of compute nodes), resulting in very small differences inthe solutions.



Example of Setting Selected Registers to Specified Partition IDs

1. Start FLUENT in parallel. The case in this example was partitioned across twonodes.

2. Read in your case.

3. Display the grid with the Partitions option enabled in the Display Grid panel (Fig-ure 31.5.5).

GridFLUENT 6.3 (2d, segregated, ske)

Figure 31.5.5: The Partitioned Grid

4. Adapt your region and mark your cells (see Section 26.7.3: Performing RegionAdaption). This creates a register.


Parallel Processing

5. Open the Partition Grid panel.

6. Keep the Set Selected Zones and Registers to Partition ID set to 0 and click thecorresponding button. This prints the following output to the FLUENT consolewindow:

>> 2 Active Partitions:----------------------------------------------------------------------Collective Partition Statistics: Minimum Maximum Total----------------------------------------------------------------------Cell count 459 459 918Mean cell count deviation 0.0% 0.0%Partition boundary cell count 11 11 22Partition boundary cell count ratio 2.4% 2.4% 2.4%

Face count 764 1714 2461Mean face count deviation -38.3% 38.3%Partition boundary face count 13 13 17Partition boundary face count ratio 0.8% 1.7% 0.7%

Partition neighbor count 1 1----------------------------------------------------------------------Partition Method Principal AxesStored Partition Count 2

Done.



7. Click the Use Stored Partitions button to make the new partitions valid. Thismigrates the partitions to the compute-nodes. The following output is then printedto the FLUENT console window:

Migrating partitions to compute-nodes.>> 2 Active Partitions:

P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors0 672 24 0.036 2085 29 0.014 11 246 24 0.098 425 29 0.068 1

----------------------------------------------------------------------Collective Partition Statistics: Minimum Maximum Total----------------------------------------------------------------------Cell count 246 672 918Mean cell count deviation -46.4% 46.4%Partition boundary cell count 24 24 48Partition boundary cell count ratio 3.6% 9.8% 5.2%

Face count 425 2085 2461Mean face count deviation -66.1% 66.1%Partition boundary face count 29 29 49Partition boundary face count ratio 1.4% 6.8% 2.0%

Partition neighbor count 1 1----------------------------------------------------------------------Partition Method Principal AxesStored Partition Count 2

Done.

8. Display the grid (Figure 31.5.6).

9. This time, set the Set Selected Zones and Registers to Partition ID to 1 and click thecorresponding button. This prints a report to the FLUENT console.

10. Click the Use Stored Partitions button to make the new partitions valid and tomigrate the partitions to the compute-nodes.

11. Display the grid (Figure 31.5.7). Notice now that the partition appears in a differentlocation as specified by your partition ID.

i Although this example demonstrates setting selected registers to specificpartition IDs in parallel, it can be similarly applied in serial.


Parallel Processing


Figure 31.5.6: The Partitioned ID Set to Zero


Figure 31.5.7: The Partitioned ID Set to 1



Partitioning Within Zones or Registers

The ability to restrict partitioning to cell zones or registers gives you the flexibility toapply different partitioning strategies to subregions of a domain. For example, if yourgeometry consists of a cylindrical plenum connected to a rectangular duct, you maywant to partition the plenum using the Cylindrical Axes method, and the duct using theCartesian Axes method.

If the plenum and the duct are contained in two different cell zones, you can select oneat a time and perform the desired partitioning, as described in Section 31.5.4: Usingthe Partition Grid Panel. If they are not in two different cell zones, you can create acell register (basically a list of cells) for each region using the functions that are usedto mark cells for adaption. These functions allow you to mark cells based on physicallocation, cell volume, gradient or isovalue of a particular variable, and other parameters.See Chapter 26: Adapting the Grid for information about marking cells for adaption.Section 26.11.1: Manipulating Adaption Registers provides information about manipu-lating different registers to create new ones. Once you have created a register, you canpartition within it as described above.

i Note that partitioning within zones or registers is not available when Metisis selected as the partition Method.

For dynamic mesh applications (see item 11 above), FLUENT stores the partition methodused to partition the respective zone. Therefore, if repartitioning is done, FLUENT usesthe same method that was used to partition the mesh.

Reporting During Partitioning

As the grid is partitioned, information about the partitioning process will be printed inthe text (console) window. By default, the solver will print the number of partitionscreated, the number of bisections performed, the time required for the partitioning, andthe minimum and maximum cell, face, interface, and face-ratio variations. (See Sec-tion 31.5.6: Interpreting Partition Statistics for details.) If you increase the Verbosity to2 from the default value of 1, the partition method used, the partition ID, number ofcells, faces, and interfaces, and the ratio of interfaces to faces for each partition will alsobe printed in the console window. If you decrease the Verbosity to 0, only the number ofpartitions created and the time required for the partitioning will be reported.

You can request a portion of this report to be printed again after the partitioning iscompleted. When you click the Print Active Partitions or Print Stored Partitions buttonin the parallel solver, FLUENT will print the partition ID, number of cells, faces, andinterfaces, and the ratio of interfaces to faces for each active or stored partition in theconsole window. In addition, it will print the minimum and maximum cell, face, interface,and face-ratio variations. In the serial solver, you will obtain the same information aboutthe stored partition when you click Print Partitions. See Section 31.5.6: Interpreting


Parallel Processing

Partition Statistics for details.

i Recall that to make the stored cell partitions the active cell partitions youmust click the Use Stored Partitions button. The active cell partition isused for the current calculation, while the stored cell partition (the lastpartition performed) is used when you save a case file.

Resetting the Partition Parameters

If you change your mind about your partition parameter settings, you can easily returnto the default settings assigned by FLUENT by clicking on the Default button. When youclick the Default button, it will become the Reset button. The Reset button allows youto return to the most recently saved settings (i.e., the values that were set before youclicked on Default). After execution, the Reset button will become the Default buttonagain.

31.5.5 Grid Partitioning Methods

Partitioning the grid for parallel processing has three major goals:

• Create partitions with equal numbers of cells.

• Minimize the number of partition interfaces—i.e., decrease partition boundary sur-face area.

• Minimize the number of partition neighbors.

Balancing the partitions (equalizing the number of cells) ensures that each processorhas an equal load and that the partitions will be ready to communicate at about thesame time. Since communication between partitions can be a relatively time-consumingprocess, minimizing the number of interfaces can reduce the time associated with thisdata interchange. Minimizing the number of partition neighbors reduces the chancesfor network and routing contentions. In addition, minimizing partition neighbors isimportant on machines where the cost of initiating message passing is expensive comparedto the cost of sending longer messages. This is especially true for workstations connectedin a network.

The partitioning schemes in FLUENT use bisection algorithms to create the partitions, butunlike other schemes which require the number of partitions to be a factor of two, theseschemes have no limitations on the number of partitions. For each available processor,you will create the same number of partitions (i.e., the total number of partitions will bean integral multiple of the number of processors).



Bisection Methods

The grid is partitioned using a bisection algorithm. The selected algorithm is applied tothe parent domain, and then recursively applied to the child subdomains. For example,to divide the grid into four partitions, the solver will bisect the entire (parent) domaininto two child domains, and then repeat the bisection for each of the child domains,yielding four partitions in total. To divide the grid into three partitions, the solver will“bisect” the parent domain to create two partitions—one approximately twice as largeas the other—and then bisect the larger child domain again to create three partitions intotal.

The grid can be partitioned using one of the algorithms listed below. The most efficientchoice is problem-dependent, so you can try different methods until you find the one thatis best for your problem. See Section 31.5.4: Guidelines for Partitioning the Grid forrecommended partitioning strategies.

Cartesian Axes bisects the domain based on the Cartesian coordinates of the cells (seeFigure 31.5.8). It bisects the parent domain and all subsequent child subdomainsperpendicular to the coordinate direction with the longest extent of the activedomain. It is often referred to as coordinate bisection.

Cartesian Strip uses coordinate bisection but restricts all bisections to the Cartesiandirection of longest extent of the parent domain (see Figure 31.5.9). You can oftenminimize the number of partition neighbors using this approach.

Cartesian X-, Y-, Z-Coordinate bisects the domain based on the selected Cartesiancoordinate. It bisects the parent domain and all subsequent child subdomainsperpendicular to the specified coordinate direction. (See Figure 31.5.9.)

Cartesian R Axes bisects the domain based on the shortest radial distance from thecell centers to that Cartesian axis (x, y, or z) which produces the smallest interfacesize. This method is available only in 3D.

Cartesian RX-, RY-, RZ-Coordinate bisects the domain based on the shortest ra-dial distance from the cell centers to the selected Cartesian axis (x, y, or z). Thesemethods are available only in 3D.

Cylindrical Axes bisects the domain based on the cylindrical coordinates of the cells.This method is available only in 3D.

Cylindrical R-, Theta-, Z-Coordinate bisects the domain based on the selected cylin-drical coordinate. These methods are available only in 3D.

Metis uses the METIS software package for partitioning irregular graphs, developed byKarypis and Kumar at the University of Minnesota and the Army HPC ResearchCenter. It uses a multilevel approach in which the vertices and edges on the fine


Parallel Processing

graph are coalesced to form a coarse graph. The coarse graph is partitioned, andthen uncoarsened back to the original graph. During coarsening and uncoarsen-ing, algorithms are applied to permit high-quality partitions. Detailed informationabout METIS can be found in its manual [172].

i Note that when using the socket version (-pnet), the METIS partitioneris not available. In this case, METIS partitioning can be obtained usingthe partition filter, as described below.

i If you create non-conformal interfaces, and generate virtual polygonal faces,your METIS partition can cross non-conformal interfaces by using the con-nectivity of the virtual polygonal faces. This improves load balancing forthe parallel solver and minimizes communication by decreasing the numberof partition interface cells.

Polar Axes bisects the domain based on the polar coordinates of the cells (see Fig-ure 31.5.12). This method is available only in 2D.

Polar R-Coordinate, Polar Theta-Coordinate bisects the domain based on the se-lected polar coordinate (see Figure 31.5.12). These methods are available only in2D.

Principal Axes bisects the domain based on a coordinate frame aligned with the prin-cipal axes of the domain (see Figure 31.5.10). This reduces to Cartesian bisectionwhen the principal axes are aligned with the Cartesian axes. The algorithm is alsoreferred to as moment, inertial, or moment-of-inertia partitioning.

This is the default bisection method in FLUENT.

Principal Strip uses moment bisection but restricts all bisections to the principal axisof longest extent of the parent domain (see Figure 31.5.11). You can often minimizethe number of partition neighbors using this approach.

Principal X-, Y-, Z-Coordinate bisects the domain based on the selected principalcoordinate (see Figure 31.5.11).

Spherical Axes bisects the domain based on the spherical coordinates of the cells. Thismethod is available only in 3D.

Spherical Rho-, Theta-, Phi-Coordinate bisects the domain based on the selectedspherical coordinate. These methods are available only in 3D.



Contours of Cell Partition

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 31.5.8: Partitions Created with the Cartesian Axes Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 31.5.9: Partitions Created with the Cartesian Strip or Cartesian X-Coordinate Method


Parallel Processing


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 31.5.10: Partitions Created with the Principal Axes Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 31.5.11: Partitions Created with the Principal Strip or Principal X-Coordinate Method




3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 31.5.12: Partitions Created with the Polar Axes or Polar Theta-Coordinate Method

Optimizations

Additional optimizations can be applied to improve the quality of the grid partitions.The heuristic of bisecting perpendicular to the direction of longest domain extent isnot always the best choice for creating the smallest interface boundary. A “pre-testing”operation (see Section 31.5.5: Pretesting) can be applied to automatically choose the bestdirection before partitioning. In addition, the following iterative optimization schemesexist:

Smooth attempts to minimize the number of partition interfaces by swapping cellsbetween partitions. The scheme traverses the partition boundary and gives cells tothe neighboring partition if the interface boundary surface area is decreased. (SeeFigure 31.5.13.)

Merge attempts to eliminate orphan clusters from each partition. An orphan cluster isa group of cells with the common feature that each cell within the group has at leastone face which coincides with an interface boundary. (See Figure 31.5.14.) Orphanclusters can degrade multigrid performance and lead to large communication costs.

In general, the Smooth and Merge schemes are relatively inexpensive optimization tools.


Parallel Processing

Figure 31.5.13: The Smooth Optimization Scheme

Figure 31.5.14: The Merge Optimization Scheme



Pretesting

If you choose the Principal Axes or Cartesian Axes method, you can improve the bisectionby testing different directions before performing the actual bisection. If you choose notto use pretesting (the default), FLUENT will perform the bisection perpendicular to thedirection of longest domain extent.

If pretesting is enabled, it will occur automatically when you click the Partition buttonin the Partition Grid panel, or when you read in the grid if you are using automaticpartitioning. The bisection algorithm will test all coordinate directions and choose theone which yields the fewest partition interfaces for the final bisection.

Note that using pretesting will increase the time required for partitioning. For 2D prob-lems partitioning will take 3 times as long as without pretesting, and for 3D problems itwill take 4 times as long.

Using the Partition Filter

As noted above, you can use the METIS partitioning method through a filter in ad-dition to within the Auto Partition Grid and Partition Grid panels. To perform METISpartitioning on an unpartitioned grid, use the File/Import/Partition/Metis... menu item.

File −→ Import −→ Partition −→Metis...

FLUENT will use the METIS partitioner to partition the grid, and then read the par-titioned grid into the solver. The number of partitions will be equal to the number ofprocesses. You can then proceed with the model definition and solution.

i Direct import to the parallel solver through the partition filter requiresthat the host machine has enough memory to run the filter for the specifiedgrid. If not, you will need to run the filter on a machine that does haveenough memory. You can either start the parallel solver on the machinewith enough memory and repeat the process described above, or run thefilter manually on the new machine and then read the partitioned grid intothe parallel solver on the host machine.

To manually partition a grid using the partition filter, enter the following command:

utility partition input filename partition count output filename

where input filename is the filename for the grid to be partitioned, partition count isthe number of partitions desired, and output filename is the filename for the parti-tioned grid. You can then read the partitioned grid into the solver (using the standardFile/Read/Case... menu item) and proceed with the model definition and solution.


Parallel Processing

When the File/Import/Partition/Metis... menu item is used to import an unpartitionedgrid into the parallel solver, the METIS partitioner partitions the entire grid. You mayalso partition each cell zone individually, using the File/Import/Partition/Metis Zone...menu item.

File −→ Import −→ Partition −→Metis Zone...

This method can be useful for balancing the work load. For example, if a case has afluid zone and a solid zone, the computation in the fluid zone is more expensive than inthe solid zone, so partitioning each zone individually will result in a more balanced workload.

31.5.6 Checking the Partitions

After partitioning a grid, you should check the partition information and examine thepartitions graphically.

Interpreting Partition Statistics

You can request a report to be printed after partitioning (either automatic or manual) iscompleted. In the parallel solver, click the Print Active Partitions or Print Stored Partitionsbutton in the Partition Grid panel. In the serial solver, click the Print Partitions button.

FLUENT distinguishes between two cell partition schemes within a parallel problem: theactive cell partition and the stored cell partition. Initially, both are set to the cell partitionthat was established upon reading the case file. If you re-partition the grid using thePartition Grid panel, the new partition will be referred to as the stored cell partition. Tomake it the active cell partition, you need to click the Use Stored Partitions button in thePartition Grid panel. The active cell partition is used for the current calculation, while thestored cell partition (the last partition performed) is used when you save a case file. Thisdistinction is made mainly to allow you to partition a case on one machine or networkof machines and solve it on a different one. Thanks to the two separate partitioningschemes, you could use the parallel solver with a certain number of compute nodes tosubdivide a grid into an arbitrary different number of partitions, suitable for a differentparallel machine, save the case file, and then load it into the designated machine.

When you click Print Partitions in the serial solver, you will obtain information about thestored partition.

The output generated by the partitioning process includes information about the recursivesubdivision and iterative optimization processes. This is followed by information aboutthe final partitioned grid, including the partition ID, number of cells, number of faces,number of interface faces, ratio of interface faces to faces for each partition, numberof neighboring partitions, and cell, face, interface, neighbor, mean cell, face ratio, andglobal face ratio variations. Global face ratio variations are the minimum and maximumvalues of the respective quantities in the present partitions. For example, in the sample



output below, partitions 0 and 3 have the minimum number of interface faces (10), andpartitions 1 and 2 have the maximum number of interface faces (19); hence the variationis 10–19.

Your aim is to achieve small values of Interface ratio variation and Global interface

ratio while maintaining a balanced load (Cell variation).

>> Partitions:P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors0 134 10 0.075 217 10 0.046 11 137 19 0.139 222 19 0.086 22 134 19 0.142 218 19 0.087 23 137 10 0.073 223 10 0.045 1

------Partition count = 4Cell variation = (134 - 137)Mean cell variation = ( -1.1% - 1.1%)Intercell variation = (10 - 19)Intercell ratio variation = ( 7.3% - 14.2%)Global intercell ratio = 10.7%Face variation = (217 - 223)Interface variation = (10 - 19)Interface ratio variation = ( 4.5% - 8.7%)Global interface ratio = 3.4%Neighbor variation = (1 - 2)

Computing connected regions; type ^C to interrupt.Connected region count = 4

Note that partition IDs correspond directly to compute node IDs when a case file is readinto the parallel solver. When the number of partitions in a case file is larger than thenumber of compute nodes, but is evenly divisible by the number of compute nodes, thenthe distribution is such that partitions with IDs 0 to (M − 1) are mapped onto computenode 0, partitions with IDs M to (2M − 1) onto compute node 1, etc., where M is equalto the ratio of the number of partitions to the number of compute nodes.


Parallel Processing

Examining Partitions Graphically

To further aid interpretation of the partition information, you can draw contours of thegrid partitions, as illustrated in Figures 31.5.8–31.5.12.

Display −→Contours...

To display the active cell partition or the stored cell partition (which are described above),select Active Cell Partition or Stored Cell Partition in the Cell Info... category of the ContoursOf drop-down list, and turn off the display of Node Values. (See Section 28.1.2: DisplayingContours and Profiles for information about displaying contours.)

i If you have not already done so in the setup of your problem, you will needto perform a solution initialization in order to use the Contours panel.

31.5.7 Load Distribution

If the speeds of the processors that will be used for a parallel calculation differ signifi-cantly, you can specify a load distribution for partitioning, using the load-distributiontext command.

parallel −→ partition −→ set −→load-distribution

For example, if you will be solving on three compute nodes, and one machine is twice asfast as the other two, then you may want to assign twice as many cells to the first machineas to the others (i.e., a load vector of (2 1 1)). During subsequent grid partitioning,partition 0 will end up with twice as many cells as partitions 1 and 2.

Note that for this example, you would then need to start up FLUENT such that computenode 0 is the fast machine, since partition 0, with twice as many cells as the others, willbe mapped onto compute node 0. Alternatively, in this situation, you could enable theload balancing feature (described in Section 31.6.2: Load Balancing) to have FLUENTautomatically attempt to discern any difference in load among the compute nodes.

i If you adapt a grid that contains non-conformal interfaces, and you wantto rebalance the load on the compute nodes, you will have to save your caseand data files after adaption, read the case and data files into the serialsolver, repartition using the Encapsulate Grid Interfaces and Encapsulate forAdaption options in the Partition Grid panel, and save case and data filesagain. You will then be able to read the manually repartitioned case anddata files into the parallel solver, and continue the solution from where youleft it.


31.6 Checking and Improving Parallel Performance


To determine how well the parallel solver is working, you can measure computation andcommunication times, and the overall parallel efficiency, using the performance meter.You can also control the amount of communication between compute nodes in order tooptimize the parallel solver, and take advantage of the automatic load balancing featureof FLUENT.

Information about checking and improving parallel performance is provided in the fol-lowing sections:

• Section 31.6.1: Checking Parallel Performance

• Section 31.6.2: Optimizing the Parallel Solver

31.6.1 Checking Parallel Performance

The performance meter allows you to report the wall clock time elapsed during a com-putation, as well as message-passing statistics. Since the performance meter is alwaysactivated, you can access the statistics by printing them after the computation is com-pleted. To view the current statistics, use the Parallel/Timer/Usage menu item.

Parallel −→ Timer −→Usage

Performance statistics will be printed in the text window (console).

To clear the performance meter so that you can eliminate past statistics from the futurereport, use the Parallel/Timer/Reset menu item.

Parallel −→ Timer −→Reset


Parallel Processing

The following example demonstrates how the current parallel statistics are displayed inthe console window:

Performance Timer for 1 iterations on 4 compute nodes

Average wall-clock time per iteration: 4.901 sec

Global reductions per iteration: 408 ops

Global reductions time per iteration: 0.000 sec (0.0%)

Message count per iteration: 801 messages

Data transfer per iteration: 9.585 MB

LE solves per iteration: 12 solves

LE wall-clock time per iteration: 2.445 sec (49.9%)

LE global solves per iteration: 27 solves

LE global wall-clock time per iteration: 0.246 sec (5.0%)

AMG cycles per iteration: 64 cycles

Relaxation sweeps per iteration: 4160 sweeps

Relaxation exchanges per iteration: 920 exchanges

Total wall-clock time: 4.901 sec

Total CPU time: 17.030 sec

A description of the parallel statistics is as follows:

• Average wall-clock time per iteration describes the average real (wall clock)time per iteration.

• Global reductions per iteration describes the number of global reduction op-erations (such as variable summations over all processes). This requires communi-cation among all processes.

A global reduction is a collective operation over all processes for the given job thatreduces a vector quantity (the length given by the number of processes or nodes) toa scalar quantity (e.g., taking the sum or maximum of a particular quantity). Thenumber of global reductions cannot be calculated from any other readily knownquantities. The number is generally dependent on the algorithm being used andthe problem being solved.

• Global reductions time per iteration describes the time per iteration for theglobal reduction operations.

• Message count per iteration describes the number of messages sent between allprocesses per iteration. This is important with regard to communication latency,especially on high-latency interconnnects.

A message is defined as a single point-to-point, send-and-receive operation betweenany two processes. (This excludes global, collective operations such as global re-ductions.) In terms of domain decomposition, a message is passed from the process



governing one subdomain to a process governing another (usually adjacent) subdo-main.

The message count per iteration is usually dependent on the algorithm being usedand the problem being solved. The message count and the number of messagesthat are reported are totals for all processors.

The message count provides some insight into the impact of communication latencyon parallel performance. A higher message count indicates that the parallel perfor-mance may be more adversely affected if a high-latency interconnect is being used.Ethernet has a higher latency than Myrinet or Infiniband. Thus, a high messagecount will more adversely affect performance with Ethernet than with Infiniband.

• Data transfer per iteration describes the amount of data communicated be-tween processors per iteration. This is important with respect to interconnectbandwidth.

Data transfer per iteration is usually dependent on the algorithm being used and theproblem being solved. This number generally increases with increases in problemsize, number of partitions, and physics complexity.

The data transfer per iteration may provide some insight into the impact of com-munication bandwidth (speed) on parallel performance. The precise impact is oftendifficult to quantify because it is dependent on many things including: ratio of datatransfer to calculations, and ratio of communication bandwidth to CPU speed. Theunit of data transfer is a byte.

• LE solves per iteration describes the number of linear systems being solvedper iteration. This number is dependent on the physics (non-reacting versus react-ing flow) and the algorithms (pressure-based versus density-based solver), but isindependent of mesh size. For the pressure-based solver, this is usually the numberof transport equations being solved (mass, momentum, energy, etc.).

• LE wall-clock time per iteration describes the time (wall-clock) spent doinglinear equation solvers (i.e., multigrid).

• LE global solves per iteration describes the number of solutions on the coarselevel of the AMG solver where the entire linear system has been pushed to a singleprocessor (n0). The system is pushed to a single processor to reduce the compu-tation time during the solution on that level. Scaling generally is not adverselyaffected because the number of unknowns is small on the coarser levels.

• LE global wall-clock time per iteration describes the time (wall-clock) periteration for the linear equation global solutions (see above).

• AMG cycles per iteration describes the average number of multigrid cycles (V,W, flexible, etc.) per iteration.


Parallel Processing

• Relaxation sweeps per iteration describes the number of relaxation sweeps (oriterative solutions) on all levels for all equations per iteration. A relaxation sweepis usually one iteration of Gauss-Siedel or ILU.

• Relaxation exchanges per iteration describes the number of solution commu-nications between processors during the relaxation process in AMG. This numbermay be less than the number of sweeps because of shifting the linear system oncoarser levels to a single node/process.

• Time-step updates per iteration describes the number of sub-iterations on thetime step per iteration.

• Time-step wall-clock time per iteration describes the time per sub-iteration.

• Total wall-clock time describes the total wall-clock time.

• Total CPU time describes the total CPU time used by all processes. This doesnot include any wait time for load imbalances or for communications (other thanpacking and unpacking local buffers).

The most relevant quantity is the Total wall clock time. This quantity can be usedto gauge the parallel performance (speedup and efficiency) by comparing this quantity tothat from the serial analysis (the command line should contain -t1 in order to obtain thestatistics from a serial analysis). In lieu of a serial analysis, an approximation of parallelspeedup may be found in the ratio of Total CPU time to Total wall clock time.



31.6.2 Optimizing the Parallel Solver

Increasing the Report Interval

In FLUENT, you can reduce communication and improve parallel performance by increas-ing the report interval for residual printing/plotting or other solution monitoring reports.You can modify the value for Reporting Interval in the Iterate panel.

Solve −→Iterate...

i Note that you will be unable to interrupt iterations until the end of eachreport interval.

Load Balancing

A dynamic load balancing capability is available in FLUENT. The principal reason forusing parallel processing is to reduce the turnaround time of your simulation, ideallyby a factor proportional to the collective speed of the computing resources used. If, forexample, you were using four CPUs to solve your problem, then you would expect toreduce the turnaround time by a factor of four. This is of course the ideal situation, andassumes that there is very little communication needed among the CPUs, that the CPUsare all of equal speed, and that the CPUs are dedicated to your job. In practice, this isoften not the case. For example, CPU speeds can vary if you are solving in parallel ona cluster that includes nodes with different clock speeds, other jobs may be competingfor use of one or more of the CPUs, and network traffic either from within the parallelsolver or generated from external sources may delay some of the necessary communicationamong the CPUs.

If you enable dynamic load balancing in FLUENT, the load across the computational andnetworking resources will be monitored periodically. If the load balancer determines thatperformance can be improved by redistributing the cells among the compute nodes, itwill automatically do so. There is a time penalty associated with load balancing itself,and so it is disabled by default. If you will be using a dedicated homogeneous resource,or if you are using a heterogeneous resource but have accounted for differences in CPUspeeds during partitioning by specifying a load distribution (see Section 31.5.7: LoadDistribution), then you may not need to use load balancing.

i Note that when the shell conduction model is used, you will not be able toturn on load balancing.


Parallel Processing

To enable and control FLUENT’s automatic load balancing feature, use the Load Balancepanel (Figure 31.6.1). Load balancing will automatically detect and analyze parallelperformance, and redistribute cells between the existing compute nodes to optimize it.

Parallel −→Load Balance...

Figure 31.6.1: The Load Balance Panel

The procedure for using load balancing is as follows:

1. Turn on the Load Balancing option.

2. Select the bisection method to create new grid partitions in the Partition Methoddrop-down list. The choices are the techniques described in Section 31.5.5: BisectionMethods. As part of the automatic load balancing procedure, the grid will berepartitioned into several small partitions using the specified method. The resultingpartitions will then be distributed among the compute nodes to achieve a morebalanced load.

3. Specify the desired Balance Interval. When a value of 0 is specified, FLUENT willinternally determine the best value to use, initially using an interval of 25 iterations.You can override this behavior by specifying a non-zero value. FLUENT will thenattempt to perform load balancing after every N iterations, where N is the specifiedBalance Interval. You should be careful to select an interval that is large enough tooutweigh the cost of performing the load balancing operations.



Note that you can interrupt the calculation at any time, turn the load balancing featureoff (or on), and then continue the calculation.

i If problems arise in your computations due to adaption, you can turn offthe automatic load balancing, which occurs any time that mesh adaptionis performed in parallel.

To instruct the solver to skip the load balancing step, issue the following Scheme com-mand:

(disable-load-balance-after-adaption)

To return to the default behavior use the following Scheme command:

(enable-load-balance-after-adaption)


Parallel Processing


Documents

Fluent Parallel