32
® IBM ® InfoSphere Information Server Best Practices Topology Design Martin Breining Software Architect, Information Server Thomas Cherel Software Architect, Information Server Jean-Claude Mamou STSM and Program Director, Information Server

DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

  • Upload
    others

  • View
    10

  • Download
    4

Embed Size (px)

Citation preview

Page 1: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

® IBM® InfoSphere Information Server

Best Practices

Topology Design

Martin Breining Software Architect, Information Server

Thomas Cherel Software Architect, Information Server

Jean-Claude Mamou STSM and Program Director, Information Server

���

Page 2: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 2

Background....................................................................................................................... 3

Executive Summary......................................................................................................... 3

Foreword........................................................................................................................... 3

Assess your needs............................................................................................................ 3

High availability ........................................................................................................ 4

Performance and scalability..................................................................................... 5

Ease of installation and manageability (aka simplicity) ...................................... 8

Assess your means........................................................................................................... 8

Hardware.................................................................................................................... 8

Technical expertise .................................................................................................... 8

Start shopping .................................................................................................................. 9

Basic topologies ....................................................................................................... 10

Single machine .....................................................................................................10 Dedicated Engine machine.................................................................................11 Dedicated machine for each tier ........................................................................12

Parallel Processing and Grid Engine Topologies................................................ 13

All tiers collocated, including the Engine conductor node............................13 Dedicated machine for Engine conductor node..............................................14

Basic High Availability Topologies ..................................................................... 16

All tiers collocated ...............................................................................................16 Dedicated Engine machine.................................................................................17 Parallel processing and grid Engine configurations.......................................19

Advanced Topologies ............................................................................................. 20

Highly scalable Services tier ..............................................................................20 Highly scalable Services tier with parallel processing and grid Engine configurations ......................................................................................................22 Advanced high availability ................................................................................23 Advanced high availability with parallel processing and grid Engine configurations ......................................................................................................24

Conclusion ...................................................................................................................... 27

Appendix: Topology scorecards .................................................................................. 27

Further reading .............................................................................................................. 30

Contributors ............................................................................................................. 30

Notices ............................................................................................................................. 31

Trademarks .............................................................................................................. 32

Page 3: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 3

Background A successful deployment of IBM InfoSphere Information Server must include a topology that meets customer expectations, such as performance, ease of maintenance, security, high availability, and scalability. Determining the optimal deployment is a complex task that requires a firm understanding of customer requirements and hands-on deployment experience with InfoSphere Information Server. What works well for one customer might be a poor choice for others. Customers are trending toward an increased number of topologies, which further complicates the process of identifying a suitable topology. For example, InfoSphere Information Server, Version 8.5 introduces various high availability solutions ranging from a basic active/passive failover cluster configuration to more advanced high availability solutions such as IBM WebSphere Application Server Network Deployment and IBM DB2 HADR.

Summary This document provides a blueprint on how to design an ideal topology for InfoSphere Information Server based on a set of available resources (such as hardware and skills) and a set of functional requirements (such as high availability, scalability, and simplicity). Each of these variables represents different dimensions of a topology. A change in any of these dimensions can greatly impact the resulting topology, so identifying and quantifying these dimensions is important to remain within set constraints while meeting expectations. After these dimensions are defined, a particular topology, or perhaps a family of topologies, is expected to emerge almost naturally when following the guidance that is outlined in this document. Dimensions often compete with one another. For example, scalability is rarely associated to a single computer environment and high availability and simplicity are typically opposites. As a general rule, the greater the functional requirements, the greater the cost and the more sophisticated the topology. Therefore, do not expect to achieve the highest levels of performance, high availability, scalability and simplicity at a minimal cost.

Foreword This document assumes that the reader has a solid understanding of the overall InfoSphere Information Server architecture and the various tiers (Metadata Repository, Services, Engine and Client). See ‘Understanding tiers and components’ in Further reading for more information.

Assess your needs The first step is to evaluate the overall system expectations or requirements, mainly in terms of high availability, performance and scalability, and simplicity. Additional requirements might need to be clarified (such as security), but the aforementioned requirements dictate the core of the design. Remain conservative when assessing your functional requirements. Every requirement that is added to the system has a cost, so if the requirement is not currently needed, do not factor it into the design.

Page 4: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 4

The following questions are important to consider when quantifying each requirement:

- If high availability is a requirement, what is the level of downtime that can be tolerated? Is 20 minutes acceptable?

- If scalability is a requirement, how many concurrent clients or data integration jobs need to be serviced at any given time? In the one digit range? Two or three digits range?

- If simplicity is a requirement, how do you define simplicity? Is running and maintaining InfoSphere Information Server on an environment with two or three computers considered simple?

After assessing your requirements, conduct capacity planning to ensure that your anticipated needs can be accommodated. Make sure to size your memory, disk space, and database needs.

High availability High availability refers to the level of operational performance that is ensured by the system during a specific time period. InfoSphere Information Server provides several high availability solutions, each with varying degrees of guaranteed availability and applicability. The tiers in the InfoSphere Information Server suite (Metadata Repository, Services, and Engine tiers) are based on different technologies, so not all high availability solutions can be applied to all tiers. Consider high availability as a suite requirement rather than a tier requirement for a given environment type (development, test, or production). If InfoSphere Information Server must be highly available, then all tiers must be configured with a high availability solution. Otherwise, if one tier is not configured for high availability, then that tier becomes a single point of failure for the overall system, which defeats the purpose of configuring the other tiers for high availability. Again, this configuration applies for a given environment type only: it is perfectly acceptable to have a development environment with no high availability capabilities and a production environment that is configured for high availability.

- Active/passive failover cluster – This solution requires a shared file system (where the InfoSphere Information Server binaries are installed), two separate computers (one active, one passive), a virtual IP address (floating between the active and passive computers) and a high availability software that manages the entire failover process. InfoSphere Information Server is initially started on the primary server. When this server fails, the high availability software triggers a failover of the InfoSphere Information Server processes to the backup server, which effectively becomes the active server. In an InfoSphere Information Server environment, the failover process can take up to 20 minutes to complete, so some downtime must be tolerable. IBM HACMP and Tivoli System Automation are examples of high availability software supported for this solution. This solution can be applied to all of the InfoSphere Information Server tiers.

- IBM WebSphere Application Server Network Deployment – This solution is typically more costly (hardware, skills) than the previous solution but offers a greater level of high availability. This solution provides an active/active N-to-N cluster (see the ‘High Availability’ reference for more information) for services that are running in WebSphere Application Server. In most cases, automatic and transparent failover can be achieved by the WebSphere Application Server, guaranteeing no downtime. This solution can only be applied to the Services tier of InfoSphere Information Server.

Page 5: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 5

- IBM DB2 HADR with Automatic Client Reroute – This solution turns an IBM DB2 database into a highly efficient active/passive cluster (or active/standby cluster) where data replication between a primary and a standby database is performed periodically to protect against data loss. Several data replication settings exist (synchronous, near-synchronous, and asynchronous) depending on the amount of data loss that can be tolerated versus the level of performance that is required. Combined with the Automatic Client Reroute facility, a DB2 HADR database provides a greater level of high availability than the basic active/passive failover cluster because the standby database is already running, and the Automatic Client Reroute facility ensures a transparent and speedy failover. This solution can only be applied to the Metadata Repository tier of InfoSphere Information Server. Note: InfoSphere Information Server also supports Oracle Real Application Clusters (RAC), which are more of an active/active cluster of Oracle RDBMS databases that share a single database. The Oracle RAC is a very different architecture than IBM DB2 HADR, but the level of high availability achieved is comparable.

The following table summarizes the main characteristics of the high availability solutions that are supported by InfoSphere Information Server.

Table 1: High availability solutions supported by InfoSphere Information Server

Solution Failover Level of high availability

Applicability

Active/passive failover Automatic but slow (downtime must be tolerated)

Basic All tiers

IBM WebSphere Application Server Network Deployment

Automatic and instantaneous (no downtime)

Advanced Services tier

IBM DB2 HADR with Automatic Client Reroute

Automatic and instantaneous (no downtime)

Advanced Metadata Repository tier

Performance and scalability Performance refers to the rate at which work is being processed by the system. Scalability refers to the system’s ability to handle growing amounts of work. In the context of InfoSphere Information Server, the term work typically refers to data integration job designs, metadata management, and data integration job executions. Performance and scalability are treated equally in this document to simplify the topology discussion without jeopardizing the end result. Performance and scalability should be researched more thoroughly than other dimensions. For a given topology, performance and scalability can vary greatly depending on how InfoSphere Information Server is being used in conjunction with the suite products that are being installed (such as IBM InfoSphere DataStage, IBM InfoSphere QualityStage, or IBM InfoSphere Information Analyzer) and the type and scale of environment that you plan to run.

Page 6: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 6

A typical example is InfoSphere DataStage job design versus InfoSphere DataStage job execution. These usages of InfoSphere Information Server are very different and stress the tiers in their own way, leading to different topologies. The following table shows how the tiers are being stressed based on particular InfoSphere Information Server usages.

Table 2: How InfoSphere Information Server tiers are stressed based on usage

Product Engine Tier Services Tier Metadata Repository Tier

IBM InfoSphere Business Glossary

N/E (Not Exercised) Medium Medium

IBM InfoSphere Business Glossary Anywhere

N/E Medium Medium

IBM InfoSphere Business Glossary Browser

N/E Medium Medium

InfoSphere Information Server Console

N/E Heavy Heavy

InfoSphere Information Server Web Console

N/E Medium Medium

InfoSphere Metadata Workbench

N/E Heavy Heavy

InfoSphere Asset Interchange (istool)

Light Heavy Heavy

InfoSphere DataStage or InfoSphere QualityStage Designer

Light Heavy Heavy

InfoSphere FastTrack Light Medium Medium InfoSphere Information Analyzer Client

Light Medium Medium

InfoSphere Information Server Manager

Light Heavy Heavy

InfoSphere Information Services Director

Light Heavy Heavy

InfoSphere DataStage or InfoSphere QualityStage Administrator

Light Light Light

InfoSphere DataStage or InfoSphere QualityStage Director

Light N/E N/E

InfoSphere DataStage Job Execution

Heavy Light Light

InfoSphere Information Analyzer Execution (not Reports)

Heavy Light Light

InfoSphere Information Services Director Execution

Heavy Heavy Light

Page 7: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 7

Typically, development environments consist of InfoSphere Information Server clients that interact with InfoSphere Information Server to design data integration jobs and other artifacts (such as the InfoSphere Datastage Designer) or manage metadata (such as the InfoSphere Business Glossary or InfoSphere Metadata Workbench clients). These interactions tend to exercise the Services and Metadata Repository tiers most. Test and production environments primarily run data integration jobs (such as InfoSphere DataStage job executions) and tend to stress the Engine tier more. Make sure that you understand how you intend to use InfoSphere Information Server so that you know which tiers require a particular focus with respect to performance and scalability. The InfoSphere Information Server suite includes at least ten different products, each stressing the various tiers differently. An InfoSphere Information Server installation might include any of these products, so numerous unique combinations exist. Use the previous table as a reference to help you understand and quantify how each tier is stressed based on your particular usage of InfoSphere Information Server. Additionally, ensure that you understand the scale of the InfoSphere Information Server environment that you intend to deploy. The scale of an InfoSphere Information Server environment is defined by the number of clients or job executions that are concurrently being serviced by InfoSphere Information Server. Experience shows that environments typically fall between small and large, where small refers to five or less concurrent clients or concurrent jobs and large refers to more than five concurrent clients or concurrent jobs. Try to quantify the scale of your environment. For development environments, estimate the number of concurrent clients that are expected to interact with InfoSphere Information Server at any time. For test and production environments, size the number of jobs and other runtime artifacts that are executed concurrently. After you understand your performance and scalability requirements, you can determine an additional performance or scalability boost. InfoSphere Information Server provides a few performance optimization solutions that are targeted for environments where high levels of performance are expected, each with varying degrees of applicability.

- Parallel processing and grid configurations – This solution can only be applied to the Engine tier of InfoSphere Information Server. Although architecturally different, both of these configurations aim to maximize the performance and throughput of the Engine tier by distributing and parallelizing job executions across multiple processors. Refer to ‘Parallel processing and grid topologies’ in Further reading for more information on parallel processing and grid configurations.

- IBM WebSphere Application Server Network Deployment – Along with increasing availability, this solution also improves scalability by spreading the work on multiple clustered WebSphere Application Server instances as opposed to a single, standalone WebSphere Application Server instance. This solution can only be applied to the Services tier of InfoSphere Information Server.

Table 3: Performance and scalability optimization solutions supported by InfoSphere Information Server

Solution Benefits Applicability Parallel processing and grid configurations Improved throughput Engine tier IBM WebSphere Application Server Network Deployment

Improved scalability Services tier

Page 8: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 8

Ease of installation and manageability (a.k.a. simplicity) Simplicity encapsulates ease of deployment, ease of configuration, ease of maintenance, and serviceability. Simplicity refers to quickly deploying an InfoSphere Information Server installation and easily maintaining it. Simplicity might not be a first-class requirement because it is often assumed that customers want their topology to be simple. However, the level of complexity generally increases as more focus is placed on other dimensions such as high availability and scalability. A byproduct of increased complexity is increased cost (hardware, skills, and time), so it is important to consider simplicity as one of the dimensions that must be assessed when designing a topology for InfoSphere Information Server. The simplest InfoSphere Information Server deployment includes all tiers running in standalone mode (no high availability solution, no clustering, and no grid) on one single computer. This topology requires minimal installation, configuration, and maintenance efforts because all components are collocated. Anything that diverges from this configuration increases the overall level of complexity. Use this basic InfoSphere Information Server configuration as a reference point whenever assessing the complexity of a particular topology. Ensure that you can absorb any added complexity and its associated hidden costs.

Assess your means Hard constraints must be addressed when designing a topology. You must ensure that adequate resources are available, especially hardware resources and technical expertise. Both of these requirements must be met to ensure that the topology can succeed. Do not design a topology based on resources that are not yet accounted for. Plan your topology based on the resources that are currently available rather than what might be available in the future. Adding hardware to an existing topology might require a complete revision of the topology, and quickly educating a team on high availability and scalability technologies is not a trivial task.

Hardware Evaluate all available computers and ensure that they meet the system requirements for InfoSphere Information Server. A computer refers to a physical computer, LPAR, or virtual environment. A single InfoSphere Information Server deployment can require anywhere between one computer to dozens of computers, depending mostly on the high availability, performance, and scalability needs of the topology.

Technical expertise Evaluate the technical expertise in areas such as high availability, scalability, and networking. Complex InfoSphere Information Server deployments can require expert skills during the initial installation and configuration phase and also during regular operational hours for miscellaneous maintenance and administration tasks. Ensure that you understand these considerations before designing a complex topology.

Page 9: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 9

Start shopping After you understand the resources and requirements, you can scrutinize the supported InfoSphere Information Server topologies and determine which ones fit best. InfoSphere Information Server supports a wide range of topologies, and singling out a specific topology is not easy. However, this task is greatly simplified when a thorough list of requirements and resources has been created. The following section is not a comprehensive list of supported InfoSphere Information Server topologies. Rather, this list highlights the most popular families of topologies that, over time, have proven to yield the best results and are often recommended to customers by IBM consulting teams. If you cannot locate a suitable topology in the following list, then you might have overlooked or misunderstood your requirements during the assessment phase. In all instances, contact an IBM representative to validate your topology before moving forward. Topologies are grouped into four main families based on their architectural and requirements similarities.

Basic Parallel processing and grid engine Basic high availability Advanced

Basic topologies are installations of InfoSphere Information Server that do not provide any high availability capabilities, advanced job processing, or scalability optimizations. Basic topologies provide the foundation on which the other topology families are built on and in many cases are sufficient. Parallel processing and grid engine topologies allow InfoSphere Information Server to achieve higher levels of job processing performance while the basic high availability topologies turn InfoSphere Information Server into an entry-level highly available platform. The advanced topologies push InfoSphere Information Server to its limits in areas such as performance, scalability, and high availability. Within each topology family, a few popular topologies are described and rated based on the various dimensions that are mentioned previously:

- Zero (VERY POOR) - A dimension rated zero indicates that the topology is a very poor fit and should not be considered if the dimension is part of your specifications list.

- One (POOR) - A dimension rated one indicates that the topology is a poor fit and should only be considered if the dimension is not high in your specifications list.

- Two (GOOD) - A dimension rated two indicates that the topology is a good fit and should be considered if the dimension is high in your specifications list.

- Three (VERY GOOD) - A dimension rated three indicates that the topology is an excellent fit and should be considered if the dimension is high in your specifications list.

Read through the list of topologies and see which fits your needs best. It is unrealistic to find a topology that rates three (very good) on all of the dimensions that you are interested in (keep in mind that dimensions often work against each other), so do your best to find a topology that successfully addresses

Page 10: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 10

the dimensions at the top of your specifications list but might not rate as high on other dimensions on your specifications list.

Basic topologies

Single computer Characteristics: - Hardware: One computer - Tier configuration: All tiers collocated on computer

Figure 1. Topology with all tiers collocated on one computer

As mentioned previously, this topology is the simplest because it requires minimal hardware and minimal skills. All tiers are installed on the same computer, which further extends the simplicity of the design. This layout ensures a speedy and straightforward installation and configuration phase while minimizing maintenance work and increasing overall serviceability. This simple yet powerful topology is ideal for demos and small scale environments where scalability is not the main concern. This topology is also great for many other types of environments, especially environments in which the Services and Metadata Repository tiers are not exercised much, so experiment with this topology before jumping to more sophisticated ones. All runtime components run in standalone mode (as opposed to cluster or grid environments), so this topology does not offer any high availability capabilities.

Table 4: Dimensions for a topology with a single computer

Dimension Rating Minimal Hardware Required 3 Minimal Skills Required 3 High Availability 0

Page 11: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 11

Simplicity 3 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 1 (for environments that stress mostly the

Services and Metadata Repository tiers) 2 (for environments that stress mostly the Engine tier)

Dedicated Engine computer Characteristics:

- Hardware: Two computers - Tier configuration:

o Engine tier on dedicated computer o Services/Metadata Repository tiers collocated on another computer

Figure 2. Topology with tiers split between two computers, one dedicated for the Engine tier

Along with the Single computer topology, this topology is very effective when high availability is not a requirement. Some simplicity is sacrificed for increased scalability on larger environments, making this topology ideal for larger scale environments that have no high availability requirements.

Table 5: Dimensions for a topology with a dedicated engine computer

Dimension Rating Minimal Hardware Required 2 Minimal Skills Required 3 High Availability 0 Simplicity 2 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 2

Page 12: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 12

Dedicated computer for each tier Characteristics:

- Hardware: Three computers - Tier configuration: All tiers configured on dedicated computers

Figure 3. Topology with tiers split between three computers

This topology is typically not recommended because it does not offer any benefits compared to the previous topologies. This topology increases the cost (hardware, complexity, and technical expertise) for no tangible added value. This topology does not provide any high availability capabilities and ranks below most other topologies in terms of performance and scalability. In some instances, organizational constraints within a company (usually large companies) require that the tiers be deployed on separate computers, perhaps managed by separate teams or system administrators, making this topology a relatively popular one.

Table 6: Dimensions for a topology with a dedicated computer for each tier

Dimension Rating Minimal Hardware Required 1 Minimal Skills Required 2 High Availability 0 Simplicity 2 Performance and Scalability – Small Environments 2 (for environments that stress mostly the

Services and Metadata Repository tiers) 3 (for environments that stress mostly the Engine tier)

Performance and Scalability – Large Environments 2

Page 13: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 13

Parallel Processing and Grid Engine Topologies

All tiers collocated, including the Engine conductor node Characteristics:

- Hardware: One computer with multiple processors for SMP configurations; alternatively, two or more computers for MPP or cluster configurations

- Tier configuration: o Engine tier configured for parallel or grid processing. The conductor node is

collocated with the other two tiers. Depending on whether a parallel processing or grid configuration is configured, the compute nodes can be on the same computer or dedicated computers. The following graphic depicts a parallel processing configuration with the conductor and compute nodes on the same computer.

o Services/Repository tiers on same computer as the Engine conductor node

Figure 4. Topology with all tiers on a single computer, plus conductor and compute nodes

This topology is an enhanced version of the Single computer topology in which the Engine tier is configured as either a parallel processing or grid configuration. This topology maximizes the performance and throughput of the Engine tier by distributing job executions across multiple processors. For optimal performance, consider offloading the compute nodes to dedicated computers (cluster or grid). Scalability and high availability are not improved with this type of configuration.

Page 14: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 14

This topology is an ideal choice for InfoSphere Information Server deployments that have strong throughput requirements on job executions at the Engine tier level. That is, deployments where business constraints mandate that a certain amount of jobs must complete within a particular timeframe.

Table 7: Dimensions for a topology with all tiers collocated, including the Engine conductor node

Dimension Rating Minimal Hardware Required 1 Minimal Skills Required 1 High Availability 0 Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 1 (for environments that stress mostly the

Services and Metadata Repository tiers) 3 (for environments that stress mostly the Engine tier)

Dedicated computer for Engine conductor node Characteristics:

- Hardware: At least two computers - Tier configuration:

o Engine tier configured for parallel or grid processing with the conductor node on a dedicated computer. Depending on whether parallel processing or grid processing is configured, the compute nodes can be on the same computer or dedicated computers. The following graphic depicts a parallel processing configuration with the conductor and compute nodes on the same computer.

o Services/Repository tiers collocated on another computer

Page 15: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 15

Figure 5. Topology with tiers split between two computers; one dedicated for the engine tier configured for parallel or grid processing

This topology is an enhanced version of the Dedicated Engine computer topology where the Engine tier is configured as either a parallel processing or grid configuration. As with the All tiers collocated, including the Engine conductor node topology, this topology maximizes the performance and throughput of the Engine tier by distributing job executions across multiple processors. For optimal performance, consider offloading the compute nodes to dedicated computers (cluster or grid). This topology maximizes the performance of the Services and Metadata Repository tiers by providing dedicated hardware for these tiers. Scalability and high availability are not improved with this configuration. While the previous topology is ideal for InfoSphere Information Server deployments that have strong performance requirements on the Engine Tier, this topology is ideal for InfoSphere Information Server deployments with strong performance needs on all tiers and is typically better suited for larger scale deployments.

Table 8: Dimensions for a topology with a dedicated computer for the Engine conductor node

Dimension Rating Minimal Hardware Required 1 Minimal Skills Required 1 High Availability 0 Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 2 (for environments that stress mostly the

Services and Metadata Repository tier) 3 (for environments that stress mostly the Engine tier)

Page 16: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 16

Basic High Availability Topologies

All tiers collocated Characteristics:

- Hardware: Two computers plus one SAN - Tier configuration: All tiers collocated on one computer - High availability technology:

o Active/passive failover solution setup for all tiers. Failover is completed on second computer.

Figure 6. Topology with all tiers on one computer, one computer for failover, and one SAN

The All tiers collocated topology is the entry-level topology for InfoSphere Information Server deployments that have high availability requirements. This topology takes the Single computer topology and complements it with the active/passive failover solution so that the tiers fail over to a secondary computer if the primary computer fails.

Page 17: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 17

This topology is best suited for small scale environments that have a high availability need and can tolerate a certain amount of downtime, because the passive computer must be booted if a failover occurs. Among the small scale InfoSphere Information Server deployments, this topology is very popular and rates high on the performance and high availability dimensions while considering the hardware, skills, and complexity dimensions.

Table 9: Dimensions for a topology with all tiers collocated plus active/passive failover

Dimension Rating Minimal Hardware Required 2 Minimal Skills Required 2 High Availability 2 Simplicity 1 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 1 (for environments that stress mostly the

Services and Metadata Repository tiers) 2 (for environments that stress mostly the Engine tier)

Dedicated Engine computer Characteristics:

- Hardware: Two computers plus one SAN - Tier configuration:

o Engine on dedicated computer o Services/Metadata Repository tiers collocated on another computer

- High availability technology: o Active/passive failover solution configured for the Engine tier on one side and

the Services/Metadata Repository tier on the other side. The computers fail over to each other.

Page 18: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 18

Figure 7. Topology with one computer for the Services tier, one computer for the Engine tier, and one SAN

This topology builds on the Dedicated Engine computer topology from the ‘basic topologies’ family by adding the active/passive failover solution. This topology is an alternative to the previous topology because it offers the same levels of high availability without compromising the hardware, skills, and simplicity dimensions. This topology works well for most environments because it allows the various tiers to scale optimally. Performance and scalability might be impacted if a failover occurs because all tiers are collocated and are competing for the same computer resources after a failover completes.

Table 10: Dimensions for a topology with a dedicated engine computer, plus active/passive failover

Dimension Rating Minimal Hardware Required 2 Minimal Skills Required 2 High Availability 2 Simplicity 1 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 2

Page 19: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 19

Parallel processing and grid Engine configurations Characteristics:

- Hardware: At least two computers plus one SAN - Tier configuration:

o Engine tier configured for parallel or grid processing. The conductor node is collocated with the other two tiers. Depending on whether a parallel processing or grid configuration is configured, the compute nodes can be on the same computer or dedicated computers. The following graphic depicts a parallel processing configuration with the conductor and compute nodes on different computers.

o Services/Repository tiers on the same computer as the Engine conductor node. - High availability technology:

o Active/passive failover solution configured for the Engine conductor node and the Services/Metadata Repository tiers. The Engine compute nodes are not configured for high availability.

Figure 8. Topology with one computer for the Services tier, one computer for the Engine tier, one computer for failover, and one SAN

Page 20: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 20

You can configure this topology in various ways depending on how the tiers and Engine nodes are configured. For example, you could place all tiers on the same computer, have a dedicated Engine tier computer, and decide whether you want the Engine conductor node and compute nodes collocated. This topology offers the combined benefits of the All tiers collocated solution and the All tiers collocated, including the Engine conductor node solution. This topology works well for InfoSphere Information Server deployments that require high efficiency at the Engine tier and some level of high availability where some downtime can be tolerated. You should not consider this topology if you do not expect the Engine tier to be exercised much. The downside to this topology is an increased complexity and cost due to the hardware and technical expertise that is required to configure and maintain this environment.

Table 11: Dimensions for a topology with parallel processing and grid Engine configurations

Dimension Rating Minimal Hardware Required 0 Minimal Skills Required 0 High Availability 2 Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 1 (this is the worse case: all tiers, including

Engine compute nodes, are collocated) 2 or 3 (this is the best case: Services and Metadata Repository are on dedicated computer, Engine tier is on dedicated computer with compute nodes also on dedicated computers)

Advanced Topologies

Highly scalable Services tier Characteristics:

- Hardware: Six or more computers - Tier configuration: All tiers on dedicated computers - High availability/scalability technology:

o WebSphere Application Server Network Deployment cluster configured for the Services tier

Page 21: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 21

Figure 9. Topology with one computer for each tier plus a WebSphere Application Server Network Deployment cluster configured for the Services tier

This topology is the first that introduces the WebSphere Application Server Network Deployment cluster technology. In this setup, the Services tier is configured as a WebSphere Application Server Network Deployment cluster, primarily as a means to increase the scalability of the Services tier rather than a means to increase the overall availability of the InfoSphere Information Server installation. Because the Engine and Metadata Repository tiers are not configured for high availability, this topology is not suited for environments that have high availability needs. High availability should be configured for all tiers or no tiers at all. This topology is suited for environments that exercise mostly the Services tier and have strong scalability requirements for that tier.

The complexity and hardware cost associated with this topology are high. A minimum of six computers are required:

- One computer for the Engine tier - One computer for the Metadata Repository tier - Three or more computers for the Services tier, configured as a WebSphere Application Server

Network Deployment cluster (the WAS Deployment Manager and managed nodes are deployed on dedicated computers)

- One computer for the front-end dispatcher (typically a web server) that spreads requests to the WebSphere Application Server Network Deployment cluster (this computer is not depicted in the previous graphic)

Table 12: Dimensions for a topology with a highly scalable Services tier

Dimension Rating Minimal Hardware Required 0 Minimal Skills Required 0 High Availability 1 (only the Services tier is highly available) Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 2 (for environments that stress mostly the

Engine tier) 3 (for environments that stress mostly the Services tier)

Page 22: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 22

Highly scalable Services tier with parallel processing and grid Engine configurations

Characteristics: - Hardware: Six or more computers - Tier configuration:

o All tiers on dedicated computers o Engine tier configured for parallel or grid processing with the conductor

node on a dedicated computer. Depending on whether parallel processing or grid processing is configured, the compute nodes can be on the same computer or dedicated computers.

- High availability/scalability technology: o WebSphere Application Server Network Deployment cluster configured for

the Services tier

Figure 10. Topology with one computer for each tier, a WebSphere Application Server Network Deployment cluster configured for the Services tier, plus the conductor and compute nodes on a dedicated computer

This topology adds a parallel processing or grid Engine configuration to the Highly scalable Services tier topology. The end result is a topology that optimizes throughput and scalability at the Engine and Services tiers respectively, which is ideal for environments that have strong performance requirements. This topology does not improve high availability, and the complexity, hardware cost, and amount of technical expertise required are high.

Table 13: Dimensions for a topology with a highly scalable Services tier with parallel processing and grid Engine configurations

Dimension Rating Minimal Hardware Required 0 Minimal Skills Required 0

Page 23: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 23

High Availability 1 (only the Services tier is highly available) Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 3

Advanced high availability Characteristics:

- Hardware: Eight or more computers plus one SAN - Tier configuration: All tiers on dedicated computers - High availability technology:

o Active/passive failover solution configured for the Engine tier o IBM DB2 HADR with Automatic Client Reroute configured for the Metadata

Repository tier o WebSphere Application Server Network Deployment cluster configured for

the Services tier

Figure 11. Topology with all tiers on dedicated computers, one SAN, and active/passive failover configured

Page 24: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 24

This topology seeks to maximize the high availability capabilities of InfoSphere Information Server. All tiers are configured with the most advanced high availability solutions. The cost in hardware is high because a minimum of eight computers are required:

- Two computers for the Engine tier, configured with the active/passive failover solution - Three or more computers for the Services tier, configured as a WebSphere Application Server

Network Deployment cluster (the WAS Deployment Manager and managed nodes are deployed on dedicated computers)

- One computer for the front-end dispatcher (typically a web server) that spreads requests to the WebSphere Application Server Network Deployment cluster (this computer is not depicted in the previous graphic)

- Two computers for the Metadata Repository tier that runs on IBM DB2 HADR with the Automatic Client Reroute.

In conjunction with hardware costs, this topology also requires strong technical expertise and a great deal of complexity. You can adapt this topology as needed to meet available resources and specific requirements. For example, in an effort to reduce hardware cost, you can consolidate some of the tiers and components on fewer computers. You could also collocate some of the Services tier cluster nodes with the DB2 HADR instances of Metadata Repository tier or collocate the front-end web server with one of the Services tier nodes. Similarly, if performance and scalability at the Services tier is required more than high availability, then you might consider configuring the Metadata Repository tier with the active/passive failover solution rather than with the IBM DB2 HADR technology. The Engine and Metadata Repository tiers can be consolidated on the same computers to mitigate the cost of two additional computers.

Table 12: Dimensions for a topology with eight or more computers, plus active/passive failover

Dimension Rating Minimal Hardware Required 0 Minimal Skills Required 0 High Availability 3 Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 2 (for environments that stress mostly the

Engine tier) 3 (for environments that stress mostly the Services and Metadata Repository tiers)

Advanced high availability with parallel processing and grid Engine configurations

Characteristics: - Hardware: Nine or more computers plus one SAN - Tier configuration:

o All tiers on dedicated computers o Engine tier configured for parallel or grid processing. Depending on whether

a parallel processing or grid configuration exists, the compute nodes can be located on the same computer or dedicated computers. The following

Page 25: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 25

graphic depicts a parallel processing configuration with the conductor and compute nodes on different computers.

- High availability technology: o Active/passive failover solution configured for the Engine tier o IBM DB2 HADR with Automatic Client Reroute, configured for the Metadata

Repository tier o WebSphere Application Server Network Deployment cluster, configured for

the Services tier

Figure 13. Topology with all tiers on dedicated computers, one SAN, active/passive failover configured, and an Engine tier configured for parallel processing

This topology is the most complex because it adds a parallel processing or grid configuration to the Engine tier on top of the Advanced high availability topology. This topology supports great levels of high availability along with strong performance and scalability at the various tiers. The high complexity translates into a high cost in hardware and a strong requirement for technical expertise. As with the previous topology, numerous variations exist due to the number of combinations between the layouts of the tiers and their components. You can experiment with these combinations to meet

Page 26: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 26

specific needs, such as consolidating some of the tiers on a smaller set of computers to help reduce hardware costs.

Table 15: Dimensions for a topology with nine or more computers with parallel processing and grid Engine configurations

Dimension Rating Minimal Hardware Required 0 Minimal Skills Required 0 High Availability 3 Simplicity 0 Performance and Scalability – Small Environments 3 Performance and Scalability – Large Environments 3

Page 27: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 27

Conclusion Designing a well-rounded topology for InfoSphere Information Server is a complex task that is paramount to a successful deployment and customer experience. An erroneous or awkward design can lead to future complications, which are often difficult to rectify after the system is active. Designing a topology that meets your customer’s needs and accounts for available resources is critical to a successful implementation.

Following a methodical approach like the one described in this document helps to simplify the design process. You must conduct a thorough analysis and quantification of the resources and requirements. This step helps to measure what the system can handle and at what cost. Most importantly, a thorough analysis forces you to evaluate your constraints and remove any unrealistic expectations. Using this input data, you can begin planning your topology and pairing it with one of the well-known and proven topologies in this document.

Appendix: Topology scorecards The following tables summarize the various topologies and their dimension ratings that are covered in this document. For more details on each topology, refer to the corresponding topology section.

Table 16: Basic topologies

Topology Hardware requirements

Skills requirements

High availability

Simplicity Performance and scalability – Small Environments

Performance and scalability – Large Environments

Single computer

Very low Very low Very poor Very good Very good Moderate to good

Dedicated Engine computer

Low Very low Very poor Good Very good Good

Dedicated computer for each tier

Moderate Low Very poor Good Good to very good

Good

Table 17: Parallel processing and grid engine configurations

Topology Hardware requirements

Skills requirements

High availability

Simplicity Performance and scalability – Small Environments

Performance and scalability – Large Environments

Page 28: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 28

All tiers collocated, including the Engine conductor node

Moderate Moderate Very poor Very poor Very good Good to very good

Dedicated computer for Engine conductor node

Moderate Moderate Very poor Very poor Very good Good to very good

Table 18: Basic high availability topologies

Topology Hardware requirements

Skills requirements

High availability

Simplicity Performance and scalability – Small Environments

Performance and scalability – Large Environments

All tiers collocated

Low Low Good Moderate Very good Moderate to good

Dedicated Engine computer

Low Low Good Moderate Very good Good

Parallel processing and grid Engine configurations

High High Good Poor Very good Moderate to very good

Table 19: Advanced topologies

Topology Hardware requirements

Skills requirements

High availability

Simplicity Performance and scalability – Small Environments

Performance and scalability – Large Environments

Highly scalable Services tier

High Very high Poor Very poor Good to very good

Good to very good

Page 29: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 29

Highly scalable Services tier with parallel processing and grid Engine configurations

High Very high Poor Very poor Very good Very good

Advanced high availability

Very high Very high Very good Very poor Good to very good

Good to very good

Advanced high availability with parallel processing and grid Engine configurations

Very high Very high Very good Very poor Very good Very good

Page 30: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 30

Further reading Understanding tiers and components [InfoSphere Information Server, Version 8.5 information center] http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.productization.iisinfsv.install.doc/topics/wsisinst_pln_configurations.html High Availability [Wikipedia] http://en.wikipedia.org/wiki/High-availability_cluster

Planning for the installation of IBM InfoSphere Information Server [InfoSphere Information Server, Version 8.5 information center] http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.productization.iisinfsv.migrate.doc/topics/wsismig_migrate_intro.html Parallel processing and grid topologies [InfoSphere Information Server, Version 8.5 information center] http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.productization.iisinfsv.overview.doc/topics/cisoarchparalov.html Oracle RAC [Wikipedia] http://en.wikipedia.org/wiki/Oracle_RAC

Contributors The document authors would like to recognize the following individuals for their feedback on this document and their contributions to this topic:

Robert Johnston Application Architect, Information Management Data Integration for InfoSphere DataStage

Ernie Ostic Specialist, Worldwide Sales

Sriram Padmanabhan Distinguished Engineer, Information Management; Chief Architect for InfoSphere Information Server

Page 31: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 31

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

Without limiting the above disclaimers, IBM provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any recommendations or techniques herein is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environment do so at their own risk.

This document and the information contained herein may be used solely in connection with the IBM products discussed in this document.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual

Page 32: DB2 Best Practices - public.dhe.ibm.compublic.dhe.ibm.com/software/dw/data/bestpractices/IIS-Topologies.pdf · Typically, development environments consist of InfoSphere Information

Topology Design Page 32

results may vary. Users of this document should verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.