13
Dell Product Group Page 1 Updated May 2006 INFOBrief Dell PowerEdge Server Performance Computing Clusters Key Points Third generation of Dell’s High Performance Computing Cluster (HPCC) provides computational-intensive capacity leveraging the latest technology available in the market. Dell™ PowerEdge™ Server-based HPC clusters include Dell’s primary 1U rack server platforms such as PowerEdge 1950, PowerEdge SC1425 and PowerEdge 1855 for general purpose and high performance computing environment, where reliability, raw performance and low cost are the most important factors in choosing a compute server. In High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for parallel computing system platforms designed for demanding, compute intensive applications

hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 1 Updated May 2006

INFOBrief Dell PowerEdge Server

Performance Computing Clusters

Key Points

• Third generation of Dell’s High Performance Computing Cluster (HPCC) provides computational-intensive capacity leveraging the latest technology available in the market.

• Dell™ PowerEdge™ Server-based HPC clusters include Dell’s primary 1U rack server platforms such as PowerEdge 1950, PowerEdge SC1425 and PowerEdge 1855 for general purpose and high performance computing environment, where reliability, raw performance and low cost are the most important factors in choosing a compute server. In

High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for parallel computing system platforms designed for demanding, compute intensive applications

Page 2: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 2 June 2006

addition, the PowerEdge 2950 server can be configured as the master node for customers needing greater I/O and internal storage capacity over the PowerEdge 1950.

• HPCC is a cost effective method for delivering a parallel computing system platform, targeted towards compute- and data-intensive applications.

• Through Dell HPCC, users can aggregate standards-based servers and storage resources into powerful supercomputers to provide an inexpensive yet powerful solution.

• High Performance Computing Clusters (HPCCs) are popular methods for solving these complex problems because of their low price points and excellent scalability.

• Dell helps provide investment protection by offering solutions based on industry standard building blocks that can be re-deployed as traditional application servers as users integrate newer technology into their network infrastructures.

• Dell delivers high-volume, standards-based solutions into scientific and compute-intensive environments that can benefit from economies-of-scale, and add systems as requirements change.

• Dell’s technology and methodology are designed to provide high reliability, price/performance leadership, easy scalability and simplicity by bundling order codes for hardware, software and support services for 8 and up to 256 node clusters.

Product Description

The concept of HPCC or “Beowulf” (the project name used by original designers) clusters originated at the Center of Excellence in Space Data and Information Sciences (CESDIS), located at the NASA Goddard Space Flight Center in Maryland. The project’s goal was to design a cost-effective, parallel computing cluster built from off-the-shelf components that would satisfy the computational requirements of the earth and space sciences community. As cluster solutions have gained acceptance for solving complex computing problems, High Performance Computing Clusters (HPCC) are starting to replace supercomputers in this role. The cost of commodity HPCC systems has changed a purchase decision from evaluating expensive proprietary solutions, where cost was not the primary issue, to evaluating vendors based on their ability to deliver exceptional price-to-performance ratios and support capabilities.

Compute Nodes

Master Node

ExternalStorage

Compute Nodes

Master Node

ExternalStorage

Page 3: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 3 June 2006

Logical View of a High Performance Computing Cluster

The strategy behind parallel computing is to “divide and conquer.” By dividing a complex problem into smaller component tasks that can be worked on simultaneously, the problem can often be solved more quickly. This can help save time and resources, as well as monetary costs. Dell’s HPCC uses a multi-computer architecture, as depicted in Figure 1. It features a parallel computing system that consists of one master node and multiple compute nodes connected via standard network interconnects. All of the server nodes in a typical HPCC run an industry standard operating system, which typically offers substantial savings over proprietary operating systems.

The master node of the cluster acts as a server for the Network File System (NFS), job-scheduling tasks, security, and acting as a gateway to end-users. The master node assigns each of the compute nodes with one or more tasks to perform as the larger task is broken into sub-functions. As a gateway, the master node allows users to gain access to the compute nodes.

The sole task of the compute nodes is to execute assigned tasks in parallel. A compute node does not have a keyboard, mouse, video card, or monitor. Access to client nodes is provided via remote connections through the master node.

From a userʹs perspective, a HPCC appears as a Massively Parallel Processor (MPP) system. Common methods of using the system are to access the master node either directly or through Telnet or remote login from personal workstations. Once logged onto the master node, users can

Figure 1

Logical View of High Performance Computing Cluster

Linux

Parallel Applications

Master Node

File Server/gateway

Compute nodes

Message Passing LibraryCluster

Management Tools

Page 4: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 4 June 2006

prepare and compile their parallel applications and spawn jobs on a desired number of compute nodes in the cluster.

In addition to compute nodes and master nodes, key components of HPCC include: systems management utilities, applications, file systems, interconnects, and storage and software solution stacks.

• Dell OpenManage Systems Management Because HPCC systems can consist of many nodes, it is important to be able to monitor and manage these nodes from a single console. It is possible to have thousands of nodes within one cluster. To help manage such a sizable cluster, Dell OpenManage systems management utilities are designed to provide system discovery, event filtering, systems monitoring, proactive alerts, inventory and asset management as well as remote manageability for the compute nodes and master nodes.

DRAC

• The Dell Remote Access Controller (DRAC ) is a systems management hardware and software solution designed to provide remote management capabilities, crashed system recovery, and power control functions for Dell PowerEdge systems.

• By communicating with the systemʹs baseboard management controller (BMC), the DRAC can be configured to send email alerts for warnings or errors related to voltages, temperatures, and fan speeds. The DRAC also logs event data and the most recent crash screen (for systems running the Microsoft® Windows® operating system only) to help diagnose the probable cause of a system crash.

The DRAC has its own microprocessor and memory, and is powered by the system in which it is installed. The DRAC may be preinstalled on customer systems, or available separately in a kit.

BMC

In addition, the integrated base management controller (BMC) for system monitoring and management is IPMI 2.0 compliant.

The BMC enables multi-vendor server management by standardizing management hardware, monitoring, alerting and communications. Key Features include: • Pro-active monitoring of server hardware. • Alerting on potential and actual faults. • Network and serial access to control server power / reset. • Continuous operation regardless of server status.

Page 5: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 5 June 2006

• Applications Applications may be written to run in parallel on multiple systems and use the message-passing programming model. Jobs of a parallel application are spread across compute nodes, which work collaboratively until the jobs are complete. During the execution, compute nodes use standard message-passing middleware to coordinate activities and information passing.

• Parallel File System A scalable parallel file system is used as a high-performance, large parallel file system for temporary storage and as an infrastructure for parallel I/O research. A parallel file system stores data on the existing local file systems of multiple cluster nodes, enabling many clients access to the data simultaneously. Within a HPC cluster, a parallel file system enables high-performance I/O that is comparable to that of other proprietary file systems. Dell’s bundles are available with the IBRIX file system.

• Interconnect To communicate with each other, the cluster nodes are connected through a network. The interconnect technology chosen depends on the amount of interaction between nodes when an application is executed. Some applications are similar to batch environments, and the communication between compute nodes is limited. For these environments, Fast Ethernet may be adequate. However, in environments that require more frequent communication, a Gigabit Ethernet interconnect is preferable.

Some applications can also benefit from special interconnects that have been designed to provide high-speed and low latency between the compute nodes. For these applications, Dell’s bundles are available with Myricom’s Myrinet and Cisco’s Infiniband products.

• High Performance Computing Cluster Solution Stack Dell partners with Platform Computing to deliver the HPCC software stack. The Platform Rocks stack includes the job-scheduler, cluster management, message passing libraries, and compilers. The IBRIX file system is supported.

High Performance Computing Market Target markets for high performance computing clusters are: higher education, large corporations, federal government, and technology sectors that require high performance computational computing. Industry examples include: oil and gas, aerospace, automotive, chemistry, national security, financial and pharmaceutical.

Page 6: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 6 June 2006

Typical high computation applications include: war and airline simulations, financial modeling, molecular modeling, fluid dynamics, circuit board design, ocean flow analysis, seismic data filtering, and visualizations. Applications that use HPC clusters and their specific vertical markets can be found in Table 1. Table 1 Vertical Markets Appropriate for HPCC

Vertical Description of Requirements Typical Applications

Manufacturing Crash worthiness, stress analysis, shock and vibe, aerodynamics

Fluent, Radioss, Nastran, Ansys, Powerflow

Energy Seismic processing, geophysical modeling, reservoir modeling

VIP, Eclipse, Vertias

Life Sciences Drug design, bioinformatics, DNA mapping, disease research

Blast, Charmn, NAMD, PC-Gamess, Gaussian

Digital Media Render Farms Renderman, Discreet

Finance Portfolio Management (Monte Carlo simulation), risk analysis

Barra, RMG, Sungard

Dell’s bundled HPCC solutions target customers with varying levels of expertise, from complete turnkey solutions -- including hardware and software - to easy-to-order hardware-only bundles. For those who do require a complete solution, Dell also offers consulting assistance and implementation services. Features and Benefits

New! Dual core Xeon processors, up to 32GB of fully-buffered DIMM memory, PCI-Express™ I/O slots, and a choice of Serial Attach SCSI (SAS) or SATA drives

The Dell High Performance Computing Cluster leverages many advantages of Dell’s product line, including server, storage, peripheral, and services components. By creating standard product offerings, Dell solutions are designed to help minimize configuration complexity. These standard packages consist of 8, 16, 32, 64 and 256 node configurations. Figure 2 provides an example of one of the Dell cluster bundles based on the PE1950 servers. Customers investigating larger cluster configurations should contact their Dell sales representative for assistance.

Page 7: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 7 June 2006

Figure 2: PowerEdge 1950, EM64T, Infiniband 256-node Cluster

The key technology features of a Dell High Performance Computing cluster configuration are shown in Table 2.

Page 8: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 8 June 2006

Table 2 The Key Technology Features of a Dell HPCC Configuration

Feature Function Benefit Full featured hardware configurations

Pre-bundled order codes for 8 node, 16 node, 32 node, 64 node 128 and 256 node configurations for 1U server clusters Pre-bundled order codes for 10 node, 20 node, 40 node, 70 node 130 and 260 node configurations for blade server clusters

Simplified ordering process and pre-qualified configurations

Servers- Master Node Options

PE1950, PE1850, PE2950 (PE2950 can be configured with up to 1.8TB of internal storage and 6 total I/O channels; available as master node alternative for compute clusters running PE1950 servers)

Flexibility to increase storage capacity on compute node

Servers - Compute Node Options

1U Servers: PE1950, PE1850, and SC1425

Blade Servers: PE1855

High performance compute node for the most challenging applications

High Density enables large clusters in a rack

Servers - I/O Node Options PE1950, PE1850 Helps to minimize I/O bottlenecks

Storage Device Options PowerVault™ 220S SCSI external storage device on the Master Node for primary storage (available with 8G servers) PowerVault MD1000 SAS (Available with 9G Master Node as primary storage )

Provides a cost effective method for a large amount of external storage capabilities that can be allocated across multiple channels for maximized I/O performance

Page 9: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 9 June 2006

Dell/EMC CX300, CX500, CX700 Fibre channel arrays on the master node for primary storage

Headless Operation The ability to operate a system without keyboard, video or mouse (KVM)

Simplifies cable management and helps lower cost of solution by eliminating monitors, keyboards and mouse

Operating System software pre-install

Factory installation of Red Hat Linux operating system

Facilitates setup of cluster configuration

Provides the capability to remotely power-on compute nodes over the Ethernet network

Remote management tool that can reduce system management workload, provide flexibility to the system administrator's job, and help save time-consuming effort and costs.

HPCC Software Solution Stack

Platform ROCKS

LAVA and LSF Job Schedulers

MKL libraries, BLAS Atlas MPI libraries NFS, and optional IBRIX file

systems

Dell tested tools for creating system environment for parallel computing infrastructure

Server Management Dell Open Manage Dell server monitoring and management tool

Page 10: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 10 June 2006

Key Customer Benefits

The performance of commodity computers and network hardware continually improves as new technology is introduced and implemented. At the same time, market conditions have led to decreases in the price of these components. As a result, it is now practical to build parallel computational systems based on low-cost, high-density servers, such as the Dell PowerEdge 1950, rather than buy CPU time on expensive supercomputers. Dell PowerEdge servers are tuned to take advantage of the existing server/OS/application combination. Dell PowerEdge server performance and price/performance are typically among the industry leaders on a variety of benchmark standard scales (TPC-C, TPC-W; SPECfp).

Low cost and high performance are only two of the advantages of using a Dell High Performance Computing Cluster solution. Other key benefits of HPCC versus large Symmetric Multi Processors (SMP) are shown in Table 3.

Table 3 Comparison of SMP and HPCC Environments

Large SMPs HPCC

Scalability Fixed Unbounded

Availability High High

Ease of Technology Refresh Low High

Application Porting None Required

Operating System Porting Difficult None

Service and Support Expensive Affordable

Standards vs. Proprietary Proprietary Standards

Vendor Lock-in Required None

System Manageability Custom; better usability

Standard; moderate usability

Application Availability High Moderate

Reusability of Components Low High

Disaster Recovery Ability Weak Strong

Installation Non-standard Standard

Page 11: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 11 June 2006

The features compared in Table 3 are defined as follows: • Scalability: The ability to grow in overall capacity and to meet higher

usage demand as needed. When additional computational resources are needed, servers can be added to the cluster. Clusters can consist of thousands of servers.

• Availability: The access to compute resources. To help ensure high availability, it is necessary to remove any single point of failure in the hardware and software. This helps to ensure that any individual system component, the system as a whole, or the solution (i.e., multiple systems) stay continuously available. A HPCC solution offers high availability because the components can be isolated and, in many cases, the loss of a compute node in the cluster does not have a large impact on the overall cluster solution. The workload of that node is allocated among the remaining compute nodes.

• Ease of Technology Refresh: Integrating a new processor, memory, disk, or operating system technology can be accomplished with relative ease. In HPCC, as technology moves forward, modular pieces of the solution stack can be replaced as time, budget and needs require or permit. There is no need for a one-time ʹswitch-overʹ to the latest technology. In addition, new technology is often integrated more quickly into standards-based volume servers than proprietary system providers.

• Service and Support: Total cost of ownership – including post-sales costs of maintaining the hardware and software – from standard upgrades to unit replacement to staff training and education, is generally much lower when compared to proprietary implementations that typically come with a high level of technical services due to their inherently complex nature and sophistication.

• Vendor Lock-in: Proprietary solutions require a commitment to a particular vendor, whereas industry-standard implementations are interchangeable. Many proprietary solutions require only components that have been developed by that vendor. Depending on the revision and technology, application performance may be diminished. HPCC enables solutions to be built from the best-performing industry standard components.

• System Manageability: System management is the installation, configuration and monitoring of key elements of computer systems, such as hardware, operating system and applications. Most large SMPs have proprietary enabling technologies (custom hardware extension and software components) that can complicate the system management. On the other hand, it is easier to manage one large system compared to hundreds of nodes. However, with wide deployment of network infrastructure and enterprise management software, it is possible to easily manage multiple servers of a HPCC system from a single point.

Page 12: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 12 June 2006

• Reusability of Components: Commodity components can be reused when off line, therefore preserving a customer’s investment. In the future, when refreshing a Dell HPCC PowerEdge solution with next generation platforms, the older Dell PowerEdge compute nodes can be deployed as File/Print servers, Web Servers or other infrastructure servers.

• Installation: Specialized equipment generally requires expert installation teams trained to handle such cases. They also require dedicated facilities such as power, cooling, etc. For HPCC, since the components are “off-the-shelf” commodities, installation is generic and widely supported.

Hardware Options The High Performance Computing Cluster configurations can be enhanced in the following ways: • Increased memory in the compute nodes • Increased internal HDD storage capacity in the compute nodes • Increased external storage on the master node • Additional NICs for the compute nodes and master node • Through Dell Professional Services’ recommendations on faster interconnect

technologies

Related Web Sites http://www.dell.com/hpcc

http://www.rocksclusters.org

http://www.beowulf.org/

Service and Support Dell HPCC systems come with the following: • Three year limited warranty2 and three years of standard Next

Business Day (NBD) parts replacement and one year of NBD on-site3 labor

• 30-day “Getting Started” help line • DirectLine network operating system support upgrades available with

three-year limited warranty2 • Telephone support 24 hours a day, 7 days a week, 365 days a year for

the duration of the three-year limited warranty2.

Page 13: hpcc PowerEdge Server infobrief v1 31 LA - Dell...as well as remote manageability for the compute nodes and master nodes. DRAC • The Dell Remote Access Controller (DRAC ) is a systems

Dell Product Group Page 13 June 2006

Dell Professional Services offers additional services to assist in: • Solution Design • Consultation • Installation and Setup • Pre-staging of solution at off-site location

1This term indicates compliance with IEEE standard 802.3ab for Gigabit Ethernet, and does not connote actual operating speed of 1 Gb/sec. For high speed transmission, connection to a Gigabit Ethernet server and network infrastructure is required.

2 For a copy of our Guarantees or Limited Warranties, please write Dell USA, L.P., Attn: Warranties, One Dell Way, Round Rock, TX 78682. For more information, visit www.dell.com/service_contracts. 3 Service may be provided by third-part y. Technician will be dispatched if necessary following phone-based troubleshooting. Subject to parts availability, geographical restrictions and terms of service contract. Service timing dependent upon time of day call placed to Dell. U.S. only.

Dell, OpenManage, PowerVault and PowerEdge are trademarks of Dell Computer Corporation. Microsoft and Windows NT are registered trademarks of Microsoft Corporation. Intel is a registered trademark of Intel Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. Dell - Platform ROCKS contain open source ROCKS developed by the San Diego Supercomputing Center. ©Copyright 2006 Dell Inc. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell is strictly forbidden. For more information contact Dell. Dell cannot be responsible for errors in typography or photography.