34
© 2006 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation System x Basic Troubleshooting XTW01 Topic 11

Xtw01t11v0901 troubleshooting

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Xtw01t11v0901 troubleshooting

© 2006 IBM Corporation

This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers.IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM

Corporation

System x Basic Troubleshooting

XTW01

Topic 11

Page 2: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

22

Course Objectives

At the completion of this topic, you should be able to:

> Identify basic troubleshooting questions to consider

> Identify the six possible states of a system

> Identify diagnostic tools that are available to gather and analyze information for each given system state

Page 3: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

33

> * IBM System x Troubleshooting Questions *

> Six System States

> Data Gathering Diagnostic Tools

Light Path Diagnostic

BMC, RSA and AMM

Dynamic System Analysis (DSA)

Topic 11- Course Agenda

Page 4: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

44

When working with problems on the System x servers, consider asking the following questions:

> Will the system power up?

> Did it ever power up?

> Is there a POST error message?

> If yes, what is it?

> Does the NOS load?

> Are any error lights illuminated?

> Is the BMC configured for remote access?

> Is the RSA-II and AMM installed?

> The log can be captured for analysis?

Questions To Ask

Troubleshooting IBM System x Servers

Page 5: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

55

> IBM System x Troubleshooting Questions

> * Six System States *

> Data Gathering Diagnostic Tools

Light Path Diagnostic

BMC, RSA and AMM

Dynamic System Analysis (DSA)

Topic 11 - Course Agenda

Page 6: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

66

AC

AC/DC

POST

NOS

Start

Complete

Stop

System state #1 – There is no AC power

System state #2 - There is AC power but there is no DC output

System state #3 – There is both AC and DC power but

the system fails to complete POST

System state #4 – There is both AC and DC power, the system

completes POST but the NOS fails to start loading

System state #5 – There is both AC and DC power, the system completes POST but the NOS fails to complete loading

System state #6 – There is both AC and DC power, the system completes POST and the NOS completes loading but stops during operation

> Identifying the Six System States

IBM System x – Six States PD

Page 7: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

77

Information Gathering and Analysis Tools

Information Gathering:

> Eyes and ears> HMM and PDSG> Light Path diagnostics> BMC> RSA > Boot sequence options

F1 setup, F2 diagnostics Adapter BIOS messages

> NOS start-up messages> NOS failure messages> Dynamic System Analysis> NOS event logs

Information Analysis:

> HMM and PDSG> Light Path diagnostics> BIOS messages

Checkpoint codes Adapter BIOS warnings

> SVCCon, SMBridge, F1 setup and F2 diagnostics Access BMC event logs

> Web browser Access RSA event logs

> RETAIN tips> IBM Support Web site> DSA

Page 8: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

88

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

State 1 - No AC Power

Page 9: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

99

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

2. There is AC power but no DC output

BMC

RSA and AMM

Light path

SvcCon, SMBridge

RSA and AMM event log

State 2 - AC Power But No DC Output

Page 10: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1010

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

2. There is AC power but no DC output

BMC

RSA and AMM

Light path

SvcCon, SMBridge

RSA and AMM event log

3. There is AC and DC power but the system fails to complete POST

Checkpoint codes

F1 and F2

Beep codes

Adapter BIOS msgs (Adaptec, LSI, etc.)

PDSG

RETAIN tips

IBM support Web site

State 3 - System Fails To Complete POST

Page 11: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1111

State 4 - System Completes POST But NOS Fails To Start Loading

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

2. There is AC power but no DC output

BMC

RSA and AMM

Light path

SvcCon, SMBridge

RSA and AMM event log

3. There is AC and DC power but the system fails to complete POST

Checkpoint codes

F1 and F2

Beep codes

Adapter BIOS msgs (Adaptec, LSI, etc.)

PDSG

RETAIN tips

IBM support Web site

4. There is AC and DC power, the system completes POST but the NOS fails to start loading

ServeRAID Manager

F2 diagnostics

PDSG

RETAIN tips

F2 diagnostics

Page 12: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1212

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

2. There is AC power but no DC output

BMC

RSA and AMM

Light path

SvcCon, SMBridge

RSA and AMM event log

3. There is AC and DC power but the system fails to complete POST

Checkpoint codes

F1 and F2

Beep codes

Adapter BIOS msgs (Adaptec, LSI, etc.)

PDSG

RETAIN tips

IBM support Web site

4. There is AC and DC power, the system completes POST but the NOS fails to start loading

ServeRAID Manager

F2 diagnostics

PDSG

RETAIN tips

5. There is AC and DC power, the system completes POST but the NOS fails to complete loading

NOS boot messages

‘Blue screen’

‘Safe’ mode

NOS vendor messages

State 5 - NOS Fails To Complete Loading

Page 13: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1313

System State Data Gathering Data Analysis

1. There is no AC power Visual PDSG/HMM

2. There is AC power but no DC output

BMC

RSA and AMM

Light path

SvcCon, SMBridge

RSA and AMM event log

3. There is AC and DC power but the system fails to complete POST

Checkpoint codes

F1 and F2

Beep codes

Adapter BIOS msgs (Adaptec, LSI, etc.)

PDSG

RETAIN tips

IBM support Web site

4. There is AC and DC power, the system completes POST but the NOS fails to start loading

ServeRAID Manager

F2 diagnostics

PDSG

RETAIN tips

5. There is AC and DC power, the system completes POST but the NOS fails to complete loading

NOS boot messages

‘Blue screen’

‘Safe’ mode

NOS vendor messages

6. There is AC and DC power, the system completes POST and the NOS completes loading but stops during operation

DSA

NOS event logs

DSA

State 6 - NOS Loads But Stops During Normal Operations

Page 14: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1414

Gathering Information - Tip

If multiple sources are available, look for confirmations

> Two sources pointing at the same probable cause increases confidence in the information

> Two sources pointing at different probable causes reduces confidence in the information Search for a third source to clarify the information being presented

Page 15: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1515

Analyzing Information - Tip

Formal reference points are proven

> RETAIN tips are based on factual evidence from previous cases histories

> The PDSG is based on the collective knowledge of the system designers and senior support teams

Guessing is NOT an option

> If the information is unclear, seek help

Experience is very valuable

> Consult with team members if you are unsure of what the information is telling you

> Offer guidance to less experienced co-workers

Page 16: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1616

> IBM System x Troubleshooting Questions

> Six System States

> Data Gathering Diagnostic Tools

* Light Path Diagnostic *

BMC, RSA and AMM

Dynamic System Analysis (DSA)

Topic 11 - Course Agenda

Page 17: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1717

Light Path Diagnostics

> Allows quick diagnosis of any type of server error Introduced in 1998, now included in most

System x, BladeCenter, and Blade Servers

> Level 1 – Drop-down panel containing system status LEDs LEDs that correspond to major server

components Includes Remind and Reset buttons

> Level 2 – LED identifying suspect component LEDs placed throughout server next to

individual server components Even without power to server, can be used

for up to 12 hours

Pop out Operator Information Panel

Blade server Front Panel LEDs

Page 18: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1818

> IBM System x Troubleshooting Questions

> Six System States

> Data Gathering Diagnostic Tools

Light Path Diagnostic

* BMC, RSA and AMM *

Dynamic System Analysis (DSA)

Topic 11 - Course Agenda

Page 19: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

1919

IBM Systems Management Hardware Portfolio

Mini-BMC BMC

RemoteSupervisor

Adapter

AdvancedManagement

Module

Mini Baseboard ManagementController• IPMI 1.5 compliant• Monitor voltages, temps, battery• Drive system LED’s except LightPath• Power control, system reset, and

reboot• Used in value servers

Baseboard ManagementController• Same features as mini-BMC plus

the following:• IPMI 1.5 or 2.0 compliant,

depending on system• Serial over LAN (SOL)• Drives LightPath• On all but value servers

Remote Supervisor Adapter• Web interface and full SSL and

other security module integrations

• LDAP integration for authentication

• Remote KVM support• Remote disk support• DNS, DHCP, SNMP, SLP• Standard in select servers and

optional for most other servers in portfolio

BladeCenter Adv Mgt Module• Web interface and full SSL and

other security module integrations• LDAP integration for authentication• Remote KVM support• Remote disk support• DNS, DHCP, SNMP , SLP• USB Virtualization

• With concurrent capable blade• Concurrent KVM capable• Concurrent Remote Drive capable• Concurrent Media Tray capable

Page 20: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2020

> IBM System x Troubleshooting Questions

> Six System States

> Data Gathering Diagnostic Tools

Light Path Diagnostic

BMC, RSA and AMM

* Dynamic System Analysis (DSA) *

Topic 11 - Course Agenda

Page 21: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2121

Product download page:

http://www.ibm.com/systems/management/dsa.html

Dynamic System Analysis

DSA collects and analyzes information about various aspects of a system to aid in troubleshooting

Creates a merged log with all the retrieved information

> Compressed XML file for IBM Support personnel

> Optionally, HTML pages can be created for all users

Portable Edition> Runs without altering target system> Removes any created temporary files

Installable Edition> Permanent> Integrates with UpdateXpress input to

rapidly identify down-level firmware and drivers

Analysed components:> System configuration > Installed applications and hot fixes> Device drivers and system services> Network interfaces and settings> Performance data and details for

running processes> Hardware inventory, including PCI

information> Vital product data, firmware, and

basic input/output system (BIOS) information

> SCSI device sense data> EXA chipset uncorrectable error

register information> ServeRAID configuration> Event logs for the operating system,

applications, security, ServeRAID controllers, and service processors

Page 22: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2222

Dynamic System Analysis - Portable Edition

Page 23: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2323

Dynamic System Analysis - Installable Edition

Page 24: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2424

> Provide problem isolation, configuration analysis, error log collection

> Primary method of testing the major components

> Viewed locally or uploaded to an internal FTP server

> Standard for System x and BladeCenter servers

New Preboot Dynamic System Analysis

Page 25: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2525

> Press F2 key during POST

> By default, it takes you to the IBM Memory Test

Select Quit to exit to DSA

> Can take up to 10 minutes to load

> Power on all attached devices before powering on the server Preboot DSA memory tests

Preboot DSA - Access

Page 26: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2626

> Preboot DSA offers several options in a command line menu system

> IBM DSA Interactive Several command line

instructions are available

Preboot DSA - Command Line

Page 27: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2727

Selecting ‘Diagnostics’ from the main menu will load the diagnostic tests page

Preboot DSA - Graphical Diagnostics

Page 28: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2828

Preboot DSA - Graphical Interface

Select System Information GUI to enter the Graphical User Menu

Page 29: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

2929

Problem Determination - Information Gathering

> Machine type and model

> Microprocessor or hard disk upgrades

> Failure symptom Do diagnostics fail? What, when, where, single, or multiple systems? Is the failure repeatable? Has this configuration ever worked? If it has been working, what changes were made prior to it failing? Is this the original reported failure?

> Diagnostics version — type and version level

> Hardware configuration Print (print screen) configuration currently in use BIOS level

> Operating system software — type and version level

Page 30: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

3030

> When solving problems – especially ones that involve a component replacement, ensure the following:> Apply code updates to ensure that all code across all boards is matched for

levels and will provide a working system> Run the embedded diagnostics program to test the new component> Run a “quick test” on the entire system> Clear the BMC event log in readiness for any subsequent events

> The embedded diagnostics programs are the primary method of testing the major components of the server following parts replacement

> Event logs are limited in capacity Once a problem has been resolved, clear

the logs so that useful information can be captured, should another fault occur

When Solving Problems

Page 31: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

3131

Advanced Management Module (AMM)

Baseboard Management Controller (BMC)

Common Information Model (CIM)

Dynamic System Analysis (DSA)

Intelligent Platform Management Interface

(IPMI)

Light Path Diagnostic

Multiple processing (MP)

Problem Determination and Service Guide

(PDSG)

Remote Supervisor Adapter (RSA) II

Glossary of terms

Page 32: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

3232

Course Summary

Having completed this topic, you should be able to:

> Identify basic troubleshooting questions to consider

> Identify the six possible states of a system

> Identify diagnostic tools that are available to gather and analyze information for each given system state

Page 33: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

3333

Additional Resources

IBM STG SMART Zone for more education on Webinar, Web Lectures, etc..:

> Internal: http://lt.be.ibm.com/smartzone/modulartechnical

> BP: http://www.ibm.com/services/weblectures/dlv/partnerworld

IBM System x

> http://www-03.ibm.com/systems/x/

IBM BladeCenter Chassis

> http://www-03.ibm.com/systems/bladecenter/

IBM BladeCenter Blade Servers

> http://www-03.ibm.com/systems/bladecenter/hardware/servers/index.html

IBM BladeCenter Redbooks

> http://www.redbooks.ibm.com/

IBM ServerProven

> http://www-03.ibm.com/servers/eserver/serverproven/compat/us/

IBM System x Support

> http://www-304.ibm.com/systems/support/supportsite.wss/brandmain?brandind=5000008

Page 34: Xtw01t11v0901 troubleshooting

IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation

3434

End of Presentation