Upload
amitagrawal12
View
106
Download
16
Embed Size (px)
DESCRIPTION
OTF102302 OptiX RTN 910950 Troubleshooting ISSUE1.00
Citation preview
www.huawei.com
Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
英文标题 :40-47pt
副标题 :26-30pt
字体颜色 : 反白内部使用字体 :
FrutigerNext LT Medium
外部使用字体 : Arial
中文标题 :35-47pt
字体 : 黑体 副标题 :24-28pt
字体颜色 : 反白字体 : 细黑体
OptiX RTN 910/950 Troubleshooting
Page3Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Objectives
Upon completion of this course, you will be able to:
Describe the troubleshooting flow of OptiX RTN
910/950
Explain the alarms and outline their causes
Perform the troubleshooting for OptiX RTN 910/950
Page4Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Contents
1. Fault Handing Flow Introduction
2. Methods of Analyzing and Locating Faults
3. Classified Troubleshooting Analysis
Page5Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
General Fault Handling FlowStart
Observe and record fault phenomenon
Rectify fault
Contact Huawei for technical
support
Find solution together
and rectify fault
Externalcause
End
Other handling flow
Write faulthandling report
Find cause and Locate fault
Yes
Rectify fault
No
Yes
No
Page6Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Emergency Handling FlowStart
Equipment alarm ?
Perform reverse operationto restore service
Yes
No
WrongOperation ?
Reset/re-insert/replaceboard
Yes
Signalloss alarm?
No
PerformLoopback on opposite
port
Reset/replace board of the opposite NE
NoYes
Handle anomaly ofinterconnected equipment
Be continued 2Be continued 1
Page7Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Emergency Handling Flow (Cont.)
Yes
No
Line alarm?
Change service route oruse standby route
No
Linealarm on adjacent
NE?
Check and usestandby route
No
Yes
Reset faulty board orprotection protocol
Be continued 4Be continued 3
Be continued 1
Handle fiber cut/boardfault/power supply problem
Protection switchconfigured?
Yes
Be continued 2
Page8Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Emergency Handling Flow (Cont.)
Yes
No
Any loopback ?
Change serviceconfiguration
No
Yes
Contact Huawei for Endtechnical support
Be continued 3
Change port loopbackconfiguration
Service configurationError ?
Be continued 4
Yes
Fault ratified?
End
No
Page9Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Contents
1. Fault Handing Flow Introduction
2. Methods of Analyzing and Locating Faults
3. Classified Troubleshooting Analysis
Page10Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Basic Principles of Fault Locating
Site First,
then Board
High-
Severity
Alarms First,
then Low-
Severity
Alarms
2 3
External
First, then
Internal
1
Page11Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Common Methods of Fault Locating
Loopback
Common Methods of Fault LocatingCommon Methods of Fault Locating
Analyze first, then Loopback, and finally replace the board
Alarm analysis
Replace-ment
Test withinstrument Resetting
Eth./MPLStesting
and RMONmonitoring
Page12Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Alarm Analysis Use NMS
Comprehensive All alarms/performance events from the whole network
Accurate Current alarms, history alarms, occurrence time and
performance event data can be queried
Observe indicators on boards indicators No alarm detail and history alarms
Note:
Besides the alarms, in the IP radio system, to query the
transmit and receiving power are also important and
useful
Page13Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Common Alarm Description
Alarm
Type
Alarm Name Indication
Equipment
alarm
POWER_FAIL The power supply is in an abnormal
state.
FAN_FAIL The fan is faulty.
BDSTATUS The board is off-position.
NO_BD_SOFT The board has no software.
HARD_BAD The board hardware is faulty.
SYN_BAD The clock synchronization source is
degraded
NESTATE
_INSTALL
The NE is in the installation status.
Page14Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Common Alarm Description (Cont.)Alarm
Type
Alarm Name Indication
Microwave
link alarms
MW_LOF Loss of microwave Reed Solomn frames.
MW_FECUNCO
R
FEC correct the bit errors in MW
frames.
CONFIG_NOSU
PPORT
Wrong parameter in ODU.
RADIO_RSL_L
OW/ HIGH
ODU receiving power low / high.
RADIO_MUTE ODU was mute.
SYN_BAD The clock synchronization source is
degraded
Page15Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Common Alarm Description (Cont.)Alarm
Type
Alarm Name Indication
Microwave
link alarms
IF_CABLE_OPE
N
IF cable uninstalled.
MW_LIM Link ID mismatch in microwave
frame.
MW_RDI Microwave remote defection.
RPS_INDI Radio protection (1+1 backup)
switched.
LOOP_ALM ODU/IF port was looped.
TEMP_ALARM ODU/IF temperature is abnormal.
Page16Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Common Alarm Description (Cont.)Alarm
Type
Alarm Name Indication
Service
Alarms
ETH_LOS The network interface is disconnected.
ETH_LINK_DOWN Ethernet port connection is faulty.
ALM_IMA_LIF Received IMA frames is lost.
CES_MISORDERPK
T_EXC
the number of lost out-of-order
CES packets exceeds specified
threshold
MP_DOWN A failure of the MP group.
MPLS_TUNNEL_LO
CV
loss of tunnel connectivity verification..
PW_DOWN A PW service connection is down
Page17Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Case Analysis
MW_LOF
MW_RDIRPS_INDI
Description
NE1 & NE2 is 1+1 HSB configuration,
There was an alarm “MW_LOF" on NE1,
Alarm "MW_RDI", “RPS_INDI” on NE2.
NE1 NE2 NE3
Page18Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Loopback
It is useful in the physical layer availability check, such as the “signal loss”, “loss of frame” alarms
Do not use in the NNI ports and E-LAN service It interrupts the traffic and inband DCN, must be carefully
Ethernet RTN 910/950 ODU/IF
Inloop Inloop
Inloop
outloop outloop
outloop
CES E1, cSTM-1
Page19Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Replacement
If any component is suspected to be faulty, replace
the component and locate the fault
In the case of replacement, use one component that
works normally to replace one probably faulty
component to locate and rectify the fault
The replaceable components include the equipment,
boards and cables
Page20Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Test with instrument
This method is the most authoritative, but we must
have the devices in hand
Instrument Test item
Bit error testing device Bit error/traffic
Optical power meter Optical power
SDH analyzer Bit error/traffic/overhead
bytes ……
SmartBits Ethernet service
Page21Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Resetting Resetting is a restoration scheme for application programs
and data configurations. When the component is not running
properly, after resetting, it will return to the normal state
Resetting boards
Resetting equipments by power off and on
Resend the configuration
Reset Modes:
Warm reset loads the correct programs and data on the
equipment
Cold reset restores the correct programs and data before the
CPU power failure
Page22Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
OAM Testing for Ethernet Performing the LB Test
Page23Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
OAM Testing for Ethernet (Cont.) Performing the LT Test
Page24Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
OAM Testing for Ethernet (Cont.) Performing the CC Test
Page25Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
OAM Testing for MPLS Tunnel
Performing the LSP Ping Test
Page26Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
OAM Testing for MPLS Tunnel (Cont.) Performing the LSP Traceroute Test
Page27Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Contents
1. Fault Handing Flow Introduction
2. Methods of Analyzing and Locating Faults
3. Classified Troubleshooting Analysis
Page28Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Contents
3. Classified Troubleshooting and Locations
3.1 Microwave link Troubleshooting
3.2 CES Service Troubleshooting
3.3 Ethernet Service Troubleshooting
3.4 IMA Troubleshooting
3.5 LAG Troubleshooting
3.6 ML-PPP Troubleshooting
3.7 MPLS APS Troubleshooting
3.8 QoS Troubleshooting
Page29Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Microwave Link Troubleshooting
Attention the alarms:
•HARD_BAD
•TEMP_ALARM
•IF_INPWR_ABN
•RADIO_MUTE
•RADIO_TSL_HI
GH
•RADIO_TSL_LO
W
•RADIO_RSL_HI
GH
•IF_CABLE_OPE
N
Service down or bit error, loss of packet hanppened
Microwave link protection switching
Impact
Power abnormal for wrong setting,
fading, interference or hardware
failure
Cause
Alarms like MW_LOF , MW_FECUNCOR, etc. reported on IF or ODU units
Sympto
m
Page30Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Microwave Link Troubleshooting (Cont.)Fault in microwave link Common Fault Causes
The transmit power is abnormal
The ODU is faulty or the frequency / power wrong setting.
The receive power is permanent lower than the ideal value
The antenna direction is not properly adjusted or be moved.The antennas have different polarization directions since installed or after changing the ODU.There is an obstacle in the transmit direction.the connection between the antenna and the ODU are abnormally (loose).The ODU is faulty or the transmit power is abnormal on the opposite ODU.
Page31Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Microwave Link Troubleshooting (Cont.)Fault in microwave link Common Fault Causes
The receive power is abnormal due to slow down-fading.
The fading margin is not sufficient.
The receive power is abnormal due to fast fading.
The multipath fading is fast.
The power are ok, service
down with MW_LIM alarm.
The link ID on both sides of one hop are not consistent.
The receive power is always
normal, but the microwave
link becomes faulty
occasionally.
There is external interference.
Page32Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Microwave Link Troubleshooting (Cont.) Fault Locating Methods:
1. Check whether the ODU is mute, powered off, or looped back. Check whether the data configuration is correct
2. Check whether the ODU and the IF board are faulty
3. If the transmit power is abnormal, replace the ODU
4. If the receive power is abnormal, check out the possible causes based on the fading type
5. If the receive power is always normal, but the microwave link becomes faulty occasionally. Check whether there is interference before you proceed
6. If the transmit/receive power is normal, perform loopback operations
Page33Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
CES Service Troubleshooting
Corresponding boards:
•CXPAR
•ML1
•ML1A
•CD1
bit errors taken place or service interruptedImpact
Hardware failure or client signal
loss
The failure in PW, tunnel or radio
link
Cause
Alarms like T_ALOS, AIS, etc. reported on corresponding CES ports
Sympto
m
Page34Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
CES Service Troubleshooting (Cont.)
Symptom Alarm Reported Board
The CES service is interrupted.
HARD_BAD, TEMP_OVER, or BUS_ERR
CXPAR, ML1 or ML1A
COMMUN_FAIL CXPAR
T_ALOS CXPAR, ML1, or ML1A
UP_E1_AIS or DOWN_E1_AIS ML1 or ML1A
MPLS_TUNNEL_LOCV CXPAR
PW_DOWN
The CES service has bit errors and the communication is degraded.
HARD_BAD,TEMP_OVER, or BUS_ERR
CXPAR, ML1 or ML1A
SYNC_C_LOS or LTI CXPAR
CES_LOSPKT_EXC,CES_MISORDERPKT_EXC,CES_MALPKT_EXC,CES_STRAYPKT_EXC,CES_JTRUDR_EXC, or CES_JTROVR_EXC
CXPAR, ML1, or ML1A
Page35Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
CES Service Troubleshooting (Cont.)
Meter testing- Optical power
meter- SDH analyzer- BER tester
Replacing boards
- HARD_BAD- COMMUN_FAIL- BUS_ERR
Client side - Laser- Cable- Loop
Other layer- PWE3- MPLS Tunnel- Radio link- Clock
Fault Locating
Methods
Page36Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Ethernet Service Troubleshooting
Corresponding boards:•CXPAR
•AUXQ
•EF8T
•EF8F
•EG2
Error packets, loss packets or interruption on the serviceImpact
Hardware failure or client signal problems
Wrong data settingThe failure in PW, tunnel or radio
link
Cause
Alarms like ETH_LINK_DOWN on Eth. Board
Client side report the faulty or loss packet
Sympto
m
Page37Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Ethernet Service Troubleshooting (Cont.)
Symptom Alarm Reported Board
The Ethernet service is
interrupted.
HARD_BAD, TEMP_OVER, or BUS_ERR
CXPAR, EF8T, EF8F or
EG2
COMMUN_FAIL CXPAR
ETH_LOS,ETH_LINK_DOWN,ETH_AUTO_LINK_DOWN, LOOP_ALM, or MAC_FCS_EXC
CXPAR, EF8T, EF8F or
EG2
LASER_SHUT or LSR_WILL_DIE EF8F or EG2
The Ethernet service loses
packets or has erorred
packets.
HARD_BAD,TEMP_OVER, or BUS_ERR
CXPAR, EF8T, EF8F or
EG2
LSR_WILL_DIE EF8F or EG2
MAC_FCS_EXC or FLOW_OVER CXPAR, EF8T, EF8F or EG2
Page38Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Ethernet Service Troubleshooting (Cont.)
Meter testing- Optical power meter- Ethernet analyzer- BER tester
Replacing boards
- HARD_BAD- COMMUN_FAIL- BUS_ERR
Client side - Laser, cable- Negotiation, MTU- Loop
Other layer- PWE3- MPLS Tunnel- Radio link
Fault Locating
Methods
Page39Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
IMA Troubleshooting
Symptom Alarm Reported Board
The IMA group is
invalid, and the service
is interrupted.
IMA_GROUP_LE_DOWN CXPAR,
ML1, or
ML1A
IMA_GROUP_RE_DOWN
One IMA group member
link is invalid, and the
service on the faulty
link is shared by other
member links. The IMA
port is congested, and
the packets of the
service are lost.
ALM_IMA_LIF CXPAR,
ML1, or
ML1A
ALM_IMA_RFI
ALM_IMA_LODS
ALM_IMA_RE_RX_UNUSAB
LE
ALM_IMA_RE_TX_UNUSAB
LE
Page40Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
IMA Troubleshooting (Cont.)
IMA groups not enable or IMA group members are invalid
IMA groups negotiation fail
Wrong interface setting of the IMA member link
Other layer- PWE3- MPLS Tunnel- Radio link
Possible
Causes
Page41Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
LAG Troubleshooting
Symptom Alarm Reported Board
The LAG is invalid, all
the member ports
cannot be used, and
the services are
interrupted.
LAG_DOWN CXPAR
The member ports in
the LAG cannot be
used, and the packet
of the service are lost.
LAG_MEMBER_DOW
N
CXPAR
LOOP_ALM CXPAR, EF8T, EF8F,
or
EG2
ETH_LOS
ETH_LINK_DOWN
Page42Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
LAG Troubleshooting (Cont.)
Possible Causes: Cause 1: The NEs at the two ends of the LAG are incorrectly
configured Cause 2: The working mode of the member ports in the LAG
is set to half-duplex Cause 3: The loopback is configured on the member ports in
the LAG Cause 4: The connection of the member ports in the LAG are
improperly connected or disconnected
Page43Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
ML-PPP Troubleshooting
Symptom Alarm Reported Board
The MP group is invalid
and the service is
interrupted.
MP_DOWN
CXPAR, ML1, or
ML1A
The MP group member is
invalid, and the packets
of the service are lost.
PPP_LCP_FAIL or
PPP_NCP_FAIL
CXPAR, ML1, or
ML1A
T_ALOS
The MP group member is
delayed, and the
packets of the service
are lost.
MP_DELAY
CXPAR, ML1, or
ML1A
Page44Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
ML-PPP Troubleshooting (Cont.) Possible Causes:
Cause 1: The MP group is invalid Cause 2: The negotiation of the protocols at the two ends of
the MP group member fails Cause 3: The received signals of the MP group member port
are lost Cause 4: The MP group member delay exceeds the threshol
d
Page45Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
MPLS APS Troubleshooting
Symptom Alarm Reported Board
The APS protection group is incorrectly configured, or the APS frame cannot be received. In this case, the protection fails.
ETH_APS_PATH_MISMATCH
CXPAR
ETH_APS_LOST
ETH_APS_SWITCH_FAIL
ETH_APS_TYPE_MISMATCH
When the working tunnel or bypass tunnel is faulty, the switching fails.
MPLS_TUNNEL_LOCV
MPLS_TUNNEL_MISMERGE
MPLS_TUNNEL_MISMATCH
MPLS_TUNNEL_Excess
MPLS_TUNNEL_SD
MPLS_TUNNEL_SF
MPLS_TUNNEL_UNKNOWN
Page46Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
MPLS APS Troubleshooting (Cont.) Possible Causes:
Cause 1: The configurations at the two ends of the APS protection group are inconsistent
Cause 2: The protocols at the two ends of the APS protection group are in the inactive state
Cause 3: The optical fibers or cables are incorrectly connected Cause 4: A hardware alarm exists on the board where the bypas
s tunnel resides, and thus the APS frame cannot be transmitted Cause 5: The clock alarms exist in the system Cause 6: The working tunnel or bypass tunnel is faulty
Page47Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
QoS TroubleshootingSymptom Alarm Reported Board
The traffic is large, and a
congestion occurs.
FLOW_OVER CXPAR, EF8T,
EF8F, or EG2
The service bandwidth is
pre-empted, and the
packets of the service
are lost or bit errors
occur.
CES_LOSPKT_EXC CXPAR, ML1, or
ML1ACES_JTROVR_EXC
CES_JTRUDR_EXC
Page48Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
QoS Troubleshooting (Cont.) Possible Causes:
Cause 1: The NE is not configured with the QoS policy Cause 2: During the service configuration, an incorrect QoS
policy is selected Cause 3: The bandwidth configured in the tunnel or PW is s
mall Cause 4: The board is faulty, and the configuration data is n
ot delivered to the board
Page49Copyright © 2009 Huawei Technologies Co., Ltd. All rights reserved.
Summary
Fault Handing Flow Introduction
Methods of Analyzing and Locating Faults
Classified Troubleshooting Analysis
Thank youwww.huawei.com