Checkup on ZXSS10 Hardware Fault_361037

Embed Size (px)

Citation preview

ZXJ10

Checkup on ZXSS10 Hardware FaultPrepared by: Liu FenReviewed by: Jiang Wei

ZTE FN Switch Customer Service Dept. Internal Use Only

Internal Use Only

ZTE Confidential Proprietary 2011 ZTE CORPORATION. All rights reserved.

ZTE Confidential Proprietary 2011 ZTE CORPORATION. All rights reserved.

Revision History

SNProduct VersionPrepared by/Revised byReviewed byDateReason for RevisionContents

NoV1.00Liu FenJiang Wei2009-12-21Document DevelopmentFirst Draft

TABLE OF CONTENTSChapter 1 Checkup on Foreground Card Fault11.1 Fault11.2 Analysis on Fault11.2.1 SC cant Start Normally11.2.2 SPC cant Start Normally21.2.3 NIC cant Start Normally31.2.4 SSN cant Start Normally31.2.5 SSNI cant Start Normally41.2.6 TIC cant Start Normally41.2.7 Power Module is Abnormal41.2.8 Fan is Abnormal41.3 Case4Chapter 2 Checkup on Background Server Fault72.1 Fault72.2 Analysis on Fault72.2.1 Background Server Works Abnormally72.2.2 Background Operation & Maintenance Window Cannot Be Accessed82.2.3 System Reports Man-machine Command Script Error When Running O&M Window82.2.4 System Reports Run Timeout When Running O&M Window82.2.5 System Reports CLIS Block When Running O&M Window9

Checkup on Foreground Card FaultFaultCommon foreground card faults:1. Cards cant start normally, including SC, SPC, NIC, and SSN.2. Cards cant work normally, including SSNI, TIC.3. Electric modules and fans cant work normally.Analysis on FaultAiming at different cards, the processes for analysis and handling are as follows.SC cant Start Normally1. Check whether the cabinet power is normal.2. Observe whether the yellow light is always on; if it isnt off after scores of seconds, we cut the power and reboot after a while; if it isnt off all the same, this SC may get hardware fault and we should make a change. If the indicator is always off, we should also make a change. Note: Make backup on data before operations and choose the time of fewer users for changing. Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance.3. Use serial line for connecting computer to SC; reset SC; check the foreground printing information through super terminal; we should fix:(1) Whether SC version is loaded (if successful, SC can be telnet by 168.5.1.1 IP); if not, we should check whether configuration is correct and whether theres foreground version in /disks/version; if the configuration is incorrect, we should reboot after re-configuration; if theres no version on foreground or version is incorrect, we should get the version from PC by SC debugging network-port, and then transfer it to the foreground by FTP (Bin mode).(2) After the version is started, observe whether it connects with background DB server (we can see the connection information between module 0 and DB server module). If failed, make the operations: Telnet to SC; PING DB server IP after entering user/password; if blocked, check the related cable and the port on Ethernet switch. If successful, it means DB server is abnormal. Check whether all.out on background DB server is started (use ps ef | grep out to check whether therere processes of all.out and Monitor.out) and whether each parameter in Config/ss.ini is correct.(3) After connecting with DB, observe whether SC can get data from background DB (we can see the records of each table got by SC). If always finding require for license, please check whether license on background DB is installed and we can run license again; if failed again, check whether therere records in r_config; if not, we can insert one or run sscfg020000tables.sql again; after this, we should run license again.SPC cant Start Normally1. Check whether the cabinet power is normal.2. Observe whether the yellow light is always on; if it isnt off after scores of seconds, we cut the power and reboot after a while; if it isnt off all the same, this SC may get hardware fault and we should make a change. If the indicator is always off, we should also make a change. Note: Make backup on data before operations and choose the time of fewer users for changing. Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance.3. Use serial line for connecting computer to SPC; reset SPC; check the foreground printing information through super terminal; we should fix:(1) Whether SPC version is loaded (if successful, SPC can be telnet by 168.2.1.X IP; X is slot No.); if not, we should check whether configuration is correct and whether theres foreground version in /disks/version and whether connection with DB server is successful (we can see the connection information between module 0 and related module). If the configuration is incorrect, we should reboot after re-configuration; if theres no version on foreground or version is incorrect, we should transfer the version to SC by FTP (Bin mode). If failed, make the operations: Telnet to SPC; PING SC 168.2.1.1 IP and DB server IP after entering user/password; if blocked, check the related cable and the port on Ethernet switch, and check whether SC, SSN & DB server is normal. If SC is abnormal, refer to 1.2.1; if SSN is abnormal, refer to 1.2.4. If DB server is abnormal, refer to 2.1.(2) Observe whether SPC can get data from background DB (we can see the records of each table got by SPC).NIC cant Start Normally1. Check whether the cabinet power is normal.2. Observe whether the yellow light is always on; if it isnt off after scores of seconds, we cut the power and reboot after a while; if it isnt off all the same, this SC may get hardware fault and we should make a change. If the indicator is always off, we should also make a change. Note: Make backup on data before operations and choose the time of fewer users for changing. Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance.3. Use serial line for connecting computer to NIC; reset NIC; check the foreground printing information through super terminal; we should fix:(1) Whether NIC version is loaded (if successful, NIC can be telnet by 168.2.1.X IP; X is slot No.); if not, we should check whether configuration is correct and whether theres foreground version in /disks/version and whether connection with DB server is successful (we can see the connection information between module 0 and related module). If the configuration is incorrect, we should reboot after re-configuration; if theres no version on foreground or version is incorrect, we should transfer the version to SC by FTP (Bin mode). If failed, make the operations: Telnet to NICC; PING SC 168.2.1.1 IP and DB server IP after entering user/password; if blocked, check the related cable and the port on Ethernet switch, and check whether SC, SSN & DB server is normal. If SC is abnormal, refer to 1.2.1; if SSN is abnormal, refer to 1.2.4. If DB server is abnormal, refer to 2.1.(2) Observe whether NIC can get data from background DB (we can see the records of each table got by NIC).SSN cant Start Normally1. Check whether the cabinet power is normal.2. Re-plug SSN to check whether it gets poor contact.3. Change SSN; if it gets recovered, that means original SSN is broken.Note: Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance when changing.SSNI cant Start Normally1. Check whether the cabinet power is normal.2. Re-plug SSNI to check whether it gets poor contact.Note: Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance when changing.3. Change SSNI; if it gets recovered, that means original SSNI is broken.TIC cant Start Normally1. Check whether the cabinet power is normal.2. Re-plug TIC to check whether it gets poor contact.Note: Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance when changing.3. Change TIC; if it gets recovered, that means original TIC is broken.Power Module is Abnormal1. Use multimeter to test whether the voltage of power modules entrance is normal.2. Change power module; if it gets recovered, that means original one is broken. Note: Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance when changing.Fan is Abnormal1. Check whether power module is normal, for the fan is powered by power module.2. Check whether theres something wrong of electric machinery or a mechanical fault; if yes, make a change.Note: Wear an antistatic strap to ensure the SoftSwitch controlling equipment can run normally without disturbance when changing.CaseProblem: After SS1b foreground upgrading the version, reboot and we find all cards (except SC) cant start normally.Analysis & Handling:SC can start and other cards cant start; so we firstly check the loading of SSN version; after rebooting SSN, we find it still cant start. Reload the old version of SSN by SC 100M port and reboot, then SSN can start normally. After communicating with the back side, we know that maybe Bttorom needs to be upgraded.Resolvent:Telenet by SC 100M port; change SSN version into the old one and reboot, updating Bootrom version; then log in SSN and use command to update its Bootrom version; reload SSN version and reboot SSN, then SSN can start normally. After startup, telnet to other cards to run the new Bootrom version command, making it into effect; reboot the card and its normal. Reboot the whole frame and cards are all normal. SS foreground SS upgrade is completed.The inner Ip: sc of ss1b foreground card is 168.2.1.1/168.2.1.2 (except ssn, inner ip of other cards is 168.2.1.x; x is the slot No.; ssn is separately as 168.4.1.16/168.6.1.17).Telnet to the card; user name is ZXSSTelnet and password is the date (as 20081124).Telnet to other cards by sc network-port for twice:1. Telnet 168.2.1.1; user name: ZXSSTelnet, password: 20081124; then rlogin 168.4.1.16" (take SSN1 as an example) ZXSSTelnet 20081124.2. Command for updating Bootrom version: if logging in sc, enter s_SCUpdateBootromFlash(); if logging in spc/nic/SSN, enter s_OtherUpdateBootromFlash().Analysis:In this case, for SC can start and other cards cant start, we should check whether SSN is normal; before version upgrade, we must fix whether card bootrom should be upgraded. Under some conditions (as the span of version is much),we should upgrade the card and now onsite personnel should comprehend the method before upgrade.

Checkup on Background Server Fault Fault1. Background database server cannot work normally. 2. Background O&M interface cannot be accessed. 3. System reports faults when running man-machine commands. Analysis on FaultBackground Server Works Abnormally 1. Check the power supply of database server. 2. Telnet onto database server from terminals. If it cannot be connected, check whether it is successful to Ping database server. If not, check the network wire and switchboard port. 3. Connect database server with serial port wire via hyper terminal. Check whether the abnormal shutdown causes the fault. If so, you need to Fsck and then reboot Solaris system. 4. If you cannot use the user root to Telnet onto database server, modify the file of /etc/default/login, add # before CONSOLE=/dev/console, save the file and reboot the system. 5. If background database cannot automatically start after installing Oracle, check whether the following configurations of file /etc/system are reasonable. the property for Oracle installation directory and sub-directory should be corresponding Oracle group and oracle user; S99dbora file is saved in /etc/rc2.d; in /var/opt/oracle/oratab, database instance name is Y; files in the working directory of background database server are completed; the property of files on background database server is executable. Note: do not forget to back up data before operating, and try not to forcibly shut down the server. Background Operation & Maintenance Window Cannot Be Accessed 1. Open the file zxss10.lax of installation directory. Check the configuration of IP address. If connected with inner network, it should be that of SC. If connected with outer network, it should be O&M NIC outer address of SS control equipment. 2. If the maintenance terminal is connected with inner network, you need to check whether the maintenance terminal can successfully Ping the IP of SC. If not, check the network connection. 3. If the maintenance terminal is connected with outer network, you need to check whether the maintenance terminal can successfully Ping the IP of NIC. If not, check the network connection. If so, check the routing of SC, if it is incorrect, re-configure the routing. System Reports Man-machine Command Script Error When Running O&M Window 1. Check whether there is DbCDInfo.INI in the working directory Config on background database server. If not, re-upload this file. 2. Check whether DbCDInfo.INI is consistent with background & foreground version. If not, re-upload the correct file. System Reports Run Timeout When Running O&M Window 1. Check the connection between SC and background database. If abnormal, check the network wire between SC and background database. 2. Check processes of background database. Telnet onto background database server. Run the command ps ef|grep out to check whether there are processes of all.out and monitor.out. If not, run ss_start. Relevant license can be viewed after this process is started. When SC queries license to background database, background database process will print the license number of trunks and users. If license cannot be viewed, you need to run license script again. Check whether there is a record in the table r_config. If not, insert a piece of record or run the background database script sscfg020000tables.sql again. After running the script, you need to run license script again. 3. When creating users/nodes, the data traffic is high, thus, the response time is long. You could choose Yes, and then system reports the data is successfully created. System Reports CLIS Block When Running O&M Window 1. Check whether SC HD has man-machine command script (*.ini) under the directory /diskc. If not, re-upload this file in the mode of bin. Check whether the rest memory of SC is too small. If so, adjust the data and reboot SC. 2. When creating users/nodes, the data traffic is high, thus, the response time is long. You could choose Yes, and then system reports the data is successfully created.