Module 6-Lte Troubleshooting Guideline

www.DigiTrainee.com Company Confidential

LTE Troubleshooting Guideline

Section-6

RNO Consultant : Ray Khastur

Version: V 1.0 (20151028)


Objectives

Upon completion of this course, you will be able to ：

Know to handling accessibility issue.

Know to handling mobility issue.

Know to handling retainabillity issue.

Page 2


Contents

1.Access Problem

2.Handover Problem

3.Service Drop Problem

Page 3

www.DigiTrainee.com Company ConfidentialPage 4

Access Problem


Introduction to the Access Procedure –

Attach Procedure

Upon power-on, a UE first selects a

cell to camp on and then initiates

the Attach procedure.

The RRC connection setup cause

value is Mo-Signaling.

The Attach procedure consists of

four steps:

- Random access

- RRC connection setup

- NAS procedure

- e-RAB setup

During the Attach procedure, a data

card terminal usually sets up only a

default bearer. LT terminals

supporting VoIP and some smart

terminals such as HTC set up a

dedicated bearer.

UE E-NODEB MME

RRC_Conn_Req (msg3)

RRC_Conn_Setup (msg4)

RRC_Conn_Setup_Cmp (msg5)

INITIAL UE MESSAGE

INITIAL UE CONTEXT SETUP REQRRC SECURITY MODE CMD

RRC SECURITY MODE CMP

RRC CONN RECFG

RRC CONN RECFG CMP

INITIAL UE CONTEXT SETUP RSP

SAEB SETUP REQ

RRC CONN RECFG

RRC CONN RECFG CMP

SAEB SETUP RSP

Direct transmission (authentication and service negotiation)

Direct transmission (service negotiation and notification)

RRC_UE_Cap_Enquiry

RRC_UE_Cap_Info

RR

C

con

nectio

n se

tuE

-RA

B se

tup

Ded

icate

d

bearer se

tup



Service Request

UE E-NODEB MME

RRC CONN SETUP REQ

RRC CONN SETUP

RRC CONN SETUP CMP

INITIAL UE MESSAGE

INITIAL UE CONTEXT SETUP REQ

RRC SECURITY MODE CMD

RRC SECURITY MODE CMP

RRC CONN RECFG

RRC CONN RECFG CMP

INITIAL UE CONTEXT SETUP RSP

SAEB SETUP REQ

RRC CONN RECFG

RRC CONN RECFG CMP

SAEB SETUP RSP

Direct transmission (authentication & service negotiation)

Direct transmission (service negotiation & notification)

PAGING

RRC PAGING

Uplink information transfer UPLINK NAS TRANSPORT

The RRC connection setup cause

values are:

Mo-data

Mt-Access

The Service Request procedure

consists of three steps:

Random access

RRC connection setup

e-RAB setup

The EPC has obtained the registration

information and capability information of

the UE. Therefore, The Service Request

procedure does not contain the

authentication and UE capability query.

After attaching to the network, if the UE returns to the idle mode, the UE initiates the Service Request procedure to perform a service.



TAU ProcedureA tracking area (TA) is used to manage the UE

location. Multiple TAs constitute a TAL. After the

UE attaches to the network, the MME assigns

TAL resource to the UE. If moving out of the

local TAL, the UE performs TAU. A UE in idle

state performs periodic TAU.

The TAU procedure requires no authentication

and bearer setup. After the TAU procedure is

complete, the connection is released.

The RRC connection setup cause value is

Mo-Signaling.

The TAU procedure consists of three steps:

Random access

RRC connection setup

TAU


Details of the Access Procedure –

Random Access Procedure (I)

Objectives of random access

- Synchronizing uplink transmission

- Obtaining uplink scheduling resources

Scenarios of random access

- Initial access in idle mode

- RRC reconnection upon radio link failure

- Handover to new cells

- Downlink data transmission in uplink unsynchronized state

- Uplink data transmission in uplink unsynchronized state

Two types of random access

- Contention-based (applicable to all scenarios)

- Contention-free (applicable to handover or downlink data transmission)



Random Access Procedure (II)

UE eNB

Random Access Preamble1

Random Access Response 2

Scheduled Transmission3

Contention Resolution 4

UE eNB

RA Preamble assignment0

Random Access Preamble 1

Random Access Response2

Contention-based random access Contention-free random access

Differences of contention-based and contention-free random accesses Preamble selection

The preamble is selected by the network for contention-free random access.

The preamble is randomly selected by the UE for contention-based random access.

Contention conflict risk

Contention-free: The network ensures no conflict for a certain time.

Contention-based: Conflict risk is generated.



Random Access Procedure (III)

Huawei eNodeB supports the following

configurations:

Preamble formats 0 to 3

PRACH periods: 10ms, 5ms

Random access procedure: contention-based and

contention-free

RACH

Slot

RACH

Slot

PUCCH

PUCCH

PRACH

Frame (10 ms )

RACH period (5ms)

Preamble

format 0

1ms

6 RBs

Time

Fre

qu

en

cy

PUSCH

PRACH CONFIGURATION INDEX = 6


Details of the Access Procedure – RRC

Connection Setup Procedure (I)

Objectives

- To set up SRB1.

- The UE sends the initial NAS message to the network.

Key Information Elements

- UE-identity (RRCConnectionRequest and RRCConnectionSetup)

- establishmentCause (RRCConnectionRequest)

- radioResourceConfiguration for Only SRB1 (RRCConnectionSetup)

- selectedPLMN-Identity (RRCConnectionSetupComplete)

- nas-DedicatedInformation (RRCConnectionSetupComplete)

RRCConnectionSetup

RRCConnectionRequest

UE EUTRAN

RRCConnectionSetupComplete

RRCConnectionReject

RRCConnectionRequest

UE EUTRAN

RRC connection success procedure

RRC connection failure procedure



Connection Setup Procedure (II)

Content of the RRC_Conn_Req message Cause values of the RRC_Conn_Req message

The ue-Identity of the RRC_Conn_Req message is S-TMSI if the S-TMSI stored in the UE is a valid

value or a random value if else.

The establishmentCause of the RRC_Conn_Req message depends on the type of the NAS

procedure. Different NAS procedure corresponds to different establishmentCause.

The Extended Service Request of the NAS procedure is used for CS fallback of a voice service.



Connection Setup Procedure (III)

Counters measured during the RRC connection setup procedure

[Point A] When the cell receives the RRC Connection Request message, the counter L.RRC.ConnReq.Att increments by 1.

[Point B] When the cell receives the RRC Connection Request message and delivers the RRC Connection Setup message

to the UE, the counter L.RRC.ConnSetup increments by 1.

[Point C] When the cell receives the RRC Connection Setup Complete message, the counter L.RRC.ConnReq.Succ

increments by 1.


Details of the Access Procedure – NAS

Procedure1. The MME initiates the AKA procedure and sends the AUTH

REQ message that contains the RAND and AUTN necessary

for authentication.

2. The UE receives the AUTH REQ message and sends the

AUTH RES message containing the RES parameters.

3. If the MME receives the AUTH RES message, it triggers the

security-mode procedure; if it fails to receive the AUTH RES

message, it sends the AUTH REJ message.

4. Upon reception of the SMC message, the UE does the following:

a) Calculates the KnasEnc and KnasInt according to the Selected NAS

security algorithms IE of the SMC message.

b) Checks the validity of the UE security capabilities and KSI IEs. If valid,

the UE sends the MME SECURITY MODE COMPLETE message; if

invalid, the UE sends the SECURITY MODE REJECT message.

UE E-NODEB MME

RRC CONN SETUP REQ

RRC CONN SETUP

RRC CONN SETUP CMP

INITIAL UE MESSAGE

S1AP_DL_NAS_TRANS

S1AP_DL_NAS_TRANS

S1AP_UL_NAS_TRANS

S1AP_UL_NAS_TRANS

Initial_Context_Setup_request

Authentication

Encryption

The NAS procedure is an interaction between the UE and EPC, including authentication, security-mode procedure,

identity procedure, and APN procedure. The authentication procedure generates a new set of keys; the security-mode

procedure validates the security context generated from the new keys; in the identity procedure, the EPC obtains

necessary information from the UE.

During the NAS procedure, the eNodeB transparently transmits the uplink and downlink messages, except that the

eNodeB needs to select a EPC node for S1 Flex or MOCN network.

The following describes the authentication and security-mode procedures:


Details of the Access Procedure – e-RAB

Setup Procedure (I)

Key information elements- SAE Bearer Level QoS parameters (contained in the context request

message)

- Transport Layer Address (contained in the context request and response

messages)

- NAS-PDU (contained in the context request message)

- Security key (contained in the context request message)

- UE Radio Capability (contained in the context request message, optional)

Counters measured during e-RAB setup [Point A] When the eNodeB receives the INITIAL CONTEXT

SETUP REQUEST or E-RAB SETUP REQUEST message from

the MME, the number of e-RAB setup attempts increments by 1.

If the message requires setup of multiple e-RABs, the counter is

separately calculated for each QCI and the calculation results of

all QCIs are summed up.

[Point B] When the eNodeB receives the INITIAL CONTEXT

SETUP RESPONSE or E-RAB SETUP RESPONSE message

from the MME, the number of successful e-RAB setups

increments by 1. If the message requires setup of multiple

e-RABs, the counter is separately calculated for each QCI and

the calculation results of all QCIs are summed up.



Setup Procedure (II)

- When the UE initiates the Attach procedure, the Initial Context Setup Request message sent by the

EPC does not contain the UE capability. The eNodeB queries the UE about UE capability; the UE

reports UE capability to the eNodeB; and the eNodeB sends the UE capability contained in the UE

Capability Indication message over the S1 interface to the EPC.

- During the Attach procedure, failure of the UE capability query procedure causes e-RAB setup failure.

- During the Idle-to-active procedure, the EPC sends the Initial Context Setup Request message

containing the UE capability to the eNodeB. The eNodeB does not need to query the UE capability,

saving the Uu interface resources.

UECapabilityInformation

UECapabilityEnquiry

UE EUTRAN MME

UE Capability Ind



Setup Procedure (III)

Objectives

- The security mode procedure is used to activate the encryption and integrity protection at

the access stratum. Note that the security mode of the access stratum and that of the

NAS are two independent procedures.

- There are three algorithms: null encryption, AES, and Snow 3G.

Time to start the security mode

- After setting up SRB1 and before setting up SRB2

- For the security protection, the protection is started by the security mode command or

security mode complete message; encryption is started by the message next to the

security mode procedure.

- Integrity protection is used by SRB and encryption is used by SRB and DRB.

SecurityModeComplete

SecurityModeCommand

UE EUTRAN

SecurityModeFailure

SecurityModeCommand

UE EUTRAN

Security mode success procedure Security mode failure procedure



Setup Procedure (IV)

Objectives

- During the access procedure, the SRB2 and DRB are set up in the RRC connection reconfiguration

procedure.

- If the reconfiguration fails, the UE initiates the RRC connection reestablishment procedure.

Key information elements

- radioResourceConfiguration (for SRB2 and possibly DRBs) (contained in the default bearer setup)

- nas-DedicatedInformation (contained in the default bearer setup)

The RRC connection reconfiguration is used to configure the following:

- measurementConfiguration (contained in the measurement control)

- mobilityControlInformation (contained in the handover command)

RRCConnectionReconfigurationComplete

RRCConnectionReconfiguration

UE EUTRAN

RRC connection re-establishment


UE EUTRAN


Introduction to Access Procedure

Symptoms of Access Problems

Analyzing Causes of an Access Problem and

Processing Data Sources

Checklist and Deliverables of an Access Problem

Case Study

Contents


Overview of Access Problems

An access failure occurs if a UE initiates a service but fails to set up the service.

Measurement of access failures

The access failure is measured by two counters: RRC connection setup success rate and e-RAB

setup success rate. The access success rate is obtained by multiplying the two.

The random access procedure is not measured by the access setup success rate due to the random

nature.

The NAS failure is not measured by the RRC connection setup success rate.

Therefore, the access success rate in the traffic statistics cannot fully reflect the user experience.

Measurement of access failures during a drive test

In a drive test, the messages are traced on both the eNodeB and UE. An access success or failure

can be determined by checking the signaling messages.

The drive test software such as Huawei UE Probe automatically determines an access failure and

calculates the access success rate.

In contrast to traffic measurement, the drive test measurement identifies an access failure caused by

NAS failure or by random access failure.


Symptoms of Access Problems –

Random Access Failure

Symptoms of a random access failure

The symptom is that the eNodeB fails to receive the RRC Connection Request message. A random

access failure can be inferred by only examining the traffic statistics; no L3 message is traced by the

eNodeB. Some details of a random access failure can be observed on a test UE.

Causes of a random access failure

- The UE does not support some specific band.

- The UE is frequency-locked; the test UE uses some special bandwidth parameters.

- The UE is at the cell edge and the uplink and downlink path loss is large.

- The cell is sleeping.

Symptoms of a sleeping cell

The DSP CELL command output shows that the cell status is normal. No user accesses

the cell, no alarm. Traffic measurement shows that the number of RRC connections is 0,

which indicates either cell exception or no users in the cell. History traffic measurement

shows that there were UEs accessing the cell but beginning from a certain moment, no UE

accesses the cell.


Symptoms of Access Problems – RRC

Connection Setup Failure

The symptoms of an RRC connection setup failure on the eNodeB are as follows:

- After delivering the RRC_CONN_SETUP message, the eNodeB fails to receive the RRC_CONN_SETUP_CMP message.

- The eNodeB sends the RRC_CONN_REJ message, indicating that the eNodeB is faulty.

The following figure shows the messages of these two failures over the Uu interface.

Counters of the RRC connection setup failures


Symptoms of Access Problems – NAS

Failure The NAS procedure consists of all interactions beginning from the Ue_Initial_Message sent by the eNodeB to the

Initial_Ue_Context_Setup_Req message sent by the EPC.

The symptoms are as follows:

- In case of an authentication failure, the EPC sends the release message that is not sensed by the eNodeB.

- In case the direct message between the UE and EPC fails to be transmitted over the Uu interface, the failure is sensed

by the eNodeB and the eNodeB sends the release request to the EPC.

- Absence or slowness in response of the EPC is sensed by the eNodeB. The eNodeB sends the release request to the EPC.


Symptoms of Access Problems – e-RAB

Setup Failure (I)

An e-RAB setup failure occurs if any step of the e-RAB setup procedure beginning from

reception of the Initial_Ue_Context_Setup_Req or E-RAB SETUP REQUEST message to

sending of a response message fails.

Symptoms of an e-RAB setup failure over the Uu interface are as follows:

During the security procedure, the UE does not send the Complete message or sends a failure

message.

During the DRB setup reconfiguration, the UE does not send the Complete message or initiates

a reconnection.

During the UE capability query, the UE does not reply.

Counters of the e-RAB setup failure


Symptoms of Access Problems – e-RAB

Setup Failure (II)

Symptoms of an e-RAB setup failure over the S1 interface are as follows:

The GTP-U resource request fails.

The EPC is exceptional, such as delivering incorrect parameters.

The radio resource request fails.







Case Study

Contents


Troubleshooting the Access Problem by

Analyzing the Data Sources

Step 1: Determine the scope of the access problem: Analyze the traffic statistics to

determine the scope of the access problem, whether it is a top-cell or top-site problem,

entire-network problem, comprehensive problem, or top-terminal/top-UE problem.

Note: 1. The analysis method varies for different scenarios. In a scenario of degraded performance after upgrade, you need to

compare the differences before and after the upgrade to determine the scope of the degradation. In a scenario of

inventory optimization where the access performance is below expectation or to be improved, you need to determine the

region of performance degradation.

2. The access problem of a top cell, entire network, or a comprehensive problem can be analyzed by using the traffic

statistics. The performance degradation of some terminal types or some UEs is analyzed by using the CHR.

Step 2: Classify the causes of an access problem: Analyze the data sources to classify the

causes of an access problem.

Step 3: Do as required by the checklist: Do as required by the checklist to determine the

root cause and the closing action.

Note: The checklist is described in the next chapter.

Step 4: Close the problem: Close the problem and evaluate the result. If the result is

unsatisfactory, repeat the preceding steps.


Determining the Scope of an Access Problem –

Principles of Selecting Top Cells, Sites, etc

The principles of selecting top cells or sites vary for different scenarios.

Scenario 1: Performance degradation in the time dimension: After an upgrade, the access

performance degrades, or degrades suddenly due to unknown reasons.

Principles: Calculate the difference of the counters (access success rate and access failure

count) before and after the upgrade of each cell. Sort the cells by the difference of the

access success rate and the difference of the access failure count to obtain the top cells of

degraded access success rate and top cells of access failure count. The principles of

selecting top terminal types and top UEs are similar.

Scenario 2: Performance degradation in an inventory optimization: The access

performance of the live network is below expectation and needs to be optimized to the

target value.

Principles: Sort the cells by the access failure percentage and access failure count to

obtain the top cells of degraded access success rate and top cells of access failure count.

The principles of selecting top terminal types and top UEs are similar.


Determining the Scope of an Access

Problem – Criteria

Top-cell problem: After one-fifth of the top cells of low access success rate and high access

failure count are removed from calculation of the entire-network access performance, if the

performance is significantly improved to the expected value, the access problem is defined

as a top-cell problem.

Entire-network problem: After one-fifth of the top cells is removed from calculation of the

entire-network access performance, if the performance is not significantly improved, the

problem is defined as an entire-network problem.

Comprehensive problem: After one-fifth of the top cells is removed, if the access

performance is improved a little to a value slightly below the expected value, the problem is

defined as a comprehensive (top-cell plus entire-network) problem.

Top-terminal or top-UE problem: After one-fifth of the top terminals or top UEs are removed

from calculation of the entire-network access performance, if the performance is

significantly improved to the expected value, the problem is defined as a top-terminal or

top-UE problem.

Note: Currently, the CHR of the LTE system provides no information about the terminal type. The terminal type is provided by

complaining users or inferred from the symptoms.


Classifying the Causes of Access

Problems

After determining the scope of the access problem, analyze

the following data sources to infer the causes of the problem:

Traffic measurement

Signaling

Drive test data


Analyzing the Traffic Statistics to Infer

Causes

Analyzing the traffic statistics

- Determine whether the RRC connection setup procedure, e-RAB setup procedure, or

both, is faulty.

- In case of faulty RRC connection setup procedure, analyze the traffic statistics to derive

the causes of the failure.

- In case of e-RAB setup failure, analyze the traffic statistics to derive the causes.


Detecting Sleeping Cells by Analyzing

Traffic Statistics (I)

Obtain the following counters from the M2000 at a period of hours for a duration

of one week.

Analyze the traffic statistics

Check the traffic statistics of the latest one week for change of user access, taking into

account the differences of weekdays and weekends. If the cell used to work normally but,

beginning from a certain moment, user access is suddenly absent or gradually decreases

to zero and the number of random access preambles is unchanged, this cell is very likely

a sleeping cell.

1526726658 L.RRC.ConnReq.Att Number of received RRC Connection Request messages

(excluding retransmission)

1526727215 L.RA.GrpA.Att Number of received contention-based preambles (Group A)

1526727218 L.RA.GrpB.Att Number of received contention-based preambles (Group B)

1526727216 L.RA.GrpA.Resp Number of transmitted RARs to contention-based

preambles(Group A)

1526727219 L.RA.GrpB.Resp Number of transmitted RARs to contention-based preambles

(Group B)

1526727221 L.RA.Dedicate.Att Number of received contention-free preambles

1526727222 L.RA.Dedicate.HO.Att Number of received contention-free preambles (handover)

1526727223 L.RA.Dedicate.Resp Number of transmitted RARs to contention-free preambles

1526727224 L.RA.Dedicate.HO.Resp Number of transmitted RARs to contention-free preambles

(handover)

1526727225 L.RA.Dedicate.HO.Msg3Rcv Number of received MSG3 messages triggered by handover

mk:@MSITStore:C:/Documents and Settings/g00159156/桌面/lte-pfm-reference.chm::/lte/lte-enodeb-performance-counter-reference/MtType_1.html











Detecting Sleeping Cells by analyzing

Traffic Statistics (II)

Examples of traffic statistics indicating sleeping cells

Example 1: Access request is absent beginning from a certain moment; the number of contention-

based random access preambles increases abruptly and the number of RARs is 0.

Example 2: Access request is absent beginning from a certain moment; the number of contention-

based random access preambles and the number of RARs are unchanged.


Detecting Sleeping Cells by Analyzing

Traffic Statistics (III)

Examples of traffic statistics indicating sleeping cells

Example 3: User access is absent from a certain moment, the number of contention-based random

access preambles decreases to 0; the number of dedicated preambles increases abruptly;

the number of RARs to dedicated preambles is 0.

After detecting a sleeping cell, send the original traffic statistics of the latest one week, Uu

interface trace, S1 interface trace, X2 interface trace, and one-click log of the LMPT and

LBBP to the R&D department for technical support.


Analyzing the Signaling Trace to Derive

Causes of Access Failures

The signaling trace clearly shows at which step the access procedure fails and is very

effective for diagnosing a drive test problem or reproducible problem. The two constraints

are that the trace must be started before the problem occurs and manual analysis is

required.

- Standard interface trace (a major means): Analyze the traffic statistics to derive the top

cells and top time segments. Start standard interface trace for the top cells and at the top

time segments to check at which step the access procedure fails.

- Single-UE entire-network trace (a minor means): Use the TMSI of a top UE as an input to

obtain the IMSI from the EPC. Star the user trace in the entire network. This means is

effective for guaranteeing services to VIP users.

- Cell trace (a minor means): Start cell trace for the top cells and at the top time segments

to determine the link quality and scheduling of the failed UE.


Analyzing Drive Test Data to Derive

Causes of Access Failures

Compared with the signaling trace of the eNodeB, the benefits of drive test data are that in

addition to signaling trace, signal strength and scheduling information are available,

depending on the drive test software and terminal type. The disadvantage is that in terms

of signaling trace, only Uu interface trace is available. Therefore, signaling trace and drive

test usually work together.

- Determine whether it is an NAS or AS problem: Analyze the signaling procedures to determine

whether it is an NAS or AS problem. An NAS problem is indicated by a failure at the NAS, such as

authentication failure, and is strongly correlated to subscription.

- In case of an AS problem, determine whether it is an L3 problem. An L3 problem is indicated by reply

of a failure message or no reply. A problem below L3 is indicated by scheduling failure or poor

signaling strength that leads to message transmission failure.

- In case of an L3 problem, a common cause is failure of the security procedure. Check consistency of

the security algorithm settings on the eNodeB and UE.

- In case of a problem below L3, check the RSRP and SINR of the venue to determine whether the

problem is caused by interference or weak coverage. If the RSRP and SINR are normal, send the

tracing result to the R&D for further analysis.







Case Study

Contents


TOP-Cell Access Problem – Checklist

Standard Action Analysis Action Deliverables Closing Action

Check device, alarm,

and version

1. Use the OMStar to quickly check device fault and alarms.

Where OMStar is not installed, manually check the top

sites for alarms.

2. CPU overload leads to flow control and access failure.

3. Check whether the versions of the BBU and RRU of the

top cells are special.

1. Fault and alarm analysis

result

2. Device fault and alarm

clearance methods

3. Version check result

1. Remove the device

fault and alarm.

2. Write a summary and

case study.

Classify failure

causes by using

traffic measurement

and signaling trace

1. Analyze the traffic statistics to determine the scope and

causes of the failure.

2. Analyze the signaling trace to determine failure step.

1. Scope and causes of the

access failure

2. Closing actions or extra

analysis actions to be taken

Remove the fault

according to the scope

and cause of the

problem.

Check all parameters Check the correctness and consistency of the parameters

1. Parameter correctness and

consistency analysis report

2. Parameter adjustment and

optimization solution

1. Adjust and optimize

parameters.


case study.

Check coverage

1. Analyze the Dmrs_sinr contained in the CHR to check for

weak coverage.

2. Analyze the drive test data, coverage evaluation report,

and regional analysis result to check for coverage overlap

and weak coverage.

1. Signal strength of top UEs

2. Coverage evaluation and

analysis result

1. Optimize the

coverage.


case study.

Check interference

1. Check for inter-modulation interference and external

interference of the top cells. For details, see the LTE RF

Channel Inspection and Troubleshooting Guide.

1. Result of interference check

1. Determine and clear

the interference

source.

Check special

scenario, analyze KPI

change

1. Check the special features of the site, for example, dual-

band networking, SingleRAN, UMTS/LTE co-antenna,

macro site, special frequency, wide coverage, hot spot, or

special transmission region.

2. Analyze the trend of KPI deterioration and the trend of

traffic increase.

1. Analysis of special scenarios

2. Comparative chart of KPIs

and traffic

1. Analyze the root

cause


Entire-Network Access Problem –

Checklist Standard Action Analysis Action Deliverables Closing Action

Classify failure

causes by analyzing

the traffic statistics

1. Analyze the traffic statistics to determine the scope

and cause of the access failure.

2. Observe the signaling procedures to determine at

which step the failure occurs.

1. Scope and causes of the

access failure

2. Closing actions or extra

analysis actions to be taken

Remove the fault according to

the scope and cause of the

problem

Check all parametersCheck the correctness and consistency of the

parameters.

1. Parameter correctness and

consistency analysis report

2. Parameter adjustment and

optimization solution

1. Adjust and optimize

parameters.

2. Write a summary and case

study.

Check for alarms

Use the OMStar to quickly check device fault and

alarms. Where OMStar is not installed, manually

check the top sites for alarms.

1. Fault and alarm analysis result

2. Device fault and alarm

clearance methods

1. Remove device fault and

alarms.

2. Write a summary and case

study.

Check events and

operations

1. Analyze events that cause access performance

deterioration, such as EPC upgrade, change of the

transmission topology, upgrade of transmission

devices, release of new terminal types, release of

new services, and construction of new frequencies.

2. Check whether traffic increase is caused by

holidays or carnivals and whether traffic decrease

is caused by bad weather or disaster.

Scenario report

Analyze the events and

operations in the context of

the scenario. If the problem is

caused by transmission

network or EPC, ask the

concerned departments for

support.

Evaluate traffic trendAnalyze the trend of KPI deterioration and the trend of

traffic increase.

Comparative chart of the KPI and

traffic

1. Analyze the root cause.

2. Consider the need for

capacity expansion.

The checklist of a comprehensive problem (entire-network plus top-cell access problem) is a

combination of the checklist for the entire-network problem and the checklist for top-cell problem.


Checklist for the Access Problem in a

Beta Office

Symptom Troubleshooting

After the eNodeB reports the

initial UE message, the EPC

delivers the release command.

Check whether the TAC configuration on the eNodeB is consistent

with that on the EPC. If inconsistent, the access fails.

After the RRC connection is

set up successfully and NAS

procedure is initiated, the EPC

delivers the release command.

Check the correctness of the subscription information.

Check whether the authentication switch is on. Some UEs work

only if the authentication switch is on, as specified by the protocols.

Check whether the UE supports the encryption and integrity

protection algorithms configured on the EPC.

If CSFB is disabled, check whether the UE is set to PS only. If not

set to PS only, the UE performs combined Attach procedure.

After the eNodeB delivers the

security mode, the UE returns

security mode failure or does

not reply.

Run the LST ENODEBCIPHERCAP and LST

ENODEBINTEGRITYCAP commands on the eNodeB to check

whether the integrity protection and encryption algorithms on the

eNodeB are consistent with those on the UE.

The access succeeds but the

user plane is not connected or

is released in 5 minutes.

Check whether the IP path of the S1 interface is consistent with the

configuration on the EPC.

The access problem is common in a beta office due to configuration errors. Though easy to diagnose, this problem is time-consuming. The following table lists troubleshooting actions for quickly solving most problems.


Suggestions for Solving a Coverage

Problem

The symptom is poor link quality caused by unbalanced uplink and downlink or weak

coverage.

- The symptoms of poor uplink are minimum RB count, MCS 0, PHR below 0 dB, high

uplink BLER, high CRC error rate, and negative SINR as shown in the CHR.

- The symptoms of poor downlink are poor CQI or the HARQ receives a lot of DTX and

NACK messages from the UE.

- Insufficient uplink means that the uplink is poor and the downlink is satisfactory;

insufficient downlink means that the uplink is satisfactory and the downlink is poor. Weak

coverage means that both the uplink and downlink are poor.

In case of insufficient uplink, the solutions are as follows:

- Add eNodeBs, reduce uplink path loss, add TMAs, add uplink signal compensation.

In case of insufficient downlink, the solutions are as follows:

- Add eNodeBs, reduce downlink path loss, increase pilot power, increase the radius of

downlink cell coverage.

In case of insufficient coverage, the solutions are as follows:

- Add eNodeBs, increase coverage.


Deliverables

If the front-line engineers are unable to solve the problem, they must collect the

following deliverables and submit the problem to the R&D engineers for support.

- Result of troubleshooting actions required in the Checklist.

- Latest data configuration files and engineering parameters of the top sites.

- eNodeB version and patch information, terminal information

- Original traffic measurement of the top sites for a duration of one week.

- Peak-hour standard interface trace of the top sites for 2 to 3 hours

- Time of changes or operations made on the live network

- One-click log of the top sites

- Both the drive test log and eNodeB standard interface trace in case of a drive

test.


Contents






Case Study


Case Study 1: KPI Deterioration Caused

by Inter-Modulation Interference (I)

The access KPI of cell 1 of an eNodeB in Germany is poor. The figure on the right shows the traffic

statistics.

1. The traffic statistics shows that the cause of all RRC connection setup failures is L.RRC.Setup.NoReply.

The possible cause is weak coverage or uplink interference.

2. The alarm console shows that the channel unbalance alarm is reported for cell 1 several minutes after

cell activation and is cleared after cell deactivation. The customer has replaced the RRU for three time

but the channel unbalance alarm persists.

3. The online spectrum scanning shows that the uplink has severe inter-modulation interference and

channel unbalance, leading to degraded KPI.


Case Study 1: KPI Deterioration Caused

by Inter-Modulation Interference (II)

The spectrum scanning shows that channel

1 has severe inter-modulation interference.

The operating frequency of the cell is 842

to 852 MHz.

The red box in the upper figure is magnified

in the lower figure. The green curve is the

maximum reception level of channel 0 that

does not fluctuate significantly; the white

curve is the maximum reception level of

channel 1. In the entire reception band, the

level is high on the left and low on the right,

a characteristic of inter-modulation

interference in the 800-MHz band.


Case Study 2: Incorrect TAC Planning Causes Increased

Number of RRC Connection Setups (I)

The number of RRC connection setups in Dresden,

Germany increases by ten folds after December 13,

2011, but the number of e-RAB setups does not

increase.

1. The traffic statistics shows that the number of

RRC connection setups increases but the RRC

setup success rate does not, indicating that the

increase is not caused by the RRC connection

setup failure. The traffic statistics also shows

that the problem is not global, but is confined to

BXL641, BXLL70, OXLG63, and OXL529.

2. The alarm console and operation log of the top

three sites show no exception.

3. Uu interface trace and S1 interface trace of the

top three sites show that the UEs in OXLG63

repeatedly perform TAU from 43000 to 43620,

as shown in the figure on the right.

4. This analysis shows that the abrupt increase of

the number of RRC connection setups is

caused by frequent TAU.

http://wxsolutions/casebase/ewebeditor/uploadfile/20120107145642001.jpg





Case Study 2: Incorrect TAC Planning Causes Increased

Number of RRC Connection Setups (II)

Possible causes of frequent TAU are:

1 Incorrect TAC planning leads to pingpong TAU.

2 Short TAU period

3 Exceptional terminal

The root cause can be found from the topology of the top

sites.

Due to incorrect TAC planning, the UE repeatedly performs

cell reselection and TAU.








Case Study 3: IP Path Is Not Configured,

Leading to e-RAB Setup Failure (I)

The traffic statistics of a site shows that the e-

RAB setup success rate is low. The following

steps are performed according to the checklist:

1. Check for alarms. There is no alarm.

2. Check the transmission configuration. The IP

path configuration is consistent with the

planning.

3. Start the S1 interface trace. The trace result

shows that the e-RAB setup fails for some

UEs.

4. Observe

S1AP_INIT_CONTEXT_SETUP_FAIL

message. The cause value is transport

resource unavailable(0).


Case Study 3: IP Path Is Not Configured,

Leading to e-RAB Setup Failure (II)

Further analysis is as follows:

1. The cause value indicates that the failure is

not caused by the RAN but by the

transmission network. The IP address

contained in the

INIT_CONTEXT_SETUP_REQ message is

BA 14 05 14 and is inconsistent with the peer

IP address configured on the eNodeB, which

is BA 14 05 13. The figure on the right shows

the message.

2. We go on to determine whether the root

cause is missing configuration on the

eNodeB or incorrect fill-in by the EPC. We

consult the EPC engineers, who say the

interface address of the UGW is a logical

network segment and only one IP path is

configured on the eNodeB, leading to e-RAB

setup failure for some UEs.


Handover


Content

Handover Principle and Signaling Procedure

Symptoms of Handover Problems

Related Tools and Data Collection

Handover Fault Location and Troubleshooting

Routine Troubleshooting Operations and Deliverables for

Handover Problems

Page 51


Overview

Handover is a process of connection interaction exchanges between the UE and the

network when the UE roams, as shown in the following figure.

The whole handover process in the Long-Term Evolution (LTE) system is controlled by

the eNodeB, and the eNodeB needs to monitor the radio quality of the environment

where the UE is located. In a handover, the eNodeB sends a measurement

configuration message to the UE and then the UE sends a measurement report to the

eNodeB once conditions for triggering measurement report are met.

Triggering a handover: Currently, Huawei eNodeBs use event A3 to an intra-

frequency handover and events A2 and A4 to trigger an inter-frequency handover.

Implementing a handover: The eNodeB sends a handover command to the UE and

the UE disconnects from the serving cell and then hands over to the target cell after

receiving the handover command.


Handover Type

Handover in the LTE system

Intra-RAT handover

Carrier frequency relationship:

Intra-frequency handover

Inter-frequency handover

Signaling bearing mode:

Intra-eNodeB handover

Intra-MME X2 handover (if the X2 interface is

available)

Intra-MME S1 handover (if the X2 interface is

unavailable)

Inter-MME S1 handover over the X2 interface (if

the X2 interface is available)

Inter-MME S1 handover over the S1 interface (if

the X2 interface is unavailable)

Inter-RAT handover


Measurement Event


Parameter Configuration (Intra-Frequency

Handover)

Handover-related parameters are used to control the reporting time and

difficulty level of a handover in the measurement report. For details about

handover parameter configuration, see

eRAN3.0 Handover-related MML Command Configuration Guide

Intra-frequency handover triggering process (event A3)

Mn indicates the measured Reference Signal Received Power (RSRP) and

Reference Signal Received Quality (RSRQ) of neighboring cells.

Ofn indicates the frequency offset.

Ocn indicates the neighboring-cell offset to be configured through the

neighboring relationship.

Ms indicates the measured RSRP and RSRQ of the serving cell.

Ofs indicates the offset of the serving frequency.

Ocs indicates the offset of the serving cell.

Hys indicates the hysteresis, closely related to the service feature and the

mobility speed.

Reducing the probability of ping-pong effect.

Off a3-Offset

Mn, Ms is in units of dBm or dB. Other parameters are in units of dB.

2.A3 Measurement

Reports

3RRC Conn. Reconf. incl.

mobilityControlinformation

4 RRC Conn. Reconf. Complete

UESource

eNB

Target

eNB

1.A3 Measurement Control

Handover preparation

Random access procedure

Measure RSRP/RSRQ

A3 EventTriger

OffOcsOfsMsHysOcnOfnMn


Parameter Configuration (Inter-Frequency

Handover)

Inter-frequency handover triggering

processEvent A2 triggers the GAP measurement.

Two GAP modes

In a period of 40 ms (default) or 80 ms

Event A4 triggers the inter-frequency handover.

2. A2 Report

UESource

eNB

Target

eNB

1.A3/A1/A2 Measurement

Control

Handover preparation

Measure RSRP/RSRQ

A2 EventTriger

3 A4 Control

GAP measurement

A4 EventTriger

4 A4 Reports

ThreshHysMs

ThreshHysOcnOfnMn

Page 56


Intra-eNodeB Inter-Cell Handover

Legend

UE Source Cell Target Cell Serving Gateway

Detach from old cell and

synchronize to new cell

Buffer packets from

MME

L3 signalling

L1/L2 signalling

User Data

Handover

Com

ple

tion

Handover

Execu

tion

Han

dove

r P

repara

tion

MME/MMEs

Area Restriction Provided

1. Measurement Control

packet data packet data

UL allocation

2. Measurement Reports

DL allocation

4. Handover Command

5. Synchronization

6. UL allocation + TA for UE

7. Handover Confirm

eNB

packet data

3. HO decision

packet data

Flush DL buffer, continue

delivering in transit packets

packet datapacket data


Measurement Reports

Handover Command

Handover Confirm

The UE sends a measurement report

to the eNodeB in the serving cell.

The eNodeB sends a Handover Command

message after completing cell admission

control and radio resource allocation in the

target cell.

The UE accesses the target cell.

After the handover is complete,

resources of the serving cell are

released.

(RRC_CONN_RECFG)

(RRC_CONN_RECFG_CMP)

Page 57


Inter-eNodeB X2 Handover (I)

Legend


UL allocation


3. HO decision

4. Handover Request

5. Admission Control

6. Handover Request Ack

7. Handover Command

DL allocation

Data Forwarding

UE Source eNB Target eNB Serving Gateway

Detach from old cell

and synchronize to new

cell

Deliver buffered and in transit

packets to target eNB

Buffer packets from Source eNB

9. Synchronisation


L3 signalling

L1/L2 signalling

User Data


Handove

r E

xecu

tion

Han

dove

r P

repara

tion

MME

0. Area Restriction Provided

SN Status Transfer8.

The UE has accessed to

the cell and performs

services. The source eNodeB sends a measurement

configuration message to the UE, instructing the

UE to start neighboring-cell measurement.

The UE sends a measurement report to

the eNodeB after detecting a cell that

meets handover conditions.

The source eNodeB

determines a handover

based on the handover

algorithm and current

state.

The source eNodeB sends a Handover Request message to the

target eNodeB, starting the handover preparation. The Handover

Request message carries the service information and other

access layer information (encryption, integrity, and measurement

information) of the current UE.

The target eNodeB sends a message carrying the admission result

and radio resource configuration information to the serving eNodeB,

completing the handover preparation.

After receiving the Handover Request message

from the serving eNodeB, the target eNodeB

performs admission control based on the service

information and radio resource configuration

carried in the message.

The source eNodeB forwards the

container from the target eNodeB to the

UE through the radio message to notify

the UE of a handover.

Sequence number (SN) information is used to

transfer the SN status and hyper frame number

(HFN) information of the UE from the source

eNodeB to the target eNodeB for the purpose of

data retransmission and encryption integrity

protection. The SN information only applies to

the RLC AM mode in Huawei.

Start data forwarding.

(RRC_CONN_RECFG)

Page 58


Inter-eNodeB X2 Handover (II)

11. Handover Confirm

17. Release Resource

12. Path Switch Request


packet data

Data Forwarding

Flush DL buffer, continue delivering in -transit packets

packet data

16.Path Switch Request Ack

18. Release Resources

Handove

r C

om

ple

tion

MME

13. User Plane update

request

15.User Plane update

response

14.Switch DL pathEnd Marker

End Marker

After completing the random access

procedure to access the target

eNodeB, the UE sends a Handover

Confirm message to the target

eNodeB.

After receiving the Handover Confirm message from

the UE, the target eNodeB initiates a path switch to the

MME to complete the user data transmission.

After completing the user-plane

handover on the S-GW, the MME

sends a PathSwitchRsp message to

the target eNodeB.

After the path switch procedure, the target eNodeB

instructs the source eNodeB to release related

resources, completing the whole handover procedure.


Page 59


Inter-eNodeB S1 Handover (I)

Legend

3. HO decision

7. Admission Control


Detach from old cell and

synchronize to new cell

Deliver buffered and in

transit packets to target eNB

Buffer packets from

MME

L3 signalling

L1/L2 signalling

User Data

Han

dove

r E

xecu

tion

Han

dove

r P

repa

ratio

n

MME/MMEs

Area Restriction Provided

5. Handover Request

7. Handover Request Acknologe



UL allocation


4. Handover Reauired

8. Handover Command

DL allocation

9. Handover Command

10. eNB SN Status Transfer

Data Forwarding

12. Synchronization


11. MME SN Status Transfer

Data Forwarding

The source eNodeB sends a Handover

Request message to the MME.

The MME transfers the

Handover Request message

to the corresponding target

eNodeB for handover

preparation.

After completing admission control and

radio resource configuration, the target

eNodeB returns a Handover Ack

message to the MME.

The MME transfers the Handover ACK

message to the source eNodeB.

Similar to the handover over the X2 interface,

the source eNodeB sends the SN information

to the MME and the MME forwards the SN

information to the target eNodeB.

(RRC_CONN_RECFG)

Page 60



Handover

Com

ple

tion

MME/MMEs

15. Handover Notify

14. Handover Confirm

packet data

16. UE Context Release Command

16. UE Context Release Completed

Inter-eNodeB S1 Handover (II)

The UE accesses the

target eNodeB.

The target eNodeB notifies the

MME of the handover

completion.

The MME notifies the source

eNodeB of resource release.

The source eNodeB returns a resource

release completion message.


Page 61


Querying the Inter-frequency Handover

Capability of the UE

To query the UE capability in initial access, view the feature group indicators information

element (IE) in the UE CAPABILITY INFO IND message.

Bits 13, 14, and 25 indicate the inter-frequency handover capability of the UE. For details, see

3GPP TS 36.331.


Traffic Measurement Counters for Handovers

(Outgoing Handover)

Page 63

Counter measurement for intra-eNodeB handovers Counter measurement for inter-eNodeB S1 handovers

The following description of counter measurement uses the intra-eNodeB handover and inter-eNodeB S1 handover as

examples.

Point A: measures the number of outgoing handover attempts. This counter is incremented by 1 after the eNodeB receives the

measurement report and successfully determines a handover.

Point B: measures the number of outgoing handover executions. This counter is incremented by 1 after the eNodeB sends a

handover command to the UE.

Point C: measures the number of successful outgoing handovers. If an intra-eNodeB handover is performed, this counter is

incremented by 1 after the eNodeB receives the handover response message (RRC Reconfiguration completion message) from

the UE. If an inter-eNodeB S1 handover is performed, this counter is incremented by 1 after the eNodeB receives the UE

release message from the MME (or from the target eNodeB in case of an inter-eNodeB X2 handover).


Traffic Measurement Counters for Handovers

(Incoming Handover)

Page 64

Counter measurement for inter-eNodeB X2

handoversCounter measurement for inter-

eNodeB S1 handovers

The following description of counter measurement uses the inter-eNodeB S1 handover and inter-eNodeB X2

handover as examples.

Point A: measures the number of incoming handover attempts. This counter is incremented by 1 after the target

eNodeB receives a handover request.

Point B: measures the number of incoming handover executions. This counter is incremented by 1 after the target

eNodeB sends a handover acknowledge.

Point C: measures the number of successful incoming handovers. If an inter-eNodeB X2 handover is performed,

this counter is incremented by 1 after the target eNodeB sends a UE release message to the source eNodeB.

If an inter-eNodeB S1 handover is performed, this counter is incremented by 1 after the target eNodeB sends the

handover notification to the MME.


Content





Routine Troubleshooting Operations and Deliverables

for Handover Problems


Symptoms of Handover Problems (1/5)

Page 66

A handover problem occurs if a UE sends a measurement report based on configurations on the eNodeB but fails in

handover based on the handover procedure. By handover failure step, handover problems can be classified as follows:

The UE sends a measurement report but does not receive a handover command.

The eNodeB fails to receive a measurement report. The serving cell encounters a fault in sending uplink signals or

uplink messages.

The eNodeB receives a measurement report but does not sends a handover command due to internal admission

failure, lost handover messages over the S1 or X2 interface, or handover punishment. This problem must be caused

by a system problem and has nothing to do with the UE and the Uu interface.

The eNodeB sends a handover command which the UE fails to receive. The serving cell encounters a fault in

sending downlink signals or downlink messages.

The UE receives a handover command and the eNodeB does not receive a handover completion message.

The UE performs random access to the target cell and the eNodeB does not receive message 1.

The UE performs random access to the target cell. The eNodeB receives message 1 and the UE does not receive

message 2.

The UE performs random access to the target cell. The UE receives message 2 and the eNodeB does not receive

message 3.

The eNodeB receives a handover completion message and the subsequent procedure fails.

This problem seldom occurs and must be caused by a system problem, having nothing to do with the UE and the

Uu interface.

If a handover fails, service drops or RRC connection reestablishment occur in most cases.



Page 67

In this example, compare logs between the UE and the eNodeB to analyze symptoms

on the UE and the eNodeB in case of a handover failure.

The eNodeB does not receive the measurement report from the UE.

The UE sends a measurement report, which the UE does not receive. Symptoms on the

UE and the eNodeB are as follows:

Signaling on the UE Signaling on the eNodeB



Page 68

The UE does not receive a handover command from the eNodeB.

After the UE sends a measurement report, the eNodeB receives the measurement

report and sends a handover command, which the UE does not receive. Symptoms on

the UE and the eNodeB are as follows:

Signaling

on the UE

Signaling

on the

eNodeB



Page 69

The eNodeB does not receive a handover completion message from the UE.

After the UE sends a measurement report, the eNodeB receives the measurement report and sends

a handover command. After the UE receives the handover command and initiates access to the

target eNodeB, the target eNodeB does not receive the handover completion message. Symptoms

on the UE and the eNodeB are as follows:

The UE sends a handover completion message (RRC_Connection_Reconfiguration_Complete) to

the eNodeB. However, this message is lost over the Uu interface when transmitting at the lower

layer.

Signaling on

the UE

The target cell does not

receive a handover

completion message.

The serving cell

sends a handover

command.



Page 70

Summary: Problems of the Uu interface causing handover failures have many symptoms on the UE and

features a common characteristics, that is, not long (within 2s) after the UE sends a measurement

report, the UE resends an RRC_Connection_Request message or an

RRC_Connection_Reestablishment_Request message, or directly enters the idle state (capable of

receiving paging and system information only).


Content









Page 72

Some tools are used for problem analysis. For example, the M2000 is used to trace signaling and

replay service data of the eNodeB and the Probe is used on the UE side. UEs of other

manufacturers also have different analysis tools.

Common tracing tools: standard interface tracing on the Web LMT

and Probe

Web LMT

interface

Probe interface



Page 73

Signaling tracing

interface on the

M2000

Common tracing tool: signaling tracing on the M2000



Page 74

Data analysis tool: Probe

Probe which is used to trace

and analyze the data of

Huawei UEs

Traffic review tool which is

used to analyze traced

eNodeB data



Page 75

Confirming the Handover Measurement Configuration and Handover Measurement

Report

messages

Use the message query software to display the details.

Display the measurement Report

message. If the measID in this

message is the same as that in

the measurement configuration

message, the measurement report

corresponds to the measurement

event and the phyCellId is the

physical cell identifier (PCI) of the

target cell.

Display the RRC_Connection_Reconfiguration

message. If there is a measConfig ID, it is a

measurement configuration message and the measID

corresponding to the ReportConfigIdg is the handover

measurement ID.



Page 76

Confirming the Handover Command message

Use the message query software to display the last RRC_Connection_Reconfiguration message in the handover measurement

report. Use the traced message on the UE as an example.

Display the last RRCConnectionReconfiguration message in the handover

measurement report.

If the targetPhysCellId IE exists, the


message is a handover command.



Confirming the cell sending the Handover completion message

The cell sending the handover completion message can be simply

confirmed by viewing the traced file on the network side and by using

the message query software on the UE side. On the UE side, the

SystemInfomationBlockType1 message in the handover measurement

report over the Uu interface can be displayed to view details.

Double-click the SIB1 message in the handover

measurement report.

View the PLMN ID

and cell identity

(including the

eNodeB ID and cell

ID included in the

MML command on

the eNodeB).



Tool Name Function Approach

LMTUsed to perform X2 tracing, Uu tracing, S1

tracing, and single-UE tracing.

http://support.huawei.com/support

/

TraceViewerUsed to replay signaling messages traced on

Web LMT.


/

PROBE

Used to trace information of Huawei UEs,

including signaling information, scheduling

information, and signal quality.


/

LTE traffic

measurement

tool

Used to resolve traffic measurement data of the

eNodeB.


/


Content







Page 79


Handover Fault Location and Troubleshooting: Missing Neighboring

Cells

Handover failures caused by missing neighboring cells

Symptom: As the RSRP and SINR of the serving cell deteriorates, the RSRP of the

neighboring cell becomes better.

Solution: Manually add neighboring cells.

The UE sends a

measurement report but

does not receive a

handover command.

The eNodeB receives the

measurement report but

does not initiate a

Handover (no handover

request is sent over the X2

interface and no handover

command is sent over the

Uu interface).

Currently, many E-UTRANs are at the construction stage and the problem of missing neighboring cell occurs seriously, especially

because some eNodeBs are not running. As a result, eNodeBs without neighboring cells planned have some neighboring cells. The

problem of missing neighboring cells is the top one reason of handover failures.

Page 80


Fault Location and Troubleshooting: Non-

Timely Handover (1/2)

Handover failures caused by non-timely handovers

Symptom: When the radio quality of the neighboring cell meets the handover threshold,

the RSRP of the serving cell suddenly drops. Generally, a handover failure is caused by

a problem in the serving cell, for example, the eNodeB does not receive the

measurement report from the UE or fails to send a handover command.

Tracing results of the

eNodeB show that, after

sending a handover

command, the eNodeB

does not receive a

handover completion

message. or the eNodeB

does not receive the

measurement report from

the UE.

Tracing results of the UE

show that, after receiving the

handover command and

sending a handover

completion message, the UE

initiates RRC connection

reestablishment, or the UE

does not receive a handover

command.

Page 81

The E-UTRAN works in intra-frequency networking mode and does not support soft handovers. The intra-

frequency interference forms the largest challenge. Compared with the GERAN and UTRAN, the handover

area is much smaller and handover failures easily occur if handovers are not timely completed.


Fault Location and Troubleshooting: Non-

Timely Handover (2/2)

Solutions

If the interval from the time of neighboring-cell radio quality meeting the handover threshold to

the time of sudden dropping of the serving-cell radio quality is excessively short (for example,

smaller than 1s), and the interval from the time of neighboring-cell radio quality becoming better

than the serving-cell radio quality to the time of sudden dropping of the serving-cell radio quality is

excessively long (for example, larger than 2s), modify the CellIndividualOffset between the serving

cell and the neighboring cell to a value larger than 0 for an earlier handover (this method is used in

most cases).

If the interval from the time of neighboring-cell radio quality becoming better than the serving-cell

radio quality to the time of sudden dropping of the serving-cell radio quality is excessively short (for

example, smaller than 0.5s), modify the IntraFreqHoA3TimeToTrig for an earlier handover (this

method is not recommended).

If the same value of CellIndividualOffset needs to be modified for the serving cell and the

neighboring cell, modify the IntraFreqHoA3Hyst and IntraFreqHoA3Offset parameters for an earlier

handover (this method is not recommended).

Page 82


Handover Fault Location and

Troubleshooting: Weak Coverage

Handover failures caused by weak coverage

Symptom: When the radio quality of the neighboring cell

meets the handover threshold, the RSRP of the serving cell

and the neighboring cell is weak.

Solutions: Adjust the power ratio; adjust the antenna tile

angle; add eNodeBs or carriers.

Tracing results of the eNodeB

show that, after sending a

handover command, the

eNodeB does not receive a

handover completion message.

or the eNodeB does not receive

the measurement report from

the UE.

Tracing results of the UE

show that, after receiving

the handover command

and sending a handover

completion message, the

UE initiates RRC

connection

reestablishment, or the UE

does not receive a

handover command.

Page 83

Weak coverage is another major cause leading to handover failures on the live E-

UTRAN. Currently, most E-URTANs are under construction and the coverage is weak.


Handover Fault Location and

Troubleshooting: Interference

Handover failures caused by interference

Symptom: When the RSRP is satisfactory, the throughput is not as good as expected and problems

such as handover failures and service drops occur. For details about how to observe uplink and

downlink interference, see LTE RF Channel Test and Check Manual.

Solution: Clear interference sources. For details, see LTE RF Channel Test and Check Manual.

The Received Signal Strength

Indicator (RSSI) of the RB traced

on the network side is obviously

larger than those of others RBs.

The subband channel quality

indicator (CQI) reported by the

UE is obviously smaller than

those of other subbands.

Page 84


Content







Page 85


Routine Troubleshooting Procedure for

Handover Problems

Check a handover

problem by following the

procedure in the figure

on the right and then

follow the checklist to

check items. Submit

related deliverables

including required

information and data to

Huawei headquarters for

further analysis if

necessary.


Routing Troubleshooting Operation

Checklist

2. Sum up KPIs of a cell to obtain the number of handover attempts, number of handover

executions, and number of successful handovers of each type on the entire work.

3. Calculate the intra-eNodeB handover success rate, inter-eNodeB handover success rate, X2

handover success rate, and S1 handover success rate.

(Inter-eNodeB handover success rate = X2 handover success rate + S1 handover success rate)

4. Check whether the handover success rate of each type meets the KPI standard (generally, 98.5%

for type-A sites; can be defined as required). If a handover success rate does not meet the

standard, analyze problems of this type of handover (intra-eNodeB handover, X2 handover, or S1

handover). Output: KPI report, top failed-handover types

1. View network KPIs. Formula:

Handover success rate = Number of successful outgoing handovers/Number of outgoing handover

executions

1. Use the M2000 or PRS to export handover-related KPIs, including:



Checklist 2 Check top cells.

Verify top cells based on KPIs and inter-specific-cell handover information.

Traffic measurement result of inter-specific-cell handover

1. Sort the number f handover failures (Number of outgoing handover executions – Number of successful outgoing

handovers) in a descending order.

2. Select top 5 cells with the handover success rate smaller than the average value.

3 If the field personnel report top n cells of handover failures, include the cells in the top cell list.

Output: Top cell list

3. Check the equipment status.

1. Verify that handover-related cells are in the activated state.

2 Query eNodeB and cell alarms to check whether abnormal alarms are cleared (for example, X2 link disconnection

alarm and RRU alarm).

3 Check whether the test UE can work properly and supports inter-frequency or inter-RAT reselection and handovers.

For details, see P16

Querying the UE Capability.



Checklist

4. Check eNodeB data configurations.

1. Check the mapping between versions.

2. Check whether the handover switch is turned on.

3. Confirm the neighboring cell configuration and parameter configuration (neighboring relationship, X2

interface configuration, and transport configuration).

4. Configure the handover threshold and time-to-trigger settings (for details, see P9 and P10).

Note: For details about MML commands, see eRAN3.0 Handover-related MML Command Configuration Guide.

2. Perform cell tracing and standard interface tracing.

1. Perform S1 tracing, Uu tracing, and X2 tracing on the M2000 or Web LMT.

2. Use the test UE to perform drive tests and capture corresponding logs on the Probe.

3. Stop drive tests after sufficient logs are captured and then save corresponding logs.

Deliverables: eNodeB standard interface tracing results; drive test results of the Probe

6. Determine a fault.

1. Follow the standard handover procedure of the right type to locate the faulty point causing the handover

failure.

2. If a fault occurs on the Uu interface, it is a radio-interface fault.

3. If a fault occurs on the S1 or X2 interface, it is a non-radio-interface fault.



Checklist 7. Check a radio-interface fault or non-radio-interface fault.

1. In case of a radio-interface fault, see section 4 of Handover Fault Location and Troubleshooting to

check it based on different symptoms.

2. In case of a non-radio-interface fault (X2 interface fault), collect and send BRD logs to the R&D

personnel at Huawei headquarters for further analysis.

3. In case of a non-radio-interface fault (S1 interface fault), analyze the problem with the EPC

personnel, and then collect and send BRD logs to the R&D personnel at Huawei headquarters for

further analysis.

8. Confirm problem-closing operations.

1. For a radio-interface fault, close the problem by referring to Handover Fault Location and

Troubleshooting.

2. For a fault which cannot be located, collect and send deliverables required in the Handover Fault

Deliverables to Huawei headquarters.

Deliverables: Handover Fault Deliverables

9. Implement problem-closing operations.

1. If parameters need to be modified, back up current configurations.

2. Wait for idle hours on the live network.

3. Implement problem-closing operations.

Deliverables: Configuration file that is backed up and operation record



Checklist

10. Confirm the troubleshooting effect.

1. By using repeated drive tests, check whether handover counters in related

areas are optimized.

2. Trace KPI changes of one week and check whether related counters meet

standards or faults no longer occur. Simultaneously, no other feature fault occurs.

3. If a fault persists, relocate the problem by following operations at the fault

location stage or submit required deliverables to the R&D personnel for further

analysis.

Deliverables: KPI data

11. Make conclusion reports and provide cases.

1. Organize related materials and, if the operator attends the troubleshooting

procedure, provide clarification materials.

2. Conclude the troubleshooting procedure and provide related cases.

Deliverables: Clarification material (optional), cases


Handover Fault Deliverables

The field personnel provide required deliverables when reporting

handover faults to Huawei headquarters.

Description of handover faults, including fault information in the field, for example, whether

upgrades are performed, whether network configurations are modified, whether telephone

numbers are released by the operator, or whether a special test is performed.

Network configuration, including network scale, number of eNodeBs, inter-eNodeB

distance, site height, eNodeB distribution map, and frequency configuration

Network parameter configurations, configuration files of eNodeBs related to handover

faults

BRD logs of eNodeBs, including standard interface logs, CHRs, KPIs, and traffic

measurement results of inter-specific-cell handovers

Drive test data recorded by using the Probe if Huawei UEs are used, single-UE tracing

data on the network side (The initial single-UE tracing only records standard interface

signaling; other data is recorded based on requirements from Huawei headquarters.)


X2 by OSS Drive Test ResultCikarang Area


Drive Test ResultDL RSRP UL TX Power


L3 Message Analysis

6.013

second

Before TX2RelocOverall expires, UE send “RRC_Connection_Reestablishment_Request” to Target eNB.

So if “RRC_Connection_Reestablishment” Success it will count as HO Success and HO Too Late counter

counted.

If “RRC_Connection_Reestablishment” Fail during TX2Relocoverall expires it will count as HO Fail


Intra Freq LTE HO Duration by X2

Intra Freq HOSR : 100%

Faster than via S1


Intra Freq LTE HO Duration by S1

Intra Freq HOSR : 100%

Slower than via X2


RRC Connection Reconfiguration Complete

X2

S1

From Drive Test Result, during HO by X2 UE

always receive “RRC Connection

Reconfiguration” message from Source eNB.

And also UE always send “RRC Connection

Reconfiguration Complete” to Target eNB.


OSS KPI Analysis

Handover by X2 is 100% success, after Target eNB didn’t receive any

“RRC_Connection_Reconfiguration_Complete” there TX2RelocOverall Timer counting until

10.7 seconds. If less than 10.7s, target eNB receive “RRC_Reestablishment_Request”

from UE. It will count as Too Late HO and declare as Handover Success by X2. If not

Receive both message it will count as “HOExeX2FailIntra”


OSS KPI Analysis

Too Late HO Increase


Normal Condition X2 Handover


HO Too Late X2


HO Too Early X2


X2 HO with one S-GW

This procedure is used to hand over a UE from the source

eNodeB to the target eNodeB through the X2 interface when the

MME and S-GW are not changed. It is assumed that the IP

connections exist between the source eNodeB and the S-GW

and between the target eNodeB and the S-GW.

1.The target eNodeB sends the Path Switch Request message

to the MME, indicating that the UE is already handed over. The

message contains the TAI + ECGI of the target cell and the list

of rejected EPS bearers. The MME determines whether the S-

GW can continue serving the UE.

2.The MME sends the User Plane Update Request message to

the S-GW to update the downlink eNodeB addresses and TEIDs

of the S1-U.

3.The S-GW sends the Update Bearer Request message to

related P-GW. The P-GW updates the bearer context and

returns the Update Bearer Response message to the S-GW.

4.After obtaining the eNodeB address and TEIDs of the target

eNodeB, the S-GW sends the downlink data packet to the target

eNodeB. The S-GW sends the User Plane Update Response

message to the MME.

5.To ensure the reordering function in the target eNodeB, the S-

GW sends one or more "end marker" packets to the old eNodeB

over the old path immediately after the path switching.

6.The MME sends the Path Switch Request Ack message to the

target eNodeB to confirm that the Path Switch Request message

is received. The MME provides the handover restriction list to

the eNodeB.

7.The target eNodeB sends the Release Resource message to

inform the source eNodeB of the success of the handover and to

trigger the release of resources.

The UE initiates the TAU procedure when the conditions are

met.


X2-based handover execution timeout procedure

As shown at point B in Beside Picture, the

eNodeB increases this base counter by one

each time the source eNodeB sends an

RRC Connection Reconfiguration message

to the UE after the X2-based handover

preparation is complete and a timeout

occurs when the source eNodeB waits for

the UE CONTEXT RELEASE message with

the release cause of handover success from

the target eNodeB (a neighbor relationship

has been established between the source

cell and target cell).


Timer Definition on Handover X2

TX2 Reloc Overall:

T304+T311+T301

TX2 Reloc Overall:

500ms+10000ms+200m

s

TX2 Reloc Overall:

10.7ms

Related with X2


Pegging Information

After activating X2 by OSS


UE Simulation

Based on planning tools simulation, UE max transmit around

+800 meters. In HO by X2 scenario, Target enodeB will

difficult to receive “RRC Connection Reconfiguration

Complete from UE if the distance more than 800 meters.


Cikarang Cluster site to site distance

There are some sites which has site to site distance more than 2 Km.

If the site to site distance bigger than 1.6 km it will create HO Problem.


Animation of Handover by X2

Source eNB Target eNB

X2 Link

Handover Request




X2 Link

Handover Response




X2 Link




X2 Link

RLF

RLF Happen, due to Target

eNB didn’t receive the

respond. From DT Log, UE

already using full TX Power

until more than 20 dBm but

still can’t reach the Target

eNB due to improper UL Link

Budget




X2 Link

Before TX2RelocOverall

expires, UE Send New Message

to Target eNB. If the timer

expires, it count as

HOExeX2FailIntra




X2 Link




X2 Link

UE Context Release

HO TOO

LATE


Summary

1.X2 Setup & delete SON by OSS create just one way relation, not two

way relation.

2.X2 Setup & delete SON by OSS not create the X2 relation properly (not

based on link budget).

3.If there is NCL/NRT with far distance for example until more than 5 km,

X2 by OSS will created the Link. It will causing Handover Failure due to

Tx Power UE can’t reach Target Cell in first message

(RRC_Connection_Reconfiguration_Complete). If UE Send (RRC_

Connection_Reestablishment_Request) if HO Success it will count as

HO Too Late. If still Can’t reach Target eNB in second Message, it will

count as HO Fail.

4.Based on Drive test, HO over X2 give less delay better than HO over

S1.

5.UE UL Transmit Power is one limitation that causing RLF during HO by

X2.


Service Drop


Contents

Formulas of Service-Drop-Related Counters

Common Symptoms of Service Drops

Causes of Service Drops and Data Handling

Checklist and Deliverables for Service Drops

Service Drop Cases

Page 119


Formulas of Service-Drop-Related Counters on the UE Side (1/2)

On the UE side

- Call Drop Rate = eRAB AbnormRel/ eRAB Setup Success *100%

- eRAB AbnormRel: indicates the number of abnormal E-RAB releases.

- eRAB Setup Success: indicates the number of successful E-RAB setups.

Definition Stated in Huawei Genex PA

- 1. The UE receives the RRC Connection Reconfiguration message in a scenario where no Non-

Access Stratum (NAS) message "DEACTIVATE EPS BEARER CONTEXT REQUEST" is received,

no NAS message "DETACH REQUEST" is received from the MME, and no NAS message

"DETACH REQUEST" is sent to the network side. The RRC Connection Reconfiguration message

carries a "drb-ToReleaseList" information element (IE) and the ERABAbnormalRel counter is

incremented by 1. The number of eps-BearerIdentitys under the Releaselist is recorded. ERAB

num indicates the number of released E-RABs. The E-RAB num is subtracted by 1 for each

abnormal release. If the E-RAB number becomes 0, the UE state becomes RRC_IDLE; otherwise,

the UE state does not change.

Page 120


Formulas of Service-Drop-Related Counters on the UE Side (2/2)

- 2. The UE receives the RRC connection release message in a scenario where no NAS message

"DEACTIVATE EPS BEARER CONTEXT REQUEST" is received, no NAS message "DETACH REQUEST" is

received from the MME, and no NAS message "DETACH REQUEST" is sent to the network side. In this case,

an abnormal release is counted into the ERABAbnormalRel counter if RLC transmission exists in 4s before

receiving the RRC connection release message (both uplink and downlink transmission must be considered;

the condition is met as long as data transmission is performed in either direction). Then, the UE state

becomes RRC_IDLE.

- 3. An abnormal release is counted into the ERABAbnormalRel counter if the UE is in the RRC_IDLE state

before receiving the RRC connection release message. The ERABAbnormalRel counter is incremented by 1

and the E-RAB num is incremented based on the number of releases.

- 4. An abnormal release is counted into the ERABAbnormalRel counter if the UE sends an RRC connection

request message in a scenario where no RRC Connection Reconfiguration, DEACTIVATE EPS BEARER

CONTEXT REQUEST, DETACH REQUEST, RRC State, and RRC Connection release message is received.

- 5. An Abnormal E-RAB release event is simultaneously recorded along with an RRC connection

reestablishment failure event.

Note that some sites may have the UE-initiated reestablishments counted into service drops because

different acceptance conditions are used in various sites.

Page 121


Formulas of Service-Drop-Related Counters on the Network Side

On the network side

- Call Drop Rate = L.E-RAB.AbnormRel/(L.E-RAB.NormRel + L.E-

RAB.AbnormRel)*100%

- L.E-RAB.AbnormRel: indicates the total number of abnormal E-RAB releases.

- L.E-RAB.NormRel: indicates the total number of normal E-RAB releases.

Page 122


Abnormal Release Counter on the Network Side

As shown by point A in figure 1, when the eNodeB sends an E-RAB Release Indication and the

cause value is not Normal Release, User Inactivity, cs-fallback-triggered, and inter-RAT redirection,

the L.E-RAB.AbnormRel counter is incremented by 1. If the E-RAB Release Indication requires the

release of multiple E-RABs, related counters are incremented based on the number of releases.

As shown by point A in figure 2, after the eNodeB sends a UE Context Release Request to the MME,

all E-RABs of the UE are released. If the release cause value is not Normal Release, User Inactivity,

cs fallback triggered, and Inter-RAT redirection, related counters are incremented.

Page 123

Note: In the E-RAB release procedure, one or multiple E-RABs are released. At least one default bearer

remains after the E-RAB release procedure is complete.

In the UE Context Release procedure, all E-RABs of the UE are released. No bearer, even no default

bearer, remains after the UE Context Release procedure is complete.


Counters Indicating Causes of Abnormal Releases on the Network Side (1/2)

By abnormal-release cause, the counters can be classified into five types:

- L.E-RAB.AbnormRel.Radio: number of abnormal E-RAB releases caused by radio-layer

problems

- L.E-RAB.AbnormRel.TNL: number of abnormal E-RAB releases caused by transport-layer

problems

- L.E-RAB.AbnormRel.Cong: number of abnormal E-RAB releases caused by network

congestion

- L.E-RAB.AbnormRel.HOFailure: number of abnormal E-RAB releases caused by handover

failures

- L.E-RAB.AbnormRel.MME: number of abnormal E-RAB releases caused by EPC problems

Abnormal E-RAB releases caused by EPC problems

- As shown by points A in figures 1 and 2 on the right, the MME initiates an E-RAB or UE

context release procedure. If the cause value of the E-RAB Release Command or the UE

Context Release Command message received by the eNodeB from the MME is not Normal

Release, Detach, User Inactivity, cs fallback triggered, or inter-RAT redirection, the cause is

counted into the L.E-RAB.AbnormRel.MME counter.

Note: The L.E-RAB.AbnormRel.MME counter is not included in the L.E-RAB.AbnormRel

counter, that is, abnormal E-RAB releases caused by EPC problems are not recorded as

service drops from eRAN2.1 V100R003C00SPC400.

Page 124


Counters Indicating Causes of Abnormal Releases on the Network Side (2/2)

Abnormal E-RAB releases caused by non-EPC problems

- As shown by point A in figure 3, when the eNodeB sends an E-RAB Release Indication to the

MME, carrying a cause value being radio error, the L.E-RAB.AbnormRel.Radio counter is

incremented; if the cause value indicates a transport-layer problem, the L.E-

RAB.AbnormRel.TNL counter is incremented; if the cause value indicates congestion, the L.E-

RAB.AbnormRel.Cong counter is incremented. If the E-RAB Release Indication requires the

release of multiple E-RABs, related counters are incremented based on the number of releases

of corresponding causes.

- As shown by point A in figure 4, after the eNodeB sends a UE Context Release Request to the

MME, all E-RABs of the UE are released. If the cause value indicates a radio error, the L.E-

RAB.AbnormRel.Radio counter is incremented; if the cause value indicates a transport-layer

problem, the L.E-RAB.AbnormRel.TNL counter is incremented; if the cause value indicates

congestion, the L.E-RAB.AbnormRel.Cong counter is incremented and records abnormal

releases caused by preemption and resource congestion; If the cause value indicates a

handover failure, the L.E-RAB.AbnormRel.HOFailure counter is incremented. Related counters

are incremented based on the number of releases of corresponding causes. Releases are not

counted again when the MME responds with a UE Context Release Command message.

Page 125


Contents

Definition of Service-Drop-Related Counters




Service Drop Cases

Page 126


Symptoms of Service Drops Observed in Drive Tests

In a drive test, use the Probe, Huawei test UEs or Huawei data card (if a commercial

UE is used, install the corresponding UE signaling tracing software), and traffic

monitoring software installed in the drive test PC to observe the following information.

The traffic volume suddenly drops to zero.

The UE receives system messages in a non-handover or reestablishment scenario.

The traffic

volume

drops to

zero.

The UE receives system

messages.

Page 127


Symptoms of Service Drops Observed in the Traffic Measurement Data

Service drops are monitored by means of traffic measurement on commercial networks. The service

drop rate and number of service drops are observed for determining a fault. The traffic

measurement result exported from the M2000 displays the following information.

Entire-network service drop rate, number of service drops, number of successful connection

establishments

Service drop rate, number of service drops, and service drop time of top cells

The entire-network

service drop rate is

high.

Top cells

contribute a

lot to

service

drops.

Service drop

occurrence period

of top cells

Page 128


Contents





Service Drop Cases

Page 129


Procedure of Analyzing Service Drops

Step 1: Identify the range of service drops. Analyze the traffic measurement data or CHR data to confirm

the range where service drops occur, that is, to check whether it is a top-cell or top-eNodeB problem,

entire-network problem, a comprehensive problem, or a top-UE-type or top-UE problem.

Note 1: The method of analyzing service drops varies between different scenarios.

If the service drop rate deteriorates after the upgrade, compare the difference of the service drop rate before

and after the upgrade and analyze the overall range where the deterioration occurs.

In an existing site to be optimized (counters related to the service drop rate do not meet requirements or need

to be improved), only analyze the range with a high service drop rate, not requiring comparison of the difference of the

service drop rate before and after the upgrade

Step 2: Break down causes of service drops. Use various data sources to identify major causes of

service drops.

Step 3: Perform routine troubleshooting operations for service drops. Follow the routine troubleshooting

operation checklist to locate root causes and determine rectification measures to solve this problem.

Note that the routine troubleshooting operations for service drops are described in details in the next section.

Step 4: Perform rectification measures. Perform rectification measures to solve the problem and

evaluate the effect. If the rectification target is not met, repeat the preceding steps for further analysis.

Page 130


Determining the Range of Service Drops: Top Cell Selection Principle

Top cells are selected according to different principles in different scenarios.

Scenario 1: The service drop rate deteriorates. The service drop rate deteriorates in scenarios, for

example, after the upgrade or where the rate suddenly deteriorates due to unknown reason.

TOP cell selection principle: Calculate the service drop rate and difference in the number of abnormal E-

RAB releases before and after the specified time (by subtracting the value before deterioration from that

after deterioration). Sort deviation values of the service drop rate and number of abnormal E-RAB

releases in a descending order to determine top cells with service drop rate deterioration and top cells

with abnormal E-RAB releases.

Scenario 2: Existing sites are to be optimized. Counters related to the service drop rate do not meet

requirements or need to be improved to reach target values.

TOP cell selection principle: Sort the service drop rate and number of abnormal E-RAB releases in a

descending order to determine top cells with service drop rate deterioration and top cells with abnormal

E-RAB releases.

Page 131


Determining the Range of Service Drops: Criteria

Top-cell problem: If 20% of top cells with service drop rate deterioration and 20% of top cells with

abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are

significantly improved to reach original values or target values, service drops are caused by top-cell

problems.

Entire-network problem: If 20% of top cells with service drop rate deterioration and 20% of top cells with

abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are not

improved, service drops are caused by entire-network problems.

Comprehensive problem: If 20% of top cells with service drop rate deterioration and 20% of top cells

with abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are

improved to a certain extent but are not as good as original values (still cannot meet target values),

service drops are caused by comprehensive (top-cell + entire-network) problems.

Top-UE problem: If 20% of top cells with abnormal E-RAB releases are subtracted and the entire-

network service-drop-rate counters are significantly improved to reach original values or target values,

service drops are caused by top-UE problems.

Note:

Currently, the UE type cannot be obtained from the CHR. Query complaints to check whether this type of problem occurs and then analyze symptoms

to check whether known problems occur on related terminals.

The eNodeB cannot obtain international mobile subscriber identifiers (IMSIs) of top UEs due to security restrictions and needs to use temporary

mobile subscriber identifiers (TMSIs) to determine top UEs.

Page 132


Classification of Service Drop Causes: Obtaining Data Sources

If the service drop range is determined, use various data sources to locate

causes of the service drop. Data sources include:

Traffic measurement data

Export the traffic measurement data file from the M2000 or PRS. For details, see

section 2.3.3 in eRAN2.1 Service Drop Troubleshooting and Optimization Guide.

Signaling tracing result on the network side

Perform signaling tracing on the M2000 to obtain the the signaling tracing result.

For details, see section 2.2.2 in eRAN2.1 Service Drop Troubleshooting and

Optimization Guide.

Drive test data

Perform drive tests to obtain related data. For details, see section 2.1.3 in

eRAN2.1 Service Drop Troubleshooting and Optimization Guide.

Page 133


Classification of Service Drop Causes: Tools

Available tools, tool function, and too-obtaining approach

Tool Name Function Approach

TraceViewerUsed to replay signaling messages traced

on LMT.

This tool is released along with the version and is

contained in the OfflineTool package.

PROBE

Used to trace information of Huawei UEs,

including signaling information,

scheduling information, and signal quality.

http://support.huawei.com/support/pages/editionctrl/catal

og/ShowVersionDetail.do?actionFlag=clickNode&node=0

00001099409&colID=ROOTENWEB|CO0000000174

ASSISTANT

Used to measure and analyze information

of Huawei UEs, including signaling

information, scheduling information, and

signal quality.




NIC Used to collect data in batches.




PRS Used to resolve traffic measurement data

of the eNodeB.



00001430110&colID=ROOTWEB|CO0000000065

OMstar

Used to resolve and analyze original

traffic measurement data and CHR data,

and compare parameters.




Page 134


Probe interface

Signaling tracing

interface on the

M2000

Classification of Service Drop Causes: Tracing Tool

Interface

Page 135


Probe used to trace and

analyze the data of

Huawei UEs

TrafficReview used to analyze the

eNodeB tracing data.

Classification of Service Drop Causes: Analysis

Tool Interface

Page 136


RRC RECONFIGURATION

Use the message query software to display the

details.

If the measConfig IE

exists, that is a

measurement

configuration message.

If the cqi-ReportConfig IE exists,

that is a Channel Quality Indicator

(CQI) reconfiguration message.

If the targetPhysCellId IE

exists, the

RRCConnectionReconfigu

ration message is a

handover command.

Classification of Service Drop Causes: Identifying

Reconfiguration Messages

Page 137


Trend Analysis

- Obtain the entire-network service drop rate of at least one to two

weeks. If an upgrade is performed, collect and analyze the

service drop rate of two weeks before the upgrade and that of

one week after the upgrade, as shown in the figure on the right.

Cause Analysis

- Analyze traffic measurement counters to check whether the E-

RAB release is caused by a radio fault or a cell resource

problem, as shown in the figure on the bottom left.

Top cell analysis

- Analyze traffic measurement data to determine top cells and top

periods of RRC connection or E-RAB establishment failures, as

shown in the figure on the bottom right.

Page 138

Classifying Service Drop Causes Based on Traffic Measurement Data


Analyzing Service Drop Causes by Using Signaling Tracing

Signaling tracing can be used to locate in which procedure a service drop occurs and

is specially effective in location of drive test problems and repeatable problems.

However, signaling tracing can only be performed before a problem occurs and

requires manual analysis. Therefore, signaling tracing cannot apply to unrepeatable

problems or small-probability problems.

- Standard interface tracing (major): After top cells and top periods are determined by using traffic

measurement, perform standard interface tracing for the corresponding cells and periods to

check which step triggers the service drop.

- Single-UE entire-network tracing (minor): Obtain the IMSI of a top UE from the EPC based on

the known TMSI, and then perform entire-networking tracing on the UE. This method is specially

effective for subsequent VIP maintenance. For details about the operation method, see chapter

6 in LTE OM Tracing and Data Collection Guide.doc.

Page 139


Analyzing Service Drop Causes by Using Drive Test Data

Compared with the eNodeB signaling tracing, the advantage of the drive test is to

obtain not only signaling messages but also the uplink signal strength, uplink transmit

power, bit error rate, and scheduling information (the information depends on the

drive test software and UE); the disadvantage of the drive test is that, only Uu tracing

(RRC and NAS message) results are available and need to be analyzed along with

the eNodeB signaling tracing results.

- Differentiating an uplink problem from a downlink problem

- The drive test software can be used to determine whether the UE does not receive a message from

the eNodeB or the eNodeB does not receive the response from the UE. the downlink RSRP and

SINR can be observed to check the quality of the downlink channel. The uplink transmit power can

be observed to check whether signal demodulation on the uplink is restricted.

- Isolating UE faults from non-UE faults

- Logs are analyzed to determine whether received signaling messages are properly processed or

the UE encounters faults such as suddenly stopped data transmissions.

Page 140


Contents





Service Drop Cases

Page 141


Entire-Network Service Drop: Routine Operation Checklist

Page 142

Note: For details about routine troubleshooting operations for a comprehensive (entire-network + top-cell) problem, see the checklists of the top-

cell problem and the entire-network problem.

Routine Operation Analysis Operation deliverables Solution Operation

Preliminary analysis on traffic measurement data related to service drops

1. Quickly analyze the traffic measurement data and export the range and causes of service drops. 2. Analyze the service drop rate trend to identify the turning point.

1. Distribution of service drop causes and top causes; 2. Operations performed at the turning point of the service drop rate

1. Perform corresponding optimization operations based on top service drop causes. 2. Provide operations performed at the turning point of the service drop rate and evaluate the impact of each operation on the service drop rate.

Version check 1. Check whether the eNodeB is upgraded or has patches installed patches. 2. Check whether the EPC is upgraded or has patches installed patches.

Version No. before and after the upgrade Provides modifications before and after the upgrade possibly affecting the service drop rate by referring to the release notes.

Equipment and transport alarms

Check alarms on the entire network. List critical and major alarms. Analyze the impact of alarms on the service drop rate and check whether the service drop rate is recovered after alarms are cleared.

Data configuration check

1. Check parameter settings on the entire network. 2. Check modified parameters on the EPC.

1. Parameter differences before and after the upgrade.

2. Parameter differences in comparison with the baseline parameters of the new version.

3. Objective and impact of parameter modification on the EPC.

1. Check whether parameter modification affects the service drop rate. 2. Revert parameters and check whether the service drop rate is recovered.

Operation record check

Check whether a great amount of operation records exist on the entire network and whether neighboring cells and PCIs are replanned.

Records of operations on the entire network Analyze the impact of operations on the service drop rate and check whether the operations can be reverted.

Neighboring relationship check

Check whether neighboring cells are missing. Deployment of a great number of eNodeBs between existing eNodeBs in a scattered manner may make the neighboring relationships of many adjacent sites become improper.

Information of missing neighboring cells Add missing neighboring cells and check whether the service drop rate is recovered.

Major events check Check whether large-scale telephone number release is implemented or other important activities such as ceremonies, holidays, and sport events are held.

1. Verify the UE type involved in the telephone number release, number release amount, and subscription policies.

2. Confirm the range and period of time of important activities.

Confirm the relationship between the important event and the deterioration of service drop rate.


Top-Cell Service Drop: Routine Troubleshooting Operation Checklist

Page 143

Routine Operation Analysis Operation deliverables Solution Operation

Preliminary analysis on the traffic measurement data related to top-eNodeBservice drops

1. Quickly analyze the traffic measurement data and export the range and causes of service drops. 2. Analyze the service drop rate trend to identify the turning point.

1. Distribution of service drop causes and top causes; 2. Operations performed at the turning point of the service drop rate

1. Perform corresponding optimization operations based on top service drop causes. 2. Provide operations performed at the turning point of the service drop rate and evaluate the impact of each operation on the service drop rate.

Top-eNodeB version check

Check whether the eNodeB is upgraded or has patches installed patches.

Version No. before and after the upgrade Provides modifications before and after the upgrade possibly affecting the service drop rate by referring to the release notes.

Equipment and transport alarms of top eNodeBs

1. Check alarms of top eNodeBs. List critical and major alarms. Analyze the impact of alarms on the service drop rate and check whether the service drop rate is recovered after alarms are cleared.

Top-eNodeB parameter settings check

Check parameter settings of top eNodeBs. 1. Parameter differences before and after the upgrade; 2. Parameter differences in comparison with the baseline parameters of the new version.

1. Check whether parameter modification affects the service drop rate. 2. Revert parameters and check whether the service drop rate is recovered.

Top-eNodeB operation record check

Check whether a great amount of operation records exist on the entire network and whether neighboring cells and PCIs are replanned.

Records of operations on the entire network

Analyze the impact of operations on the service drop rate and check whether the operations can be reverted.

Top-eNodeB neighboring relationship check

Check whether neighboring cells are missing. Deployment of a great number of eNodeBsbetween existing eNodeBs in a scattered manner may make the neighboring relationships of many adjacent sites become improper.

Information of missing neighboring cells Add missing neighboring cells and check whether the service drop rate is recovered.

Top-cell coverage check

Analyze the MCS and CQI information in the traffic measurement data, CHR data, and drive test data to check whether top cells encounters cross coverage or weak coverage.

Top-cell coverage evaluation report 1. If weak coverage exists, adjust the coverage by means of network optimization.

Top-cell interference check

Analyze the real-time tracing data to check whether top cells encounter intermodulationinterference and external interference.

1. Top-cell interference evaluation report 1. If interference exists, solve the problem by referring to the interference check manual.

Major events check Check whether large-scale telephone number release is implemented or other important activities such as ceremonies, holidays, and sport events are held.

1. Verify the UE type involved in the telephone number release, number release amount, and subscription policies. 2. Confirm the range and period of time of important activities.

Confirm the relationship between the important event and the deterioration of service drop rate.


Fault Location: Radio Problems

Symptom:

- According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-

RAB.AbnormRel.Radio counter, the service drop is caused by the radio interface problem on the wireless network side.

Possible causes

- A service drop with the cause value being radio is caused by the reason that RLC retransmissions reach the maximum timer,

out-of-synchronization occurs, or signaling message exchange fails due to weak coverage, uplink interference, or UE faults.

For details about interference elimination, see LTE RF Channel Test and Check Manual.

Handling procedure

- Analyze the CHR data to check whether top UEs exist.

- Analyze the CHR data to verify inner causes of abnormal releases.

- If a service drop is caused on a failure in exchange of non-procedure messages, view the L2 DRB scheduling data to check whether

weak coverage or interference occurs.

- If a procedure message exchange fails, observe the last ten message to locate the faulty point and determine whether the UE does not

receive the message from the eNodeB or receives but not processes the message, or the eNodeB does not receive the response from

the UE.

- Inner release cause values in the CHR are: UEM_UECNT_REL_UE_RLC_UNRESTORE_IND,

UEM_UECNT_REL_UE_RESYNC_TIMEROUT_REL_CAUSE, UEM_UECNT_REL_UE_RESYNC_DATA_IND_REL_CAUSE,

UEM_UECNT_REL_UE_RLF_RECOVER_FAIL_REL_CAUSE, and UEM_UECNT_REL_RRC_REEST_SRB1_FAIL

UEM_UECNT_REL_RB_RECFG_FAIL_RRC_CONN_RECFG_CMP_FAIL.

Page 144


Fault Location: Handover Failures

Symptom:

- According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases

are counted into the L.E-RAB.AbnormRel.HOFailure counter, service drops are caused by handover

failures.

Possible causes

- A service drop with the cause value being handover failure is caused by an abnormal release due to a

failure in handover out of the serving cell.

Handling procedure

- Use inter-specific-cell outgoing handover counters to determine the target cell with the largest service

drop rate.

- Analyze the CHRs of the serving cell and the target cell to check whether the UE fails to receive the

handover command or the UE fails to random access the target cell. The corresponding inner release

cause values in the CHR are UEM_UECNT_REL_HO_OUT_X2_REL_BACK_FAIL and

UEM_UECNT_REL_HO_OUT_S1_REL_BACK_FAIL.

- Optimize the handover relationship including handover parameters and neighboring relationship and

then check whether related counters are recovered.

Page 145


Fault Location: Transport Problems

Symptom:

- According to the definition of the traffic measurement counter on the eNodeB, if

abnormal releases are counted into the L.E-RAB.AbnormRel.TNL counter, service drops

are caused by transport-layer problems.

Possible causes

- A service drop with the cause value being TNL is caused by a transport fault between

the eNodeB and the MME, for example, intermittently disrupted S1 link.

Handling procedure

- Query alarms to check whether there are transport-related alarms, clear the alarms if

any, and then check whether related counters are recovered.

- Check whether the eNodeB encounters transport-related alarms on the M2000.

- Clear alarms by referring to the alarm help.

- If alarms are cleared and the L.E-RAB.AbnormRel.TNL counter still has a large value,

collect and send the following information to the next fault location station.

Page 146


Fault Location: Congestion Problems

Symptom

- According to the definition of the traffic measurement counter on the eNodeB, if abnormal

releases are counted into the L.E-RAB.AbnormRel.Cong counter, service drops are caused by

congestion problems.

Possible Causes

- A service drop with the cause value being congestion is caused by congestion of radio resources

on the eNodeB, for example, the maxim number of users reaches.

Handling Procedure

- If a top cell encounters service drops caused by long-term congestion, enable the load balancing

or interoperation function to reduce the load of the serving cell for a short-term solution. For a

long-term solution, expand the capacity. After solving the problem, check whether related

counters are recovered.

- Turn on the MLB algorithm switch and check whether the situation is improved.

Page 147


Fault Location: MME Problems

Symptom

- According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are

counted into the L.E-RAB.AbnormRel.MME counter, the service drop is caused by an abnormal

release initiated by the EPC. This type of abnormal releases is not counted into the L.E-

RAB.AbnormRel counter.

Possible Causes

- A service drop with the cause value being MME is caused by an abnormal release initiated by the EPC.

Handling Procedure

- This type of service drops is caused by non-eNodeB problems and needs to be located by using EPC-related

information.

- Inner release cause values in the CHR: UEM_UECNT_REL_MME_CMD. The service drop is caused by the

release initiated by the EPC. Work with the EPC technical support personnel to solve this problem.

- Obtain the S1 tracing result of top cells and analyze the distribution of various causes of abnormal releases

initiated by the EPC.

- Send measurement results and related signaling procedures to the EPC technical support personnel for further

analysis.

Page 148


Deliverables for Service Drops

• Check result based on the routine troubleshooting operation checklist for

service drops

For some difficult problems, collect more logs for further location.

- BRD log (mandatory)

- Indicates logs of the LMPT and LBBP on the eNodeB to which top cells belong.

- Standard interface signaling (mandatory)

- Indicates S1, X2, and Uu interface tracing results.

- Network configuration (mandatory)

- Includes networking information, engineering parameters, and configuration files of top eNodeBs.

- TTI tracing (optional; depending on fault location requirements)

- Indicates IFTS tracing results and cell tracing results. Only information of top cells in top periods needs to be collected

because there is a great amount of data.

- Single-UE tracing (optional; depending on fault location requirements)

- Used for in-depth top-UE location and is performed on the entire network by using the IMSI that is obtained from the

EPC based on the TMSI of the top UE.

Page 149


Contents





Service Drop Cases

Page 150


Cases: Overview

Page 151

After the network in D2 of Germany is upgraded to eRAN2.1 V100R003C00SPC420, the R&D personnel analyze the service

drop rate of this site. This document uses this analysis as an example to describe the procedure of analyzing service drops

and causes of service drops.

After D2 is upgraded, some problems encountered in the old version are solved and the average service drop rate decreases

to 0.6%. Since the network is upgraded based on segments, the service drop rate experiences a slow decrease process

during the period from Dec 5th to Dec 10th. The whole network is upgraded by Dec 12th.


Case 1: Service drops are caused by the reason that

top UEs continuously fail in reestablishment.

As shown in the figure on the upper right, most abnormal releases on the

eNodeB are caused by failures in exchanging the first three signaling

messages during the reestablishment process.

As shown in the figure on the middle right, from the perspective of fault

occurrence time, most service drops occur in a continuous manner within a

period from 11:51 to 18:49 in cell 0.

As shown in the figure on the bottom right, from the perspective of TMSI

information, service drops are caused by a certain UE (TMSI C2 B0 B0 40)

and the main cause value of reestablishment is reconfiguration failure.

As shown in the figure on the bottom left, from the perspective of

reconfiguration message type, messages are not handover commands or

measurement configuration messages but may be CQI, sounding, and

transmission mode (TM) reconfiguration messages. In addition, the UE does

not respond to the RRC CONN REESTAB message and therefore the

eNodeB releases E-RABs 5s later.

Page 152


Case 2: Top UEs encounters continuous faults.

- The CHR of the eNodeB shows that most abnormal releases are caused by the

reason that RLC retransmissions reach the maximum number of times, that is, DRB

retransmissions reach the maximum number of times (8 retransmissions).

- From the perspective of fault occurrence time, most service drops occur in a

continuous manner within a period from 10:51 to 13:49 in cell 2.

- From the perspective of TMSI information, service drops are caused by a certain

UE (TMSI C2 7F 20 56).

- The last 16 64-ms messages of DRB scheduling information show the similar

problem, that is, a fault (similar to suddenly stopped data transmission) occurs soon

after access. The release occurs within tens of seconds to two minutes after access

and is not possibly caused in a test using commands. In addition, the access type is

MO-DATA. This type of releases occurs in actual service performance process.

Page 153


Case 3: The uplink link quality is poor

The figure on the right shows that, from the last four

512-ms messages of DRB scheduling information to

the last 16 64-ms messages of DRB scheduling

information, the uplink RSRP and SINR are poor. The

uplink RSRP reaches –135 dBm or below. The

sounding SINR and demodulation reference signal

(DMRS) SINR are –3 dB or less. The service drop is

possibly caused by uplink weak coverage.

Page 154

• The figure on the left shows that, from the

last four 512-ms messages to the last 16 64-

ms messages, the uplink RSRP is around –

130 dBm. The sounding SINR and DMRS

SINR are –3 dB or less. The service drop is

possibly caused by small uplink interference

in a weak-coverage area.


Case 4: Reconfiguration of the target cell fails.

Release cause (Unspecified displayed in the S1 tracing result)

- TGT_ENB_RB_RECFG_FAIL indicates an abnormal release caused by an RB reconfiguration failure on the target

eNodeB during the handover process.

- After the UE successfully hands over to the target cell, the target eNodeB sends a PATH SWITCH REQ ACK message

to the MME and immediately sends a UE CONTEXT REL REQ message about 100 ms later, carrying the S1-AP

cause value of unspecified. The figure on the left displays the last ten messages.

Problem analysis

- During the handover process, the MME sends a PATH_SWITCH_ACK message carrying the downlink AMBR value

inconsistent with that carries in the S1 or X2 handover request. This is a defect of the RR module. The upper-layer RR

control module sends an AMBR update message to the lower-layer RB module. The RB module determines not to

send a Uu reconfiguration message to the UE and then responds with a null value to the upper-layer RR control

module. In this case, the upper-layer RR control module handles with this response as a fault and then releases the

UE. This problem is included in eRAN2.1 V100R003C00SPC430.

Page 155


Case 6: A service drop is caused by the inter-RAT redirection.

Release cause (Inter-RAT redirection displayed in the

S1 tracing result)

- IRHO_REIDRECTION_TRIGER indicates a release

caused by inter-RAT redirection. Releases caused by this

reason are mistakenly counted into service drops in

eRAN2.1 V100R003C00SPC400 and eRAN2.1

V100R003C00SPC401. The following figure shows related

messages.

- This problem will be solved in eRAN2.1

V100R003C00SPC420.

Page 156


Case 6: Releases are counted into the L.E-RAB.AbnormRel.TNL counter due to transport faults.

On Dec 11th of 2011, the entire-network service drop rate of 900 MHz and 2.6 GHz

deteriorate in Tele2 and Telnor, as shown in the following figure.

The field personnel has discussed this problem with the operator. It is likely that this problem

is caused by EPC faults. However, no response is received from the operator.

Page 157


Case 7: Service drops are caused by radio problems.

Release cause

- UE_RESYNC_TIMEROUT_REL_CAUSE (Radio Connection With UE Lost displayed in the S1 tracing result): indicates a L2-report release

caused by resynchronization after timeout of the resynchronization timer following the out-of-synchronization.

- UE_RLC_UNRESTORE_IND (Radio resources not available displayed in the S1 tracing result): indicates the L2-reported RLC unrestore

indication that is sent after the maximum number of RLC retransmissions reaches.

- UE_RESYNC_DATA_IND_REL_CAUSE (Unspecified displayed in the S1 tracing result): indicates a L2-reported release caused by data-

triggered resynchronization after the out-of-synchronization.

Cause analysis

- From the last four 512-ms messages of DRB scheduling information to the last 16 64-ms messages of DRB scheduling information, abnormal

releases are caused by faults similar to suddenly stopped data transmission in most cases. Possibly, the SIM card is removed or the UE is

faulty. The following figure shows information recorded in the CHR.

Page 158


Case 8: The reestablishment procedure fails.

Release cause (Radio Connection With UE Lost displayed in the S1 tracing result)

- RRC_REEST_SRB1_FAIL: indicates a release occurring at the SRB 1 restoration stage during

the RRC connection reestablishment.

- The last ten messages, as shown in the following figure, after the eNodeB sends an

RRC_CONN_REESTAB message, the eNodeB does not receive the

RRC_CONN_REESTAB_CMP message from the UE before the radio interface 5s timer

expires.

- For the perspective of L2 scheduling, the UE responds with an ACK message after receiving

the RRC_CONN_REESTAB message from the eNodeB.

- That is possibly because some UEs do not send the RRC_CONN_REESTAB_CMP message.

For example, Samsung UEs have this problem.

Page 159


End of Section

http://www.ppt-vorlagen.de/

http://www.ppt-vorlagen.de/

Technology

Module 6-Lte Troubleshooting Guideline