Case Study of WCDMA Optimization (Performance Analysis of Nastar)-20060908-A-1.0

1Performance Analysis of Nastar
*
*
*
*
*
*
Traffic statistics analysis
Major Functions of Performance Analysis Module
The above functions are often used during analysis and their specific applications are described in the typical cases that will follow. Below is a brief introduction to the operations for various functions. Please refer to the online help or relevant operation guide of Nastar for details.
*
*
*
Operation Overview of Performance Analysis Module
Performance Query provides flexible traffic measurement index query. Before querying traffic measurement indices, we must first make the analysis theme (a table composed of relevant traffic measurement indices is called an “analysis theme”), so as to query the traffic measurement indices accordingly.
There are some default analysis themes in Nastar, as shown in the figure on the right.
Operation overview of performance query
The default analysis themes are simple and inflexible. Moreover, performance problems occur to a wide scope and various features are correlated. Obviously the default analysis themes cannot satisfy the needs of routine problem monitoring and analysis. Therefore, the analyzer often needs to make the analysis theme by himself/herself.
*
*
*
Operation Overview of Performance Analysis Module
*
*
*
Operation Overview of Performance Analysis Module (Continued)
Step 1: Right click “Performance Query” and then select “New Perf Func”, as shown in the right figure.
*
*
*
Step 3: Input the name of the analysis theme in the “Name” text box and then click “Query List Setting…”, as shown in the right figure.
*
*
*
Step 5: The “Query List” box shows the selected indices. Click “OK”, as shown in the right figure.
*
*
*
Step 7: The “Performance Query” node on the navigation tree shows the analysis theme you have just made. Double click the analysis theme. The “Query” dialog box pops up, as shown in the right figure.
*
*
*
Step 9: Select the cell(s) to be analyzed on the “Query Object” tab page, as shown in the right figure. Finally, click “OK”. The program will start executing the query according to the analysis theme you have specified.
*
*
*
We may combine analysis themes to form templates so as to improve the efficiency of analysis. As these templates can be kept and transferred, repeated operations are avoided and there is great convenience in experience accumulation and sharing. Each time you make an analysis, you just need to output reports according to the templates you specify. This is the traffic statistic analysis mode of Performance Report.
Nastar has integrated some report templates based on experience, as shown in the following figure. It also provides the custom report template function, that is, you can make report templates according to your own experience. To distinguish them, here we call the integrated templates as “Self-contained report templates of Nastar” and call the custom templates as “Custom report templates of Nastar”.
Operation overview of performance report
No matter whether it is a self-contained report template or a custom report template of Nastar, the operation of outputting the report is basically the same. Below we take the operation of outputting the RNC Weekly Report as an example.
*
*
*
Step 1: Double click “RNC Weekly Report”, as shown in the right figure.
*
*
*
Nastar provides the CHR analysis function, through which the other failure causes of traffic measurement and the doubtful & exceptional traffic statistics can be further analyzed so as to obtain more detailed failure causes and learn the detailed exceptional process.
Below is a brief description of relevant operations of CHR analysis:
Operation overview of CHR analysis
*
*
*
Step 1: Double click “SPU Subscribers Log Analysis”, as shown in the right figure.
*
*
*
*
*
*
No
Yes
Yes
No
Cell analysis
Description of the performance analysis process
Network-wide KPI monitoring: Monitor the KPIs of the entire network. Exceptions indicate that the network has severe problems, for example, when a certain KPI in the weekly/daily report is displayed in red, it means that index is abnormal in the entire network.
Cell TOPN monitoring: Monitor the KPI TOPN distribution of cells at the monitoring time granularity of a matter of hours, so as to avoid the case where the performance deterioration of a certain cell in a certain time span is concealed in space and time (thus we may avoid the situation where the KPIs of the entire network are all normal but the performance of a certain cell in a certain time span has severely deteriorated). Besides, we can also monitor some non-KPI metrics, e.g. uplink RTWP, cell out of service, co-channel/inter-channel/inter-system handover preparation success rate, board CPU utilization, etc.
*
*
*
Parameter configuration analysis: Analyze if the relevant parameter configuration is abnormal or not.
CHR analysis: Further analyze the causes unknown in cell analysis, e.g. “Other” causes; or analyze if the specific procedure is abnormal or not when the specific failure cause is known through cell analysis but the correlation analysis results indicate that the cause may be wrong (possible due to statistical error or other problems).
Problem location test and signaling trace analysis: Perform the location test on the severest cell(s), trace the signaling of various interfaces and all the other data that may help problem analysis & location, and then comprehensively analyze the collected data till the problem is solved. This process is called “troubleshooting”.
Solution: Put forward the solution.
Performance Analysis Process (Continued)
Mapping from the performance analysis process to Nastar performance analysis function
Nastar flexibly supports network-wide KPI monitoring, cell TOPN monitoring, cell analysis and CHR analysis through various analysis functions. The figure below shows the correspondence:
Network-wide KPI monitoring
Performance Analysis Process (Continued)
*
*
*
Case 1 — Plenty of Non-Service RRC Requests due to Wrong Setting of CN Denial Cause Value
A network has little traffic but there are plenty of non-service RRC setup requests, as shown in the following table:
As can be seen from the above table, the number of non-service RRC setup requests is 33 times (112391/3389=33) that of service RRC setup requests. This ratio may be abnormal.
*
*
*
Cell analysis
*
*
*
RNC signaling analysis
Through further RNC LMT interface signaling trace, we find that there are indeed quite many RRC connection setup requests of registration type and most of them are repeated requests from the same IMSI, as shown in the following figure:
*
*
*
After querying the 3G HLR, we know that the IMSI 460019115045990 shown in the above figure is not a registered 3G user. Certainly the user’s registration request will always be denied and thus the registration will always fail. The problem here is why the RRC request of the same IMSI keeps appearing. According to protocol analysis, when a UE enters the 3G system and if its registration request is denied by the CN, the UE may take one of the following actions:
a) if the denial cause is #17 cause value (network failure), then the UE will start the T3211 timer (15s). After this timer expires, the UE will attempt three times (the Attempt Counter of the timer is then 4) and then enter the restricted state. If in this period the T3212 timer expires, then the above procedure will be triggered again. If the LAI changes, the UE will make the above attempt immediately in the new LAI.
b) If the denial cause is #15 cause value (no suitable cells in LA), the UE will record this LAI in the ‘Forbidden LA’ list and will not make multiple attempts as in the above case (#17 cause value). The UE will not initiate the registration procedure again till an LA not in the ‘Forbidden LA’ list appears or the UE is switched on again. After confirming with the CN side, we know that the cause value is indeed set to #17. So the root cause is found.
In general, if 2G and 3G use the same PLMN ID in the 3G coverage area and if the 2G IMSI uses a 3G UE, the above problem will immediately occur after the user is dropped from the 2G network. Therefore, plenty of non-service RRC setup requests will arise and consume plenty of power, code, transport and other resources and may cause congestion.
Problem analysis & location
*
*
*
Modify the cause value of CN denial from #17 (network failure) to #15 (no suitable cells in LA). After the modification, the ratio of non-service RRC setup requests to service RRC setup requests in the entire network becomes 3467/15912 = 4.59, which is normal (the normal ratio should be less than 10 according to the data statistics of various commercial offices), as shown in the following table:
Solution
*
*
*
Case 2 — Transport Congestion due to IUB Bandwidth Configuration Error
Both the service RRC connection setup success rate and the non-service RRC connection setup success rate are low in a network, as shown in the following table:
Note: You can get the above table via “RNC Daily Report/RNC Weekly Report” or custom report of Nastar.
Access
89.17%(10700/12000)
91.05%(118396/130038)
Cell analysis
As can be seen from the right table, the major cause of RRC connection setup failure is that plenty of RRC connection requests are denied when Cells 0, 1 and 2 are busy and the RRC connection denial rate is up to 62%. In most cases, the denial cause is due to AAL2 setup failure. According to routine experience and the reply from R&D, AAL2 setup failure is mostly caused by transport congestion. The three cells belong to the same Node B.
Note: You can get the above table via custom report or “Performance Query” of Nastar.
*
*
*
Parameter configuration analysis
As can be see from the indices in green on the right side of the above table, the uplink/downlink RLC mean throughput and the maximum quantity of uplink/downlink CE resources occupied are not large in the case of transport congestion, so possibly the transport congestion is caused by IUB bandwidth configuration error.
Open the RNC MML configuration script and find the IUB configuration of the Node B with transport congestion as follows:
The Node B has two pairs of E1 and the bearer type is IMA.
The total bandwidth (PCR) configured for NCP, CCP, ALCAP and IPOA is 302 kbps.
There are two AAL2 paths. One is configured for HSDPA service and the bandwidth is 2442 kbps; the other is for R99 service and the bandwidth is 891 kbps.
The two pairs of E1 in IMA bearer mode can provide 1860 * 2 = 3720 kpbs ATM transport capability. Excluding the common bandwidth occupied by NCP, CCP, ALCAP and IPOA, they should be able to provide 3720 - 302 = 3418 kbps bandwidth for traffic channels. Moreover, there are two AAL2 paths and the sharing mode should be configured between AAL2 paths, that is, the bandwidth of each AAL2 path should be set to the maximum transport capability of traffic channels.
Therefore, the bandwidth of HSDPA and that of R99 AAL2 paths should be both set to 3418 kbps. Obviously, the bandwidth of R99 AAL2 paths is set to 891 kbps (too small). Even without considering soft handover, 891 kbps cannot satisfy the access needs of two 384 kbps services (R99 services), so transport congestion may occur.
*
*
*
Solution
Configure the AAL2 path bandwidth of HSDPA and that of R99 to 3418 kbps. After the modification, the service RRC connection setup success rate and the non-service RRC connection setup success rate both reach the normal KPI requirement and the problem is thus solved, as shown in the following table:
*
*
*
Case 3 — Low Paging Success Rate due to Paging Cycle Coefficient Setting Error
The paging success rate of a certain network is less than 35% (very low), as shown in the following figure:
Note: You can get the above table via “RNC Weekly Report” of Nastar.
*
*
*
Open the RNC MML script. The CN domain paging cycle coefficient “DrxCycleLenCoef” is set to 8 and the paging resend count “MACCPageRepeat” is set to 1.
Moreover, we know from the CN side that the CS paging resend count is set to 3, the interval between the first paging and the second paging is 3 seconds, and the resend interval of the third paging is 4 seconds.
*
*
*
As can be seen from the above settings, the paging detection cycle of the UE in the idle mode, that is, the DRX (discontinuous reception) cycle is 2^8 = 2.56s. Each paging message from the CN will be sent twice in the RNC and the paging interval is 2^8 = 2.56s. In other words, at least 2×2^8 = 5.12s is needed before each paging resent by the RNC can be responded by the UE. Generally, the paging resend count and resend interval of the CN must be considered along with the resend of the UTRAN. If the UTRAN resends the paging once, then the resend interval of the CN should be more than two DRX cycles. Obviously, the resend interval of the CN (3s) is less than two DRX cycles (5.12s) and so the CN starts to resend the next paging message before the UTRAN finishes sending and resending the preceding paging message. Therefore, no paging response is obtained and this problem is shown as paging failure in the traffic statistics of the RNC.
For the UE in the idle mode, the DRX cycle is 2^8 = 2.56s and the UE’s paging response time is more than 2.56s, so the paging response time is longer than we actually feel.
*
*
*
Solution
Modify the paging cycle coefficient of the CN domain “DrxCycleLenCoef” from 8 to 6 (baseline). After the modification, the DRX cycle is reduced from 2.56s to 0.64s and the paging success rate of the entire network is larger than 85%, as shown in the following figure. The problem is thus solved.
*
*
*
Case 4 — Low Paging Success Rate due to 2G&3G Combined Paging
The paging success rate of a certain network is less than 10% (very low), as shown in the following figure:
*
*
*
Because the paging success rate kept being rather low for a number of days, we preliminarily determined that there was little possibility of weak coverage. We checked the CN and found that the 2G&3G combined paging policy was set in the MSC, that is, any paging message destined to 2G or 3G would be initiated to all the LACs in the 2G and 3G networks so as to guarantee paging response.
Site investigation
Check the RNC MML script. The parameter configurations related to paging are found normal.
*
*
*
The above combined paging policy will surely cause the 3G paging success rate to be low. Because there are quite many 2G subscribers at the site, plenty of paging messages to 2G subscribers will also be sent in the 3G network and because the called subscribers are 2G subscribers, the 3G network will surely not receive any paging response. Therefore, the 3G paging success rate will surely be low. The more paging messages to 2G subscribers, the lower 3G paging success rate.
The 2G/3G combined paging policy may be used in some scenarios to improve the paging response probability to some extent, but it may also bring the following troubles:
Paging channel congestion;
Increased air interface interference.
In sum, the 2G&3G combined paging policy has little gain but may easily bring other severe performance problems. It is seldom used.
*
*
*
Solution
Change the CN paging policy to the normal policy, that is, paging by LAC. After the modification, the paging success rate of the entire network is higher than 85% and the problem is solved, as shown in the following figure:
*
*
*
Case 5 — Abnormal Load due to Unreasonable Common Channel Power Ratio Configuration
In a network with many cells empty-loaded, the ratio of the power to the total power (i.e. the downlink load) is between 3% and 38%, as shown in the right table. Normally, the load of a cell empty-loaded should be about 20%, so the value between 3% and 38% is severely abnormal.
Cell TOPN monitoring
Note: You can get the above table by combining “Performance Query” of Nastar and Excel.
Cell Max. Tx Power
Empty Load Power Ratio
*
*
*
The load abnormality in the case of empty load is usually caused by unreasonable common channel power ratio configuration. Below is a comprehensive analysis of the script settings:
In the above table, the load of some cells empty-loaded is very huge. Let’s take 38% here to make the calculation. After we deduct the load of the empty-loaded cell, the admission redundancy (20%) and the power (30%) statically allocated to HSDPA, the power left for R99 services is only 12% (100% - 38% -20% -30% = 12%), which is little and may very easily cause power congestion, that is, the capacity is restricted.
What’s more, the load of some cells empty-loaded is very little, e.g. 3%, which is also abnormal. We may conclude through calculation that such cells cannot be accessed due to pilot Ec/Io deterioration when the load of these cells is about 30%, so the maximum load of such cells will be about 30%. Again because for the admission algorithm (Algorithm 1), load control algorithm and other algorithms it is necessary to compare the current load with the corresponding threshold value (e.g. 75%) so as to decide whether to start the corresponding algorithm, possibly the current load cannot be more than the corresponding threshold value and the algorithms may fail.
We further check the script configuration and find that the following problems exist for the power ratio configuration of the cells with abnormal load:
The pilot power configuration is from 27 dBm to 37.8 dBm whereas the maximum transmit power of the cells is mostly set to 44.8 dBm.
The power ratio configuration of common channels such as PSCH, SSCH, PCH and FACH is far larger than the baseline configuration.
*
*
*
Because the power of other channels uses pilot power as a reference, too high or low pilot power is the major reason why the load of cells empty-loaded is too high or low. The pilot power should be reasonably set according to the maximum power of the cell. According to the planning (simulation) requirements, the pilot power should be 10 dB less than the maximum transmit power of the cell. Too high pilot power will result in capacity loss as mentioned above, whereas too low pilot power will cause the algorithms to fail and may bring other problems such as mute. If we need to reduce the pilot power in special cases, we should also lower the configured maximum transmit power of that cell, so that the pilot power is 10 dB less than the maximum transmit power of the cell. For instance, suppose we set the pilot power to 27 dBm, then we should configure the maximum transmit power of the cell to 37 dBm.
In some guidance documents about RF optimization, we are often told to modify pilot channel power as an optimization means. We do not recommend this. Because the modification of pilot power in a large scope will result in numerous problems such as uplink-downlink unbalance, service coverage void in the soft handover area and severe adjacent cell interference in the uplink. Therefore, we should start with antenna parameters to solve RF problems such as weak coverage and pilot pollution. We should not adjust the pilot power unless in special cases.
When the power ratio configuration of common channels such as PSCH, SSCH, PCH and FACH is far larger than the baseline configuration, the load of cells empty-loaded may be too high. The baseline configuration of power ratio for common channels has been verified in Beta tests and commercial offices. We should not modify the baseline configuration as a routine RF optimization means. Moreover, we should use the smaller power ratio to attain a better balance between common channel coverage and traffic channel coverage during the optimization, so that the common channel power ratio after the optimization is not far larger than the baseline configuration.
*
*
*
Take the following measures:
Optimize the configurations of the pilot power and cell maximum transmit power, so that the pilot power is 10 dB or more less than the cell maximum transmit power, for example, Pcpich = 27 dBm, TCP = 37 dBm, Pcpich = 34.8 dBm and TCP = 44.8 dBm.
Restore the power ratio configuration of common channels (PSCH, SSCH, PCH, FACH, etc. )t to the corresponding baseline configuration.
After the above measures are taken, the cell loads restore to normal and the problem is thus solved without bringing any other troubles, as shown in the following table:
Note: You can get the above table by combining “Performance Query” of Nastar and Excel.
Case 5 — Abnormal Load due to Unreasonable Common Channel Power Ratio Configuration (Continued)
Cell Max. Tx Power
Empty Load Power Ratio
*
*
*
Case 6 — Access Failure and Call Drop due to External Interference
In a certain network, the average RTWP of quite many cells during some days is above –85 dBm, as shown in the following table:
As can be seen from the above table, the cells with a high average RTWP have normal services (they are not out of service) and very little traffic, so very probably external interference may cause the RTWP to raise.
*
*
*
Correlation analysis of RRC setup failure and interference
As can be seen from the above table, the cause of RRC setup failure is that the UE does not reply and for the cells whose failure rate is high with many failures, the average RTWP is above –95 dBm (rather high). Moreover, there is very little traffic in these cells. Therefore, very probably the abnormal raise of RTWP may cause plenty of RRC setup failures.
*
*
*
As can be seen from the above table, the cause of CS RAB setup failure is air interface failure (mostly RB no response) and for the cells whose failure rate is not high but with many failures, the average RTWP is above –92 dBm (very high). Moreover, there is very little traffic in these cells. Therefore, very probably the abnormal raise of RTWP may cause CS RAB setup failure.
Case 6 — Access Failure and Call Drop due to External Interference (Continued)
Correlation analysis of RAB setup failure and interference
*
*
*
As can be seen from the above table, the cause of CS call drop is RF failure (mostly uplink synchronization failure and UU interface no response). For the cells whose failure rate is high with many failures, the average RTWP is above –92 dBm (very high). Moreover, there is very little traffic in these cells. Therefore, very probably the abnormal raise of RTWP may cause CS call drop.
Case 6 — Access Failure and Call Drop due to External Interference (Continued)
Correlation analysis of CS call drop and interference
*
*
*
Correlation analysis of PS call drop and interference
As can be seen from the above table, the cause of PS call drop is RF failure (mostly uplink synchronization failure and UU interface no response). For the cells whose failure rate is high with many failures, the average RTWP is above –95 dBm (rather high). Moreover, there is very little traffic in these cells. Therefore, very probably the abnormal raise of RTWP may cause PS call drop.
*
*
*
According to the routine interference monitoring results and the correlation analysis between RRC setup failure/RAB setup failure/call drop and the average RTWP, we may infer that possibly there exists strong external interference that causes the RTWP of some cells to abnormally raise. The plenty of RRC setup failures, RAB setup failures and call drops are all closely linked to the abnormal raise of RTWP.
For this reason, we carefully surveyed the radio environment at the site and searched for interference. Through tests and problem location, we found that the strong interference came from the radio equipment of the army. With our efforts, the customer requested the local radio commission to remove the existing interference source or lower the power of the radio equipment. These external interference was gradually reduced. The latest statistics show that the number of cells with abnormal raise of RTWP will decrease with the weakening of abnormal raise of the average RTWP.
*
*
*
Solution
Continue to push the customer and the local radio commission to shut off the interference source or reduce the transmit power of the interference source so that the interference is within the acceptable range.
*
*
*
Case 7 — Power Congestion due to Resource Restriction
Power congestion occurs to many cells in a network, as shown in the following table:
As can be seen from the above table, power congestion mainly occurs in the PS RAB setup phase and accounts for over 50% of all congestions. The congestion rate during PS RAB setup is as high as 16.55%. Obviously, power congestion is quite severe.
*
*
*
Open the MML script to see the call admission parameter configuration of the cells with power congestion:
The downlink call admission algorithm uses Algorithm 1 (NBMDLCACALGOSELSWITCH=ALGORITHM_FIRST)
The maximum transmit power of the cell is 20W (MAXTXPOWER=430)
The call admission threshold of PS services is 75% (DLOTHERTHD=75)
Therefore, the power threshold for access denial of PS services is 10lg (20000*75%) = 41.76 dBm. We can see from the above table that the transmit power of the cells with power congestion is above 41.92 dBm, which is more than the admission threshold 41.76 dBm. Thus we know that the power congestion is normal. Moreover, the maximum number of CEs in the downlink reaches 97 when congestion occurs, indicating that there is indeed big traffic and power resource congestion is a fact.
*
*
*
Increase carrier frequencies
Cell splitting
Introduce HSDPA
Other means
*
*
*
Code congestion occurs to quite many cells in a network, as shown in the following table:
As can be seen from the above table, code congestion mainly occurs in the PS RAB setup phase and accounts for 100% of all congestions. The congestion rate during PS RAB setup is as high as 7.78%. Obviously, code congestion is quite severe.
Case 8 — Code Congestion due to Resource Restriction
*
*
*
As can be seen from the right green part in the above table, the average code utilization rate is over 60% when code congestion occurs. Therefore, code congestion may easily occur. Moreover, the maximum number of CEs in the downlink reaches 105 when the congestion occurs, indicating that there is indeed big traffic and code resource congestion is a fact.
Correlation analysis of code congestion, code utilization rate and traffic
Case 8 — Code Congestion due to Resource Restriction (Continued))
*
*
*
Case 8 — Code Congestion due to Resource Restriction (Continued))
The same as power resource congestion, we may solve code resource congestion by the following means:
Optimize PS policies (e.g. DCCC, state transition, etc.)
Increase carrier frequencies
Cell splitting
Introduce HSDPA
Other means
*
*
*
Transport congestion occurs to quite many cells in a network, as shown in the following table
As can be seen from the above table, IUB transport congestion mainly occurs in the PS RAB setup phase and accounts for over 90% of all congestions. The congestion rate during PS RAB setup is as high as 31.33%. Obviously, IUB transport congestion is quite severe.
Case 9 — IUB Transport Congestion due to Resource Restriction
*
*
*
Here we select Cell1211 to make an analysis. Open the MML scrip to see the relevant IUB transport bandwidth configuration as follows:
The Node B to which the cell belongs has a pair of E1s and the bearer type is UNI
There is one AAL2 path for R99 services and the bandwidth is 1812 kbps
As can be seen from the above configuration, the IUB bandwidth configuration of the Node B is OK for the physical bandwidth of one E1. However, we can see from the right green part in the above table that the average downlink throughput is as high as 104.85 kbps when IUB transport congestion occurs to the cell and so the cell traffic is rather high. Moreover, the Node B has two cells and the traffic of the two cells together will be even higher. Therefore, one E1 cannot satisfy the actual bearer requirements and thus IUB transport congestion may easily occur.
*
*
*
Case 9 — IUB Transport Congestion due to Resource Restriction (Continued))
Solve transport resource congestion by the following means:
Optimize PS policies (e.g. DCCC, state transition, etc.)
Expand the capacity of transport resources
Increase micro cells in hot spots
Other means
*
*
*
Case 10 — Plenty of RRC Connection Setup Failures due to Node B Version Defect
The RRC connection setup success rate in a network is rather low, as shown in the following table:
*
*
*
RRC connection setup failure mostly occurs to such cells as 30843, 30863, 30252 and 30382. It is nearly 100% for these cells and the major failure cause is that the UE does not reply, as shown in the following table:
Cell analysis
*
*
*
As can be seen from the right green part in the above table, the cells are in normal service, the RTWP is about -106 dBm and the maximum transmit power of the cells is about 36 dBm when RRC connection setup failed, so there is no transport problem or uplink interference.
Assisted by the onsite engineers, we performed the dialing test on Cell 30843 and started IOS tracing. The failure symptom is as follows: After receiving RRC CONNCET REQ, the RNC normally sends RRC CONNECT SETUP. However, RRC connection setup failed because RRC CONNECT SETUP COMPLETE is not received. Finally, the RNC initiates RL to release the link. See the figure below:
*
*
*
Open the RRC CONNCET REQ message, as shown in the following figure:
As can be seen from the above figure, the downlink Ec/No is about (44-49)/2 = -2.5 dB when the RRC connection setup request is initiated. Therefore, the downlink signal quality is OK and there should be no downlink weak coverage or downlink interference problem.
*
*
*
Try deactivating the cells and then activating them. The cells can be connected, as shown in the following figure:
Therefore, the problem is caused by Node B equipment fault.
The R&D confirmed that the problem was due to software version (V16 041) defect of Node B of BTS3812E type and could be solved by upgrading the software version to V16 061.
*
*
*
Solution
Upgrade the software version of the Node B of BTS3812E type to V16 061. After the software version upgrade, the RRC connection setup success rate in the network reaches the normal KPI requirements and thus the problem is solved, as shown in the following table:
*
*
*
Case 11 — Power Congestion due to Unreasonable Admission Parameter Setting
Power congestion occurs to the cells in the right figure (mainly 50201 and 50203) in the RRC connection setup phase and RAB setup phase. According to the maximum number of CEs when the congestion occurs, we know that the traffic is not very big.
Note: You can get the table on the right via custom report or “Performance Query” of Nastar.
Power congestion in RRC setup
Power congestion in RAB setup
*
*
*
Open the MML script. The uplink call admission algorithm of the two cells uses Algorithm 2. We can see from the traffic that there is no uplink congestion. The downlink call admission algorithm uses Algorithm 1. The downlink threshold of the conversational AMR voice service is 70%, that of conversational non-AMR voice service is 70% and that of other services is 60%. Below are relevant MML commands:
ADD CELLCAC:CELLID = 50201, DLCONVAMRTHD = 70, DLCONVNONAMRTHD = 70, DLOTHERTHD = 60
ADD CELLCAC:CELLID = 50203, DLCONVAMRTHD = 70, DLCONVNONAMRTHD = 70, DLOTHERTHD = 60
Comparing the above with the baseline configurations and the corresponding settings of various commercial offices, we can see that the values of the above call admission thresholds are too small.
*
*
*
Solution
Change the downlink thresholds of conversational AMR voice service, conversational non-AMR voice service and other services respectively to 80%, 80% and 75% (the baseline values of RNC1.5). Then the power congestion problem disappears.
*
*
*
Case 12 — Code Congestion due to Unreasonable Setting of DCCC and Soft Handover Parameters
Code congestion often occurs to Cells 50201 and 50203.
*
*
*
As viewed from RB distribution, in the downlink at 20:00 the two cells had quite many DL PS384k services, some PS144k streaming services as well as some other services:
Correlation analysis (1)
Note: You can get the above table via “Performance Query” of Nastar.
*
*
*
Further analysis reveals that the actual average RB traffic of PS high-speed service is not high:
So the DCCC related parameter configuration may be unreasonable.
*
*
*
The DCCC monitoring results show that there is no channel switching and there is no DCCC-related RB reconfiguration in the entire network, as shown in the following table:
Possibly the DCCC switch is off.
*
*
*
Parameter configuration analysis (1)
Open the MML scrip. The DCCC switch is indeed off (SET CORRMALGOSWITCH:CHSWITCH = DCCC_SWITCH-0).
We recommend that the DCCC switch be on (the DCCC-related parameters may temporarily use the default ones or they can be optimized as needed so as to ensure that the users will not obviously feel the change).
When the DCCC switch is on, the code resources that do not need to be occupied will be released and the corresponding power resources will also be released. In this way, code congestion and power congestion can be alleviated to a certain extent.
*
*
*
As can be seen from the following table, the ratio of 1A events to 1B events is even higher than 100.
Correlation analysis (2)
*
*
*
Most of the weak links in the active set release resources by the out-of-sync mechanism, as can be verified in the following table:
*
*
*
Parameter configuration analysis(2)
Open the MML script. The relevant threshold of RNC-level 1B is set to 14 (7 dB) and the trigger time is set to 2560 (2560 ms), which will make it hard to trigger the 1B event.
When the 1B event cannot be timely triggered, some links cannot be timely released and as a result some code resources that do not need to be occupied will be occupied, thus easily causing code congestion. Moreover, transport congestion and power congestion are likely to occur, too.
Let’s have a look at Hong Kong’s settings:The relative threshold of RNC-level 1B is set to 14 (7dB) but the trigger time is set to 640 (640 ms). The ratio of 1A events to 1B events is within 2 and so the set values are reasonable.
*
*
*
Open the DCCC switch.
Modify 1B parameters with reference to Hong Kong’s relevant parameter configuration: the relative threshold of RNC-level 1B is 14 (7 dB) and the trigger time is 640 (640 ms).
After the above are done, code congestion has disappeared.
*
*
*
Case 13 — High Call Drop Rate due to Unreasonable Setting of 1D Parameters and Inter-System Handover Parameters
Call drop occurs to the following cells:
*
*
*
The above call drop is usually caused by weak coverage. We can find some problems from the MML:

*
*
*
Let’s have a look at the settings in Hong Kong and Brunei: 1D hysteresis is set to 8 (4 dB) and the 1D trigger time is set to 640 (640 ms), so the 1D event is more easily to trigger, which can be verified by the ratio of 1D events to 1A events. In Hong Kong and Brunei, 1D events are about half of 1A events, but in this case the ratio is 1/8. So we recommend that the settings be changed to the same as in Hong Kong and Brunei so that soft handover becomes more smooth and the call drop rate can be improved.

*
*
*
Modify 1D parameters with reference to relevant parameter configuration in Hong Kong and Brunei: 1D hysteresis is set to 8 (4 dB) and the 1D trigger time is set to 640 (640 ms).
Change the value of INTERRATPSTHD2DRSCP to -105dBm or a greater value, so as to avoid call drop due to weak signals during signal switching.
After the above are done, both the CS call drop rate and the PS call drop rate are improved, as shown in the following table:
*
*
*
Case 14 — High Call Drop Rate due to RNC Traffic Measurement Defect
Call drop occurs to the following cells:
*
*
*
Because the call drop cause is “Other”, we should first check the relevant CDL log. Below is the log of Cell 1403 on May 3:
CDL analysis
======No. 57======
Interface: RNCAP_NBM_NBAP_INTERFACE
Msg: NBAP_RL_RECFG_READY
======No. 58======
Interface: RNCAP_INTRA_INTERFACE
Msg: RNCAP_RL_SYNC_RECFG_RSLT
======No. 59======
FSM ID:RNCAP_RB_FSM_ID
CSS:RB SETUP
======No. 61======
Interface: RNCAP_INTRA_INTERFACE
Msg: RNCAP_MAIN_RUNTIME_ABNORMAL_MSG
AL configuration failed
*
*
*
RB setup is complete
Signaling plane setup succeeded and the RNC returns the RAB setup success message to the CN
*
*
*
======No. 76======
Err in RNCAP_CcbCheckAbnormalFlags: User Plane Fail!RAB Fail Cn Domain id = 1, Rab Id = 5
======No. 77======
Err In RNCAP_CcbCheckAbnormalFlags: User Plane Fail! Table Type is : 9, Table Index is 2252
======No. 78======
======No. 79======
Enter in RNCAP_RabRelReq for PS: Cause = 184945367, enRabRelReqType = 4.
According to the above CDL procedure analysis: a) the RNC returns the RAB setup success message to the CN after completing RL reconfiguration and RB setup (this is merely the setup success in the NBAP signaling plane); b) the limited transport bandwidth caused the user plane setup failure during the NBAP user plane setup, so the RNC then initiates the RAB release procedure. The Other cause in the above traffic statistics means the abnormal release here.
User plane setup failed
The error code 184945367 can be interpreted as RR_ERR_RNCAP_ALCFG_IUB_AAL2_MAX_BIT_RATE_FOR_FW_NOT_AVAIL, that is, the limited transport bandwidth caused user plane setup failure. The RNC enters the RAB release procedure
*
*
*
The above analysis shows that the call drop in traffic statistics is actually access failure and the RNC should not count the ”RAB release procedure“ that follows as call drop.
The R&D has confirmed that this is a known bug of the current RNC version (R005C03B065) and it can be avoided by installing the SP05 patch.
*
*
*
After the SP05 patch is installed for the RNC, almost all the call drops with the cause being “Other” have disappeared and the PS call drop rate is obviously lower, as shown in the following table. The problem is thus solved.
Solution
*
*
*
Case 15 — Power Congestion due to HSDPA Measurement Switch Off
The PS RAB setup success rate in a network is 79% (extremely low), as shown in the following table:
*
*
*
Cell analysis
The cell analysis results show that the PS RAB setup failure mostly occurs to such cells as 8881, 8882 and 25282 and the major failure cause is power congestion, as shown in the following table:
*
*
*
As shown in the above table, when power congestion occurs, the maximum transmit power of the cell is 39 dBm (not high) and it should not cause power congestion. Further analysis of CDL reveals that the PS RAB setup failure is due to power congestion that is caused by access denial (in many cases the denial based on the equivalent user number) . Because the script parameters use ‘Alg_First’, there should not be any judgment of the equipment user number. The only possibility is that the power cannot be predicted. Because the cell is H cell and the power occupied by H channel needs to be known during the call admission, the algorithm provides a switch ‘HSDPA measurement’. The measurement can be made and call admission based on power prediction can be performed only when this switch is on. In the script, this switch is off for the network and thus the call admission uses the equivalent user number to make the power prediction and a big deviation arises (this is also why we should avoid using the equivalent user number for call admission) to cause the denial that should not happen. That’s the reason why we see power congestion in the traffic statistics although the actual power is not high.
*
*
*
Open the HSDPA measurement switch, as shown in the following figure to solve the problem.
Solution
Note: The NCP bandwidth should be above 100 kbps if we want to open the HSDPA measurement switch.
After the switch is on, the PS RAB setup success rate of the entire network reaches the normal KPI requirements and the problem is solved, as shown in the following table:
*
*
*
The PS call drop rate of a network is larger than 30% (extremely high), as shown in the following table:
*
*
*
Cell analysis
The cell analysis results show that the cells with severe call drop are 24181, 3783, 19083, etc. and the major causes of call drop are RLC reset, uplink synchronization failure, UU interface no response or other RF problems, as shown in the following table:
*
*
*
The call drop with the cause being “RF problems” is often due to weak coverage. However, the results of checking and analyzing the MML script show that the 2G cell support capability needed for inter-system handover of PS services in the network is EDGE (ADD TYPRABBASIC REQ2GCAP= EDGE whereas it is GPRS (ADD GSMCELL RATCELLTYPE=GPRS) in GSM cell attribute configuration. Because the capability required by the services is higher than the support capability of GSM cells, PS services will not start the compressed mode for inter-system handover. Therefore, PS service call drop may easily occur at the edge of the network with the call drop cause being “RF problems”.
*
*
*
Change “REQ2GCAP=EDGE” in the PS service attributes to “REQ2GCAP=GPRS”. The PS service call drop rate is improved to a certain extent, as shown in the following table:
Solution
*
*
*
Case 17 — PS Inter-System Handover Success Rate Is Zero due to RNC Traffic Measurement Defect
The PS inter-system handover success rate of a network is 0, as shown in the following table:
*
*
*
Cell analysis
The cell analysis results show that the PS inter-system handover failure mostly occurs to such cells as 45552, 5552, 25652 and 45151 and the major failure cause is “Other”, as shown in the following table:
*
*
*
We tested the cells with frequent failures at the site and found that the PS inter-system handover actually succeeded but the RNC PS traffic statistics indicated failure.
Through further signaling trace and analysis, we found that the CN did not send the SRNC CONTEXT REQUEST message during PS inter-system handover whereas at present the RNC will not count the success unless it receives the SRNC CONTEXT REQUEST message (so the PS inter-system handover success count is always 0 in the statistics). At present, our CN will send the SRNC CONTEXT REQUEST message, so we failed to discover this problem during the test of V15 office. (On the earlier days, the RNC would directly judge the cause value of IU REL CMD but later we discovered in Uruguay Beta Test that the CN (not our CN) would always send the release command even if the 2G system did not support inter-system handover and the RNC would count the handover as being successful as long as the release cause value was “Normal Release” , so we changed the rule to the present way so as to avoid incorrect measurement: the RNC will not count the handover as being successful unless it receives the SRNC CONTEXT REQUEST message).
It is not stipulated in the protocols that the CN should send the SRNC CONTEXT REQUEST message during PS inter-system handover. Instead, the CN does not need to send this message if it is not necessary to restore the PDP context.
*
*
*
Change the measurement method of PS inter-system handover success as follows: During the PS inter-system handover, if the RNC receives the IU RELEASE COMMAND message after sending the CELL CHANGE ORDER FROM UTRAN message and if the cause value in the IU RELEASE COMMAND message is “Successful Relocation” or “Normal Release” or an other normal cause value, then it indicates that the PS inter-system handover procedure succeeded and the success should be counted. Merge this change into the RNC V17 version.
There still exists this problem: When the CN sends the IU RELEASE COMMAND message that carries the normal cause value in the case of inter-system handover failure, the RNC will also count the handover as being successful. This problem cannot be avoided and at present our RNC cannot solve it.
Solution
*
*
*
Case 18 — High PS Call Drop Rate due to FACH State Support Defect
The PS call drop rate of a certain network is higher than 30% (extraordinarily high), as shown in the following table:
*
*
*
According to RB reconfiguration conduct analysis, channel switching between common channels (FACH) and dedicated channels occurred many times:
*
*
*
*
*
*
Close the BE service state transition switch. The PS service call drop rate is greatly lowered, as shown in the following table:
Solution
*
*
*
Case 19 — Low CS Inter-System Handover Success Rate due to 2G Parameter Configuration Error
CS inter-system handover failure frequently occurs to the following cells and the failure rate is even as high as 100% with the major failure cause being physical channel failure, as shown in the following table:
*
*
*
Physical channel failure is generally caused by weak signals of 2G cells, interference at the 2G side or other problems. Later these causes were ruled out at the site and ultimately we found that the problem was because the 2G MSC did not set cell encryption in the handover response message after the AMR function was provisioned for the GSM of the customer.
Till now, we were sure that the problem was caused by 2G MSC.
*
*
*
Because the 2G network was provided by our competitor E, we asked the customer to push E to handle the problem. After then, the CS inter-system handover failure rate obviously decreased and the problem was thus solved, as shown in the following table:
Solution
*
*
*
Case 20 — High VP Service Call Drop Rate due to Too High Logic Channel Priority
*
*
*
Cell analysis
The cell analysis results show that VP call drops are randomly distributed in space and time and the call drop is due to RF problems. The cells are in normal service and the RTWP is also normal when the call drop occurs, as shown in the following table:
*
*
*
The RF-attributable call drop is generally caused by weak coverage, but there was little possibility that the coverage changed a lot in a large scope within the two days of upgrade, so we suspected that the rise of the VP call drop rate was closely related with the RNC upgrade.
Through troubleshooting, we found that BSC6800V100R003 and BSC6800V100R005 had slight difference in the specific implementation: In RNC V1.3, the priority allocation of logical channels is implemented in the codes and cannot be modified via any command, what’s more, signaling priority is higher than service priority; in RNC V1.5 and later versions, flexibility is added and the priority allocation is configurable via the background with such factors as service type differentiation fully considered, meanwhile default configurations are provided for priority parameters.
We found through the verification test on the simulation platform that the service priority in the default configurations of RNC V1.5 was too high and the logical channel priority of some services was even higher than signaling priority, which caused cell coverage edges. When the transmit power of the UE is close to the maximum value, the UE will enter the uplink TFC selective sending state and then the uplink signaling cannot be sent, thus causing call drop. The relationship between logical channel priority and transport channels is not clarified in the protocols. According to the test results, it is this parameter that helped the VP call drop rate of various commercial offices to decrease.
*
*
*
Change the default value of logical channel priority of VP service RB in BSC6800V100R005 from 1 to 4 (SET LOCHPRIO: CSCONVLOCHPRIO=4 ), so that the service priority is not larger than the signaling priority. The VP call drop rate obviously decreases and the problem is thus solved, as shown in the following table:
Solution
*
*
*
Experience Summary
Performance analysis is largely based on traffic statistics, so we should be familiar with traffic measurement indices. We are sure which to analyze only after we know the traffic measurement indices and their meanings and measuring time.
The Help file of RNC traffic measurement provides the structure diagram of traffic measurement indices (index tree) as well as a description of the index meanings and measuring time. We should understand all the indices covered in this Help file. Of all the indices, the indexes of “cell measurement” and “RNC overall performance measurement” are especially important. We must completely master them.
*
*
*
Experience 2: Be good at designing analysis themes
As we mentioned before, a table composed of relevant traffic measurement indices is called an “analysis theme”. There is no constraint for the design of analysis themes. Any group of relevant traffic measurement indices can become an analysis theme, as long as they can guide the discovery or analysis of problems. The aforesaid cases have given many examples of analysis themes.
Although there is no constraint for the design of analysis themes, analysis themes have a lot in common in terms of the analysis principle, for example, most performance problems are related to equipment state, interference, capacity and coverage, so we may combine the indexes related to equipment state, interference, capacity and coverage into a standard “correlated index set” to serve as a reference for analyzing a specific performance problem. Practice shows that the correlated index set may have a great role: It enables the analyzer to analyze problems from a wider angle and to further confirm problems or discover new exceptions.
Below is the correlated index set often used in “Cell TOPN monitoring” and “Cell analysis”. It is given here for your reference only.
Experience Summary (Continued)
*
*
*
How well you design analysis themes and how efficient they can help problem discovery and problem analysis will depend on how well you understand the traffic measurement indices, master the systematic knowledge and accumulate analysis experience. Moreover, as mentioned before, analysis themes can be integrated in templates to enable knowledge transfer and experience sharing. Therefore, the analysis theme function of Nastar is a powerful weapon to expand your thoughts and bring into play your wisdom and potentials. It enables your experience to be conveniently shared to others.
Below are some common analysis themes based on a summary of past experience. There are already report templates accordingly for your reference, which provide the basic information needed for routine monitoring and analysis of performance problems and help you efficiently complete performance monitoring & analysis.
A link to report template. It is for your reference only.
*
*
*
Experience 3: Get familiar with auxiliary problem analysis & location methods
As we have mentioned before, Nastar supports network-wide KPI monitoring, cell TOPN monitoring, cell analysis and CHR analysis. However, we can only discover problems and preliminarily troubleshoot them by merely these monitoring and analysis methods. Still plenty of other means, e.g. IOS, CDR, alarm and other problem location tests and data collection & analysis means, are needed to solve some hard-to-crack problems.
Moreover, we may improve the problem analysis efficiency through use of more auxiliary analysis tools, for instance, OMStar can clearly display the network topology and transport layer configuration, and can quickly complete the comparative check of radio layer parameters; Microsoft office excel has many functions and the macro function that can greatly help improve the analysis efficiency.
We should be familiar with these auxiliary analysis means. For details, please refer to the relevant guidance book or online help document.
Experience 4: Get familiar with parameter configuration
Parameter configuration is a key factor influencing the network performance. At present, most of the performance problems we have discovered are caused by parameter setting errors. What’s more, we often need to combine parameter configuration to make the analysis during problem location. Therefore, we should get familiar with and master the configuration of various parameters. Here the parameter configuration mainly refers to radio layer related parameters and among them the parameters related to network planning and management are especially important. In addition, because we should focus on the capacity expansion of transport resources during network planning, we must also master the configuration of transport layer parameters.
There are quite many documents for guiding parameter configuration. You can consult them if necessary.
Experience 5: View problems from system perspective
Many factorsinfluence performance and are complex. For example, access problem and call drop problem may be related to coverage problem, interference problem, transport problem, equipment function defect and/or parameter configuration errors. Therefore, we cannot say that the access or call drop problem whose symptom indicates RF problem is always caused by coverage problem. For example again, the resource access and congestion problem may be related to the actual resource condition, admission policy, DCCC policy, state transition policy, HSDPA bearer policy and/or congestion control policy, so we cannot simply think that capacity expansion is needed once there arises resource access or congestion problem.
In sum, we should view problems from the system perspective so as to avoid noticing only the superficial symptom and being blind to the problem nature.
How to do this? First, we should learn the coverage, load and resource configuration & utilization of the whole network, understand the network conducts (e.g. QoS negotiation, RB distribution, cell throughput, traffic distribution, etc.), master the parameter configurations (such as function switches, admission control, handover & roaming policies and PS policies of the whole network) before analyzing the specific problem in the global point of view.
Always viewing problems from the system perspective will help you avoid detours during problem analysis and help you efficiently solve problems.
Experience 6: Dare to cast doubt
Most performance issues are end-to-end and involve many intermediate NEs, and every NE may be problematic. Therefore, we should dare to cast doubt on each link based on the actual data during in-depth analysis of the problem, and should be good at organizing relevant resources to troubleshoot the problem. In this way, we can achieve double effect with half efforts during problem location and solving.

Documents

Case Study of WCDMA Optimization (Performance Analysis of Nastar)-20060908-A-1.0