62
Kapil Ramlal (KappA) Escalation Engineer Troubleshooting Tools and Methodology in a Citrix XenApp 5.0 Environment

Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Embed Size (px)

DESCRIPTION

Troubleshooting tools and methodology for Citrix XenApp 5 environmen

Citation preview

Page 1: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Kapil Ramlal (KappA)Escalation Engineer

Troubleshooting Tools and Methodology in a Citrix XenApp 5.0 Environment

Page 2: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 3: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 4: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 5: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 6: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 7: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Agenda

The right tool, right place at the right time

Troubleshooting scenarios

Top utilities

Case studies

Additional resources/Q&A

Page 8: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp troubleshooting

Understanding the infrastructure

The anatomy of a XenApp farm

• Information: Static and Dynamic

• Components: Where to focus troubleshooting

Understanding what happens from logon to launch

• Types of issues: Denial of service, bottlenecks

• Troubleshooting: Medevac, performance monitoring, CDF…

Page 9: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Types of Information

• Dynamic Store

• Constantly changing information

• Load management

• Information required for application launch

Dynamic

• Data Store

• Does not change frequently

• Farm configuration

• Changes made in the Management Console

Static

LHC

DATA STORE

Page 10: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Logon to launch

Zone Data Collector

Data Store

Active Directory

Least Loaded Server

XML Broker

Web InterfaceClient

Page 11: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

MedEvac (CTX107935)

• The XML Broker tests• Verifies that the XML Service is able to respond to an XML / client request

• XML is able to contact the Zone Data Collector

• Zone Data Collector tests• Verifies that the ZDC can provide the address of the least loaded server for the requested app

• The IMA Service is able to respond

• The IMA Service can read the Local Host Cache

• The IMA Service can read it’s Dynamic Store

• Least Loaded Server tests• Verifies that Terminal Service is able to respond

• Verifies that the RPC Service is able to respond

Page 12: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

How to Monitor Farm Health using MedEvac?

• See knowledge center article CTX119899   

Page 13: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Monitoring

Zone Data Collector

Active DirectoryXML Broker

Web InterfaceClient

IMA Work Item Queues

IMA %CPU time

Zone Elections Won

ASP Requests

XML Threads RSOP

CDF

CDF

Page 14: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Citrix Counter Description Threshold Server to monitor

Application Resolution Time (MS) Time to resolve LLS Determine baseline All XML Brokers

Data Store Connection Failure

Number of minutes the server

has been disconnected from

the Data Store

Determine threshold

considering scheduled

reboots and maintenance

All XennApp servers

Number of Busy XML Threads

Number of XML requests

currently being processed

(Max=16)

16 sustained for 1 min or

longerAll XML Brokers

WorkItem Queue Ready Count

Number of work items that are

ready and waiting to be

processed by IMA

Sustained above 0 for 1 min

or longer

All XML brokers

Most Preferred and

Preferred Data Collectors

Resolution WorkItem Queue

Ready Count

number of work items (related

to application launches)

waiting to be processed by

IMA

Sustained above 0 for 1 min

or longer

All XML brokers

Most Preferred and

Preferred Data Collectors

Zone Elections WonNumber of times this server

won an election

if this counter increments by

2 in a 1 hour period

Most Preferred and

Preferred Data Collectors

Page 15: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

XenApp 5.0 Health Monitoring and Recovery

• Enterprise & Platinum Editions of XenApp• Performs tests to monitor state and identify health risks

• Terminal Services tests

• XML Service test

• Citrix IMA Service test

• Logon Monitor test

• Check DNS test

• Local Host Cache test

• XML threads test

• Citrix Print Manager Service test

• Microsoft Print Spooler test

• ICA Listener test

• See page 307 of the XenApp 5.0 Administrator’s Guide (CTX115519) for information

Page 16: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Large Farm Tips

• Limit additional roles on Zone Data Collectors

• Limit the number of zones in the environment

• Do not run management consoles on or pointed to the ZDCs

• Read the Key Infrastructure Tuning article: CTX116492

Free the ZDC!

Page 17: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

The evolution continues!

• Citrix XenApp 5.0 opens the door for delivering resources on Windows Server 2008

• Clients are also adopting more Windows Vista users

• Say hello to the next generation troubleshooting artillery for the XenApp 5 environment

• Existing tools have been updated, and new tools introduced

• The evolution continues!

Page 18: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

The right tool, right place at the right time

• DON'T• Use troubleshooting tools just because you can

• Recommend tools that are not relevant to the problem

• Use troubleshooting tools without understanding their impact of the environment

• DO• Use tools to help automate time consuming tasks

• Use tools at the right time, such as when the problem is occurring and not afterwards

• Understand what the tool is trying to accomplish, so that the right data is obtained

• Use tools with a clear purpose

• Maintain a local toolkit, so that the right tools are always available in times of crisis

Page 19: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CDF Tracing & CDFControl 2.5

Page 20: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Common Diagnostic Facility (CDF)

• Provides the ability to collect traces for problem diagnosis on Citrix binaries without disrupting the services or users

• Citrix’s standard debug tracing facility

• Efficient and non-intrusive data collection process

• Enabled without stopping and starting services

• Faster & easier tracing for retail modules

• Flexible & customizable troubleshooting facility

• Consistency across most Citrix products

Page 21: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CDF Basics

• To better understand what a CDF trace message is, let’s look at the following pseudo code example

• In the example, the function belongs to a service, which can be considered to be a Trace Provider (more on this later)

Page 22: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen
Page 23: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

The moral of the story

• We could capture a CDF trace to determine if the CitrixFeatureDLL.dll loaded successfully

• How difficult it would be to debug without having this tracing?

• You need special symbol files to be able to read the trace messages (TMF files)

• This allows certain information to remain private as needed (similar to .pdb files)

• You get more by default!

Page 24: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CDF Internals

• To better understand CDF, let’s take a quick overview at how the Operating System supports Event Tracing (ETW)

Page 25: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CONTROLLER

Enable/Disable Buffers

Trace File

Events

Events

Events

CONSUMER

Events

Events

CDFCONTROL

CDM.sysRadeSvc.exe WFShell.exe

Page 26: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

ETW Components

Providers:

• Modules containing tracing, that can be enabled or disabled

• Example: MF_Driver_Cdm (Cdm.sys)

Controllers:

• Enables/Disables a provider

• Configures trace capture settings

• Starts/Stops a trace

Consumer:

• Reads trace events from log file

• Reads trace events real-time from a trace session

Page 27: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CDFControl v2.5

• CDFControl is a hybrid controller and consumer

• It can start/stop/enable and configure an ETW/CDF trace session

• It can consume (read) trace events from a log file, or from a live real-time trace session

• The original version operated only as a ETW Controller, and was published under CTX111961

Page 28: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CDFControl 2.5 Demo

Page 29: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Troubleshooting Scenarios

Page 30: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Troubleshooting scenarios

• Application Streaming

• Seamless/Multi-Monitor

• 3rd Party Applications

• CPU Spikes

• Deadlocks/Hangs

• Database

• Network

• Black Hole Effect

• XenApp Plugin (PNA)

• Debugging

Page 31: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Application Streaming

1. End user launches app from WI or PN Agent2. RAD file is downloaded3. RAD file launches client Application Isolation Environment (AIE)4. RAD file instructs streaming client to download:

• Manifest file | AIE rules | Application executable | Pre and post execution scripts5. Streaming client launches executable according to instructions in manifest file and AIE rules including pre

and post execution scripts and registers with the ctxsbx.sys (redirector)6. Application is available to user7. Streaming Client requests additional files as required, checking first in the client cache, then if necessary,

downloading additional files from the file server

What happens on the client side?

Network File Servers

RAD file

Streaming Client and AIEEnd User

• manifest file• executable• AIE rules

• .dll’s• data files• other .exe’s

• .dll’s• data files• other .exe’s

• .dll’s• data files• other .exe’s

Page 32: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Application Streaming

• Isolate the Issue• When?

• Profiling

• Publishing

• Streaming

• How?• Streaming to Server

• Streaming to Client

• Versions?• WI 4.5, 5.0

• License server 4.5,5.0

• Client

Page 33: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Application Streaming

Streaming Client Troubleshooting:

• Client installation is required on workstations

• Verify the Citrix Streaming Service is started or restart

• Reference CTX116483 – required permissions

• Enable debug console• HKEY_LOCAL_MACHINE\Software\Citrix\Rade• REG_DWORD: “EnableDebugConsole”• Value: 1 to switch on, 0 to switch off

Page 34: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Application Streaming

Leverage realtime CDF tracing!

• Run CDFControl on the client (where client is installed)

• Choose the Application Streaming category

• Enable realtime tracing

• Provide a TMF path (CTX106233)

• Start tracing and reproduce the launch failure

Page 35: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Seamless/Multi-Monitor

Winlogon Default

ICA Client

winlogon.exe

TWIWorker

TWISysTrayAgent

TWIReader

icast.exe

wfshell.exeseamls20.dll

icactls.dllsehook20.dll

sehook20.dll

SEAMLESS HOST COMPONENTS

Page 36: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Seamless/Multi-Monitor

wfica32.exe

vdtwin30.dll vdtwn.dllctxsrcc.lib

GAI

LVB

SEAMLESS CLIENT COMPONENTS

Page 37: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Seamless/Multi-Monitor

Multi-Monitor

• An optional component

• Client provides a monitor layout via thinwire channel which is shared by all process loading mmhook.dll via shared memory

• Work area change is always posted to host. This could be due to change in work area of the existing area or change in virtual screen size due to addition /deletion of monitors.

• API hooks are controlled by flags and can be customized per process. Refer to CTX115637 for various configuration options

Page 38: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Seamless/Multi-Monitor

• Shift F2 to change to Full Screen mode

• Reconnect as fixed size window session

• Set global flags, 0x26DEA7, to see if it fixes the issue.

• This is combination of following flags (See CTX101644 for details of each bit)

• 0x1 (Disable session sharing), 0x2 (Disable modality check), 0x4 (Disable AA hook)

• Analyze CDF trace for MF_DLL_CTXNOTIF and MF_SESSION_TWI

• Analyze window information using SPY++/Window History/Message History

• Try per-window exception flags

• Analyze application logic (API flow) using TracePlus utility

Page 39: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Seamless/Multi-Monitor

• Get the Window class name which is exhibiting the problem

• Collect the CDF traces for concerned module ONLY

• CTXNOTIF, MMHOOK, TWCDS, TWI, TWI_HOOK

• Analyze the behavioral aspect that could be affected by hooks???

• Enable disable/ Does it happen on single monitor too? If yes, chances are very little. Disable mmhook and see what happens?

• Compare the window styles at host and client

• For seamless specific issue, verify if it happens in ICA Desktop/RDP also.

Page 40: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

3rd Party Applications

• How does the application work?• Is it Native, or does it run on a Framework, such as .NET or Java?

• Do you have the right versions of the Framework installed?

• Are the correct dependencies present, and does it work at the console?

• Does it require certain file and registry access? (Does it need Write permissions etc. ?)

• Does it require component registration?

• Inspect core functionality• View the application/process under an analysis tool such as ProcessExplorer or WinDbg

• Inspect all loaded modules (DLLs) by the application

• Validate any dependencies (missing DLL's?)

• Inspect named events and handle usage (synchronization/resource problems?)

• Validate file and registry access using ProcessMonitor

• Run application under the AppVerifier utility to check for a multitude of issues

Page 41: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

3rd Party Applications

• Leverage the Global Flags for user-mode applications using the Gflags utility

• Set 3rd party application to run under Image File Executions

• Configure a debugger to invoke the application (such as WinDbg)

• When the application launches, the debugger will automatically attach to the process and halt its execution!

• This gives the opportunity to explore all application threads from process initialization (~*kb)

• From here the internals of the application can be understood at the Native Windows API level (i.e. Which Windows API's are being used)

Page 42: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

3rd Party Applications

• Use ProcessExplorer to view the loaded modules for a process, and check for the presence of any hook modules (hooking DLL's)

• Hook modules can alter the natural behavior of applications, which can sometimes cause problems

• Try excluding the problem application from all Citrix hooks (CTX107825)

Page 43: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

CPU Spikes

• Try to define a pattern (leverage perfmon)

• Determine offending Thread ID causing the spike

(Process Explorer, QSlice)

• Obtain userdump of offending process immediately after (Userdump.exe, WinDbg.exe)

• Check CDF trace for repeated (looping) messages (if Citrix component)

• Use application spy to look at what the application is doing (TracePlus, Logger)

Page 44: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Deadlocks

• Windows Vista and Server 2008 offer the new Wait Chain Traversal (WCT) API!

• This offers applications a mechanism to check internally for wait conditions, and also allows for custom tools to be created which can also check for application hangs – LIVE!

• No cool WCT tools available? The debugger is your friend!

• Attach to hung process/service and generate a dump for post-mortem analysis:• .dump /ma c:\PathToDump\DeadlockedApp.dmp

• Manually inspect thread states, and get the debugger's opinion with:• !analyze -hang -v

THE WINDOWS TASK MANAGER CAN CAPTURE USER DUMPS IN

VISTA & 2008!!!

Page 45: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Slow logons

• Understand the logon process and Identify the slowdown!

• Validate via network trace that the connection between server to client is good

• If the connection makes it to the server, check which processes exist

• Use TaskManager and sort by session ID

• Gather userdumps for each process for the slow session to try to identify any synchronization problems, such as LPC and ALPC wait chain conditions

• Ensure Terminal Services is running (svchost.exe) and that the thread count appears normal

• Ensure critical Citrix processes are okay, such as IMA, CpSvc and XML

Page 46: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

The XenApp client

• PNAgent.exe starts up and communicates with PNAMain.exe to share application launch, and shortcut details

• PNAMain.exe initiates communication with the Web Server for application requests and config.xml settings

• WFCRun32.exe works with WFICA32.exe to launch an application

• Best to use a live-debug approach as there is no inherent tracing readily available on the client

Page 47: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

The XenApp client

For single sign-on problems ensure:• PNSSON is at the top of the network provider list

• SSONSVR is running

• Nothing is causing any logon delays (such as 3rd party monitoring applications etc.) as this would cause the SSON ticket to expire, therefore causing SSONSVR to exit

• Enable a default debugger to look out for any unexpected termination of the client processes

Page 48: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugging

• User Mode versus Kernel Mode

• The Windows operating system can be conceptually divided into 2 parts:• User Space (User Mode)

• Kernel Space (Kernel Mode)

• Applications run in User Mode

• System drivers run in Kernel Mode (Privileged Mode)

Page 49: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

USER MODE

USER SPACE

KERNEL SPACE

USER APPLICATION

USER APPLICATION USER

APPLICATIONUSER

APPLICATIONUSER

APPLICATIONUSER

APPLICATION USER APPLICATION

USER APPLICATION

USER APPLICATION

keyboard.syswin32k.systcpip.sys

rusb2w2k.sys

[…]

Page 50: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen
Page 51: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugging

• Windows Vista and Server 2008 does not rely on the boot.ini for debug settings anymore

• Say hello to the BCDEDIT utility!• (http://technet.microsoft.com/en-us/library/cc721886.aspx)

• To do a live local debug, you need to first enable debugging on the server• Bdcedit /debug on (requires reboot)

Page 52: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugging

In the event of a system crash (BSOD), ensure that:

1. The Pagefile (pagefile.sys) is configured to run on the system drive (where Windows is installed)

2. The Pagefile is larger that the amount of physical RAM on the server

3. Startup and recovery options are set for a kernel or complete memory dump

4. Enough space exists to write the dump file

Page 53: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugging

• To debug application crashes, configure a default application debugger to handle fatal application errors!

• Dr.Watson is gone in Vista and Server 2008

• Manually configure a default application debugger (CTX105888)

• Use the TestDefaultDebugger tool to ensure that server is able to capture userdumps (CTX111901)

Page 54: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugger Basics

• NTSD –pn ProcessName (attaches to running process)

• ~*kb Lists all running threads

• x *!*Symbol* Searches for a symbol matching the one specified

• bp Sets a breakpoint (typically used with symbol)

• kb Dumps callstack of current thread

• !analyze –vScans for exceptions

• !analyze –hang –v Scans for wait chains

Page 55: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Debugger Basics (The Call Stack)Thread

# PID TID Function Parameters

First Parameter Second Parameter

Module Name

Function Name

Offset

Switch to thread 4

First Parameter off stack

Page 56: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Case Studies

Page 57: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Introducing the Citrix Symbol Server

• #1 feedback during SMART post incident reviews• Traditional data collection/upload/analysis cycle takes too long

• Live debugging while problem is occuring• Significant delays introduced when waiting on large uploads to complete

• Resources are strained during CritSits – keep focus on issue resolution

• 64-bit adoption increasing• Full system dump files will get larger

• Significantly longer upload times

Page 58: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Citrix Symbol Server – The Payoff – A Case Study

• A critical Citrix service is crashing on startup• Users unable to connect

• Debugger attached to process at startup• Crash caused by heap corruption

• Full page heap enabled• New stack trace points to root cause

• Case archives reveal that problem is resolved with an existing hotfix

• Time to resolve• With symbol server: less than 1 hour

• Estimated time without symbol server: more than 1 business day

Page 59: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Using the Citrix Symbol Server

• Products supported• Citrix Presentation Server 3.0, 4.0 and 4.5 – all languages / hotfixes

• XenApp 5.0 – all languages / hotfixes

• Location• Add http://ctxsym.citrix.com/symbols to your symbol path

• Questions / Feedback• Article CTX118622 on Citrix Knowledge Base (http://support.citrix.com)

• Send additional feedback to [email protected]

Page 60: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Case Study – CDFControl Realtime Tracing Demo

Page 61: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen

Questions?

Page 62: Troubleshooting Tools and Methodology for Citrix XenApp 5 Environmen