Spca2014 advanced share point troubleshooting hessing

Preview:

Citation preview

Advanced SharePoint Troubleshooting

Donald Hessing, @dhessing, #SPCon14

Who am I?

Donald Hessing Principal Consultant | Thought Leader SharePoint @Capgemini

Netherlands Microsoft Certified Master (MCM) - SharePoint (Virtual) Technology Solution Professional for Microsoft Work full time on SharePoint since 2007 | #DEV | #ITPRO | #STRATEGY donald.hessing@capgemini.com | @dhessing | ##SPCon14

Let’s investigate….

Who

What

What

When when

Where

Why Why

…defines the scope

Lab\User3

30 seconds render time

Mon-Fri between 0900-1100 CST, but not on weekends

WFE1

Typically achieves results between 5-7 seconds

When user is searching for “Widget” and hits WFE1

It’s not all about SharePoint!

Load Balancer

Understand response time

Response Time

1414 KB

1024 KB (10Mb)

20

ms

(44

/2)300 ms 300 ms 2.6 sec.

1414KB

100 KB (1Mb)

150

ms

(44

/2)300 ms 300 ms

15.42

sec

IE8, FireFox 3 – 6

connections max.

TCP Slow Start

ULS Viewer

Fiddler

Fiddler – Troubleshooting WCF services

http://www.andrewconnell.com/blog/SP2013-Workflow-Advanced-Workflow-Debugging-with-Fiddler

http://127.0.0.1:8888

Fiddler – Troubleshooting Search

LOGPARSER 2.2

LOGPARSER

IIS LOGS

CS-URI Client Side URI

Addition Fields

Example – Find slow pages

Join data with AD with PowerQuery

Customer Case

Questions?

LOGPARSER "SELECT QUANTIZE(TO_TIMESTAMP(date, TO_LOCALTIME(time)), 3600) AS Hour, DIV(SUM(sc-bytes),1048576) AS TotalBytesSent, DIV(SUM(cs-bytes),1048576) AS TotalBytesReceived FROM *.log GROUP BY Hour ORDER BY Hour" -i:w3c

Determine network capacity..

Number of users 5.000 10.000 20.000 50.000

Bandwidth (Mbit/s) 25 50 100 250

Response times

SC-STATUS AVGTIME MAXTIME HITS

200 215 33882 30581

302 56 3104 2949

400 58 78 13

401 16 3166 14811

403 31 31 1

404 81 2745 1472

500 63 109 12

Response Times Web Frontend Servers

Response times per DOMAIN

Response times for DOMAINB per WFE

Why are my responses for DOMAINB users SLOW on WFE2?

Win32Status

System outage analysed - Win32Status

Win32-Status Description Occurred

2 The system cannot find the file specified. 4788

64 The specified network name is no longer available. 1663

121 The semaphore timeout period has expired. 964

995The I/O operation has been aborted because of either a thread exit or an application

request.85

1236 The network connection was aborted by the local system. 3934

1330 The password for this account has expired. 6

2148074252 The logon attempt failed 271

2148074254 No credentials are available in the security package 117831

2148074257 No authority could be contacted for authentication. 9379

LogParser.exe -i:w3c

"SELECT sc-win32-status,

WIN32_ERROR_DESCRIPTION(sc- win32-status) as Description,

Count(*) AS Occurred FROM *.log WHERE sc-win32-status>0

GROUP BY sc-win32-status ORDER BY Total DESC

Performance Counters

SharePoint Servers Threshold

% CPU SharePoint Servers <75%

# Available Mbytes < 20%

% Network bandwidth – bytes total < 41 - 65%

Disk (Instance)% Idle Time < 20%

SQL Server Threshold

% CPU SQL Server <75%

Memory: SQLServer:Buffer Manager--Buffer

Cache hit ratio

< 90%

Memory: SQLServer:Buffer Manager--Page Life

Expectancy(PLE):

(Total Mem/4) * 300

16GB > (16/4) * 300

Network bandwidth – bytes total < 40%

Avg. Disk Write Queue Length < 2 Per Spindel

Avg. Disk Read Queue length < 2 Per Spindel

Avg. Disk Sec/Read SQL Server Disks < 12 ms – 0.012

Avg. Disk Sec/Write SQL Server Disks < 12 ms – 0.012

Virtualization - Hypervisor

Host - Counter Healthy Caution Critical

Hyper-V Hypervisor Logical Processor

(_Total)\% Total Runtime

< 60% 60-89% 90-100%

Memory\Available Mbytes 50% 25% 10%

Memory\Pages/sec (swap rate) <500 500-1000 >1000

% Network bandwidth – bytes total < 40% 41 – 64% 65% - 100%

Network Interface(*)\Output Queue Length 0 1-2 >2

Logical versus Virtual Processor

Inaccurate perfmon results - VMWare

Validate disk configuration prior to deployment

Numbers to Remember - Spindles

Determine maximum throughput

Kind Threads Seconds Drive Stripe Outstanding Size IOs MBs Latency_min Latency_avg Latency_max

R 2 60 G random 1 8 105.05 0.82 4 18 666

R 2 60 G random 2 8 106.96 0.83 6 36 684

SQL IO Characteristics

Operation Random / Sequential Read/Write Size Range

OLTP – LOG Sequential Write 512 bytes – 64KB

OLTP- Data Random Read / Write 8KB

Bulk Insert Sequential Write 8KB – 128KB (in

multiples of 8KB)

Read Ahead Sequential Read 8KB – 128KB (in

multiples of 8KB)

Backup Sequential Read/Write 1MB

Restore Sequential Read/Write 64KB

Validate configurations prior to deployment

Customer Example – IOMonitor.exe

Design TEST - 2 KB Random - Read/Write (67%/33%)

Drive

Expected

IO

Total

IO

Average

Read Time

(ms)

Average Write

Response Time

(ms)

Maximum

Read Time

(ms)

Maximum

Write

Time(ms)

C:\ 60 314 3 1 197 6

L:\ 300 419 2 3 31 11

S:\ 600 386 3 3 115 28

T:\ 1200 51 35 52 89 2501

U:\ 1000 43 5 59 41 1817

Monitoring Pending Disk IO

http://henkvandervalk.com/sql-under-the-hood-part-1-monitoring-current-pending-disk-ios

The common…

…Less well known

Thank you!