48

Office 365 Network Performance Troubleshooting

Embed Size (px)

Citation preview

Page 1: Office 365 Network Performance Troubleshooting
Page 2: Office 365 Network Performance Troubleshooting

Microsoft Office 365 Network Performance Troubleshooting

Paul CollingeSenior Premier Field Engineer

OFC-B411

Page 3: Office 365 Network Performance Troubleshooting

DATA G AT H E R I N G

TR O U B L E SH O O T I N G

Agenda

Page 4: Office 365 Network Performance Troubleshooting

1. Get details about the specific end user problem from the customer• What Office 365 user operations does it impact specifically? What operations are not impacted. Are non-Office 365

services performing poorly?• Are all users or just some are impacted?• What time of day does it happen? Identify locations where users are impacted. Do the users have the same issue at

home?• Did performance recently get much worse? When?• Has any customer network configuration item changed recently or more users added?

2. Get details about the customer network topology• Where are their offices located? What ISPs do they use at each office and what bandwidth is subscribed to. Which offices

have Internet egress points.• Get internal network map and Internet proxy details.

3. Get a repro• Reproduce the issue where the customer can show you in person, on Lync screen sharing, or in a video recording of the

issue. Get an environment where the same issue can be reproduced by the performance troubleshooting engineer. A repro can also be captured using PSR.EXE which is included with Windows 7 and later.

4. Form and meet the v-team• Meet with the performance troubleshooting engineer, the customers network engineer, a customer representative who is

familiar with the problem, and if a Microsoft customer the customers TAM.

Troubleshooting performance processStage 1 Performance troubleshooting engineer not required

Page 5: Office 365 Network Performance Troubleshooting

5. Run basic tests on the repro environment

6. Form a hypothesis from the basic tests

7. Run more in depth tests on the repro to confirm the hypothesis

8. Recommend action to improve performance

Troubleshooting performance processStage 2 Performance troubleshooting engineer is required

Page 6: Office 365 Network Performance Troubleshooting

• Where is it?• What is there?

• Firewalls• NAT devices• Proxies• Web Traffic Scanners

• Direct Connection available?• Which ISP? • Bandwidth available• Shared with other services?

Egress Point Detail

Page 7: Office 365 Network Performance Troubleshooting

• Key issues to check for:

• Latency/Round Trip Time• TCP Window Scaling• GEO DNS issues• NAT, Proxy and Firewall port exhaustion• Packet Loss

Troubleshooting performance issues

• Routing and Peering• TCP Idle time settings• Proxy Authentication• DNS performance• SACK and TCP MSS

Page 8: Office 365 Network Performance Troubleshooting

• Network Capture tool • Netmon• WireShark• TCPDump• Microsoft Message Analyzer

• PSPING• TraceRt• Application Level tools

• HTTPWatch• Fiddler• Outlook Connection Status

Key Tools

Page 9: Office 365 Network Performance Troubleshooting

Measuring Latency

Page 10: Office 365 Network Performance Troubleshooting

• Use PSPing• Creates a TCP session to a port and IP address which works round port blocking• If using a proxy or similar device we need to measure

to it’s address• This gives us RTT to the managed egress point• Indicates whether there is a problem inside the managed network• Next take a network trace or PSPing on the

proxy/egress• Indicates the remaining RTT from perimeter to Office 365• Also measure RTT all the way to Office 365

termination point if possible• Direct connection may not be available

Measuring Latency (RTT)

Page 11: Office 365 Network Performance Troubleshooting

• Running PSPING from the perimeter isn’t always possible.• Proxies and Firewalls at the egress point can take a network trace• From this trace we can accurately measure the RTT from the perimieter

device to Office 365. • Once added to the internal RTT we get a clear view of the internal and

external RTT.• Measure the time delta between application level packets and their

responses.• Connect and SSL Client/Server Hello are useful samples.22679 12:15:43 17/02/2014 12:15:43.6577060 0.0000000 10.32.1.224

10.42.59.34 HTTP HTTP:Request, CONNECT Contoso.sharepoint.com:443 {HTTP:1786, TCP:1785, IPv4:503}

22988 12:15:44 17/02/2014 12:15:44.6042040 0.3464980 10.42.59.34 10.32.1.224 HTTP HTTP:Response, HTTP/1.1, Status: Ok, URL: Contoso.sharepoint.com:443

23003 12:15:44 17/02/2014 12:15:44.6101790 0.0059750 10.32.1.224 10.42.59.34 TLS TLS:TLS Rec Layer-1 HandShake: Client Hello.

23090 12:15:44 17/02/2014 12:15:44.9934880 0.3833090 10.42.59.34 10.32.1.224 TLS TLS:TLS Rec Layer-1 HandShake: Server Hello.; TLS Rec Layer-2 HandShake: Certificate.

Measuring RTT from the Proxy to Office 365

Page 12: Office 365 Network Performance Troubleshooting

PSPing Demo

Page 13: Office 365 Network Performance Troubleshooting

Internal RTT (ms) External RTT (ms) Total RTT to O36554.88 346 400.88

Here we can see clearly, the poor RTT is outside the customer’s environment, on the ISP link to Office 365. If this RTT is unexpected, the customer can engage their ISP to investigate.

54.88ms

Proxy

Office 365 Datacentre

0.346ms

Client

Putting it all togetherClient Proxy

54.88ms 346msOffice 365

Page 14: Office 365 Network Performance Troubleshooting

So what’s a good RTT?• From client to egress point of corporate network look for <100ms, ideally

half that.• From EMEA site to EMEA Datacenter <100ms total should be the aim.

Ideally much less than that.• Australia <>EMEA can be done in around 300ms as a reference• 300-400ms is generally seen as the tipping point with SharePoint

between good and noticeably delayed performance.• Outlook in cached mode can cope well with worse RTT• http://www.verizonenterprise.com/about/network/latency/ gives a good

overview of internet backbone RTT to give you a guide to what’s a normal RTT

• Having both and internal and external RTT allows us to accurately identify if a network latency issue is inside the managed environment or outside.

• It’s useful for customers to do this as a baseline during normal operation so we can refer to it if issues occur.

Page 15: Office 365 Network Performance Troubleshooting

TCP Window Scaling

Page 16: Office 365 Network Performance Troubleshooting

TCP Window ScalingIf disabled, limits throughput and will reduce performance, especially when high RTT is apparent Windows has this enabled by default, as has Office 365Proxies/Firewalls/NAT may notTake a network trace on the client and also on the perimeter device talking to O365, the firewall or ProxyA trace from the client may be sufficient. If it’s disabled here, good chance it’s disabled the other side of the proxy/firewallWithout this enabled we’re limited to a 64k TCP receive bufferThe scale factor is a multiplier of this value (2^scale factor)*window size

Page 17: Office 365 Network Performance Troubleshooting

TCP Window Scaling

TCP data packetsTCP data packet

TCP ACKTCP ACK

TCP Window Scaling enabled?

Maximum TCP receive buffer (Bytes)

No 65535 (64k)Yes 1073725440 (1gb)

Page 18: Office 365 Network Performance Troubleshooting

Impact of TCP Window Scaling

Before & After TCP Window change - Open 14MB PDF

Round Trip Time (ms) Maximum Throughput (Mbit/sec) without scaling

Maximum Throughput (Mbit/sec) with scaling

300 1.71 447.36

200 2.56 655.32

100 5.12 1310.64

50 10.24 2684.16

25 20.48 5368.32

10 51.20 13420.80

5 102.40 26841.60

1 512.00 134208.00

Presuming a 1000 Mbps link here is the maximum throughput we can get with TCP window scaling disabled and then with it enabled

Page 19: Office 365 Network Performance Troubleshooting

Impact of enabling this setting

Australia Proxy (Incorrect Windows Scaling settings) Australia Proxy TCP Window Scaling enabledOffice 365 Office 365

Australia PC Australia PC

0.0

100.0

200.0

300.0

400.0

500.0

600.0507.0

21.0

Before & After correctly enabling TCP Window Scaling- Download a 14mb PDF

Australian PC downloading a large PDF before and after correctly setting the TCP Window Scaling on the Proxy

Seco

nds

Page 20: Office 365 Network Performance Troubleshooting

TCP Window Scaling Demo

Page 21: Office 365 Network Performance Troubleshooting

Geo DNS

Page 22: Office 365 Network Performance Troubleshooting

Test for users outside of home region North America datacenterClient data flow (outside NOAM)

EU datacenterPortal

EXOCAS

The user accesses the regional datacenter

Exchange Online accesses the datacenter where the tenant resides and proxies the requests

1

2

3Microsoft’s DNS servers return the IP addresses of the regional datacenter

4

5

Microsoft’s DNS

Clients DNS

Client’s DNS asks the Microsoft DNS servers

The client asks the local DNS servers

EXOMBX

Customer tenant in

US

Page 23: Office 365 Network Performance Troubleshooting

GEO DNS issuesContent Delivery Networks work on GEO DNSExchange Online uses GEO DNSYou get a different IP Address from the DNS depending on where in the world you request it Impacts a multi-country corporate network with multiple Internet connection pointsCommonly DNS is only requested at one point and cachedYou can get DNS from another part of the globe to where you have Internet connectivity

Microsoft network

Internetegress point

Customer network

Page 24: Office 365 Network Performance Troubleshooting

Geo-DNS Demo

Page 25: Office 365 Network Performance Troubleshooting

Routing

Page 26: Office 365 Network Performance Troubleshooting

TraceRt to an Office 365 service network endpointtracert -4 outlook.office365.com

Test for routing

• This trace is from the UK• Look for hosts called

NTWK.MSN.NET which is the Microsoft global network

• In the example we hit the MSFT network in 10ms

• Then traverse the USA via • ASH-Virginia• ATB-Georgia• LAX-Los Angeles• Then to APAC via • TYA - Tokyo• SIN-Singapore

Tracing route to OUTLOOK-APACNORTH.OFFICE365.COM [132.245.65.146]over a maximum of 30 hops:

1 1 ms 1 ms 1 ms SkyRouter.Home [192.168.0.1] 3 11 ms 11 ms 11 ms ip-89-200-132-100.ov.easynet.net [89.200.132.100] 4 10 ms 10 ms 10 ms igbtmdistc7503.msft.net [195.66.236.140] 5 84 ms 84 ms 84 ms xe-0-3-2-0.ash-96cbe-1a.ntwk.msn.net [207.46.45.227] 6 96 ms 95 ms 95 ms ae2-0.atb-96cbe-1a.ntwk.msn.net [207.46.33.228] 9 140 ms 142 ms 140 ms 191.234.83.150 10 142 ms 138 ms 139 ms ae11-0.lax-96cbe-1b.ntwk.msn.net [207.46.47.11] 11 256 ms 256 ms 256 ms ae2-0.tya-96cbe-1a.ntwk.msn.net [207.46.46.149] 12 265 ms 265 ms 265 ms ae0-0.tya-96cbe-1b.ntwk.msn.net [204.152.140.181] 13 288 ms 290 ms 292 ms xe-7-0-1-0.sin-96cbe-1a.ntwk.msn.net [207.46.38.252] 14 290 ms 288 ms 287 ms xe-5-3-1-0.sin-96cbe-1b.ntwk.msn.net [207.46.41.39] 15 279 ms 279 ms 279 ms ae1-0.sg2-96cbe-1a.ntwk.msn.net [191.234.80.90] 18 280 ms 280 ms 279 ms 132.245.65.146

Page 27: Office 365 Network Performance Troubleshooting

Egress Scalability

Page 28: Office 365 Network Performance Troubleshooting

NAT and Firewall Port exhaustionFor companies that have existing Internet connectivity but do not make extensive use of SaaS applicationsFirewalls have limited port mapping and SaaS use will require more than Internet browsingThe primary symptom of this are that some users will see Outlook in disconnected state

Page 29: Office 365 Network Performance Troubleshooting

Proxy Scalability• Around 6000 clients can be supported by a single

public IP safely• Err on the side of caution and estimate this to be

nearer 4000. • This issue stems from the available ephemeral

ports available to connect to Office 365. Outlook can open many connections per user, as many as 10+ for power users.

• Issue would manifest itself as random hangs, connection issues.

• Check how many IP addresses you are accessing Office 365 from and how many sockets the proxy is using.

Page 30: Office 365 Network Performance Troubleshooting

Troubleshooting Packet Loss

Page 31: Office 365 Network Performance Troubleshooting

Packet LossCould cause poor performanceEasy to spot with a network traceIn Netmon the filter Property.TCPRetransmit == 1 will highlight Retransmits.> 1% consistently, should be

investigated.

Page 32: Office 365 Network Performance Troubleshooting

Packet Loss Demo

Page 33: Office 365 Network Performance Troubleshooting

TCP Idle Time

Page 34: Office 365 Network Performance Troubleshooting

• Especially important for Outlook • New usage of customer perimeter devices• Designed for transient Network web access• Outlook leaves connections open for extended time and

can be quiet for long periods.• Many firewalls/proxies kill idle TCP connections after a brief

period• Can cause Outlook disconnects, hangs, authentication

prompts.• Easily resolved with keep alives and adjusting the

perimeter.

TCP Idle time settings

Page 35: Office 365 Network Performance Troubleshooting

Proxy Authentication

Page 36: Office 365 Network Performance Troubleshooting

• It is advisable Proxy Authentication is disabled • Can cause delays in connection setup• Look for “Proxy Authentication required” on first Connect request• Following request will be authentication. Look for delays like shown below

Proxy Authentication

Initial connect:

14:12:24.6483418 19.0046514 0.0003578 iexplore.exe 10.200.30.40 btssig-msl.bcp-01.Contoso.sig HTTP:Request, CONNECT Contosoemeamicrosoftonlinecom-3.sharepoint.emea.microsoftonline.com:443 , Using NTLM Authorization NTLM NEGOTIATE MESSAGE

Proxy Response:

14:12:24.6876389 19.0439485 0.0283000 iexplore.exe btssig-msl.bcp-01.Contoso.sig 10.200.30.40

HTTP:Response, HTTP/1.1, Status: Proxy authentication required, URL: Contosoemeamicrosoftonlinecom-3.sharepoint.emea.microsoftonline.com:443 NTLM CHALLENGE MESSAGE

We then send the request again, this time with NTLM authentication for the proxy: Contoso Office 365 Network Review

Second request with NTLM Auth:

14:12:24.6883198 19.0446294 0.0004838 iexplore.exe 10.200.30.40 btssig-msl.bcp-01.Contoso.sig HTTP HTTP:Request, CONNECT Contosoemeamicrosoftonlinecom-3.sharepoint.emea.microsoftonline.com:443 , Using NTLM Authorization NTLM AUTHENTICATE MESSAGE Version:NTLM v2, Domain: headoffdom, User: paul.collinge, Workstation: W7TEST20

200 OK response from proxy but this takes 3 seconds.

14:12:27.7859643 22.1422739 3.0878394 iexplore.exe btssig-msl.bcp-01.Contoso.sig 10.200.30.40 HTTP HTTP:Response, HTTP/1.1,

Page 37: Office 365 Network Performance Troubleshooting

• From an Office 365 perspective, we can easily remove this problem by following the recommended setup and making an exception in the proxy for the Office 365 urls as per http://support.microsoft.com/kb/2637629

• Firewall or proxy servers require additional authentication • To resolve this issue, configure an exception for Microsoft Office 365 URLs and

applications from the authentication proxy. For example, if you are running Microsoft Internet Security and Acceleration Server (ISA) 2006, create an "allow" rule that meets the following criteria:

• Allow outbound connections to the following destination: *.microsoftonline.com • Allow outbound connections to the following destination: *.microsoftonline-p.com • Allow outbound connections to the following destination: *.sharepoint.com • Allow outbound connections to the following destination: *.outlook.com • Allow outbound connections to the following destination: *.lync.com • Allow outbound connections to the following destination: osub.microsoft.com • Ports 80/443 • Protocols TCP and HTTPS • Rule must apply to all users.

Proxy Authentication

Page 38: Office 365 Network Performance Troubleshooting

DNS Performance

Page 39: Office 365 Network Performance Troubleshooting

• DNS performance can affect Office 365 traffic• Easy to check• Start netmon• IPCONFIG /FLUSHDNS• Below is an example of a slow response• Normal response times should be in milliseconds.• Filter by DNS conversation and look at the time delta column.

13:52:52 16/04/2013 31.2765664 0.0000000 10.200.30.40 10.214.2.129

DNS:QueryId = 0xE41, QUERY (Standard query), Query for Contosoemeamicrosoftonlinecom-3.sharepoint.emea.microsoftonline.com of type A on class Internet

13:52:56 16/04/2013 35.0579179 3.7813515 10.214.2.129 10.200.30.40

DNS:QueryId = 0xE41, QUERY (Standard query), Response - Success, 157.55.232.50, 2.22.230.131 ...

DNS Performance

Page 40: Office 365 Network Performance Troubleshooting

Page Load Performance Monitoring

Page 41: Office 365 Network Performance Troubleshooting

Multiple Tools available: • Fiddler• HTTPWatch• IE tools• Microsoft Message Analyzer

• As SharePoint in O365 is delivered over SSL it’s difficult with a packet analyser to look at the page load performance.

• HTTPWatch or Fiddler allows us to measure the elements of a page and how long they take to load.

• Helps identify which elements are taking time. Gives us information such as TCP port of the slow loading element which we can then investigate with Netmon

• DEMO

Page Load troubleshooting

Page 42: Office 365 Network Performance Troubleshooting

• Multiple things in the 3 way handshake to the proxy/Office 365 should be checked to ensure they are optimal

• Check the TCP options in both the syn and the syn ack

• TCPOptions: • - MaxSegmentSize: 1• type: Maximum Segment Size. 2(0x2)• OptionLength: 4 (0x4)• MaxSegmentSize: 1460 (0x5B4) //Max is 1460 bytes on Ethernet, shouldn’t be much lower than this. If it is, it will

cause poor performance• + NoOption: • + WindowsScaleFactor: ShiftCount: 8 • + NoOption: • + NoOption: • + SACKPermitted: //This should be enabled as It allows us to better handle dropped packets

Other TCP settings to check

Page 43: Office 365 Network Performance Troubleshooting

Breakout SessionsOFC-B343 SharePoint Online Performance: Designing Your Pages to be FastFriday 8:30am in Hall 8.0, Room D3

OFC-B335 Office 365 Network Topology & Performance Planning

Technet Blog - http://blogs.technet.com/b/onthewire/

Office 365 Network Analysis tool

TechNet Landing Page - http://msdn.microsoft.com/en-us/library/dn850362.aspx

Related content

Online Resources

Page 44: Office 365 Network Performance Troubleshooting

ResourcesLearning

Microsoft Certification & Training Resourceswww.microsoft.com/learning

Developer Network

http://developer.microsoft.com

TechNetResources for IT Professionals

http://microsoft.com/technet

Sessions on Demandhttp://channel9.msdn.com/Events/TechEd

Page 45: Office 365 Network Performance Troubleshooting

Technical NetworkJoin the conversation!

Share tips and best practices

with other Office 365 expertshttp://aka.ms/o365technetwork

Page 46: Office 365 Network Performance Troubleshooting

Managing Office 365 Identities and Services

5

Office 365Deploying Office 365 Services

Classroomtraining

Exams

+

Introduction to Office 365

Managing Office 365 Identities and Requirements

FLC

40041

Onlinetraining

Managing Office 365 Identities and ServicesOffice 365 Fundamentals

http://bit.ly/O365-Cert

http://bit.ly/O365-MVA

http://bit.ly/O365-Training

Get certified for 1/2 the price at TechEd Europe 2014!http://bit.ly/TechEd-CertDeal

MOC

20346 Designing for Office

365 Infrastructure

MOC

10968

3

EXAM

346EXAM

347

MVA MVA

Page 47: Office 365 Network Performance Troubleshooting

TechEd Mobile app for session evaluations is currently offline

SUBMIT YOUR TECHED EVALUATIONSFill out an evaluation viaCommNet Station/PC: Schedule BuilderLogIn: europe.msteched.com/catalog

We value your feedback!

Page 48: Office 365 Network Performance Troubleshooting

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.