60
SharePoint Disaster Avoidance Architecture for Large Scale Enterprises Cornelius J. van Dyk Crayveon Corporation [email protected] @cjvandyk Jason Himmelstein Sentri [email protected] @sharepointlhorn

Share point disaster avoidance architecture for large scale enterprises

  • Upload
    sentri

  • View
    405

  • Download
    1

Embed Size (px)

DESCRIPTION

SharePoint best practices dictate that a proper disaster recovery plan should be in place before the launch of your SharePoint farm. Standard methodologies related to disaster planning in SharePoint deal with the traditional type of scenarios where your datacenter is a smoldering hole in the ground. Processes such as SQL Server database backups or STSADM backups for site collections are often employed to cater to such scenarios. When something seemingly benign like a Secure Store Service Application corruption strikes, architects and administrators often come to the sad conclusion that a complete farm rebuild is their only recourse. Additionally the risks associated with the application of regular bi-monthly SharePoint Cumulative Updates and periodic service packs, all of which have no uninstall or undo features, also serve to increase the probability of experiencing an complete emergency farm rebuild at some point in an architect/administrator’s career. Long after a rebuild is completed and business has been restored to "almost" normal status, you’ll still be troubleshooting server configurations and tweaking the environment to get back to your pre-disaster level. This workshop takes you through a dramatically new way of architecting your disaster plan. By applying the principles of this new methodology, you’ll dramatically cut down your disaster response time to the point of almost avoiding them entirely.

Citation preview

Page 1: Share point disaster avoidance architecture for large scale enterprises

SharePoint Disaster Avoidance Architecture for

Large Scale EnterprisesCornelius J. van Dyk

Crayveon Corporation

[email protected]

@cjvandyk

Jason Himmelstein

Sentri

[email protected]

@sharepointlhorn

Page 2: Share point disaster avoidance architecture for large scale enterprises

About Cornelius• Chief Architect, Crayveon Corporation

• 7 time MVP, MCITP, MCTS

• Blog: www.cjvandyk.com/blog

• Twitter: @cjvandyk

• LinkedIn: http://www.linkedin.com/in/cjvandyk

Page 3: Share point disaster avoidance architecture for large scale enterprises

About Jason

• SharePoint Practice Director, Sentri Inc.• MCITP, MCTS SharePoint 2010• Microsoft vTSP

● virtual Technology Solutions Professional

• SharePoint Foundation Logger (http://spflogger.codeplex.com)

• Web: www.sentri.com • Blog: www.sharepointlonghorn.com • Twitter: @sharepointlhorn • LinkedIn: www.linkedin.com/in/jasonhimmelstein

Page 4: Share point disaster avoidance architecture for large scale enterprises

Why do we do this?Jason’s Family Cornelius’ Family

Page 5: Share point disaster avoidance architecture for large scale enterprises

GET TO KNOW YOU

• Name

• Company

• What you do with SharePoint

• Something interesting about yourself

Page 6: Share point disaster avoidance architecture for large scale enterprises

DISASTER

• Outage vs Disaster

• When is a disaster actually a disaster?

• Traditional disaster planning

Page 7: Share point disaster avoidance architecture for large scale enterprises

DISCUSSION GROUP BREAKOUT

• What is disaster planning to you?

• In the context of SharePoint

• Critical points

Page 8: Share point disaster avoidance architecture for large scale enterprises

BUSINESS CONTINUITY PLANNING

• Business continuity planning identifies an organization's exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, whilst maintaining competitive advantage and system integrity.

• Components● Planning● Testing● Validation

Page 9: Share point disaster avoidance architecture for large scale enterprises

STRATEGIES

• Recovery Point Objective (RPO)

• Recovery Time Objective (RTO)

• Tolerance for down time

Page 10: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Executive Management Commitment

● This costs money

● Must invest to protect

● Think of Insurance

Page 11: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Planning Committee

● All business units represented

● One person to lead – think Chief Justice

● Responsibility

● Authority

Page 12: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Risk Assessment

● Business Impact Analysis

• Natural Disasters

• Technical Disasters

• Human threats

• Terrorism

Page 13: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine SLA

● SLA for corporate users

● SLA for internal customers

● SLA for partner companies

● SLA for public

Page 14: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Establish Priorities for Recovery

● Critical Operations

● Key Personnel

● Vital Systems

● Documentation/Records/Policies & Procedures

Page 15: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Facilities

• Destroyed

• Impaired

● Hardware

• Servers – replacement availability

• Network – service providers

Page 16: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Software

• Install ISOs

• Updates

● Communications

• Inter-company

• Partners & Public

Page 17: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Data

• Backups

• Availability

● Company Services

● Customer Services

Page 18: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Distributed architecture

• Hot Site

• Warm Site

• Cold Site

Page 19: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Vendor Agreements

• Circumstances constituting an emergency

• Contract Duration

• Termination Conditions

• Cost

• Testing

Page 20: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Vendor Agreements (cont.)

• Security procedures

• System change notifications

• Hours of operation

• Hardware requirements

• Personnel requirements

Page 21: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Determine Recovery Strategies

● Vendor Agreements (cont.)

• Compatibility guarantee

• Availability guarantee

• Priorities with other customers

Page 22: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Perform Data Collection

● Critical phone numbers

● Hardware inventory

• Vendor contact and equipment information

● Software inventory

● Notification checklist

Page 23: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Organize & Document a Written Plan

● Plan should follow a checklist

● Think rebuild from scratch

• Notifications

• Hardware

• Software

• Restore backups

Page 24: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Organize & Document a Written Plan (cont.)

● Think rebuild from scratch (cont.)

• Re-establish systems

• Test & Validate

• Communicate

• After Action Review

Page 25: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Develop Testing Criteria & Procedures

• Test the plan

• Test the plan again

• Approve the plan

Page 26: Share point disaster avoidance architecture for large scale enterprises

DISASTER PLANNING STEPS

• Ongoing plan validation

● Annual testing

● Scenario testing

● Testing when something changes

Page 27: Share point disaster avoidance architecture for large scale enterprises

TRADITIONAL DISASTER PLANNING

• Backups

• Log Shipping

• SQL Replication

• Hot Site

Page 28: Share point disaster avoidance architecture for large scale enterprises

SHAREPOINT ARCHITECTURE

• Farm configuration

• 2 WFE, 2 APP, SQL Cluster

• The role of virtualization

Page 29: Share point disaster avoidance architecture for large scale enterprises

RECOVERY vs AVOIDANCE

• What is Disaster Avoidance?

• A new way of looking at DR

• Why another DR strategy?

• What makes SPDAALSE different?

Page 30: Share point disaster avoidance architecture for large scale enterprises

CAUSES OF DISASTERS

• Natural disasters such as floods, hurricanes, earthquakes, tornados, storms etc.

• Human induced such as accidents, acts of terrorism etc.

• Hardware failures such as drive crashes, memory or board failures etc.

Page 31: Share point disaster avoidance architecture for large scale enterprises

CAUSES OF DISASTERS (cont)

• Malware such as worms, viruses etc.

• The one everyone forgets about…• Software incompatibility when upgrading:

● Operating systems

● Software service pack

● Software patches

Page 32: Share point disaster avoidance architecture for large scale enterprises

SHAREPOINT CUMULATIVE UPDATES

• Bi-monthly

• Recommended by support

• History of hot fixes and re-releases

• Famously broke User Profile Services

Page 33: Share point disaster avoidance architecture for large scale enterprises

CUs A NECCESARY EVIL

• Why apply them at all?

• What’s their risk?

• Can’t we just uninstall them?

• Compared to Exchange…

Page 34: Share point disaster avoidance architecture for large scale enterprises

HOW DOES SPDAALSE HELP?

• Farm Architecture

• SharePoint databases

• Difference between data and configuration

• What makes Large Scale Enterprises different?

Page 35: Share point disaster avoidance architecture for large scale enterprises

TRADITIONAL ARCHITECTURE

• DEMO

Page 36: Share point disaster avoidance architecture for large scale enterprises

SPDAALSE ARCHITECTURE

• DEMO

Page 37: Share point disaster avoidance architecture for large scale enterprises

THINKING DIFFERENT

• Separation of data and configuration

• Performance considerations

• Adding virtualization

Page 38: Share point disaster avoidance architecture for large scale enterprises

IN ACTION

• Building the farm based on SPDAALSE

• Preparing the farm for testing

• Snapping the farm

• Backups

Page 39: Share point disaster avoidance architecture for large scale enterprises

IN ACTION (cont)

• Patching the farm

• Testing the patch

• Rolling back

• Validating rollback

Page 40: Share point disaster avoidance architecture for large scale enterprises

IN ACTION (cont)

• Demo

Page 41: Share point disaster avoidance architecture for large scale enterprises

Agenda

• Infrastructure Design● Analyze Customer Requirements● Hardware requirements● Server configuration● Network recommendations● Virtual vs. Physical

• SQL Server Performance● Pre-grow vs. Auto-growth ● I\O requirements● Sizing recommendations● Database Isolation

• SharePoint Server Performance● Tier isolation vs. Location Proximity Requirements● Load balancing your App Tier● Load testing in your environment● Governance & Troubleshooting

Page 42: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Analyze Customer Requirements● High Availability● Disaster Recovery● Budget Constraints● Location Awareness● Number of Concurrent Users

Page 43: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Hardware requirements● Web servers & Application servers

● SQL servers

• What constitutes a small/medium/large farm?

Developer or Evaluation environmentsCPU: 4 cores, 64-bit required

RAM: 4GB Hard Drive space: 80GB

Production in Single Server or farm environments

CPU: 4 cores, 64-bit requiredRAM: 8GB

Hard Drive space: 80GB

Small FarmCPU: 4 cores, 64-bit required

RAM: 8GB Hard Drive space: 80GB

Medium Farm CPU: 8 cores, 64-bit required

RAM: 16GB Hard Drive space: 80GB

Large FarmUp to 2TB Content DBS

RAM: 32 GB From 2TB to 5TB Content DBS

RAM: 64 GB

Page 44: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Server configuration – Small Farm

Page 45: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Server configuration – Scaled Farm

Page 46: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

Page 47: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Network recommendations● Traffic Isolation

• Web• Database• Search• Service Applications• Authentication

● Number of NICs per server● Limit the number of hops● Colocation of servers

Page 48: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Physical● Benefits

• No virtualization overhead• Ability to target DBs to separate physical spindles • Only OS limits on Hardware• Simple Networking

● Drawbacks• Backup & recovery time• Limited snapshot ability• Costly & lacking Centralized Management• Failover limitations

Page 49: Share point disaster avoidance architecture for large scale enterprises

Infrastructure Design

• Virtualization● Benefits

• Snapshot capability• Rapid system deployment• HA\DR ability • Centralized Management

● Drawbacks• Loss of minimum 8% compute for overhead• Limitations on addressing full hardware• Disks are stored as single/multi-file • Centralized Networking

Page 50: Share point disaster avoidance architecture for large scale enterprises

SQL Server Performance

• Pre-grow databases● Requires more space initially● Dramatic increase in performance● Databases like contiguous space

• Auto-growth ● Immediately change from 1m increments● Do not use “Grow by %” setting● 50-100m maximum growth per required● Schedule maintenance task to check size & grow in off

peak hours as required

Page 51: Share point disaster avoidance architecture for large scale enterprises

SQL Server Performance• I\O requirements

  DB Files RAID Level Optimization

1 TempDB data 10 Write

2 TempDB logs 10 Write

3 ContentDB data 10 Read\Write

4 ContentDB logs 10 Write

5 Crawl DB logs 10 Write

6 Crawl DB data 10 Read\Write

7 Property DB logs 10 Write

8 Property DB data 10 Write

9 Services DB logs 10 Write

10 Services DB data 5/10 Read\Write

11 Archive Content DB 5 Read

12 Publishing Site Content DB 5 Read

Page 52: Share point disaster avoidance architecture for large scale enterprises

SQL Server Performance

• Sizing recommendations● Recommended limit for ContentDBs: 200G

• Maximum supported: 4TB– Includes Remote BLOBs

● Backup/Restore timing● Simple vs. Full recovery mode

Page 53: Share point disaster avoidance architecture for large scale enterprises

SQL Server Performance

• Database Instance Isolation● Secure Store Database● SharePoint core databases● Content Databases● Search● Highly Transactional non-SharePoint DBs

• Drawback● Lose the central management in a single SQL Server

Management Studio window

Page 54: Share point disaster avoidance architecture for large scale enterprises

SharePoint Server Performance

• Tier isolation vs. Location Proximity Requirements● Separation via vLAN

• Less chatter• Increased hop count

● Collocating SharePoint in a single vLAN• Increased chatter• Lower hop count

• Key take away● Know your network, determine your topology based

upon traffic & requirements

Page 55: Share point disaster avoidance architecture for large scale enterprises

SharePoint Server Performance

• Load balancing your App Tier● Know your load● Scale based upon need, not perception

• Find your choke point, then release the grasp

● Don’t assume, validate!

Page 56: Share point disaster avoidance architecture for large scale enterprises

SharePoint Server Performance

• Load testing in your environment● Example

• 2 Web Servers (4cores, 16GB RAM) using NLB• 1 App Server (4cores, 16 GB RAM)• 1 SQL Server Instance (16cores, 128GB RAM)

• Simple CRUD operations– Login, create list item, open item, modify item, save item,

delete item, log out

Page 57: Share point disaster avoidance architecture for large scale enterprises

SharePoint Server Performance

• Load testing in your environment● Results

• Farm was completely non-responsive at ~500 concurrent users

● Root cause• Watching this test on the server side we found that we

were immediately CPU bound. ● Conclusion

• Add CPUs or Web Servers to the farm to handle additional load

Page 58: Share point disaster avoidance architecture for large scale enterprises

References• Jason’s Blog Sentri, Inc SharePoint Foundation Logger

http://www.sharepointlonghorn.com http://www.sentri.com http://spflogger.codeplex.com

• My Article on SharePoint Pro http://www.sharepointpromag.com/content1/topic/sharepoint-performance-troubleshooting-141506/catpath/sharepoint-server-2010

• Cornelius J. van Dyk’s Blog http://www.cjvandyk.com/blog

• Eric Shupps’s Bloghttp://www.sharepointcowboy.com

• SharePoint Server 2010 Hardware and software requirements http://technet.microsoft.com/en-us/library/cc262485.aspx

• SharePoint Server 2010 Capacity Management: Software Boundaries and Limitshttp://technet.microsoft.com/en-us/library/cc262787.aspx

• Capacity Management and Sizing Overview for SharePoint Server 2010http://technet.microsoft.com/en-us/library/ff758647.aspx

• Capacity Planning for SharePoint Server 2010http://technet.microsoft.com/en-us/library/ff758645.aspx

• Performance Testing for SharePoint Server 2010http://technet.microsoft.com/en-us/library/ff758659.aspx

• Storage and SQL Server Capacity Planning and Configurationhttp://technet.microsoft.com/en-us/library/cc298801.aspx

• Performance and Capacity Technical Case Studieshttp://technet.microsoft.com/en-us/library/cc261716.aspx

• Monitoring and Maintaining SharePoint Server 2010http://technet.microsoft.com/en-us/library/ff758658.aspx

• Performance Testing for SharePoint Server 2010http://technet.microsoft.com/en-us/library/ff758659.aspx

• The Load Testing Kit for Visual Studio Team System http://technet.microsoft.com/en-us/library/ff823731.aspx

• Web Capacity Analysis Tool (WCAT) http://www.iis.net/community/default.aspx?tabid=34&g=6&i=1466

Page 59: Share point disaster avoidance architecture for large scale enterprises

REFERENCES

• @cjvandyk @sharepointlhorn

• www.cjvandyk.com/blog www.sharepointlonghorn.com

• Deck download http://aurl.to/SPDAALSE• Painless deck http://aurl.to/Painless• Logging deck http://aurl.to/logging• PowerPivot deck http://aurl.to/HMPP• Versions List http://aurl.to/v• Corne’s Utils http://quix.codeplex.com

Page 60: Share point disaster avoidance architecture for large scale enterprises

Your Feedback is Important

Please fill out a session evaluation form drop it off at the conference registration

desk.

Thank you!