Sandy Arthur Program Manager Microsoft Corporation.
Slide 1 Sandy Arthur Program Manager Microsoft Corporation Slide 2 Improve reliability, availability, and serviceability of Windows Server 2008 by: Leveraging new technologies to reduce hardware failure rate. Enhancing tests to reduce software failure rate. Address problem areas exposed by OCA, PSS. and other data sources that impact Windows Server 2008 customers typical scenarios, usages, and configurations: Multiple adapters, higher number of CPUs, more RAM, complex drivers such as MPIO, LBFO, TOE, anti-virus, firewall, mirroring, backup, remote management, and so on. Slide 3 Legacy Systems Server-Specific Device and System Logo Requirements Server-Specific Device and System Test Requirements Additional Qualifications Server System Stress Test Slide 4 Slide 5 Supported category for systems will continue to exist: This is forward compatibility from Windows Server 2003 to Windows Server 2008. Supported option will NOT exist for devices: All server-qualified devices/drivers must be Server logod. Testing and submission: Windows Server 2003 SID Windows Server 2008 Logo Drivers CHKLogo 32-bit only systems and devices in the system: Devices need X64 version for logo and signature. Catalog display. Slide 6 Some Server Model Name and Number by Some Manufacturer, Inc. Slide 7 Slide 8 All device categories supported for server: Must be functional in DP-capable systems: DP-capable systems will become more broadly available over time. Customer may add arbitrary devices to their DP-capable systems. Customers will need Hot Add for their virtual Windows instances. This is functionality that all drivers should have in any case. Must meet logo requirements that support the above: Hot Add CPU Resource Rebalance Hot Replace Quiescence/Pseudo S4 Slide 9 Requirements different from Windows Vista and different reasons for requirements that are the same. Security: BitLocker (if implemented) Branch Office scenario. Reliability and availability: WDT (if implemented) prevent hung system from losing availability. WHEA (required). PCI-E (required June 2008) provides Advanced Error Reporting and improved availability. ECC or better (required) memory errors are a Top 20 issue for Windows Server 2003 crashes. Slide 10 Performance: HPET (required) significant performance improvement for applications needing high-resolution timestamps. GigE or better (required). Manageability: Headless, Remote, OOB (required, but specific implementation is up to vendor). Power: Processor power states, if exist, must be exposed to Windows (if implemented). No S3 requirement (if implemented). Slide 11 Slide 12 All device categories: Must test with Windows Server 2008, not Windows Vista. 4-Core, 1-GB system required. DP Tests : Hot Add CPU Resource Rebalance Hot Replace Quiescence/Pseudo S4 Storage and networking devices: 4-Core, 6-GB system required. Device tests that require pools and the no-driver case: 4-core system for DP testing must include driver and be in pool. 4-core system for DP testing must be the sole, or first, system in pool for case where no driver is being tested, such as HD testing. Slide 13 CHKLogo exam of WHQL signature attribute: Client signature has different attributes than server. Many/most of the current Windows Vista tests are If Implemented for server: Will not execute or false fail if the device category does not exist in system. Server-client stress test. Shutdown/restart: ~1% of systems fail to shutdown; numerous reasons. Test can detect power off, dirty shutdown, WDT, and so on. Slide 14 Slide 15 There are groups of If Implemented requirements in WLP 3.x dealing with hardware functionality not required for logo: If Implemented features are not required for logo. If Implemented features that are critical to security or reliability are tested as part of logo qualification, if present: Security, BitLocker Reliability, WDT If Implemented features that provide additional functionality beyond what industry standard systems do are tested as Additional Qualifications: Examples: Dynamic Partitioning, Fault Tolerance, and Virtualization. Vendor may select none, some, or all qualification tests for additional qualifications [AQs]. Additional qualifications are the way a vendor can: Qualify their systems functionality with respect to these if implemented requirements. Inform customers of this functionality in the Server Catalog. Slide 16 FT: None specific to FT DP: Partition isolation Configuration persistence Partition unit differences between I/O and CPU-RAM Stability after hot add and replace operations Partition management, status, and UI VM: Windows-compatible virtualization support Slide 17 Fault tolerant: Standard server system logo test. Fault-tolerant AQ test. Test of system ability to survive FT set break and resync with no impact on stress clients. DP: Standard server system logo test. Dynamic partitioning AQ tests. Tests of partition Isolation, configuration persistence partition nit differences between I/O and CPU-RAM, stability after hot add and replace operations, partition management, status, and UI. VM: Standard server system logo test. Virtualization AQ test. Tests processors for Windows-compatible virtualization support. Slide 18 Slide 19 Slide 20 Slide 21 Customer support history of bugs found that the test could have found, but did not. OEM feedback on past Test issues. Lessons learned from previous kits. Slide 22 Replay-repro failures more consistently: Replay log - within the ball park, not an exact science. Vendor can e-mail log to Microsoft to be replayed in lab. Hardware and machine names need not be the same. Stress load on processor and RAM is appropriate to system: Regardless of CPU count or speed, amount of RAM. Support multiple NICs and HBAs: Test many configurations at once. Dynamically add/replace slave clients: The test no longer automatically fails when a client fails. Automated setup and cleanup. Non-HCT mode for vendor use in testing: Start/stop tests. Change stress settings through GUI or CLI. Use in combination with vendor-written tests. Slide 23 CPU, RAM, paged pool, and non-paged pool: Exercises chipset, buses, processors, and RAM. Network/Winsock: Works with TOE or non-TOE adapters/drivers. SQL emulator: Designed to find corruption issues. Client-server SMB: Multiple NICs, networks, and physical layers [GigE, 10GigE, FDDI] possible. Local file system: Multiple HBAs, storage and physical layers [SATA, SAS, FC, iSCSI] possible. Slide 24 Workload automatically scales to the number of network and storage adapters found in the system: Adapters need not be same type, but require matching network clients or storage devices. Network usage is managed to average ~40% by throttling clients. Storage stress provided by single instance of stress per HBA-HD(s) pair. Target stress level for HBA-HD pairs is 100%. Achieve the same relative amount of stress on the system, regardless of number or type of processors, or amount of memory: Test spawns as many processor-specific and memory-specific stress threads as are needed to achieve a predetermined level of processor and memory usage. Test will terminate those stress threads if the usage level exceeds the predetermined upper range of stress for a period of time. Target for processor, RAM, and pools usage is 70%. Slide 25 FeatureWindows Server 2003Windows Server 2008 Dynamic Load BalancingFixed load generated regardless of the servers capabilities Stress tailored to server capabilities Repro-abilityRe-run and hopeReplay previous test logs Dynamic Client ReplacementNone. Two client machines fail and the server test fails Uninterrupted testing and easy client addition Different Modes of OperationOnly one modeHCT for certification Non-HCT for testing Stress mode for single machine testing Start/Stop Individual TestsNot possibleUser has full control under non- HCT mode Slide 26 Server (SUT: Server Under Test) Client Master : Cluster/Server harness install point Standard server SKU Single process machine Clients : 8 clients generating load against SUT Standard server SKU Single process machine Network Switch Slide 27 history of starts and stops of tests Status of what is happening at the moment Slide 28 Slide 29 Slide 30 Start testing server device and system DTM at Beta 3. File bugs with Microsoft WHQL if any issues. Use server device and system tests to find issues in your own products. Fix those issues so that tests can be passed at RC for server devices and server RTM for systems without costly delays. Slide 31 http://www.microsoft.com/whdc/winlogo/WLP30.mspx http://www.microsoft.com/whdc/system/platform/server/dhp.mspx http://www.microsoft.com/whdc/system/pnppwr/WHEA/default.mspx Slide 32 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.