Click here to load reader

Network Protocol Changes Can Improve DisCom WAN ... · PDF fileStates Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any

  • View
    214

  • Download
    0

Embed Size (px)

Text of Network Protocol Changes Can Improve DisCom WAN ... · PDF fileStates Government. Neither the...

SANDIA REPORT

SAND2005-3132 Unlimited Release Printed June 2005 Network Protocol Changes Can Improve DisCom WAN Performance: Evaluating TCP Modifications and SCTP in the ASC Tri-lab Environment

Tan Chang Hu and Larwrence F. Tolendino

Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore, California 94550 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energys National Nuclear Security Administration under Contract DE-AC04-94AL85000. Approved for public release; further dissemination unlimited.

Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation.

NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced directly from the best available copy. Available to DOE and DOE contractors from

U.S. Department of Energy Office of Scientific and Technical Information P.O. Box 62 Oak Ridge, TN 37831 Telephone: (865)576-8401 Facsimile: (865)576-5728 E-Mail: [email protected] Online ordering: http://www.osti.gov/bridge

Available to the public from

U.S. Department of Commerce National Technical Information Service 5285 Port Royal Rd Springfield, VA 22161 Telephone: (800)553-6847 Facsimile: (703)605-6900 E-Mail: [email protected] Online order: http://www.ntis.gov/help/ordermethods.asp?loc=7-4-0#online

2

SAND2005-3132 Unlimited Release Printed June 2005

Network Protocol Changes Can Improve DisCom WAN

Performance: Evaluating TCP Modifications and SCTP in the ASC Tri-lab Environment

T. C. Hu, Systems Analysis and Trouble Resolution Department

and L. F. Tolendino, Network Design Department

Sandia National Laboratories P.O. Box 5800

Albuquerque, New Mexico 87185-0806

Abstract

The Advanced Simulation and Computing (ASC) Distance Computing (DisCom) Wide Area Network (WAN) is a high performance, long distance network environment that is based on the ubiquitous TCP/IP protocol set. However, the Transmission Control Protocol (TCP) and the algorithms that govern its operation were defined almost two decades ago for a network environment vastly different from the DisCom WAN. In this paper we explore and evaluate possible modifications to TCP that purport to improve TCP performance in environments like the DisCom WAN. We also examine a much newer protocol, SCTP (Stream Control Transmission Protocol) that claims to provide reliable network transport while also implementing multi-streaming, multi-homing capabilities that are appealing in the DisCom high performance network environment. We provide performance comparisons and recommendations for continued development that will lead to network communications protocol implementations capable of supporting the coming ASC Petaflop computing environments. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockeed Martin Company for the United States Department of Energy's National Nuclear Security administration under contract DE-AC04-94AL85000.

2

3

Table of Contents

Introduction......................................................................................................................... 4

ASC/DisCom .................................................................................................................. 4 Performance Challenges of the DisCom WAN .............................................................. 5

Characterizing Usage of the DisCom WAN....................................................................... 6 File Size Distribution ...................................................................................................... 6 Temporal Distribution of Large File Transfers............................................................... 8

Transport Stack Modification ........................................................................................... 10 Modifications to TCP.................................................................................................... 10 Summary of TCP Congestion Control Algorithms....................................................... 10 High Speed TCP ........................................................................................................... 10

Congestion Avoidance .............................................................................................. 11 Additive Increase ...................................................................................................... 11

Scalable TCP................................................................................................................. 12 Congestion Avoidance .............................................................................................. 12 Multiplicative Increase.............................................................................................. 12 Multiplicative Decrease ............................................................................................ 12

SCTP Protocol .............................................................................................................. 13 Conclusions....................................................................................................................... 15

TCP Modifications........................................................................................................ 15 SCTP............................................................................................................................. 18

Recommendations............................................................................................................. 20 References......................................................................................................................... 21

Table of Figures Figure 1: ASCI/DisCom Wide Area Network .................................................................... 5 Figure 2: DisCom File Size Distribution and Performance May 2003............................... 7 Figure 3: DisCom File Size Distribution vs. Data Transferred .......................................... 8 Figure 4: August 2004 - Large File Transfers Are Rare..................................................... 9 Figure 5: Summary of Congestion Control Algorithms ................................................... 10 Figure 6: HighSpeed TCP Table....................................................................................... 11 Figure 7: SCTP vs TCP vs UDP....................................................................................... 14 Figure 8: Network Diagram - Two-to-One 1Gpbs test setup with Delay......................... 15 Figure 9: Percent Increase for Two Parallel Streams....................................................... 16 Figure 10: Percent Increase for Sixteen Parallel Streams................................................ 16 Figure 11: Network Diagram 2: 10Gbps to 1 Gbps test setup with 30 msec Delay ........ 17 Figure 12: 2 Clients to 1 Server: With 30 msec Delay .................................................... 17 Figure 13: 2 Clients to 1 Server: Overall .......................................................................... 18 Figure 14: SCTP client-server setup via 1Gbps links....................................................... 19

4

Introduction The Advanced Simulation and Computing (ASC) program was instituted to maintain a safe and effective nuclear weapon stockpile while upholding the Nuclear Test Ban Treaty. In turn, as modeling and simulation became more important in the maintenance of the nuclear stockpile, it encouraged the development of an infrastructure to support the sharing of valuable and limited computational and visualization resources. Distance Computing (DisCom) is a critical infrastructure aspect of the ASC program that makes resource sharing possible and allows users to utilize remote ASC resources as if they were local.

ASC/DisCom From an infrastructure viewpoint the ASC effort has been focused on the supercomputers, visualization systems, and assorted support systems primarily located at the three main weapons laboratories: Los Alamos National Laboratory (LANL), Lawrence Livermore National Laboratory (LLNL), and Sandia National Laboratories (SNL). Part of the implementation strategy for these expensive hardware systems was to create an environment for sharing the expensive resources throughout the weapons complex. The Distance Computing Wide Area Network (DisCom WAN) is a critical part of the infrastructure that makes resource sharing possible and allows users to utilize remote ASC resources as if they were