Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS

Network Recovery

Protection and Restoration of Optical,SONET-SDH, IP, and MPLS

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page i

The Morgan Kaufmann Series in Networking

Series Editor, David Clark, M.I.T.

Network Recovery: Protection and Restoration

of Optical, SONET-SDH, IP, and MPLS

Jean-Philippe Vasseur, Mario Pickavet, and

Piet Demeester

Routing, Flow, and Capacity Design in

Communication and Computer Networks

Michał Pioro and Deepankar Medhi

Wireless Sensor Networks: An Information

Processing Approach

Feng Zhao and Leonidas Guibas

Communication Networking: An Analytical

Approach

Anurag Kumar, D. Manjunath, and Joy Kuri

The Internet and Its Protocols: A Comparative

Approach

Adrian Farrel

Modern Cable Television Technology: Video,

Voice, and Data Communications, 2e

Walter Ciciora, James Farmer, David Large,

and Michael Adams

Bluetooth Application Programming with the

Java APIs

C Bala Kumar, Paul J. Kline, and Timothy J.

Thompson

Policy-Based Network Management: Solutions

for the Next Generation

John Strassner

Computer Networks: A Systems Approach, 3e

Larry L. Peterson and Bruce S. Davie

Network Architecture, Analysis, and Design, 2e

James D. McCabe

MPLS Network Management: MIBs, Tools,

and Techniques

Thomas D. Nadeau

Developing IP-Based Services: Solutions for

Service Providers and Vendors

Monique Morrow and Kateel Vijayananda

Telecommunications Law in the Internet Age

Sharon K. Black

Optical Networks: A Practical Perspective, 2e

Rajiv Ramaswami and Kumar N. Sivarajan

Internet QoS: Architectures and Mechanisms

Zheng Wang

TCP/IP Sockets in Java: Practical Guide for

Programmers

Michael J. Donahoo and Kenneth L. Calvert

TCP/IP Sockets in C: Practical Guide for

Programmers

Kenneth L. Calvert and Michael J. Donahoo

Multicast Communication: Protocols,

Programming, and Applications

Ralph Wittmann and Martina Zitterbart

MPLS: Technology and Applications

Bruce Davie and Yakov Rekhter

High-Performance Communication Networks,

2e

Jean Walrand and Pravin Varaiya

Internetworking Multimedia

Jon Crowcroft, Mark Handley, and Ian

Wakeman

Understanding Networked Applications: A First

Course

David G. Messerschmitt

Integrated Management of Networked Systems:

Concepts, Architectures, and their Operational

Application

Heinz-Gerd Hegering, Sebastian Abeck, and

Bernhard Neumair

Virtual Private Networks: Making the Right

Connection

Dennis Fowler

Networked Applications: A Guide to the New

Computing Infrastructure

David G. Messerschmitt

Wide Area Network Design: Concepts and Tools

for Optimization

Robert S. Cahn

For further information on these books and for a

list of forthcoming titles, please visit our website

at http://www.mkp.com

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page ii

Network Recovery

Protection and Restoration of Optical,SONET-SDH, IP, and MPLS

Jean-Philippe VasseurMario PickavetPiet Demeester

AMSTERDAM • BOSTON • HEIDELBERG • LONDON

NEW YORK • OXFORD • PARIS • SAN DIEGO

SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page iii

Senior Editor Rick Adams

Associate Editor Karyn Johnson

Acquisitions Editor Rick Adams

Publishing Services Manager Andre Cuello

Project Manager Justin Palmeiro

Editorial Coordinator Graphic World Publishing Services

Cover Design Yvo Riezebos Design

Cover Image Brooklyn Bridge in front of Manhattan skyline at dusk. Courtesy Digital Vision

and Getty Images

Composition Kolam Information Services, Pvt., Ltd.

Technical Illustration Kolam Information Services, Pvt., Ltd.

Copyeditor Graphic World Publishing Services

Proofreader Graphic World Publishing Services

Indexer Graphic World Publishing Services

Interior printer Maple-Vail Book Manufacturing Group, Pennsylvania

Cover printer Maple-Vail Book Manufacturing Group, Pennsylvania

Morgan Kaufmann Publishers is an imprint of Elsevier.

500 Sansome Street, Suite 400, San Francisco, CA 94111

This book is printed on acid-free paper.

# 2004 by Elsevier Inc. All rights reserved.

Designations used by companies to distinguish their products are often claimed as trademarks or

registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the

product names appear in initial capital or all capital letters. Readers, however, should contact the

appropriate companies for more complete information regarding trademarks and registration.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any

form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without

prior written permission of the publisher.

Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in

Oxford, UK: phone: (þ44) 1865 843830, fax: (þ44) 1865 853333, e-mail:

[email protected]. You may also complete your request on-line via the Elsevier homepage

(http://elsevier.com) by selecting ‘‘Customer Support’’ and then ‘‘Obtaining Permissions.’’

Library of Congress Cataloging-in-Publication Data

Application submitted.

ISBN: 0-12-715051-x

For information on all Morgan Kaufmann publications, visit our website at www.mkp.com.

Printed in the United States of America.

04 05 06 07 08 5 4 3 2 1

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page iv

Dedications

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page v

About theAuthors

Jean-Philippe Vasseur

Jean-Philippe Vasseur has a French engineering degree in Network Computing and a Masterof Science degree from the Stevens Institute of Technology, New Jersey. He worked as anetwork architect for several large national and international service providers in large multi-protocol environments (e.g., IP, ATM, X25) prior to joining Cisco Systems. After two yearswithin the EMEA technical consulting group focusing on IP/MPLS routing, VPN, and trafficengineering designs for service providers, he joined the CISCO Engineering team as a Tech-nical Leader with a particular focus on IP, MPLS traffic engineering, and recovery mechan-isms. He is a regular speaker at various international conferences and is involved in severalresearch projects in the area of IP and MPLS. In addition, he is an active member of theInternet Engineering Task Force (IETF) and has co-authored several IETF specifications.

Mario Pickavet

Mario Pickavet received a Master of Science degree and a Doctor of Electrical Engineeringdegree, specialized in telecommunications, from Ghent University in 1996 and 1999, respect-ively. Since 2000, he has been a full-time professor at the same university. His researchinterests are related to broadband communication networks (i.e., IP, MPLS, WDM, SDH,ATM) and include resilience mechanisms, design, and long-term planning of core and accessnetworks. In this context, he was and currently is involved in European IST projects (i.e.,LION, DAVID, STOLAS, ePhoton/One, LASAGNE) on IP over WDM next generationnetworks. He has published a number of international publications on these subjects, both inleading journals (e.g., IEEE Journal on Selected Areas in Communications and IEEE Commu-

nication Magazine) and proceedings of conferences.

Piet Demeester

Piet Demeester received his doctoral degree from Ghent University at the Department ofInformation Technology (INTEC) in 1988. In 1993, he became a professor at Ghent Univer-sity, where he is responsible for research on communication networks. He was involved inseveral European COST, ESPRIT, RACE, ACTS, and IST projects. He is a member of theeditorial board of several international journals and has been a member of several technicalprogram committees. His current interests are related to broadband communication net-works (i.e., IP, G-MPLS, optical packet and burst switching, access and residential, active,mobile, CDN, grid) and include network planning, network and service management, tele-com software, internetworking, and network protocols for QoS support. He has publishedover 250 journal or conference papers in this field. He also has been very active in the field ofresilience in communication networks, both as founder of the DRCN conference and aseditor of special issues on this subject in IEEE Communication Magazine.

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page vi

Contents

Foreword xv

Preface xvii

Chapter 1 General Introduction 11.1 Communications Networks Today 1

1.1.1 Fundamental Networking Concepts 31.1.2 Layered Network Representation 51.1.3 Network Planes 6

1.2 Network Reliability 81.2.1 Definitions 91.2.2 Which Failures Can Occur? 121.2.3 Reliability Requirements for Various Users and Services 181.2.4 Measures to Increase Reliability 20

1.3 Different Phases in a Recovery Process 221.3.1 Recovery Cycle 231.3.2 Reversion Cycle 24

1.4 Performance of Recovery Mechanisms: Criteria 251.4.1 Scope of Failure Coverage 251.4.2 Recovery Time 261.4.3 Backup Capacity Requirements 261.4.4 Guaranteed Bandwidth 271.4.5 Reordering and Duplication 271.4.6 Additive Latency and Jitter 271.4.7 State Overhead 271.4.8 Scalability 271.4.9 Signaling Requirements 281.4.10 Stability 281.4.11 Notion of Recovery Class 28

1.5 Characteristics of Single-Layer Recovery Mechanisms 281.5.1 Backup Capacity Dedicated versus Shared 29

vii

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page vii

1.5.2 Recovery Paths: Preplanned versus Dynamic 301.5.3 Protection versus Restoration 311.5.4 Global versus Local Recovery 321.5.5 Control of Recovery Mechanisms 341.5.6 Ring Networks versus Mesh Networks 351.5.7 Connection-Oriented versus Connectionless 361.5.8 Revertive versus Nonrevertive Mode 36

1.6 Multilayer Recovery 361.6.1 Sequential Approach 381.6.2 Integrated Approach 38

1.7 Conclusion 38

Chapter 2 SONET/SDH Networks 392.1 Introduction 40

2.1.1 Transmission Networks 402.1.2 Management of (Transmission) Networks 422.1.3 Structuring/Modeling Transmission Networks 432.1.4 Summary 45

2.2 SDH and SONET Networks 452.2.1 Introduction 452.2.2 Structure of SDH Networks 462.2.3 SDH Frame Structure: Overhead Bytes Relevant for

Network Recovery 482.2.4 SDH Network Elements 522.2.5 Summary 552.2.6 Differences between SONET and SDH 56

2.3 Operational Aspects 572.3.1 Fault Management Processes 582.3.2 Fault Detection and Propagation Inside a Network

Element 602.3.3 Fault Propagation and Notification on a Network Level 702.3.4 Automatic Protection Switching Protocol 742.3.5 Summary 80

2.4 Ring Protection 812.4.1 Multiplex Section–Shared Protection Ring 832.4.2 Multiplex Section–Dedicated Protection Ring 912.4.3 Subnetwork Connection Protection Ring 932.4.4 Ring Interconnection 932.4.5 Summary 1052.4.6 Differences between SONET and SDH 106

2.5 Linear Protection 1072.5.1 Multiplex Section Protection 1072.5.2 Path Protection 1082.5.3 Summary 113

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page viii

viii Contents

2.6 Restoration 1132.6.1 Protection versus Restoration 1132.6.2 Summary 115

2.7 Case Study 1152.8 Conclusion 1272.9 Recommended Reference Work and Research-Related Topics 129

Chapter 3 Optical Networks 1313.1 Evolution of the Optical Network Layer 132

3.1.1 Wavelength Division Multiplexing in thePoint-to-Point Optical Network Layer 132

3.1.2 An Optical Networking Layer with Optical Nodes 1353.1.3 An Optical Network Layer Organized in Rings 1353.1.4 Meshed Optical Networks 1373.1.5 Adding Flexibility to the Optical Network Layer 139

3.2 The Optical Transport Network 1393.2.1 Architectural Aspects and Structure of the Optical

Transport Network 1393.2.2 Structure of the Optical Transport Module 1423.2.3 Overview of the Standardization Work on the Optical

Transport Network 1443.3 Fault Detection and Propagation 144

3.3.1 The Optical Network Overhead 1453.3.2 Defects in the Optical Transport Network 1523.3.3 OTN Maintenance Signals and Alarm Suppression 154

3.4 Recovery in Optical Networks 1573.4.1 Recovery at the Optical Layer? 1573.4.2 Standardization Work on Recovery in the Optical

Transport Network 1583.4.3 Shared Risk Group 159

3.5 Recovery Mechanisms in Ring-Based Optical Networks 1603.5.1 Multiplex Section Protection in Ring-Based Optical

Networks 1633.5.2 Optical Channel Protection in Ring-Based Optical

Networks 1663.5.3 OMS- versus OCh-Based Approach 1703.5.4 Shared versus Dedicated Approach 1713.5.5 Interconnection of Rings 173

3.6 Recovery Mechanisms in Mesh-Based Optical Networks 1733.6.1 Protection 1753.6.2 Protection in a WP Network versus Protection in

a VWP Network 1763.6.3 Restoration 1773.6.4 Protection versus Restoration 180

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page ix

Contents ix

3.6.5 Protection Combined with Restoration 1823.7 Ring-Based versus Mesh-Based Recovery Schemes 1823.8 Availability 185

3.8.1 Availability Calculations 1853.8.2 Availability: Some Observations 192

3.9 Recent Trends in Research 1973.9.1 p-Cycles 1973.9.2 Meta-Mesh Recovery Technique 1993.9.3 Flexible Optical Networks 200

3.10 Conclusion 200

Chapter 4 IP Routing 2034.1 IP Routing Protocols 204

4.1.1 Introduction 2044.1.2 Distance Vector Routing Protocols Overview

(‘‘Bellman-Ford’’) 2044.1.3 Link State Routing Protocols Overview 2074.1.4 IP Routing: A Global versus Local Restoration

Mechanism? 2134.2 Analysis of the IP Routing Recovery Cycle 214

4.2.1 Fault Detection and Characterization 2144.2.2 Hold-Off Timer 2144.2.3 Fault Notification Time 2154.2.4 Computation of the Routing Table 2154.2.5 An Example of IP Rerouting upon Link Failure 217

4.3 Failure Profile and Fault Detection 2204.3.1 Failure Profiles 2204.3.2 Failure Detection 2224.3.3 Failure Characterization 2244.3.4 Analysis of the Various Failure Types and Their

Impact on Traffic Forwarding 2254.4 Dampening Algorithms 2264.5 FIS Propagation (LSA Origination and Flooding) 229

4.5.1 LSA Origination Process 2314.5.2 LSA Flooding Process 2334.5.3 Time Estimate for the LSA Origination and

Flooding Process 2374.6 Route Computation 237

4.6.1 Shortest Path Computation 2384.6.2 The Dijkstra Algorithm 2414.6.3 Shortest Path Computation Triggers 2494.6.4 Routing Information Base Update 251

4.7 Temporary Loops during Network State Changes 2524.7.1 Temporary Loops in the Case of a Link or Node Failure 253

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page x

x Contents

4.7.2 Temporary Loops Caused by a Restored NetworkElement 257

4.8 Load Balancing 2594.9 QoS during Failure 262

4.9.1 IP Traffic Engineering at Steady State 2624.9.2 QoS Guarantee during Failure 264

4.10 Nonstop Forwarding: An Example with OSPF 2664.10.1 Mode of Operation 2674.10.2 Mode of Operation of the Restarting Router 2674.10.3 Mode of Operation of the Restarting Router’s

Neighbors 2694.10.4 Backward Compatibility 269

4.11 A Case Study with IS-IS 2704.12 Summary 2784.13 Algorithm Complexity 279

4.13.1 Definition of Algorithm Complexity 2794.13.2 NP Complete Problem 284

4.14 Incremental Dijkstra 2854.14.1 Motivation 2854.14.2 History 2874.14.3 Algorithm Description 2874.14.4 iSPF Efficiency 293

4.15 Interaction between Fast IGP Convergence and NSF 2934.16 Research-Related Topics 295

Chapter 5 MPLS Traffic Engineering Recovery Mechanisms 2975.1 MPLS Traffic Engineering Refresher 298

5.1.1 Traffic Engineering in Data Networks 2985.1.2 Terminology 3015.1.3 MPLS Traffic Engineering Components 3035.1.4 Notion of Preemption in MPLS Traffic Engineering 3055.1.5 Motivations for Deploying MPLS Traffic Engineering 306

5.2 Analysis of the Recovery Cycle 3075.2.1 Fault Detection Time 3075.2.2 Hold-Off Timer 3085.2.3 Fault Notification Time 3085.2.4 Recovery Operation Time 3095.2.5 Traffic Recovery Time 309

5.3 MPLS Traffic Engineering Global Default Restoration 3105.3.1 Fault Signal Indication 3105.3.2 Mode of Operation 3115.3.3 Recovery Time 313

5.4 MPLS Traffic Engineering Global Path Protection 3145.4.1 Mode of Operation 315

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xi

Contents xi

5.4.2 Recovery Time 3165.5 MPLS Traffic Engineering Local Protection 316

5.5.1 Terminology 3165.5.2 Principles of Local Protection Recovery Techniques 3175.5.3 Local Protection: One-to-One Backup 3185.5.4 Local Protection: ‘‘Facility Backup’’ 3205.5.5 Properties of a Traffic Engineering LSP 3255.5.6 Notification of Tunnel Locally Repaired 3275.5.7 Signaling Extensions for MPLS Traffic Engineering Local

Protection 3295.5.8 Two Strategies for Deploying MPLS Traffic

Engineering for Fast Recovery 3295.6 Another MPLS Traffic Engineering Recovery Alternative 3335.7 Load Balancing 3345.8 Comparison of Global and Local Protection 336

5.8.1 Recovery Time 3365.8.2 Scalability 3365.8.3 Bandwidth Sharing Capability 3405.8.4 Summary 343

5.9 Revertive versus Nonrevertive Modes 3465.9.1 MPLS Traffic Engineering Global Default Restoration 3465.9.2 MPLS Traffic Engineering Global Path Protection 3475.9.3 MPLS Traffic Engineering Local Protection 347

5.10 Failure Profile and Fault Detection 3485.10.1 MPLS-Specific Failure Detection Hello-Based Protocols 3485.10.2 Requirements for an Accurate Failure Type

Characterization 3495.10.3 Analysis of the Various Failure Types and Their

Impact on Traffic Forwarding 3535.11 Case Studies 354

5.11.1 Case Study 1 3545.11.2 Case Study 2 3595.11.3 Case Study 3 362

5.12 Standardization 3705.13 Summary 3715.14 RSVP Signaling Extensions for MPLS TE Local Protection 372

5.14.1 SESSION-ATTRIBUTE Object 3725.14.2 FAST-REROUTE Object 3745.14.3 DETOUR Object 3755.14.4 Route Record Object 3765.14.5 Signaling a Protected Traffic Engineering LSP with

a Set of Constraints 3785.14.6 Identification of a Signaled TE LSP 3785.14.7 Signaling with Facility Backup 3795.14.8 Signaling with One-to-One Backup 382

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xii

xii Contents

5.14.9 Detour Merging 3845.15 Backup Path Computation 385

5.15.1 Introduction 3865.15.2 Requirements for Strict QoS Guarantees during Failure 3865.15.3 Network Design Considerations 3875.15.4 Notion of Bandwidth Sharing between Backup Paths 3925.15.5 Backup Path Computation: MPLS TE Global Path

Protection 3935.15.6 Backup Tunnel Path Computation: MPLS TE Fast

Reroute Facility Backup 3975.15.7 Backup Tunnel Path Computation with MPLS TE Fast

Reroute One-to-One Backup 4195.15.8 Summary 421

5.16 Research-Related Topics 422

Chapter 6 Multilayer Networks 4236.1 ASON/G-MPLS Networks 424

6.1.1 The ASON/ASTN Framework 4246.1.2 Protocols for Implementing a Distributed Control Plane 4266.1.3 Overview of Control Plane Architectures (Overlay, Peer,

Augmented) 4326.2 Generic Multilayer Recovery Approaches 437

6.2.1 Why Multilayer Recovery? 4386.2.2 Single-Layer Recovery Schemes in Multilayer Networks 4396.2.3 Static Multilayer Recovery Schemes 4446.2.4 Dynamic Multilayer Recovery 4576.2.5 Summary 464

6.3 Case Studies 4646.3.1 Optical Restoration and MPLS Traffic Engineering

Fast Reroute 4656.3.2 SONET/SDH Protection and IP Routing 4696.3.3 MPLS Traffic Engineering Fast Reroute (Link Protection)

and IP Rerouting Fast Convergence 4716.4 Conclusion 476

Bibliography 479

List of Figure Sources 491

Index 497

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xiii

Contents xiii

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xiv

This page intentionally left blank

Foreword

It was not all that long ago that the Internet was considered, at least by the

traditional telecommunications world, as a researcher’s toy. They scoffed at the

notion of a ‘‘best effort’’ network being a useful way to support serious telecommu-

nications such as ‘‘toll quality’’ voice or business critical data communications.

Some of the traditional telecommunications carriers still think this way. But even

these carriers are now planning on migrating their services to Internet Protocol (IP)-

based networks. Some of the more dogmatic carriers are planning on building a

multicarrier carrier-run IP network in parallel with the Internet, but most of the

carriers are beginning to realize that the Internet itself is in the process of a

transformation—a transformation that will render obsolete much of the traditional

telecommunications infrastructure and thinking.

A key facilitator of this transformation is the subject of this book. Recovery

from link failures in traditional IP networks can take a long time. This is because IP

routing protocols (covered in Chapter 4) were not designed to ensure that network

users would not experience significant outages while the routing protocol attempted

to find a path around a link, SRLG, or node failure. Internet research has shown

that a single link failure can cause users to experience outages of many minutes even

when the underlying network architecture itself is highly redundant with plenty of

spare bandwidth available and with multiple ways to route around the failure.

Outages of many minutes are not a real problem if the primary communication

method is email. Such outages are more of an issue for web surfers, but considering

the episodic nature of web surfers, most users will not even notice even a 5-minute

outage. On the other hand, even outages as short as a few tens of seconds can be a

real problem when talking over the phone and a 5-minute outage seems like forever.

The same is true for critical business data communications such as stock trading

systems, e-commerce servers or process controllers. Future applications such as

remote medical diagnostic or surgery systems will be even more demanding.

The network recovery technologies covered in this book are changing the

perception and reality of the Internet. The IP, MPLS, SONET, and optical protec-

xv

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xv

tion and restoration technologies explained in this book can be used to reduce

outages resulting from link, SRLG, and node failures from minutes to subseconds

(and in some cases to milliseconds). As these technologies continue to be deployed

in major Internet service providers even the most demanding traditional telecom-

munications engineer will be forced to take another look at replacing their existing

network infrastructure with the Internet or at least with Internet technology, if for

no other reason than the realization that their competitors are already working on

their transition plans.

This is the right book at the right time for anyone in the telecommunications

business, or anyone who is dependent on the services provided by the telecommuni-

cations business and would like to understand the new Internet that is rapidly

becoming the common reality.

Scott Bradner

Senior Technical Consultant and University Technology

Security Officer at Harvard University

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xvi

xvi Foreword

Preface

The range of services and applications that rely on communication networks is

impressive: business critical communication, phone calls, emails, home banking,

and even watching TV or listening to music, and this is undoubtedly just the very

beginning. Because our professional and private life is more and more dependent on

these communication services, the repercussions of a service interruption are severe.

Hence, network reliability has received intensified interest from service providers

and enterprises during the past few years to provide highly reliable networks, and

this trend is expected to continue in the future. We have dedicated a very significant

amount of our time during those past years to understanding the challenges of

network recovery and the existing and new requirements of operators and enter-

prises to develop new technologies, standards, and network designs. We found that

the time was overdue to devote a book to network recovery, and this book is the

result of our experience.

Network recovery is undoubtedly a complex, fascinating, and rapidly evolving

topic, essentially because of its truly multi-dimensional nature. Indeed, although the

immediate criteria that comes to mind is convergence time (i.e., time to recover the

affected traffic), which is only one among several other aspects we should consider.

Throughout this book, we explore all the other dimensions that lead to choosing a

particular recovery mechanism and elect a specific network recovery design: Does

the backup path offer a similar quality of service in terms of bandwidth and propaga-

tion delay? What are the consequences of maintaining extra network states? Is there

any potential impact on the network stability as a result of trying to restore the traffic

upon failure and to potentially reuse restored routes? What are the implications of

adding some extra complexity in the network both in terms of engineering and network

operation management? And finally, what are the cost implications in terms of

additional required equipment and network backup bandwidth? All the above criteria

must be carefully evaluated, because they lead to various trade-offs during the

decision-making process of network recovery design. Moreover, one must admit

that the emergence of new services and applications have resulted in some increased

xvii

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xvii

complexity in terms of hardware and software equipment (indeed, it is not unusual

to see a software program with millions of code lines!). As a result, the potential for

possible failures drastically increased during the past several years, both in diversity

and identification complexity. Furthermore, both network convergence and the

rapid growth of new applications such as Voice or Video over IP led to building

networks involving several layers. Each layer offers a large set of recovery mecha-

nisms, which ineluctably interact when deployed at multiple layers. Hence, we

devote an entire chapter to the subject of interlayer recovery, with the objective

of highlighting the potential interactions between multiple recovery mechanisms

operating at different layers.

Objectives

Our objectives in writing this book are to deliver a thorough reference to network

recovery mechanisms available at various layers, highlighting the strengths, weak-

nesses, and applicability of each of them. This includes not only the detailed coverage

of the signaling and routing aspects, but also the delicate problem of understanding

the underlying dynamics: In other words, what actually happens when a network

recovery mechanism is triggered upon a network element failure? This usually involves

a succession of rather complex action steps, which are described and illustrated by

means of an extensive set of examples that appear throughout this book.

Our second objective is to go beyond the technical description of the possible

network recovery mechanisms and include some network recovery design guidelines

as well. Indeed, understanding a protection or restoration mechanism is not a

sufficient knowledge base from which to develop an efficient network design,

especially considering the large set of possible objectives and networks constraints.

Consequently, for each mechanism we incorporate a number of detailed case

studies, starting with the constraints and then describing a set of network recovery

objectives to propose some possible designs along with detailed explanations and

possible alternatives.

Finally, although we have been involved in the design of some of the recovery

mechanisms discussed, we have endeavored to detail each one with objectiveness.

One of our main challenges in writing this book was to offer a reference in

network recovery while not making it a prerequisite for the reader to be an expert in

every related layer. Hence, we have strived to make each chapter readable at

multiple levels—from a basic understanding of the discussed set of mechanisms in

question to an in-depth coverage allowing a protocol designer and network archi-

tect to benefit from this material in their primary work.

Structure

We begin the book with an ‘‘advanced’’ introduction with the goal to provide an

exhaustive definition of each concept used throughout the book. In particular, the

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xviii

xviii Preface

literature related to network recovery often uses substantially disparate terms,

acronyms, and definitions. To avoid confusion, we devoted a chapter to reviewing

each individual concept before digging in to each detailed analysis of the network

recovery mechanisms available at each layer.

After this general Chapter 1, the first investigated approach in Chapter 2 is

SONET-SDH, followed by the optical layer in Chapter 3, because a significant

number of network recovery mechanisms available at the optical layer were inspired

by SONET-SDH. Then IP routing is explored in detail: A large proportion of

Chapter 4 emphasizes the routing dynamics, which play a crucial role in distributed

routing environments and are usually not covered in detail in literature. The MPLS-

related recovery mechanisms are studied in Chapter 5, and the large set of drastic

evolutions in this area are all covered in depth with numerous illustrations. Note

that in most of the chapters, the reader can skip some parts related to more

advanced aspects (usually situated at the end of each chapter) without compromis-

ing an overall understanding of the technology. For the sake of illustration, a

detailed understanding of the signaling aspects of MPLS Fast Reroute is not

required to perfectly understand the mechanisms in use and how they can be

applied to any networks.

Finally, Chapter 6 of this book concludes with a discussion of the interlayer

network aspects and investigates the interaction between network recovery pro-

cesses operating at different layers.

Each chapter contains a final section devoted to current research-related work.

These sections may be the core of potential revision to this book.

Acknowledgments

We are greatly indebted to Didier Colle, Sophie De Maesschalck, and Ilse Lievens

for their indispensable contributions to the writing and review of the book. We

highly appreciate your continuous effort and outstanding expertise throughout the

entire process!

We are of course extremely grateful to our reviewers for their thorough review

of our book and their precious suggestions that undoubtedly helped us in many

ways to improve its content: Dave Cooper for his experience and expertise in data

networks after several years as a lead architect for a large Service Provider; Stefano

Previdi, probably one of most recognized routing experts; Achim Autenrieth for his

much appreciated expertise in transport network recovery mechanisms; Kevin

D’Souza for his valuable suggestions and input from a large operator; and Maurice

Gagnaire for his expertise as a widely recognized professor, researcher, and author.

Writing a book is a fascinating experience; this book would not have

been possible without the support from several people at Morgan Kaufmann

and in particular our editor, Rick Adams, and our development editor,

Karyn Johnson, whose help and guidance throughout the writing of this book

have been tremendous. We are also extremely grateful to our production editor,

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xix

Preface xix

Denise DeLancey, for her outstanding professionalism and precious help during the

production phase of this book.

Finally, we would like to encourage our readers to send comments, highlight

errors or omissions, or support the writing of a second edition. Please contact us at:

[email protected].

Special Acknowledgments

My first acknowledgment undoubtedly goes to my wife, Brigitte, without whose

help and support I would not have succeeded in either the writing of this book or in

my life. I would, of course, like to thank my two daughters, Manon and Eleonore,

for their patience, understanding, and love.

I also wish to thank my company, Cisco Systems, Inc., for being part of it, and

in particular Bruce Davie, who helped me in many respects.

A special thank you to my close friend Stefano Previdi, for not only his review

and expertise but also for several years of friendship.

Jean-Philippe Vasseur

Words are not adequate to thank my wife, Evelien, for her tremendous support

and understanding in everything I do. Thank you, Evelien, for being my soul mate

and the ultimate inspiration in my life.

I would also like to thank my colleagues at Ghent University for the fruitful

technical debates.

Mario Pickavet

Thank you, colleagues (especially those from the ACTS-PANEL project) and

PhD students, for the stimulating discussions on network resilience.

Thank you, Mieke, Bram, Anneleen, and Jozefien, for your love, patience, and

support.

Piet Demeester

Vasseur / Network Recovery Final Proof 9.6.2004 7:32pm page xx

xx Preface

C H A P T E R 1

General Introduction

This chapter presents a general introduction to recovery mechanisms in data and

telecommunications networks. Before delving into the core topic of this book,

network recovery mechanisms, we present an overview of the main technologies

in today’s broadband communications networks. The objective of Section 1.1 is to

highlight the high-level characteristics of these networks, to refresh our knowledge.

After this background introduction, we touch on the focus of the book in Section

1.2, where the crucial importance of network reliability—and hence recovery mech-

anisms—is explained. Making abstraction of the network technology, we enumer-

ate and illustrate the different phases of traffic recovery in Section 1.3. To be able to

compare various recovery mechanisms, we discuss a number of criteria in Section

1.4. The main characteristics and fundamental choices when constructing a single-

layer recovery mechanism are elucidated in Section 1.5. This allows classification of

the plethora of recovery mechanisms and a better overview of the pros and cons

of different mechanisms. Finally, Section 1.6 briefly touches on the issue of inter-

layer dependency of failures and recovery mechanisms.

In summary, the first section presents a technology overview, whereas sections

that follow lay the (technology-independent) foundations for the study of recovery

mechanisms, by explaining the terminology and by highlighting the main charac-

teristics of recovery mechanisms.

1.1 Communications Networks Today

It is a well-known fact that today’s society is relying more and more on communi-

cations networks, both for professional and for recreational purposes. The volume

of traffic to be conveyed by the communications network infrastructure has grown

significantly. This traffic increase, which is expected to continue, is mainly due to

the popularity of the Internet and all its related services [Rob01]. According

to several sources (e.g., [McK00]) data traffic has already overtaken voice traffic

1

Vasseur / Network Recovery Final Proof 7.6.2004 12:18pm page 1

in volume. In terms of revenue, however, voice traffic is still the most important

source of income for telecom operators. A network model more optimized to carry

data traffic could thus help the operators to increase their revenues coming from data

traffic. Moreover, several service providers have started to carry voice traffic over

their data networks. This can reduce the operating cost, because operating a single

network is typically less costly than running two networks.

The network model currently envisaged to be most suited for the transport of

large traffic volumes is an IP/MPLS-over-OTN multilayer model, as depicted in

Figure 1.11 (in contrast to today’s IP [over ATM] over SONET/SDH over WDM

networks). Because of the explosive growth of the Internet, future broadband

communications networks will be based on the Internet Protocol (IP) to carry

both data and voice traffic. To be able to accommodate these huge amounts of

traffic, the transport network will be based on Wavelength Division Multiplexing

(WDM). WDM systems provide more bandwidth capacity over fiberoptic networks

by increasing the number of usable channels per cable. Unfortunately, because

traffic must be converted to an electrical signal at each network node, a bottleneck

is created. This bottleneck can be overcome by introducing optics in the network

nodes, thereby creating an Optical Transport Network (OTN). The introduction of

Multi-Protocol Label Switching (MPLS) [Ros01] in the IP layer enhances the

network’s capabilities (e.g., for traffic engineering, virtual private networking

1See the first sections of Chapters 2, 3, 4, and 5 for more detailed descriptions of the different tech-

nologies.

OTN(G-MPLS)

Voice, IP, ...

WDM (pt. to pt.)

SONET/SDH

ATM

IP-MPLS

PAST FUTURE

Adaptation

Figure 1.1 Evolution in data-centric networks.


2 C H A P T E R 1 General Introduction

[VPN] or strict quality-of-service [QoS] support). In the OTN, a similar extension

toward Generalized MPLS (G-MPLS [Man1]) provides opportunities for dynamic

lightpath allocation, leading to an Automatic Switched Optical Network (ASON)

[G807]. Of course, some kind of adaptation (e.g., based on a SONET- or SDH-like

mechanism) is still needed between the IP/MPLS and the OTN layer, to deal with

issues like framing, flow control, and error correction.

This book concentrates on the data-centric network model depicted in Figure

1.1, particularly on the reliability and recovery issues that arise in these networks.

We mainly focus on carrier infrastructure of the network, that is, the network

realizing the traffic exchange between the various customers. The following chap-

ters of this book contain an in-depth discussion of the various recovery mechanisms

in every technology. The remainder of the current section introduces some general

preliminary terminology on communications networks and may be skipped if you

are familiar with communications networks.

1.1.1 Fundamental Networking Concepts

There is a large variety of characteristics of the different technologies in communi-

cations networks today.

Symmetrical Versus Asymmetrical Traffic, Unidirectional VersusBidirectional Traffic

In the literature, several definitions of symmetrical/asymmetrical and unidirectional/

bidirectional traffic exist. In this book, we adopt the following definitions. Symmet-

rical services, like classic telephony, require the same bandwidth in each direction. If

person A and person B want to have a telephone conversation, a connection

is provided between A and B, where the same bandwidth is provided from A to B

as from B to A. Other services, like Web access, are inherently asymmetrical in

nature: The bandwidth needed from server to client is typically much higher than

that needed from client to server.

If the traffic from server to client is following a different route than the traffic

from client to server, this is called unidirectional traffic. In the case of bidirectional

traffic, the route from point A to point B in the network is the same as the route

from point B to point A. The unidirectional or bidirectional nature of various

services has a direct impact on the network technologies, designed with one or

more specific services in mind. For instance, SONET and SDH were originally

designed with a focus on classic telephony, hence all connections are bidirectional.

On the other hand, IP/MPLS is inherently unidirectional.

Ring Networks Versus Mesh Networks

A ring network is defined as a set of nodes forming a closed loop where each node is

connected to two adjacent nodes. A ring network is completely composed of

interconnected rings (Figure 1.2). Whereas the traffic in a mesh network is routed


1.1 Communications Networks Today 3

unrestricted through the network, all traffic in a ring network is routed from ring to

ring. (See Chapters 2 and 3 for typical examples of SONET/SDH networks and

OTN networks, respectively.)

Circuit Switching Versus Packet Switching

A distinction can be made with respect to the atomic entity (i.e., the smallest

indivisible portion) that is switched in a network node. In circuit switching, all

information is transported through the network via circuits (i.e., paths with a

fixed available bandwidth). In packet switching, all information is split up in packets

and these packets are sent one by one through the network. Every node reads the

header or label of every incoming packet, to find where the packet should be

forwarded. Because the packets occupy capacity only when they are transmitted,

packet switching allows for statistical multiplexing and hence is usually more

efficient than circuit switching in terms of bandwidth usage. On the other hand,

packet switching requires more operations in the network nodes, because the

packets must be processed and switched one by one.

Connection Oriented Versus Connectionless

Switching techniques can also be categorized as connection oriented or connection-

less. In connection-oriented networks, an end-to-end connection must be established

before the start of each communication session. After the session, the connection is

closed. In connectionless networks, communication can occur without having to

establish any kind of connection. For instance, circuit-switched networks (e.g.,

based on SONET/SDH or OTN) only support connection-oriented operation.

Working PathRing 1

Ring 2

Ring 3

Figure 1.2 Example of a ring (left) and a ring network (right).



Shared multiple access approaches (e.g., Ethernet) do not involve the concept

of connection and are hence connectionless. Packet-switched networks can operate

in connectionless (e.g., IP networks) or connection-oriented mode (e.g., ATM

networks). Hybrid forms are also possible; for example, MPLS has both a

connectionless and a connection-oriented form (see Chapter 5 for a detailed

description).

1.1.2 Layered Network Representation

A communications network usually consists of a number of heterogeneous network

elements, performing a large variety of communication functions. To ensure the

compatibility and ‘‘interworking’’ of these elements, several reference models

have been developed from various standardization bodies. Some examples are

ITU-T recommendation G.805 on the generic functional architecture of transport

networks [G805], the OSI-model in ITU-T recommendation X.200 [X200], and

the TCP/IP protocol stack [Soc91] from the Internet Engineering Task Force

(IETF).

The latter model is shown in Figure 1.3. It isolates the specific functions or tasks

for communication in IP networks in five layers. The lowest layer, or Layer 1, is the

physical layer, which deals with the transmission of the pure unstructured bit

stream over a physical link (e.g., optical fiber, coaxial cable, and wireless). It deals

with characteristics to establish, maintain, and deactivate a physical link, such as

bit duration and signal voltage swing. Layer 2, or the data link layer, attempts to

make the physical link reliable and provides the means to activate, maintain, and

deactivate the link. The main service provided to the higher layers is error detection

and control. This implies that with full Layer 2 functionality, the next higher layer

may assume a virtual error-free transmission over the link. Layer 3 is the network

layer, which provides the upper layers with independence from data transmission

and switching technologies used to connect systems. Layer 3 is realized by the IP,

so it relieves Layer 4 of the need of knowing about the underlying data transmission

and switching. The purpose of Layer 4, the transport layer, is to provide a

reliable mechanism for the transparent exchange of data between endpoints. It is

a connection-oriented approach, providing end-to-end error recovery and flow

Application Layer

Physical Layer

Transport Layer

Network Layer

Data link Layer

5

1

4

3

2

Figure 1.3 TCP/IP protocol stack.



control and ensuring that data units are delivered error free, in sequence,

and without duplications or losses. Layer 4 is realized by the Transport Control

Protocol (TCP). The User Datagram Protocol (UDP) forms a connection-

less alternative. The highest layer is the application layer, providing a means

for applications to exchange information. Some typical examples of application

layer protocols in TCP/IP networks are the Hypertext Transfer Protocol (HTTP,

used for access to the Web), the File Transfer Protocol (FTP) to upload and

download large files, the Simple Mail Transfer Protocol (SMTP) for email,

and so on.

In practice, however, broadband communications networks typically consist of

several technologies (e.g., OTN, SONET or SDH, ATM, IP/MPLS, etc.), where

every technology can cover functions from different layers in the TCP/IP protocol

stack. The main drivers leading to a multitechnology network are that each tech-

nology has its strengths and weaknesses (depending on the traffic type, the user

requirements, etc.) and the historical evolution of a communications network where

legacy equipment is used for as long as possible.

To visualize multitechnology networks in a comprehensible manner, a layered

representation is very helpful [G805], and this type of representation will be used

throughout the book. Every network technology corresponds to a network layer,

where the successive network layers usually have a client-server relationship. To

illustrate this concept, the example of a small IP-over-OTN network is shown in

Figure 1.4.

The OTN layer, consisting of five optical cross-connects (OXCs A, B, C, D, and

E) that are interconnected via optical fibers, represents the physical topology of

the network (lower plane of Figure 1.4). It serves as a transport network to the IP

layer. For instance, the link between IP routers b and c will be realized in the OTN

layer as a lightpath (i.e., a bandwidth pipe corresponding with one wavelength

channel) B-D-C or B-A-E-C. In a similar way, the IP links a-b, a-c, b-d, and c-d are

realized, leading to the logical topology of the IP layer (upper plane of Figure 1.4).

From an IP point of view, only the logical topology of the IP layer is visible,

irrespective of the exact realization of the IP links in the underlying transport

network.

Of course, a realistic carrier-class network will typically be much more complex

than the situation shown in Figure 1.4: The network can contain hundreds of

nodes instead of only a few, often separated in multiple domains (e.g., autonomous

systems), possibly with different routing protocols in the different domains, and

so on.

1.1.3 Network Planes

To identify the large functionality entities in a communications network, we make a

distinction between the following (see also Figure 1.5 [I321]):

. The data or user plane transfers user information (also called the payload )

through the network. Every network layer has its own user plane.



. The control plane handles, for example, signaling for connection setup,

supervision, and tear down by transferring the control information through

the network routing table updates. Every network layer has its own control

plane. A control plane typically functions in a distributed way across the

IP Layer

OTN Layer

a

b

c

d

A

B

C

Working Path

Recovery Path

D

E

Figure 1.4 IP-over-OTN network.

Physical Layer

...

Highest Layer

Control Plane User Plane

Management PlanePlaneMgmt

LayerMgmt

Figure 1.5 Protocol reference model. (ITU-T Recommendation I.321, ‘‘B-ISDN Protocol ReferenceModel and its Application,’’ April 1991. Available at www.itu.int. Accessed May 2004.)



network. Typical examples are the telephony control plane (e.g., based on

Signaling System 7 [SS7]) and the IP control plane.

. The management plane consists of two parts: A layer management for each

network layer and a plane management to ensure the correct coordination

between the different layers. A management plane is usually operating in a

centralized way, a typical example being the Telecommunication Manage-

ment Network (TMN) [M30000].

1.2 Network Reliability

Communications networks are subject to a wide variety of unintentional failures

caused by natural disasters (earthquakes, fires, and floods), wear out, overload,

software bugs, human errors, and so on, as well as intentional failures caused by

maintenance action or sabotage. Such failures affect network facilities such as

transmission or switching infrastructure, whose failure in turn disrupts communi-

cation services for business and residential users.

Communication services play an indispensable role in many of the social

and economic activities of our daily lives [Dem99]. For instance, telephone services

serve as a lifeline and their interruption (even if only temporary) causes social

turmoil and unrest. Strategic corporate functions also show an increasing depend-

ence on communication services. For business customers, disruption of com-

munication can suspend critical operations. This may cause a significant loss of

revenue for the customer, to be reclaimed from the communications provider.

In fact, availability guarantees (and compensations if these are not met) now

form an important component of service-level agreements (SLAs) between

provider and customer. Besides, the provider is often the largest customer

of its own communication services (e.g., an incumbent network operator heavily

relies on its own transport infrastructure). For all parties’ sake, it is thus

imperative to provide a high level of service availability. This relies on the per-

manence of those network functions required to make the communication services

run.

Before going into more detail on the kind of failures that can happen in

a communications network and their impact on various services, we define

some terms commonly used in the context of network reliability. This terminology

is used throughout the book. Section 1.2.2 elaborates on typical examples

of network failures, to give you a better overview of what can disturb the

proper functioning of a communications network. The impact of these failures on

the plethora of services to be supported by the network is discussed in Section

1.2.3.

Considering the drastic nature of some failures and the unacceptable impact on

crucial communication services, we take a wide variety of measures to overcome or

alleviate the burden caused by frequently occurring failures. Section 1.2.4 presents a

general overview of these measures, describing their functioning and their advan-

tages and disadvantages.



1.2.1 Definitions

With respect to network reliability, a number of similar but slightly different terms

are used. Network element reliability is defined as the probability of a network

element (e.g., a node or a link) to be fully operational during a certain time frame

[E800]. Availability is the instantaneous counterpart of reliability: Network element

availability is the probability of a network element to be operational at one particu-

lar point in time. A simple numerical example is given in Figure 1.6: For each

network node and link, the availability is mentioned. If we assume that these

probabilities are mutually independent (i.e., not caused by a shared origin), we

can easily calculate the availability of a complete network path from these numbers,

being the product of the availability numbers of all network elements along the

path. For example, the availability of the path shown in the figure amounts to

0:99996 � 0:9997 � 0:99999 � 0:9998 � 0:99997 � 0:9999 � 0:99995 � 0:9995 � 0:99990

� 0:9987

If we assume that these probabilities are not mutually independent (e.g., two links

can simultaneously fail as a result of a single network element failure), then the

overall availability of the path is lower.

For a detailed example of availability calculations in a WDM-based network,

see Chapter 3.

Whereas the aforementioned definitions concentrate on the statistical behavior

of the network elements, a number of definitions represent the capabilities and skills

of the network as a whole. Network integrity is the ability of a network to provide

the desired QoS to the services, not only in normal (i.e., failure-free) network

0.9998

0.9999

0.9995

0.99930.9996

0.9997

0.9995

0.9994

0.9995

0.99995

0.99997

0.99996

0.99990

0.99996

0.99999

0.99989

Figure 1.6 Network with node and link (italics) availability numbers.


1.2 Network Reliability 9

conditions, but also when network congestion or network failures occur [Wu97].

Network survivability is a subset of integrity; it is the ability of a network to recover

the traffic in the event of a failure, causing few or no consequences for the users

[G841]. Because it is impossible for a network to be completely survivable (e.g., in

the case of dramatic events like major earthquakes, causing multiple failures in the

network), we use the degree of survivability to denote the extent to which a network

is able to recover from single and multiple network failures (considering the

probability of each individual failure to occur).

Also with respect to failure terminology, a lot of terms and slightly different

interpretations can be found in the literature. In this book, we use the following

convention [M20]:

. A network element defect is a decrease in the ability of a network element to

perform a required function. For instance, a link defect may cause a poor

link quality (indicated by an increased bit error rate), leading to error

detection and resulting in packet/frame retransmissions.

. A network element failure is the termination of the ability of a network

element to perform a required function. Hence, a network failure happens

at one particular moment. For example, a cable cut by an excavator is a

network failure. Note that in practice, some failures do not happen over-

night and a network element may exhibit a gradual degradation. The time of

the failure is then defined as the moment the degradation reaches an un-

acceptable level.

. The inability of a network element itself to perform a required function

is called a fault or outage. This fault lasts until the network element is

repaired, implying that a network fault covers a time interval, in contrast

to a network failure.

These definitions are also illustrated in Figure 1.7.

A further distinction is made between the original failure and the failures that

occur as a consequence of the original failure. A root failure or primary failure is the

basic, original failure occurring in the network (e.g., a cable cut). This root failure

can cause many other failures to occur, the so-called secondary failures or symp-

toms. For example, when a cable is cut (root failure), many secondary failures such

as interrupted connections in higher network layers occur.

Time

Failure Fault Repair

Operational OperationalNot Operational

Defect 1 Defect 2

Figure 1.7 Failure-and-repair process.



A typical example of a SONET network is shown in Figure 1.8, to illustrate the

introduced terminology. At time point t1, the cable between the SONET digital

cross-connect (DXC) A and the regenerator R is cut (root failure). A few milli-

seconds later (t2), there is no light anymore at regenerator R (secondary failure).

Some time later (t3 and t4), DXC B and IP router b no longer receive a signal

(secondary failures). Because of the built-in recovery mechanisms, an alternative

route is found in the SONET network between DXCs A and B along the DXC C

and the traffic is rerouted along that path, or A-C-B (end of fault status in DXC B

and a bit later in router b). Meanwhile, civil workers are repairing the cable that was

cut. After some time, the cable is fixed, and a few moments later light again enters

regenerator R.

Along the life cycle of a network element, several failures could occur. This

leads to an alternation of operational and fault states. To grasp the temporal

behavior of a network element in a probabilistic way, we use two parameters

[G911]:

. The mean time between failures (MTBFs) specifies the average length of the

time interval that elapses between two subsequent failures of the same

network element.

. The mean time to repair (MTTR) refers to the average time needed to repair

the network element when it has failed.

From these two values, the availability of the network element can be

derived as

A ¼ 1�MTTR

MTBF

(under the assumption that the MTBF is much larger than the MTTR, a safe

assumption for most network elements).

Router a Router b

DXC A DXC BRegenerator R

Cable CutNo Light at R

Time

No Signal at DXC BNo Signal at Router b

t1 t2 t3 t4

DXC C

Figure 1.8 Example of failure terminology.



1.2.2 Which Failures Can Occur?

The causes of a network failure are quite diverse. An initial distinction can be made

between planned and unplanned outages. A planned outage is caused by operational

or maintenance actions intentionally performed by the operator, for instance, to

change the software version of a network element (i.e., if this change operation is

disruptive) or to remove or add a new network element. Because the planned

outages are known in advance, preventive techniques can be quite effective against

them; for example, measures can be taken at the service level, the customers can be

notified in advance, the operational and maintenance procedures can be timed to

cause minimal impact (e.g., during the night or a weekend), and so on. In contrast,

unplanned outages are by definition difficult to predict and the operator must

therefore prepare an arsenal of ‘‘defensive’’ measures against them.

A second distinction could be made between internal and external causes,

depending on whether the failure is caused by a network-internal imperfection or

by some surrounding event. Examples of internal causes are design errors, defects of

electronic or optical components, a battery breakdown, and so on. The failure

could also have an external cause such as electricity breakdown, lightning, storm,

earthquake, flood, digging accident, vandalism, or sabotage.

Failures are not restricted to hardware components. Today, advanced commu-

nication technologies and services show an increasing dependence on information

technology (IT) systems, and software in particular. Typical examples are software

bugs, configuration errors (e.g., you forget to turn on an essential protocol), routing

errors (e.g., some routes are missing in the routing table), and hacker attacks.

Hence, the reliability of software systems also affects network integrity. Equipment

vendors are devoting substantial efforts to increase the quality of software. Never-

theless, the measurement and control of software quality (and software perform-

ance in general) is a relatively young branch of science. Because it is impossible to

simulate all possible events that could occur in a communications network, soft-

ware failures are very difficult to detect.

Commonly Occurring Failures

One of the most commonly occurring failures in communications networks are

fiberoptic cable cuts and hence cable cuts are related to the length of the link. Of

course, the vulnerability of a cable also depends on the terrain (e.g., urban area

where a lot of civil works are carried out) and the preventive measures taken to

reduce the vulnerability of the cable system (e.g., armored casings). Most operators

have a long history of cable-cut recordings and know more or less how many cable

cuts they can expect on average per year for their long-distance links. Typical

MTBF values range between 50 and 200 days ([DeM03], [Wil01], [Bat02],

[Jur98]) per 1000 km of cable. The time to repair the cable consists mainly of the

amount of time it takes to determine and reach the cut location, because mending

the cable is a relatively quick procedure. Usually, the repair team monitors the

performance of the fibers before they are returned into operation. Note that most



cable cuts are due to digging activities that accidentally hit the cable, so it is

often possible to quickly start with the repair. However, in some cases reaching

the location can be quite time consuming (e.g., submarine cables). Depending on the

location and the severity of the damage, MTTR values typically range from hours

to several weeks ([DeM03], [Wil01], [Jur98]).

Besides cable cuts, equipment failures also frequently occur. Unlike with cable

cuts, however, it is difficult to collect realistic and accurate MTBF and MTTR

values for equipment failures. Because of the rapid technological evolution, oper-

ators have little practical experience with most of the relatively novel equipment.

Some typical MTBF ranges for IP, ATM, SONET/SDH, and WDM equipment are

shown in Table 1.1 ([DeM03], [Wil01], [Wos01], [Lab99], [Kal96]).

The MTTR for equipment failures largely depends on the urgency, which is

itself dependent on the amount and priority of traffic passing through the failing

equipment. Some typical MTTR values for urgent faults are also shown in Table

1.1. In practice, the time required for the mending of the equipment could be only a

few hours. Yet, the repair time must be supplemented with the time spent to get the

spare parts at the site of the failed equipment. The major exchanges such as

backbone offices usually keep an on-site storage of spare parts (for critical equip-

ment). Other sites rely on a central depot of equipment parts; the transportation of

spare parts then delays the repair. Hence, the total repair time can be much longer,

in the range of 4 hours to several days for remote sites. In addition, the repair time

depends on the ability to detect the defect. Some defects generate merely QoS

degradation and might be difficult to detect.

More Drastic Examples

Besides the more commonly occurring failures described above, a communications

network can be damaged by more severe causes, leading to major disruptions in the

network availability. For example, in the United States, all network faults affecting

at least 30,000 users during at least 30 minutes must be reported to the Federal

Communications Commission (FCC) [McC95]. In the following sections, we

AQ1

Table 1.1 Typical MTBF and MTTR Values for Communication Equipment Failures

Equipment Type MTBF Range (hr) Typical MTTR (hr)

Web Server 104 � 106 1

IP Interface Card 104 � 105 2

IP Router Itself 105 � 106 2

ATM switch 105 � 106 2

SONET DXC or ADM 105 � 106 4

SDH DXC or ADM 105 � 106 4

WDM OXC or OADM 105 � 106 6



describe some major network disruptions, highlighting the extent of the damage,

the measures taken by the network operator to keep critical traffic alive, and the

time it took to repair the failures and to return to normal traffic conditions.

Hanshin/Awaji Earthquake

At 5:46 am on January 17, 1995, the Japanese city of Kobe was hit by the Hanshin/

Awaji earthquake (recorded as 7.2 on the Richter scale) [Kal96]; 5379 people died,

34,626 were injured, and the material damage in the city was immense. The

earthquake caused widespread severe physical damage to the Hanshin area, par-

ticularly Kobe City. The damage was very localized; in many cases adjacent

buildings were affected differently, and sometimes half of a building was virtually

undamaged while the other half was severely affected.

All utilities such as electricity, gas, water, and telecommunications were com-

pletely disrupted (Figure 1.9). The total damage for the Japanese telephone

company NTT was estimated at 30 billion yen (about U.S. $200 million). Eleven

local telephone switches were put out of action for more than 24 hours (mainly

because of lack of power), which cut off as many as 285,000 subscriber lines (about

20% of the subscriber lines in the Hanshin area). More than 60,000 local transmis-

sion lines were affected, as well as signaling systems, billing systems, and other

network databases. It took several hours merely to identify which network

Figure 1.9 Ravage caused by the Hanshin earthquake (left: broken pole, right: extracted conduits).(G. Kalbe, et al., ‘‘Operator requirements,’’ European ACTS project Protection AcrossNetwork Layers [PANEL], deliverable D1, December 1996.)



infrastructure the earthquake destroyed. Long-distance communications services in

Japan were not directly hindered by the disaster, because of the automatic protec-

tion mechanisms installed in NTT’s core network.

As a consequence of the earthquake, calling patterns within, into, and out of

the Hanshin area changed drastically. Peak hour traffic volume on January 17 was

20 times the normal volume.

This resulted in severe network congestion, which NTT handled by limiting the

amount of nonpriority calls per minute in major telephone switches. NTT also

provided 5000 emergency transmission lines to meet the critical communications

needs in the region. NTT undertook intensive repair actions and managed to repair

more than 50% of the damage within 2 weeks.

Erroneous Software Update in AT&T Network

In April 1998, AT&T suffered a catastrophic nationwide failure of most of its

frame-relay network [Gry01]. During the outage, more than 5000 corporations

were unable to complete network-based business operations. For example, retailers

were unable to authorize credit-card payments and financial institutions could not

complete transactions.

When the outage was detected, AT&T engineers focused first on identifying

and isolating the problem. They found out that the problem was caused by a

computer command to upgrade software code in one of the network switch’s circuit

cards. The upgrade was performed but malfunctioned; this created a faulty com-

munication path, which generated a large volume of administrative messages to the

other network switches. As a result, these switches became overloaded and stopped

routing data from customers’ applications. This lasted from 6 to 26 hours before the

network was fully restored.

AT&T provided selected customers (in particular, those with critical applica-

tions) with updates every 15 or 20 minutes throughout the crisis. Although many

large corporate users had backup systems in place, analysts and customers agree an

incident like this can prove devastating for companies without a contingency plan.

In fact, the communications to many smaller companies were left completely dead

until the outage was rectified. After 24 hours, about 96% of the affected services

were reestablished.

Submarine Cable Break

On July 5, 2002, a submarine multiple cable failure affected the Asia Pacific Cable

Network (APCN 2) that connects the Philippines to the Internet [Lem02]. APCN 2

is a 19,000-km underwater fiberoptic cable system that stretches from Japan to

Singapore. It covers major countries in Asia including China, South Korea, Hong

Kong, Japan, Malaysia, Taiwan, and the Philippines. The network has been oper-

ational since December 21, 2001.

The failure caused a considerable slowdown of the company’s services

but did not completely disrupt services. Because of poor weather conditions,

the repair of the failure was delayed. On July 16, the network was completely

repaired.



Deriving Accounted Failure Scenarios

It is practically impossible to provide measures against all possible failures in a

communications network. The impact of some dramatic failures such as those

caused by major earthquakes is simply too great, whereas other failures are too

rare to justify the extra budget needed to cover them. Therefore, a practical strategy

followed by most operators and service providers is to identify the most frequently

occurring failures, to classify these failures in a limited set of failure scenarios, and

to provide ‘‘healing’’ measures to overcome these failures in a gracious, cost-

effective manner. These failures are called accounted failures in the remainder of

this book, although no measures are provided for unaccounted failures. We elabo-

rate on the possible preventive measures in Section 1.2.4, but for now we

concentrate on the definition of the accounted failure scenarios.

When considering the physical network layer, where cable cuts and equipment

failures typically represent the most common failures, most operators consider two

accounted failure scenarios:

1. A single-link failure is a situation in which the link between two adjacent

offices fails. As a consequence, no direct information exchange between

these two offices is possible (until the fault gets repaired). This is illustrated

in Figure 1.10.

Note that measures to heal a link failure will automatically be able to heal smaller

failures affecting only part of a link (e.g., only one direction of a bidirectional SDH

link or only one failing laser of a WDM line system). Stated otherwise, the single-

link failure scenario encompasses all single sublink failure scenarios.

2. A single-node failure is a situation in which a network element in an office

fails (e.g., a hardware equipment failure or a software crash). As illustrated

in Figure 1.11, the failing of a single node automatically puts all attached

Figure 1.10 Single-link failure scenario.



links out of service. Note that the node might be out of service with the

attached links still seen in operation by its neighbor nodes.

Like the single-link failure scenario, a single-node failure scenario encompasses

all single subnode failure scenarios (e.g., failing of only a part of the exchange

office).2

The focus on the single-link or single-node failure scenario is based on two

main assumptions:

. In most cases, the failure of a link or node in the network is statistically

independent of the failure of another link or node in the network (assuming

that dramatic outages affecting large parts of the network, such as earth-

quakes, are very unlikely).

. If the network scale is not too large, the MTTR for a single-link or single-

node failure is typically much shorter than the MTBF. Hence, the probabil-

ity that two (link or node) faults are overlapping in time can be neglected in

comparison to the probability of a single-link or single-node fault.

The situation gets more complicated when considering a logical network layer.

The importance of single-link or single-node failures remains. For instance, in an IP

network layer, this corresponds to an IP link disruption and an IP router break-

down, respectively.

However, the malfunctioning of the IP network layer could be caused by an

unrecovered failure in a lower network layer as well (e.g., a cable cut in the physical

layer), leading to multiple link failures in the IP network layer at the same time. To

Figure 1.11 Single-node failure scenario.

2In fact, even a single-link failure scenario could be seen as a component of a single-node failure scenario.

However, the distinction between a single-link and a single-node failure is important when considering

recovery techniques for both failure scenarios (see Section 1.2.4).



model such a failure scenario, the IETF [Pap02] has defined the concept of shared

risk link group (SRLG), or more general a shared risk group (SRG), that is, a group

of resources that are affected by the same failure. In contrast to the single-link or

single-node failure scenario, in which the failure of individual links or nodes is

considered statistically independent, the SRG concept expresses a statistical de-

pendence between the failures of individual links or nodes.

1.2.3 Reliability Requirements for Various Users and Services

The reliability requirements of a communications network highly depend on the

types of users and the types of services transported through the network.

User Types

The users can typically be classified in the following categories [OSh94]:

. Safety critical users (e.g., hospitals, police, and fire department): For safety

reasons, this group should have communications services at all times.

Service interruptions are unacceptable, especially during fault conditions

(e.g., accessibility of emergency services after an earthquake).

. Business critical users: This group suffers considerable financial losses when

service interruptions occur.

. Low cost users: This group consists of residential users demanding more or

less reliable communications services at a relatively low cost. Service inter-

ruptions cause discomfort but can be tolerated if they are not too frequent.

. Basic level users: The support of this group is the lowest. Service reliability is

only a side issue. This implies that the service provider can support these

users as long as the network is failure free; in case of a failure, bandwidth is

removed from these basic level users to transport services from the first

three user categories. For instance, if a failure occurs in an IP network,

some critical traffic flows may be rerouted along alternative paths, which

may provoke some congestion. This in turn could result in dropping the

traffic generated by basic level users, using some congestion avoidance

mechanisms.

Service Types

Tomorrow’s communications networks will have to carry a plethora of services,

such as the classic plain old telephone service (POTS), voice over IP (VoIP), video-

telephony and videoconferencing, teleworking, TV broadcast services, distance

learning, movies and news on demand, Internet access, teleshopping, and many

others. These services are considerably different, not only with respect to their bit

rate requirements, but also with respect to delay tolerances and the need for

recovery. This is indicated in Table 1.2 [Las99], where the service sensitivity for

delay and the need for recovery are graded on a scale from 1 (not sensitive) through

5 (highly sensitive).



For the subject of this book, the last two columns of the table are particularly

important. The ‘‘need for recovery’’ answers the question of whether recovery

mechanisms are necessary for the applications under consideration. As can be

seen from the table, a lot of applications show a critical dependence on the recovery

skills of the network. To estimate the necessary speed of the recovery process, the

column ‘‘delay sensitivity’’ is clarifying, because delay sensitive applications will be

severely disturbed or even completely disrupted in the case of failures if the recovery

time becomes large. For instance, in the case of POTS, the minimal recovery time

leading to service disruption ranges from 150 ms to 2 seconds [Sos94]. To avoid

POTS disruptions, a recovery time less than 150 ms is needed.

Examples of Service-Level Agreements

From the previous discussion, we know that network reliability is in many cases

crucial for the customers. These expectations are translated into contracts between

an operator or service provider and its customers, via an SLA. Typically, these

agreements include a rebate provision if the service level is not met during a billing

period. With respect to reliability, SLAs usually specify the minimal availability of

the service (e.g., minimal availability of 99.99% required) and the maximum down-

time that is acceptable (e.g., half an hour). The more stringent the reliability

requirements are, the more expensive the service provided by the operator will be.

If these engagements are not met, a financial compensation is usually agreed on

Table 1.2 Overview of Applications Services and Their Typical Requirements

Application Bit Rate

Bit Rate

Variation

Delay

Sensitivity

Need for

Recovery

Plain Old Telephone Service 32–64 Kbps Constant 5 5

Voice Over IP 8–32 Kbps Constant 5 5

Video-telephony 256–1920 Kbps High 5 5

Videoconferencing at least 256 Kbps High 5 5

Teleworking 64 Kbps to 2Mbps Very high 5 4

TV broadcast 2–8 Mbps High 4 4

Distance Learning 64 Kbps to 2Mbps Very high 5 5

Movies on Demand 750 Kbps to 4Mbps High 4 3

News on Demand 64 Kbps Very high 2 2

Internet Access 64 Kbps to 2Mbps Very high 1 2

Teleshopping 64 Kbps to 2Mbps Very high 2 2

Kbps, kilobits per second; Mbps, megabits per second.

From A. Lason, et al., ‘‘Network Scenarios and Requirements,’’ European IST project Layers Inter-

working in Optical Networks (LION), deliverable D6, September 1999.

AQ2



(e.g., X% of the monthly charge is waived). The SLA may also specify the way

customers will be notified of outages.

Trend of Reliability Requirements

Though dependent on the specific user and service type, the reliability of a commu-

nications network is clearly an important issue and will become even more impor-

tant in the future.

. In a liberalized telecommunications sector, the sense of well-being of

business and residential customers of traditional and multimedia services

steps more and more into the limelight. Price, quality, and flexibility are

key. Users do not appreciate an interruption in these services. Network

failures discredit operators and service providers in a commercial market.

. The total amount of data to be transported is ever increasing and the

socioeconomic life is relying more on communications services. Hence,

the consequences of network failures may be significant.

. Because of the introduction of optical fiber and digital or optical switching,

traffic is more and more concentrated in fewer network elements (e.g., a

fiber carrying thirty-two 10-Gbps wavelengths can carry about 4 million

simultaneous telephone calls). This augments the vulnerability of the

network.

1.2.4 Measures to Increase Reliability

As indicated in Section 1.2.2, quite a large variety of failure types may occur in a

telecommunications network. However, many services require high network avail-

ability. To bring these two opposing factors together, we take several measures.

A first possibility is to prevent failures as much as possible. For instance, the

likelihood of a cable cut can be reduced by putting the cable deeper in the ground or

by using special armored cables, and the number of failures in exchange offices can

be diminished by a fire security plan or by limited access to the building. Equipment

failures can be reduced by a safer design, an extra cover, or more testing of the

hardware and software before putting it into use. Quick detection of failures or

dangerous situations, by a smoke detection system, an automatic sprinkler system,

or a direct connection with the fire department, can increase the availability as well.

Another strategy is to duplicate vulnerable network elements. For instance, in the

case of a cross-connect failure, all traffic can be switched to an identical hot standby

cross-connect (most helpful in hardware failures). Also the network access link can

be duplicated to ensure that users are not cut off from the network by a single

failure. This dual homing principle is illustrated in Figure 1.12. When a failure

occurs, the network can still be accessed via the unaffected network access link.

Although the aforementioned measures alleviate the problem to some extent, in

many cases they turn out to be insufficient to meet the network availability levels

required by the customers. Moreover, they can be quite expensive and do not allow



easy differentiation between critical traffic—requiring extremely high availability—

and less important traffic from low cost or basic level users.

To circumvent these drawbacks, most modern communications networks use

so-called network recovery or resilience schemes. As soon as a failure in the network

is detected, these mechanisms automatically divert the traffic stream affected by the

failure to another (fault-free) path in the network. This way, the traffic eventually

reaches its destination. These schemes can greatly enhance the availability of the

services transported through the network. In contrast to the aforementioned mea-

sures, recovery schemes operate on a network scale level, not on an individual

network element level.

The basic principle of a recovery scheme is illustrated in Figure 1.13. Under

normal (i.e., fault-free) conditions, the traffic is transported along the working or

primary path. If a failure is detected along that path, the recovery scheme is

activated. A part of the working path (or the whole path, depending on the recovery

technique), the recovered segment, will be bypassed by a recovery or alternative path.

Traffic that was flowing along the failed network element will be redirected in the

recovery head-end (RHE) toward the backup path (the switch-over operation). After

passing the recovery tail-end (RTE), the traffic is again transported along the

working path toward the destination. In most cases diverse routing is applied, that

is, the recovery path is usually resource disjoint (e.g., link and/or node disjoint)

from the working path, to ensure that a single failure will not affect both the

working and the recovery path [Sha03].

Figure 1.12 Principle of dual homing.



Note that a recovery mechanism imposes some extra requirements on the

communications network. For any failure it wants to recover from, there must be

an alternative route in the network (topology requirement) to serve as a recovery

path. This implies that a so-called single point of failure must be avoided in the

network; it should be designed so that one single failure cannot disconnect a part of

the network from the rest. If an equivalent QoS must be offered to the rerouted

flows after the link or node failure, then other requirements on the recovery path

must be satisfied as well:

. There should be enough available bandwidth along the recovery path

(capacity requirement), so a recovery scheme will typically require some

additional capacity, the backup or spare capacity.

. A considerable rise of the propagation delay from source to destination

should be avoided.

Note also that a recovery scheme usually forms a component of a particular

network technology; hence, the mechanism is active only in the network layer

corresponding to that technology. The most important recovery techniques in

SONET/SDH, OTN, IP, and MPLS are highlighted in Chapters 2, 3, 4, and 5,

respectively.

1.3 Different Phases in a Recovery Process

Although a wide variety of recovery schemes exists (see Section 1.5), they all show a

rather similar succession of phases, that is, the recovery cycle.

Working Path

Recovered Segment

Recovery Path

RHE

RTE

Figure 1.13 Basic principle of recovery scheme.



1.3.1 Recovery Cycle

The different phases of this cycle are shown in Figure 1.14 [Sha03]. If a failure in the

network occurs, it could take some time before a node adjacent to the failure detects

the fault. This time may depend, for instance, on the frequency of signals sent, on

the speed of fault detection in a lower network layer and notification toward upper

layers, on the time it takes for the node to gather all abnormal information from

various signals, correlate this information, and derive the exact fault state from that

(diagnosis), and so on. Once the fault is detected, the node that detected the fault

may (or may not) wait some time before it starts sending notification messages

toward the other nodes in the network. For instance, this hold-off time could allow

a lower layer recovery scheme to repair the fault. For example, in an IP network

supported by an optical transport network, a cable cut could be quickly repaired by

an optical recovery mechanism so the IP link becomes operational again shortly

after the moment of failure. If the fault still exists after the hold-off time, fault

notification messages are sent throughout the network to inform the other nodes

that will be involved in the recovery action.

Note that this timer may be a static or a dynamic value. In the latter case, the

timer is a function of the number of failures within a certain period; the more

failures have been detected recently, the longer the hold-off time. This technique is

called dampening and helps stabilize the network in case of a flapping resource (i.e.,

a resource alternating quickly between the operational state and the fault state). A

typical example of dampening is discussed in Chapter 4.

The time between the first and the last recovery action is called the recovery

operation time (not to be confused with the overall recovery time; see Figure 1.14).

Time

Fault Notification Time

Fault Detection TimeHold-Off Time

Recovery Operation TimeTraffic Recovery Time

FailureFault Detected

Recovery Time

OperationalOperational

Figure 1.14 Recovery cycle. (V. Sharma, F. Hellstrand, ‘‘Framework for MPLS-based recovery,’’Internet draft, work in progress, RFC 3469, February 2003. Available at www.ietf.Accessed May 2004.)


1.3 Different Phases in a Recovery Process 23

This time span could include the exchange of messages between the different nodes

involved in the recovery action to coordinate the operation. After the last recovery

action, the traffic starts using the recovery path. However, it could still take some

time before the traffic is completely recovered. This traffic recovery time may

depend on the propagation delay along the recovery path, the location of the

fault, and the recovery scheme used.

1.3.2 Reversion Cycle

After the recovery cycle, the network is again fully operational. However, the

new routes of the traffic along the recovery paths may be less ideal than

before the failure (e.g., recovery path longer than the original path and more

congestion along recovery path). Therefore, a dynamic rerouting protocol may be

initiated to optimize the usage of network resources in the new situation. Another

possibility is to wait for the repair of the fault that has occurred and to redirect

the traffic from the recovery path back to the working path once the fault is

completely repaired (in this case, the recovery technique is called revertive). This

switch-back operation also follows a succession of general phases, the so-called

reversion cycle.

The different phases of this cycle are shown in Figure 1.15. The reversion cycle

bears a strong resemblance to the recovery cycle, described earlier. Once the fault is

repaired, it could take some time (e.g., dependent on lower layer protocols) before

this repair is detected: the fault clearing time. After that, the protocol may decide to

wait for a certain time before starting the notification of the repaired fault. This

hold-off time may be needed to ensure that the path is stable. Indeed, in the case of

an intermittent fault, a quick reaction of the reversion process may lead to unstable

network conditions. As in the previous case, note that this timer may be a static or a

dynamic value (dampening).

Time

Fault Repaired Notification Time

Fault Clearing TimeHold-off Time

Reversion Operation TimeTraffic Reversion Time

Fault RepairedFault Cleared

Figure 1.15 Reversion cycle. (V. Sharma, F. Hellstrand, ‘‘Framework for MPLS-based recovery,’’Internet draft, work in progress, RFC 3469, February 2003. Available at www.ietf.Accessed May 2004.)



After that, in a similar way as in the recovery cycle, the repaired fault is notified

throughout the network and the actual reversion operation is carried out; the traffic

is again switched from the recovery path to the working path. Finally, it may take

some time before the traffic begins flowing on the working path again (traffic

reversion time).

In contrast to the recovery cycle—which is typically reacting to an unforeseen

event, the failure—the reversion cycle can be planned well in advance. In a rever-

sion, there is no need for a hasty operation; a well-controlled switch-back mecha-

nism with minimal disruption is typically preferred.

1.4 Performance of RecoveryMechanisms: Criteria

As is pointed out in Section 1.5, a wide variety of recovery mechanisms exist,

depending on the facilities of the network technology, the priorities and desires of

the typical users of the network, and so on. Every recovery mechanism has its

strengths and weaknesses. This section elaborates on the criteria that represent the

performance components of the recovery scheme. This overview of criteria allows

us to weigh the performance and the cost of a recovery mechanism, to assess the

pros and cons of any recovery mechanism, and to make a judicious comparison

between recovery mechanisms [Sha03], [Owe02].

1.4.1 Scope of Failure Coverage

Recovery schemes may offer various types of failure coverage. The scope of failure

coverage may be defined by several metrics, which are described in the following

paragraphs.

Failure Scenarios

The recovery mechanism may be designed to cover a particular failure scenario,

such as a single-link failure (e.g., one fiber in an optical network or one OC192 (see

Chapter 2) line system in a SONET network), a single-node failure (e.g., an optical

node or an IP/MPLS node), or a single-link or single-node failure (i.e., single

failure, either a link failure or a node failure). To ensure very high availability,

you may design the recovery mechanism to cover a number of concurrent faults, for

example, a double link failure or an SRLG failure.

Percentage of Coverage

The recovery mechanism may completely or partially cover the failure scenario. For

example, the recovery mechanism may be able to recover only a percentage of the

traffic volume affected by the failure (e.g., if a certain percentage of the traffic is

high priority). Another example is the percentage of coverage of node failures. In

contrast to link failures, 100% coverage of a node failure might be possible (if the


1.4 Performance of Recovery Mechanisms: Criteria 25

node is a transit node) or not (in the case of an edge node). Traffic coming from or

terminated in the failing node cannot be recovered (at least not by a recovery

mechanism operating in this network layer only; see Chapter 6 for a more thorough

discussion of this topic). As illustrated in Figure 1.16, only traffic passing through

the failing node can be recovered.

1.4.2 Recovery Time

The recovery time is the time between a network failure and the point at which a

recovery path is installed and the traffic starts flowing through it. The recovery

time usually forms an important criterion for a recovery mechanism: Typically,

the smaller the recovery time, the less the services are harmed by the network

failure.

Note: This does not imply that after the recovery time, the traffic will experience

the same network conditions as before the failure occurred. For instance, the

recovery path could have more limited resources or a worse signal quality than

the working path.

1.4.3 Backup Capacity Requirements

When comparing different recovery schemes, the backup capacity that is needed to

recover from the same failure scenarios may be quite different. The capacity

requirements of the recovery scheme may depend on variables such as the algorithm

selecting the recovery paths, the traffic characteristics, or the layer in which the

recovery mechanism operates.

Working Paths

Recovery Path

Figure 1.16 Incomplete coverage of a node failure.



1.4.4 Guaranteed Bandwidth

Some recovery mechanisms inherently guarantee that the full bandwidth of the

affected traffic will be rerouted along the recovery paths. Other recovery mecha-

nisms do not provide any bandwidth guarantee, and depending on the situation,

there may or may not be enough backup capacity to reroute all affected traffic.

1.4.5 Reordering and Duplication

Even though a switch-back operation from recovery path to working path may

seem beneficial at first glance, it can have some awkward complications with respect

to the order in which traffic is delivered. For example, in packet switched networks

(e.g., IP or IP/MPLS networks), the switch back may result in a reordering of the

packets at the destination. Indeed, if the delay of the packets along both paths is

different, the switch-back operation may cause some packets to overtake others.

Similar situations could also occur at the switch-over operation. For example, in

one-to-one protection (see Section 1.5.3), a switch-over operation may cause a

temporal duplication of traffic.

Such reordering or duplication may have a significant impact on the complexity

and cost of the destination node, because (depending on the application) the infor-

mation stream should usually be reordered to respect the original order.

1.4.6 Additive Latency and Jitter

Recovery schemes may introduce additional latency to traffic. For example, a

recovery path may by significantly longer than the working path. This may be

dependent on the recovery path selection algorithms. For some services, it is also

important to minimize the jitter, that is, the fluctuations on the delay for data from

the same traffic flow.

1.4.7 State Overhead

As the number of recovery paths in a recovery plan grows, the state (i.e., the

information stored in the individual network elements) required to maintain them

also grows. The exact required state may depend not only on the number of

recovery paths (the state overhead is usually proportional to the number of recov-

ery paths), but also on the particular state needs of the recovery mechanism.

1.4.8 Scalability

As the network grows (i.e., more links and nodes) and the amount of traffic to be

transported by the network increases, the performance of the recovery mechanism


1.4 Performance of Recovery Mechanisms: Criteria 27

may change considerably. For instance, for some schemes the state overhead may

increase very fast with growing network or traffic size, whereas other recovery

schemes need only a modest state overhead increase. In addition, other perfor-

mance factors such as the recovery time or the required backup capacity may be

highly influenced by the network and traffic size.

A recovery mechanism is said to be scalable if the performance does not depend

too much on the size of the network and the traffic to be transported. Scalability is

an important characteristic for a recovery scheme to be ‘‘future proof.’’

1.4.9 Signaling Requirements

The operation of a recovery scheme might require a significant number of signaling

messages between the network nodes. For instance, the fault detection may depend

on (the absence of) messages; the fault notification is based on messages; and

signaling can also play a crucial role in the recovery operation itself.

Because some recovery schemes require much more signaling messages than

others, the resources (in terms of bandwidth, central processing unit [CPU] usage,

etc.) used by signaling form another criterion to judge the performance of a

recovery scheme.

1.4.10 Stability

When designing a recovery mechanism, you typically will find a number of timing

parameters (e.g., time between two consecutive messages, hold-off times, etc.) that

can be tuned more or less freely within a certain range. Although small values for

these timers usually speed up the recovery, they may have a deteriorating impact on

the network stability. For instance, in the case of a flapping link, small hold-off

timers for reversion may lead to a never-ending switch-over and switch-back

alteration, having a significant impact on traffic disruption.

1.4.11 Notion of Recovery Class

Some recovery schemes make it possible to distinguish between different classes of

traffic and to take appropriate recovery actions for each individual QoS class

[Aut02]. This may be a useful feature, because different traffic classes typically

impose different recovery requirements. For instance, one traffic class may need a

very fast recovery scheme with bandwidth guarantee, whereas for another class a

slow recovery mechanism at a low cost may be sufficient.

1.5 Characteristics of Single-Layer RecoveryMechanisms

Depending on the particular application(s) a recovery mechanism is aimed for,

some evaluation criteria described earlier can be far more important than others.

AQ3



Moreover, the specific network technology typically imposes some constraints on

the implementation feasibility of a recovery scheme. Hence, a wide range of recov-

ery mechanisms exists in today’s networks. In what follows, the essential choices

when designing a recovery mechanism are enumerated and the pros and cons of

each option are elucidated.

1.5.1 Backup Capacity: Dedicated versus Shared

With respect to the allocation of backup capacity, two major options exist: dedi-

cated or shared backup capacity.

In the case of dedicated backup capacity, a particular backup resource corre-

sponds to one particular working path. In other words, there is a one-to-one

relationship between the backup resources and the working paths. A backup

resource can be used only by a particular working path. This concept is illustrated

in Figure 1.17: The recovery paths A-D-E-F-C and G-D-E-F-I are not sharing any

bandwidth on the common part D-E-F, despite that the working paths A-B-C and

G-H-I do not have any common resource and hence no large probability of

simultaneous fault states.

The other possibility is to share a backup resource between several working

paths. If a failure occurs along one of these working paths, the backup resource is

used to recover from this failure. If at another time a failure occurs along another

one of these working paths, the same backup resource will be used to recover from

this failure. Stated otherwise, there is a one-to-many relationship between the

Working Paths

Recovery Paths

Channel 1

Channel 2

A B C

D E F

G H I

Figure 1.17 Dedicated backup capacity.


1.5 Characteristics of Single-Layer Recovery Mechanisms 29

backup resources and the working paths. This concept is illustrated in Figure

1.18: The recovery paths A-D-E-F-C and G-D-E-F-I are now sharing the

bandwidth on the common part D-E-F. This is permitted because the probability

of having a simultaneous malfunction on the working paths A-B-C and G-H-I is

low, so usually at most one recovery path will be exploiting the resources on D-E-F

at a time.

The second option clearly is more complex than the first one; after a failure

along a working path, you must be sure that the corresponding backup resources

are still available for the recovery (i.e., not used for the recovery of another working

path). On the other hand, the backup capacity can be used much more efficiently in

the case of shared backup capacity, because of its flexible character. The purpose of

the backup resources is adapted to the failure that occurs.

1.5.2 Recovery Paths: Preplanned versus Dynamic

Another choice depends on the moment the path for the recovery flow is chosen. In

the preplanned option, for all accounted failure scenarios the path of the recovery

flow is calculated in advance (i.e., before any failure occurs). In the dynamic option,

recovery paths are not planned; their path is computed ‘‘on the fly’’ once the failure

is detected, for instance, by the RHE or RTE node. If a failure occurs, the recovery

mechanism starts searching dynamically for possible recovery paths throughout the

network.

One CommonChannel

A B C

D E F

G H I

Working Paths

Recovery Paths

Figure 1.18 Shared backup capacity.



The preplanned option advantageously allows a fast recovery if a failure

occurs, whereas a dynamic recovery mechanism may take additional time to iden-

tify suitable recovery paths. On the other hand, the preplanned option lacks

flexibility for unaccounted (i.e., noncovered) failure scenarios. A dynamic recovery

mechanism is able to search for recovery paths for unaccounted failures as well,

although there is no guarantee that it will find such a recovery path.

Because of its flexible nature, a dynamic recovery mechanism will typically lead

to a situation of shared backup capacity. In a preplanned recovery mechanism, the

nature of the backup capacity can be either dedicated or shared.

1.5.3 Protection versus Restoration

A quite important distinction typically made when considering recovery mecha-

nisms is between protection and restoration [Man2]. Both options require signaling,

but the subtle difference lies in the timing of the signaling actions. In the case of

protection, the recovery paths are preplanned and fully signaled before a failure

occurs. Hence, when a failure occurs, no additional signaling is needed to establish

the protection path.3 In the case of restoration, the recovery paths can be either

preplanned or dynamically allocated, but when a failure occurs additional signaling

will be needed to establish the restoration path.

A major advantage of protection compared to restoration is typically its fast

recovery time. Indeed, the additional signaling after a failure occurrence in the case

of restoration may consume quite some (precious) time. On the other hand, resto-

ration techniques can be more flexible with regard to the failure scenarios they can

recover from and require in many cases less backup capacity because of their shared

nature.

Protection Variants

Between the different recovery mechanisms that are classified as protection

schemes, a further distinction can be made, depending on the number of recovery

entities that are protecting a given number of working entities [Man2].

1þ1 Protection (Dedicated Protection)

One dedicated protection path protects exactly one working segment and the

normal traffic is permanently duplicated at the RHE on both the recovery path

and the working path. At the RTE, the signal with highest quality is selected and

sent to the destination. Another method consists of selecting the working path

unless a signal defect is detected. Then, the RTE starts to select the traffic from

the recovery path. Note that this protection strategy is very efficient in terms of

recovery time but quite expensive in terms of bandwidth usage.

3It must be noted that this does not exclude all signaling after a failure. Various other kinds of signaling

may take place between RHE and RTE, for fault notification, to synchronize their use of the protecting

path, for reversion, and so on [Man2].



1:1 Protection (Dedicated Protection with Extra Traffic)

One dedicated protection path protects exactly one working segment, but in failure-

free conditions the traffic is transmitted over only one path at a time. This leaves the

opportunity to transport extra traffic along the protection path in failure-free

conditions. As soon as a fault along the working segment is detected, the extra

traffic is preempted from the recovery path and the traffic affected by the failure is

switched to the protection path.

1:N Protection (Shared Recovery with Extra Traffic)

A specific recovery entity is dedicated to the protection of up to N (explicitly

identified) working entities. In failure-free conditions, the recovery entity can be

used for extra traffic.

M:N Protection (M � N)

A set of M specific recovery entities protects a set of up to N specific working

entities. The two sets are explicitly identified. Extra traffic can be transported over

the M recovery entities when available.

1.5.4 Global versus Local Recovery

To bypass the failed network facilities, recovery schemes change the route

of affected traffic. We define the recovery extent as the portion of the working

path that may be manipulated by the recovery scheme, that is, the recovered

segment.

In local recovery, only the affected network elements are bypassed. In other

words, the RHE and RTE are chosen as close to the failed network element as

possible (Figure 1.19). If a single link fails, a (link-disjoint) recovery path is set up

between the nodes adjacent to the failure. If a single node fails, all links incident to

RTE RTERHERHE

Working Paths

Recovery Paths

Figure 1.19 Local recovery for single-link (left) and single-node (right) failure.



the failing node cannot be used anymore. Hence, the local recovery path establishes

a (node-disjoint4) recovery path between every two ‘‘neighbor’’ nodes of the failing

node.

The other extreme is global recovery, in which the complete working path

between source and destination is bypassed by a recovery path. In other words,

the RHE and RTE will coincide with the source and destination of the working

path, respectively (Figure 1.20). In a preplanned recovery mechanism, the global

recovery path should be disjoint from the working path, which might impose some

additional constraints to compute both paths. For instance, if the mechanism aims

only at recovering from single-link failures, a recovery path that is link-disjoint

from the working path will be sufficient. If, on the other hand, the mechanism wants

to recover from single-node failures (or both single-link failures and single-node

failures), the recovery path must be node-disjoint5 from the working path.

When comparing local and global recovery, several pros and cons arise:

. In local recovery, the RHE and RTE are closer to the failure. Hence, these

nodes will typically detect the fault rather quickly, leading to a smaller

recovery time than for global recovery. In other words, local recovery is

usually much faster than global recovery, an important advantage in time-

sensitive applications.

. An obvious drawback of local recovery is apparent from Figure 1.19: The

resulting route followed by the traffic after recovery is often longer than

needed. The main reason for this suboptimum result is that local recovery

does not consider other parts of the working path than the recovered

segment. Hence, the same traffic may cross a particular link twice. This

4Except for the RHE and RTE, of course.5Except for the RHE and RTE, of course.

RTE RTERHERHE

Working Paths

Recovery Paths

Figure 1.20 Global recovery for single-link (left) and single-node (right) failure.



phenomenon is called back hauling. Because of its network-wide optimizing

nature, global recovery will in many cases require less backup capacity than

local recovery, for identical failure scenarios.

. Local and global recovery are also slightly different with respect to the

failure coverage. For instance, if two successive nodes along a working path

fail, global recovery could still resolve the problem, whereas local recovery

will fail.

. The number of recovery paths needed in the complete network can be

largely different when comparing local and global recovery, resulting in

different state overhead requirements. In many cases, global recovery may

generate more state overhead.

Of course, local and global recovery represent only the two extremes of a whole

range of intermediate possibilities, where the recovery extent is longer than in the

local option, but shorter than the complete working path (global option). For

instance, in G-MPLS networks these intermediate options are denoted as ‘‘segment

recovery,’’ whereas the term subnetwork connection protection is applied for

SONET/SDH and OTN networks.

1.5.5 Control of Recovery Mechanisms

Another attribute pertains to which entity is in control of the recovery process.

Centralized recovery mechanisms depend on a central controller to determine

which recovery actions to take. The central controller has a global view of the

network status. This controller determines where and when a fault has

occurred, gathers network-wide state information, and issues (switching) com-

mands to reconfigure all of the network elements involved in the recovery

process. Network management systems based on the Telecommunications

Management Network (TMN) [M30000] form a typical example of centralized

operation systems.

Decentralized or distributed recovery mechanisms operate without the interven-

tion of a central control system. In this case, the network elements feature

intelligent control systems, which autonomously initiate and steer the recovery

actions. In other words, the control is distributed over the network elements

involved in the recovery process. In contrast to centralized mechanisms, these

distributed control systems do not have a global but only a local view of the

network status. They may have to exchange messages to provide each other

with sufficient information and coordinate their recovery actions. As such,

multiple network elements work in parallel to put disrupted traffic on an

alternative route. A typical example of a distributed system is the control

plane in IP and G-MPLS networks.

Note that the recovery path computation could be decorrelated from the action

of recovery. Hence, a recovery mechanism can be a combination of both centralized

and distributed aspects. As an example, in IP, once the link or node failure is



detected, the recovery path is computed on the fly by every IP node in the network

(recomputing their routing table) as soon as they are informed of the failure. Each

node then reroutes the traffic whose destination can be reached via a new path. In

the case of MPLS traffic engineering, the recovery paths can be computed by a

central system (also called path computation server [PCS]) or can be computed by

the nodes themselves before the failure. Once the failure is detected, the decision of

recovery is taken by the node detecting the failure, not by the central system (see

Chapter 5 for more details).

Both control systems have their strengths and weaknesses. Some examples are

as follows:

. In principle, centralized mechanisms are simpler. The interaction between

the individual network elements in a distributed system, in order to give

each network element a good and up-to-date view on the network status,

can be quite complex.

. Because of the complexity of this interaction, centralized systems tend to

have a better global view of the network, whereas the view of distributed

systems is typically more local.

. Because of their global view of the whole network topology and complete

resources, centralized systems are generally more efficient in terms of re-

quired capacity. Also, more complicated algorithms are usually easier to

implement on a central PCS than in individual optical/SDH/IP-MPLS

nodes.

. Distributed systems are more scalable, because of the parallel processing

effect in the individual network elements.

. In a centralized system, the control architecture itself also forms a vulner-

able aspect of the network.

. It is easier for a human expert to supervise a centralized system. This may

turn out to be beneficial in case of unaccounted catastrophes, in which the

human operator may intervene in the control of the recovery (remotely

controlled patching).

1.5.6 Ring Networks versus Mesh Networks

In a ring network (see Section 1.1.1), the restricted routing pattern does not only

hold for working paths. The recovery of traffic is also carried out on a ring-by-ring

basis. If a failure occurs along a ring, the traffic is rerouted along the other side of

the ring. Figure 1.21 shows an example in which two simultaneous faults (in

different rings) lead to recovery actions in both affected rings.

To avoid single points of failure, special recovery measures must be taken for

the interconnection of the rings. Chapters 2 and 3 further elaborate on recovery

techniques in SONET/SDH and OTN ring networks, respectively.

In a mesh network, no restriction is imposed on the routing pattern of the

recovery path(s).



1.5.7 Connection-Oriented versus Connectionless

The connection-oriented or connectionless nature of a network technology is also

reflected in its setup of recovery paths. In connectionless networks, such as an IP

network, there is no need for a new connection between RHE and RTE before the

traffic can start flowing again. In connection-oriented networks, a recovery connec-

tion must be set up first (before the occurrence of a failure in preplanned mode or

when a fault is detected in dynamic mode).

1.5.8 Revertive versus Nonrevertive Mode

As mentioned in Section 1.3.2, some recovery mechanisms switch back from the

recovery path to the working path once the fault is completely repaired. If this

revertive mode is provided, this can lead to more efficient network utilization than

leaving the traffic along the recovery path. On the other hand, the nonrevertive

mode avoids the temporary repercussions of a switch-back operation.

1.6 Multilayer Recovery

In the previous sections, the main characteristics of recovery schemes active in one

network layer (technology) were presented. These schemes prove very effective to

cover a number of failure scenarios. In a realistic (multilayer) network, one

could imagine a situation in which every network layer has its own recovery

mechanism. For instance, in an IP-over-OTN network, IP restoration could

be used to recover from an IP router failure or an IP interface card failure and

Ring 1Ring 2Ring 3

Working Path

Recovery Paths 1 and 2

Figure 1.21 Recovery in ring networks.



one-to-one optical protection could be used to recover from an OXC failure or an

optical fiber cable cut.

However, not every failure in a particular network layer can be resolved by a

recovery mechanism in that same layer. Consider, for instance, Figure 1.22, where

the OXC B is hit by a failure. The fault is detected in the optical network and a

recovery action may be initiated in the OTN layer. However, this OTN recovery

action cannot recover the traffic along the working path, because from the OTN

layer point of view, this traffic is nothing more than two separate connections A-B

and B-D, which are both unrecoverable in the OTN layer. From the IP point of

view, a number of secondary failures (links a-b, b-c and b-d) are noticed,

isolating router b. Upon detection of these faults, the IP network layer could

also initiate recovery actions (eventually leading to the recovery path indicated in

Figure 1.22).

In other situations (e.g., the failure of link A-B in Figure 1.22), both a recovery

action in the OTN layer and a recovery action in the IP layer are able to resolve the

problem. If these recovery mechanisms are merely triggered by detection of a fault,

an uncoordinated and inefficient action may result. From these examples, it be-

comes clear that interworking and coordination between the network layers will be

needed for recovery purposes. As explained in the following sections, this inter-

working may take on different forms [Col02].

OTN Layer

IP Layer

a

b

c

d

A

B

EC

D

Figure 1.22 Multilayer recovery.


1.6 Multilayer Recovery 37

1.6.1 Sequential Approach

Instead of uncoordinated recovery in several network layers, one could ensure that

a fault is not resolved in different layers at the same time—leading to racing

conditions—by imposing a chronological order on the recovery mechanisms. This

could be implemented with a hold-off time (see Section 1.3.1). For instance, upon

detection of a fault, the server layer may start recovery immediately, whereas the

recovery mechanism in the client layer has a built-in hold-off time before initiating

the client recovery process. This way, if the fault is already fixed by the server layer

recovery mechanism before the hold-off time expires, no client recovery action will

take place. An alternative implementation is based on a recovery token signal, that

is, a token that is sent from the server layer recovery mechanism to the client layer

from the moment that it knows it cannot recover the traffic. Upon receipt of this

token, the client layer recovery mechanism is initiated. This allows to limit the

traffic disruption time in case the server layer cannot recover.

1.6.2 Integrated Approach

A more radical means to ensure coordination between the recovery mechanisms in

different layers is to combine the two mechanisms in one integrated multilayer

recovery scheme. This implies that the recovery scheme has a full overview of

both layers and that it can decide when and in which layer (or layers) to take the

appropriate recovery actions. Although this approach is clearly the most flexible

one from a recovery point of view, combining different technologies in one mech-

anism is often unrealistic from a practical point of view. In Chapter 6, we elaborate

in much more detail on multilayer recovery and the related implementation issues.

1.7 Conclusion

This chapter is intended to present a brief overview of today’s and future network

technologies and concepts in general and of recovery mechanisms to augment the

reliability of information traffic through the network. Although this chapter

focused on the generic technology-independent characteristics of recovery tech-

niques, the following chapters will go into more detail for each individual technol-

ogy: SONET and SDH recovery in Chapter 2, optical network recovery in Chapter

3, IP recovery in Chapter 4, and MPLS-based recovery in Chapter 5. These various

recovery techniques are then combined in Chapter 6, providing the big picture

for recovery in multitechnology networks.



C H A P T E R 2

SONET/SDH Networks

In the last decade, Synchronous Digital Hierarchy (SDH)/Synchronous Optical

NETwork (SONET) networks have been deployed in many commercial networks.

SDH/SONET is a technology used in transmission networks: These networks can

provide huge amounts of capacities between nodes in client networks in a flexible and

cost-effective way. Taking into account that Internet Protocol (IP) data traffic

becomes the most dominant type of traffic, the SDH/SONET technology is enhanced

with some new features (e.g., the Link Capacity Adjustment Scheme [LCAS]) to

better fulfil the needs of this traffic type. As the traffic keeps growing, one can observe

a slow evolution from SDH/SONET to Optical Transport Networks (OTNs): OTNs

switch complete wavelength channels as a single entity, whereas SDH/SONET

networks switch on a sub–wavelength granularity. This chapter is devoted to the

SDH/SONET technology, and Chapter 3 is dedicated to OTNs.

With respect to network recovery, the SDH/SONET technology is commonly

accepted as a network technology that has already proven to be capable of provid-

ing very fast protection switching (on the order of 50 or 60 milliseconds [ms]). First,

this is realized by having sophisticated supervisory processes for failure detection,

notification and propagation process. Second, the Automatic Protection Switching

(APS) protocol is responsible for switching over very fast from the affected

resources to dedicated preprovisioned protection/backup resources. Although pro-

tection rings have been chosen as the strategy to implement the fast protection

switching, there is a trend to shift from ring-based to mesh-based networks.

Section 2.1 starts by introducing the concept of transmission/transport net-

works, and Section 2.2 gives a brief overview of the SDH/SONET technology.

While discussing the SDH frame format, Section 2.2 particularly focuses on that

part of the overhead that is needed for failure detection. Section 2.3 highlights the

operational aspect of Automatic Protection Switching in SDH networks; more

39

We are greatly indebted to Didier Colle, INTEC, Ghent University, for his exceptional contribution to

the writing of Chapter 2.


precisely, it discusses the failure notification and propagation process plus the

basics of the APS protocol. Sections 2.4, 2.5, and 2.6 describe the various recovery

strategies possible in SDH networks, starting in Section 2.4 by discussing the

popular protection rings. Section 2.5 continues with linear protection switching

and Section 2.6 highlights opportunities for restoration and compares restoration

versus protection. Section 2.7 presents a practical case study, showing the cost

advantages of having hybrid protection strategies and highlighting some issues

with respect to providing protection when considering different node architectures.

The main findings are summarized and recapitulated in Section 2.8. Section 2.9

recommends the review of some reference material, on which this chapter relies, and

highlights some research topics related to SDH network recovery.

2.1 Introduction

The goal of this section is to introduce the concept of transmission networks: SDH/

SONET networks are a particular example of transmission networks. Section 2.1.1

starts with positioning transmission networks in the overall network. Then Section

2.1.2 briefly discusses network management, which is an important aspect of

transmission networks. Section 2.1.3 highlights how to model and structure trans-

mission networks, and finally Section 2.1.4 provides a summary of Section 2.1.

2.1.1 Transmission Networks

Communications networks typically consist of network nodes, interconnected by

network links. Based on some control information provided by the enduser, the

network nodes know how to ‘‘route’’ or ‘‘switch’’ traffic through the network node.

For example, in the case of a telephone network, the calling party dials the phone

number of the called party in order to dictate how the exchanges in the network

should connect incoming circuits to outgoing circuits to establish an end-to-end

circuit between calling and called party. In the case of an IP network, the enduser

adds sufficient control information (the destination address) to each packet so the

routers can forward the packets in the direction of the destination (the routers use a

routing protocol).

A cost-effective network typically consists of a rather low number of network

nodes (compared to the number of endusers connected to the network). Often the

network is organized hierarchically (e.g., think about the regional, national, and

international levels in a telephone network). For cost-efficiency reasons this typi-

cally results in lots of traffic flows that are bundled (or aggregated) and routed

between the same network nodes. Nevertheless, often it is not reasonable or feasible

to provide a physical link for each bundle. For example, in the core of a telephone

network, a (dense) mesh of links is typically required.

Therefore, transmission or transport networks aim at provisioning any set of

high-bandwidth bit pipes (i.e., circuits) between the network nodes independently of

the underlying physical network topology. The left side of Figure 2.1 illustrates how


40 C H A P T E R 2 SONET/SDH Networks40 C H A P T E R 2 SONET/SDH Networks

the transmission network infrastructure can be shared among multiple networks

(here, an IP-based and a telephone network). In this example, a ring network

functions as a transmission network (see top of the figure), and thus, the

network links or transmission bit pipes (i.e., circuits) between the local telephone

exchanges (dotted lines) or IP routers (dashed lines) are realized as connections

(crossing two physical links) in the transmission network. The fact that these high-

bandwidth bit pipes (i.e., circuits) are multiplexed onto a typically sparse topology

makes a single network failure in the transmission network affect a lot of traffic,

LEX

IP Router

Transmission Bit pipe

Transmission Network

Server

IP Host

NMS

TMN

Figure 2.1 Transmission networks.


2.1 Introduction 412.1 Introduction 41

motivating the tremendous importance of network survivability of transmission

networks.

The top of Figure 2.1 shows a typical transmission or transport

network. The network operator configures the transport network connections

statically by some means—for example, through its network management system

(NMS). In other words, no control information provided by the enduser is incor-

porated in the configuration of the transport network. A logically separated Tele-

communications Management Network (TMN) allows the central Network

Management System (NMS) to configure each network element (NE) in the net-

work.

2.1.2 Management of (Transmission) Networks

Network management is not restricted to only the provisioning of network connec-

tions, but it also involves the following items, according to the Open Systems

Interconnection (OSI) FCAPS (i.e., fault, configuration, accounting, performance,

and security) classification [X700], [X701]:

. Fault management denotes the collection of management processes respon-

sible for identifying, locating, and reporting problems or faults in the

network. Fault management may trigger resilience techniques, as shown

later in this chapter.

. Configuration management denotes the collection of management processes

responsible for discovering and configuring network devices and connec-

tions. With respect to network resilience, configuration management is

important because it allows configuring the network with sufficient redun-

dancy in order to survive network faults and/or allows reconfiguring the

network when a failure occurs.

. Accounting management denotes the collection of management processes

responsible for keeping track of the network resources being reserved or

used by for example a particular user or a particular traffic type. Based

on these statistics, the users are billed as agreed contractually. Depending

on the agreed service-level agreement (SLA), the network operator

might have to pay a penalty in case the service has been interrupted for

a certain amount of time, motivating the importance of network surviv-

ability.

. Performance management denotes the collection of management processes

responsible for monitoring the overall performance of the network and the

performance perceived by the network user this includes performance of

hardware, software, and/or any other media. A degraded performance

might end up in a network fault.

. Security management denotes the collection of management processes re-

sponsible for issues such as the control of access to any available network

resources, the exchange of keys for encrypted transport of the data, and/or

the prevention of denial-of-service (DoS) attacks.



In this chapter, the fault, performance, and configuration management aspects

of network recovery in SONET/SDH transmission networks are discussed in more

detail.

Managing every item in the network from a central NMS would not be very

scalable. Therefore, several abstraction levels exist for managing the network,

allowing the delegation of more detailed management tasks to agents managing a

smaller part of the network. Five layers (ordered from highest to lowest level of

abstraction) can be distinguished [M3010]:

. The business management layer (BML) is responsible for the total enter-

prise and in particular the agreements between customers and the operator.

. The service management layer (SML) is the layer responsible for negotiating

the contractual aspects of the services offered to the customer. These

service-level agreements (SLAs) specify the quality-of-service (QoS) mea-

sures that must be met on the overall network connection.

. The network management layer (NML) manages the overall network and is

responsible for provisioning end-to-end connections.

. The scope of the network element management layer (NEML) is restricted

to the subnetwork level.

. The network element layer (NEL) is the level conceptually representing the

managed network equipment/elements.

2.1.3 Structuring/Modeling Transmission Networks

To understand how network failures are detected and propagated through the

transmission network and how network recovery mechanisms react to these fail-

ures, we must have a clear view of the structure of transmission networks.

A transmission network can be decomposed in one or more layers. In accor-

dance with International Telecommunications Union-T (ITU-T) recommendations

G.805 [G805] and G.806 [G806] and European Telecommunications Standards

Institute (ETSI) EN 300 417-1-1 [ETSI1], each network layer can be modeled as a

set of atomic functions interconnecting reference points in the network layer, as

depicted in Figure 2.2.

. Connection functions (C) represent the flexibility in the network; more

precisely, a connection function connects a set of connection points (CPs)

at its border with each other. Because the flexibility of a connection function

can be represented by a matrix (e.g., think about the ‘‘switch matrix’’ or

‘‘switch fabric’’ in a cross-connect), each cross-connection realized by the

connection function is called a matrix connection (MC).

. Link connections (LC): The interconnections of CPs at the borders of

distinct connection functions are called link connections. As Figure 2.2

illustrates, link connections can be supported by one or more server layers.

. Subnetwork connection (SNC): Like the decomposition of a network into

layers (vertical decomposition), we can also decompose the network into one


2.1 Introduction 432.1 Introduction 43

or more interconnected subnetworks. A subnetwork is at least one connec-

tion function but could also comprise multiple or maximally all connection

functions in the network and the link connections between these connection

functions. Of course, a ‘‘nested’’ subnetwork can be defined inside a subnet-

work. A subnetwork connection is the connection created between connec-

tion points on the border of the subnetwork.

At the ingress of the network, the client information has to be transformed,

wrapped in the necessary overhead and fed into the network. At the egress of the

network, a reverse operation is needed and the integrity of the received signal

(which is important with respect to network resilience) has to be supervised.

. CPs at the edge of the layer network are called termination connection points

(TCPs) and end-to-end network connections (NCs) are established between

these TCPs. Network connections carry characteristic information (CI),

which consists of the adapted information (AI) plus some overhead infor-

mation. The trail termination (TT) functions are responsible for adding

(source direction) and removing (sink direction) this overhead information,

which allows the sink TT function supervising the integrity of the received

signal.

. Adapted information (AI), carried over so-called trails, flows through access

points (APs) instead of through connection points (CPs). Adaptation func-

tions (A) are responsible for converting the client layer information into the

appropriate format (more precisely, the AI). This conversion includes

scrambling, encoding/decoding, alignment, multiplexing/demultiplexing,

bit rate adaptation, frequency justification, timing/clock recovery,

TCP

CTCP C

TCP

TCPCPCP

LC

LC

LC

LC

SNC

TT

AAP

TT

AAP

Trail

Network Connection (NC)

To Client Layer

To Server Layer

MC

MC

MC

MC

A Adaptation FunctionTT Trail Termination FunctionC Connection Function

AP Access PointCP Connection PointTCP Termination Connection Point

NC Network ConnectionSNC Subnetwork ConnectionLC Link ConnectionMC Matrix Connection

Figure 2.2 Functional structure/model of transmission networks. (ITU-T Recommendation G. 805,‘‘Generic functional architecture of transport networks,’’ ITU-T Standardization Organ-ization, March 2000. Available at: www.itu.int. Accessed May 2004.)



smoothing, and/or payload justification. The client layer information is

retrieved from the CP in the client layer network.

The fact that transmission/transport networks typically consist of multiple

layers does not necessarily imply that network recovery techniques are foreseen in

more than one layer. The coordination of recovery techniques deployed in multiple

layers is discussed extensively in Chapter 6. Thus, this chapter is limited to networks

featuring a recovery technique at a single layer. Nevertheless, this chapter incorpo-

rates how failures might propagate from a lower layer up to the layer deploying the

network recovery technique.

2.1.4 Summary

. Reference points: Characteristic information (CI) flows through connection

points (CPs), whereas adapted information (AI) flows through access points

(APs).

. Atomic functions: Connection function (C): (T)CP!(T)CP; trail termin-

ation (TT) function: AP!TCP in source direction and TCP!AP in sink

direction; adaptation function (A): (T)CP!AP in source direction and

AP!(T)CP in sink direction.

2.2 SDH and SONET Networks

The goal of this section is to give a brief overview of some major aspects of SDH/

SONET networks. The introduction of Section 2.2.1 situates SDH/SONET net-

works in the evolution of transmission networks. Section 2.2.2 describes the struc-

ture of SDH networks, Section 2.2.3 discusses the SDH frame structure while

focusing on relevant aspects for network recovery, and Section 2.2.4 describes

different types of SDH network equipment. The major items are summarized in

Section 2.2.5. Finally, Section 2.2.6 highlights some differences and similarities

between SDH and SONET transmission networks.

2.2.1 Introduction

The remainder of this chapter focuses on one particular transmission/transport

network technology: Synchronous Digital Hierarchy (SDH). The Synchronous

Optical NETwork (SONET) technology is the U.S. counterpart of the SDH tech-

nology: Both technologies are compared with each other in Section 2.2.6.

One of the major progresses realized in SDH, compared to its predecessor

(which is the Plesiochronous Digital Hierarchy [PDH]), is that the clocks used for

processing received signals and generating signals to be transmitted in all nodes are

synchronized with each other through a synchronization network. This allows byte-

interleaved instead of bit-interleaved multiplexing and prevents frequent justifica-

tions or stuffing to compensate for the frequency mismatch between different


2.2 SDH and SONET Networks 452.2 SDH and SONET Networks 45

clocks. Accessing each individual multiplexed signal becomes possible by means of

a pointer to the appropriate byte in a repetitive frame structure (e.g., a 64-kbps

stream corresponds to exactly one byte in a frame structure, which is repeated every

125 ms), thereby avoiding the need to demultiplex (and re-multiplex) the high-

bandwidth aggregate signal, significantly reducing the complexity of the network

equipment. To cope with the explosive growing data traffic volume, Optical Trans-

port Networks (OTNs) (discussed in Chapter 3) are expected to transport typical

connections at or beyond the bit rate of SDH lines (up to 10 gigabits per second

[Gbps]).

Taking into account that OTNs did not materialize yet (because of the eco-

nomic slowdown) and that SDH networks will coexist with the introduced OTNs

for a long time, it is clear that there is still a lot of interest in the SDH technology.

This is reflected in the recent work to develop a ‘‘next-generation’’ SDH technology:

The goal is to adapt the SDH technology to the highly dynamic IP networks,

overcoming the limitations caused by the initial targeting of more static telephone

networks. The Link Capacity Adjustment Scheme (LCAS) [G7042] allows dynamic

adjustment of the capacities of the transported signals as required. The Generic

Framing Procedure (GFP) [G7041], [Her02] eases the encapsulation and transport

of IP-based traffic over SDH and OTN frame formats. To allow SDH connections

with a capacity beyond that of individual SDH connections, the concatenation of

SDH connections between the same endpoints has been standardized. Nevertheless,

many of the currently deployed SDH equipment do not implement this feature;

therefore, virtual concatenation is seen as the solution to overcome the bandwidth

limitations of current SDH connections. Virtual concatenation is based on inde-

pendent connections routed between the same endpoints, where they are inverse

multiplexed into a high-bandwidth connection. Although this avoids the need for

upgrading the equipment inside the SDH network, the edge should solve at least

any synchronization problems that might occur as a result of the independent

routing of the involved connections. Finally, note that even when no intermediate

SDH network elements are in place, transporting the information in an SDH frame

structure might still be useful (e.g., when interconnecting IP routers through packet-

over-SONET [PoS]). Indeed, the SDH frame structure has proven to include

valuable overhead information for supervisory purposes and is commonly accepted

as standard.

2.2.2 Structure of SDH Networks

As mentioned in Section 2.1.3, SDH is a transmission/transport network technol-

ogy that can be decomposed into several layers [G803]. As shown in Figure 2.3,

there exist two path layers and two section layers.

. The higher and lower order path (HOP and LOP) layers provide the

flexibility in the network through connection functions. As shown in Figure

2.3, LOPs are always routed over a chain of HOPs. In other words, a HOP

network connection can serve as an LOP link connection (more than one



LOP may be routed over the same HOP). Note also that not only the LOP

layer but also other non–SDH network layers can act as a client layer of the

HOP layer.

. The multiplex section (MS) layer is responsible for multiplexing traffic and,

thus, increasing the transmission bit rate between the network elements

providing the connection functions in the path layers. The MS layer typi-

cally does not feature connection functionality, except for specific purposes

like multiplex section protection (MSP).

. The regenerator section (RS) layer is responsible for supervising the regen-

erator sections. As shown in Figure 2.3, an MS may span multiple RSs,

when regenerators are introduced in the network. Regenerators aim at

cleaning up the distorted transmission signal to extend the reach of the

transmission links. SDH regenerators perform 3R–regeneration: reamplifi-

cation, reshaping, and retiming.

An additional physical media layer (typically optical, but also electrical or radio

media are possible) is responsible for the actual transmission of the signal. Option-

ally, an optical transport network (OTN) can be deployed between the SDH

network (layers) and the optical media layer.

The format of the digital signal flowing between SDH NEs is called Synchronous

Transport Module of order N (STM-N). An STM-N signal has a bit rate of N times

the bit rate of an STM-1 signal: N� 155, 522 megabits per second (Mbps) (as

explained in Section 2.2.6, an SDH STM-N signal corresponds to a SONET Syn-

chronous Transport Signal of level 3N [STS-3N] or an Optical Carrier of level 3N

[OC-3N] when the STS-3N is transported optically). Virtual Containers-n (VC-n)

support path layer connections: This includes the Containers-n (C-n) (which corre-

sponds to the payload information) and the path overhead (POH) information. Per

STM-1 signal one VC-4 or three VC-3s are transported at the HOP layer. As

mentioned earlier, an HOP can carry a number of LOPs; more precisely, a VC-4

can transport up to three VC-3s, 21 VC-2s, 63 VC-12s, or 84VC-11s.6 As the

Figure 2.3 Layer structure of SDH networks. (ITU-T Recommendation G.803, ‘‘Architecture oftransport networks based on the synchronous digital hierarchy (SDH),’’ ITU-T Standard-ization Organization, March 2000. Available at: www.itu.int. Accessed May 2004.)

6Sometimes VC-1 is used to refer to either a VC-11 or a VC-12.



multiplexing hierarchy in Figure 2.4 shows, this capacity can also be allocated to a

mix of these LOP signals. For example, a VC-4 is able to transport 1 VC-3 plus 7 VC-

2s plus 21 VC-12s. A higher order VC (HOVC)-3 can accommodate only one third of

a VC-4: more precisely, 7 VC-2s, 21 VC-12s or 28 VC-11s. The fact that a VC-3 can

act as an HOP or as an LOP explains the ‘‘two-way’’ arrow between the OR function

and the VC-3 signal in Figure 2.4.

2.2.3 SDH Frame Structure: Overhead Bytes Relevant forNetwork Recovery

Figure 2.5 shows the STM-1 frame format in more details. This commonly known

representation consists of 9-byte rows and 270-byte columns. As mentioned earlier,

SDH networks were initially developed for supporting telephone networks. Taking

into account that with respect to the Nyquist-Shannon sampling theorem, the

digitalization of telephone signals is based on a sample frequency of 2 * 4 ¼ 8

kilohertz (kHz), the duration of an STM-1 frame was chosen exactly equaling the

time between two sample points: 125 microseconds (ms). This explains the bit rate of

an STM-1 signal mentioned in Figure 2.4: 8 bits=byte * 9 * 270 bytes=125 ms ¼155, 520 kbps. The bit rate of an STM-N signal is N times the bit rate of an STM-

1, implying also that each STM-N frame has a duration of exactly 125 ms.

Figure 2.5 shows that the first 9 columns7 are dedicated to the section overhead

(SOH), whereas the remaining 261 columns contain the STM-1 payload. The SOH

is split up in three rows of regenerator section overhead (RSOH) and five rows

of multiplex section overhead (MSOH). The remaining row contains a pointer

(H bytes) where the HOVCs start: VCs can be shifted in time compared to the

HOP Layer

LOP Layer

Section Layers

VC-11: 1,664 kbps(C-11: 1,600 kbps)

VC-12: 2,240 kbps(C-12: 2,176 kbps)

VC-2: 6,848 kbps(C-2: 6,784 kbps)

VC-4: 150,336 kbps(C-4: 149,760 kbps)

STM-N: N x 155,522 kbps

x3

xN

x3 xN

OR

ORx1

x7x1

x3x4

VC-3: 48,960 kbps(C-3: 48,384 kbps)

Figure 2.4 SDH multiplexing hierarchy. (ITU-T Recommendation G.707/Y.1322, ‘‘Network nodeinterface for the synchronous digital hierarchy (SDH),’’ ITU-T Standardization Organiza-tion, October 2000 Available at: www.itu.int. Accessed May 2004.)

7More precisely, all rows except the fourth row in the first nine columns.



underlying stream of STM-1 frames. An HOVC-n together with its pointer forms

an Administrative Unit-n (AU-n). Similarly, Tributary Unit-ns (TU-ns) exist at the

LOP layer. The path overhead (POH) of a VC-4 (see Figure 2.5) or a VC-3 signal

occupies one column (minus one H4 byte). VC-11, VC-12, and VC-2 frames span

four STM-1 frames, forming a multiframe (the seventh and eighth bit of the H4

byte in the HOVC indicate the position of the current frame in the multiframe): 4

bytes per multiframe (or 1 byte per STM frame) is allocated for VC-11, VC-12, or

VC-2 POH.

The frame structure in Figure 2.5 is just an example. A VC-4 has been chosen as

an HOP. Because an STM-1 can carry only a single VC-4, only one of the three H1,

H2, and H3 bytes is used (these bytes are needed in case three VC-3s are multiplexed

in the same STM-1). Note that part of the H bytes are reserved for supporting

pointer justifications (in Figure 2.5, H3 can be used to carry payload information if

needed in the case of a pointer justification).

Until now, we have been mainly speaking about STM-1 signals. Although not

completely correct, we can roughly think about an STM-N signal as N byte–inter-

leaved STM-1 signals. In the current standards, N can equal 1 (155 Mbps),

4 (622 Mbps), 16 (2.5 Gbps), 64 (10 Gbps), or 256 (40 Gbps). For N larger than

1, concatenated VCs (see also Section 2.2.1) beyond the bit rate of a VC-4 can

be supported.Acompletediscussionof themultiplexingand framestructure isbeyond

the scope of this book, but more details can be found elsewhere (e.g., [G707], [Sex92]).

The goal of this chapter is not to explain every single overhead byte in detail, so

our discussion here focuses on issues relevant in the context of network recovery.

First, overhead bytes are needed for fault detection. Second, some overhead bytes

propagate fault information. Third, the signaling for the Automatic Protection

Switching (APS) protocol is transported in some overhead bytes.

A1

H1

B2

Z1

A1

H1

B2

Z1

A2

H2

Z2

A2

H2

M1

J0

F1

D3

H3

K2

D6

D9

D12

E2

NU

NU

H3

NU

NU

NU

H3

NU

J1

B3

C2

G1

F2

H4

F3

K3

N1

Pointer Path OverheadSTM-1: 9 x 270

VC-4: 9 x 261

Media Dependent

RSOH

MSOH

A1

B1

D1

H1

B2

D4

D7

D10

S1

A2

E1

D2

H2

K1

D5

D8

D11

Z2

Figure 2.5 STM-1 frame format. (ITU-T Recommendation G.707/Y.1322, ‘‘Network node interfacefor the synchronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, Octo-ber 2000. Available at: www.itu.int. Accessed May 2004.)



Table 2.1 summarizes for each layer (in accordance with [G806] and [G783])

that can be declared: For each defect the table states which overhead bytes to

monitor and how long it takes to declare this defect from the moment the first

anomaly has been perceived. A first category of defects includes the failure to track

the beginning of an STM-N frame, the position of a VC relatively to such frame or

Table 2.1 Defect Detection Times1

Defect

Regenerator

Section

Multiplex

Section VC-4/3 VC-2/12/11

Loss of Frame

(dLOF)

A1, A2: 3 ms

Loss of Multiframe

(dLOM)

H4b7-8: 1-5 ms

Loss of Pointer

(dLOP)

H1, H2:

[8-10]*125 ms

V1, V2:

[8-10]*500 ms

Trace Id Mismatch

(dTIM)2J0: < 100 ms J1: < 100 ms J2: < 100 ms

Payload Mismatch

(dPLM)

C2: < 100 ms V5b5-7, K4b1:

< 100 ms

Degraded Signal

(dDEG)3B1: 10x�2 ms B2: 10x�2 ms B3: 10x�2 ms V5b1-2:

4*10x�2 ms

Excessive Error

(dEXC)4B1: 10x�2 ms B2: 10x�2 ms B3: 10x�2 ms V5b1-2:

4*10x�2 ms

All Ones (dAIS) K2b6-8: 3*125 ms C2: 5*125 ms V5b5-7: 5*500 ms

H1, H2:

3*125 ms

V1, V2:

3*500 ms

Remote Defect Ind.

(dRDI)

K2b6-8: [3-5]*125 ms G1b5: {3, 5,

10}*125 ms

V5b8: {3, 5,

10}*500 ms

Unequipped VC

(dUNEQ)

C2: 5*125 ms V5b5-7: 5*500 ms

1Throughout this chapter, we use the notation XXbY to indicate bit Y in the overhead byte XX.2Data from ‘‘Transmission and Multiplexing (TM); Generic requirements of transport functionality

of equipment; Part 1-1: Generic processes and performance,’’ ETSI EN 300 417-1-1 V1.2.1, October 2001.3A bit interleaved parity (BIP) mechanism is adopted to measure the bit error rate (BER). Assuming a

Poisson distribution of errors, the values in this table indicate the maximum measuring time needed to

declare and clear defects when the BER thresholds are set to 10�x and 10�(xþ1), respectively. For dDEG, x

can be configured in the range of 5 through 9. The numbers in this table are not valid when assuming a

Bursty distribution of errors. The measuring time is then between 2 and 10 seconds.4The process of declaring and clearing dEXC defects is similar to that for dDEG defects, except that x is

configured in the range of 3 through 5. dEXC defects cannot be declared when assuming a Bursty

distribution of errors.



the position of a frame relatively to a multiframe, respectively, resulting in a Loss of

Frame defect (dLOF), a Loss of Pointer defect (dLOP), or a Loss of Multiframe

defect (dLOM) (these defects are particular instances of the Loss of Alignment

defects [dLOA]). A second category is based on signal quality, measured in terms of

bit error rate (BER) by means of a bit interleaved parity (BIP) mechanism, allowing

to declare a Degraded Signal (dDEG) defect or Excessive Error (dEXC) defect.

A third category involves payload-type supervision (a signal type identifier allows

verifying the compatibility of the adaptation functions at source and sink), con-

nectivity supervision (a trail trace identifier allows verifying that a source TT

function is not misconnected to fthe wrong sink TT function) and continuity

supervision (monitoring the presence/absence of the CI allows supervising the signal

integrity) resulting in a Payload Mismatch defect (dPLM), a Trace Identifier

Mismatch defect (dTIM), and an Unequipped defect (dUNEQ), respectively. Con-

nection functions insert an unequipped VC (more precisely, an all 0s pattern in

some overhead bytes) signal at those outputs that are not connected to one of its

inputs. There also exist supervisory-unequipped signals (more precisely, an un-

equipped VC with a valid trail trace identifier and Remote Defect Indication

[RDI], Remote Error Indication [REI] bytes) to test a connection between two

TT supervisory unequipped functions [ETSI1]. Finally, a fourth category involves

the supervision of maintenance signals: an alarm indication signal (AIS) (or an all

1s signal) is sent downstream to indicate an upstream failure, whereas the sink sends

a Remote Defect Indication (RDI) signal upstream to indicate that the trail in the

opposite direction is failing. This mechanism is explained in more detail in Sections

2.3.1 through 2.3.3.

Table 2.1 makes it clear that the time needed to declare a defect depends on

which supervision process is involved. Although at least 375 ms are needed to detect

any kind of defect, the quality, connectivity, and continuity supervision processes

only declare failures after 10 ms or more (up to 10,000 seconds or 2 hours and

47 minutes for a dDEG with a threshold set to 10�9). The detection times for

the other defects are compared with each other in Figure 2.6. The error bars

indicate the tolerable range in accordance with Table 2.1. Most of these defects

are declared after the corresponding anomaly is monitored in a small number of

consecutive frames; this explains the multiplication factor of 125 ms (the length

of an STM-N frame as explained earlier) or 4 * 125 ms ¼ 500 ms in the case of a

VC-2/12/11 (because each multiframe spans four STM-N frames). For example, at

the VC-4/3 level, an AUdAIS (a dAIS defect based on the H1- and H2-pointer

bytes) and a VCdAIS (a dAIS defect based on the C2 byte) are detected

within 3 * 125 ms ¼ 375 ms and 5 * 125 ms ¼ 625 ms, respectively, whereas a dRDI

is detected within 5 * 125 ms ¼ 625 ms but in the best case in 3 * 125 ms ¼ 375 ms and

in the worst cased in 10 * 125 ms ¼ 1:25 ms. Figure 2.6 clearly shows that typically

(except for the worst-case dRDI detection) defects resulting from the maintenance

signal supervision need the lowest amount of time to be declared, important

because this will suppress most of the impact resulting from defects that are

only a side effect of the root failure. Note, however, that the underlying

physical layer might detect and notify a Loss of Signal defect (dLOS)—because of



a transmitter failure or optical path break—even faster: more precisely, within 2.3

to 100 ms.

Summarizing the previous discussion, the K2(b6–8) in the MS OH, the G1(b5)

and C2 in the VC-4/3 OH, the V5(b5–8) in the VC-2/12/11 OH plus the AU and TU

pointers, respectively H1&2 and V1&2, used to transport maintenance signals are

very important in the context of network recovery: For example, these maintenance

signals might trigger the Automatic Protection Switching (APS). In addition to that

the K bytes (K1 and K2[b1–5] in MS OH, K3[b1–4] in VC-4/3 OH, and K4[b3–4] in

VC-2/12/11 OH overhead bytes) transport the signaling messages for the APS

protocol, which is discussed in Section 2.3.4.

2.2.4 SDH Network Elements

Although it is crucial to understand how an SDH network can be decomposed in

network layers consisting of a set of atomic functions, processing the overhead

bytes, it is also important to understand how this model corresponds to a set of

equipment (or network nodes) interconnected with each other (via network links).

The example in Figure 2.7 shows that an equipment typically spans multiple layers

and is (logically) built out of the atomic functions described earlier.

The example in Figure 2.7 shows an add/drop multiplexer (ADM) that allows

adding/dropping up to four VC-4s tributary signals into/from an STM-N aggregate

signal. An ADM always terminates two network links: This includes termination of

3000

500

625

625

625

112515

00

2500

2500

2500

3000

4500

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

dAIS VCdAIS dRDI dUNEQ dLOF dLOM dLOP

Det

ectio

n T

ime

(mic

rose

cond

s)

375

375

RS MS VC-4/3 VC-2/12/11

Figure 2.6 Comparison of short detection times.



the physical section (we implicitly assume here an optical section [OS]), the Regen-

erator Section (RS), and the Multiplex Section (MS). This add/drop functionality is

provided by the S4_C connection functions. By providing the appropriate S4/

<client>_A adaptation function, this ADM can connect to any kind of client

network equipment (e.g., a PDH device or a router).

Note also that the legend of Figure 2.7 outlines the naming convention used for

the atomic functions mentioned throughout the remainder of this section. An

optional suffix ‘‘_So’’ or ‘‘_Sk’’ may be added to indicate source or sink direction,

respectively. In Figure 2.7, the atomic functions are assumed to be bidirectional.

Note also that in accordance with the standards, this naming convention uses ‘‘Sn’’

to indicate a ‘‘VC-n path.’’ For example, an S4/S12_A_So function corresponds to

a source adaptation function from a VC-12 CP to a VC-4 AP.

SDH NEs are classified in three categories (Figure 2.8), depending on the

number of aggregate signals terminated in the NE and the flexibility provided by

the connection functions. A terminal multiplexer (TM) multiplexes a number of

tributary signals into one aggregate signal. A TM typically does not provide any

flexibility. Instead of a fixed time-slot assignment to the tributary signals, an

optional connection function (notice the dashed line) might allow a flexible time-

slot assignment. Add/drop multiplexers (ADMs) terminate two aggregate signals.

Therefore, ADMs are typically used in a ring configuration: The ‘‘ring’’ corre-

sponds then to the aggregate signal between the ADMs. The flexibility of the

connection functions in an ADM is restricted to continuing the paths on the ring

(thus, from the aggregate port at one side into the other aggregate port) or adding/

dropping VCs into/from this signal. As is explained in Section 2.4, ring networks are

very suitable to provide network recovery. Note also that ADMs are often config-

ured as TMs by leaving out the second aggregate port. The highest flexibility is

.1

N

1

N

Trib./Client Signals

OSN_TT

OSN/RSN_A

RSN_TT

RSN/MSN_A

MSN_TT

MSN/S4_A

OSN_TT

OSN/RSN_A

RSN_TT

RSN/MSN_A

MSN_TT

MSN/S4_A

S4_TT

S4/<Client>_A

S4_C

X_TT Trail Termination function in layer X

X/Y_A Adaptation function between server layer X and client layer Y

X_C Connection function in layer X

STM-N Aggr.

Signal

STM-N Aggr.

Signal

Figure 2.7 Example of an STM-N ADM. (ITU-T Recommendation G. 806, ‘‘Characteristics of trans-port equipment—description methodology and generic functionality,’’ ITU-T Standard-ization Organization, October 2000, and ITU-T Recommendation G.806, amendment 1,ITU-T Standardization Organization, prepublished March 2003. Available at:www.itu.int. Accessed May 2004.)



provided by digital cross-connects (DXCs). Indeed, there is no (functional) restric-

tion on the number of terminated aggregate ports and the featured connection

functions provide full flexibility (e.g., as illustrated in Figure 2.8, any tributary

port can be connected to another tributary port).

Figures 2.7 and 2.8 might give the impression that NEs only feature connec-

tion functions at the VC-4 level, but this of course is not always the case. First,

as mentioned earlier, not only VC-4s but also VC-3s can serve as higher order

paths (in that case, all ‘‘S4’’ instances in Figure 2.7 have to be replaced by ‘‘S3’’).

Second, connection functions can be foreseen at the higher and/or the lower

order path layer. For example, some of the S4/<client>_A adaptation functions

in the ADM of Figure 2.7 might be S4/S12_A adaptation functions connected to

an S12_C connection function, realizing VC-12 layer functionality. A commonly

used DXC naming convention is able to reflect all these possibilities: The format

looks like DXC-X=Y1= . . . =YI, where X refers to the order of higher order paths

and Y1= . . . =YI listing the order of VCs being cross-connected. For example,

a DXC-4/1 terminates VC-4 higher order paths (X ¼ 4) and cross-connects VC-

11s or VC-12s (Y1 ¼ 1), in a DXC-4/4 also VC-4 higher order paths enter the

DXC (X ¼ 4) and are cross-connected (Y1 ¼ 4), whereas in a DXC-4/4/1 also

Terminal Multiplexer Add/Drop Multiplexer Digital Cross-Connect

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

Continue

Add/Drop

Figure 2.8 Classification of SDH network elements (tributary ports: top side; aggregate ports:bottom side). (ITU-T Recommendation G. 806, ‘‘Characteristics of transport equip-ment—description methodology and generic functionality,’’ ITU-T Standardization Or-ganization, October 2000, and ITU-T Recommendation G.806, amendment 1, ITU-TStandardization Organization, prepublished March 2003. Available at: www.itu.int.Accessed May 2004.)



VC-4 higher order paths enter the DXC (X ¼ 4) and are cross-connected (Y1 ¼ 4)

and possibly terminated for the cross-connection of VC-11s or VC-12 (Y2 ¼ 1).

Finally, in Figures 2.7 and 2.8, it is implicitly assumed that client network

equipment connects to an interface compliant with client network technology and

that the adaptation (e.g., the S4/<client>_A function in Figure 2.7) is performed on

the SDH NE inside the SDH network. An alternative is illustrated in Figure 2.9, in

which the adaptation function is foreseen on the client device, which is connected

via STM-1 signals (the additional SDH atomic functions are shown within the

dashed rectangle) to the SDH device. Of course, the advantage is that the SDH

network equipment should not be capable of adapting any possible client signal that

might be transported over the SDH transport network infrastructure. However,

independent from the support for network recovery in the client network technol-

ogy, this configuration allows for a standardized SDH linear multiplex section

protection (MSP) of the interface between client and SDH network equipment

(note that the recovery provided inside the SDH transport network does not

cover the interface to the client network equipment).

For completeness, Figure 2.10 depicts the functional architecture of a regener-

ator device. As shown in Figure 2.3, a regenerator terminates only regenerator

sections (RSs) and thus features RSN_TT plus RSN/MSN_A functions, whereas

the MS signal transparently passes through the regenerator (thus, the signal leaving

the RSN/MSN_A_Sk function is directly fed into the RSN/MSN_A_So function of

the next regenerator section). Note that such an NE does not feature any connec-

tion function.

2.2.5 Summary

. SDH network layers: regenerator section (RS), multiplex section (MS),

higher order path (HOP) (HOP: VC-4 and/or VC-3), and lower order

path (LOP) (LOP: VC-3, VC-2, VC-12, and/or VC-11) layers.

S4_C

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OSM

S

RS

OS

MS

RS

OS

MS

RS

OS

MS

RS

OS

To ClientNetwork

Functionality

STM-NAggregate(s)

to OtherSDH NEs

4 Times STM-1Tributary Interface

Client NE SDH NE

S4_TTS4/<Client>_A

Figure 2.9 STM-1 tributary interface to client network equipment. (ITU-T Recommendation G. 806,‘‘Characteristics of transport equipment—description methodology and generic func-tionality,’’ ITU-T Standardization Organization, October 2000, and ITU-T Recommenda-tion G.806, amendment 1, ITU-T Standardization Organization, prepublished March2003. Available at: www.itu.int. Accessed May 2004.)



. SDH network elements: terminal multiplexers (TMs), add/drop multiplexers

(ADMs), and digital cross-connects (DXCs) (plus regenerators without any

flexibility).

. SDH interfaces: STM-N signals at N� 155, 522 kbps. Each STM-N frame

(N * 9 * 270 bytes) has a duration of exactly 125 ms.

. Defect detection: defects declared by the supervision of SDH maintenance

signals (dAIS and dRDI) typically need the lowest amount of time to be

detected. dUNEQ and dLOA detection times also fall within the range of

5 ms. (The physical layer might detect dLOS before the SDH layers are able

to declare any other defect.)

. Overhead bytes: from a network recovery perspective, overhead bytes for

the transport of AIS and RDI maintenance signals and for automatic

protection switching (APS) signaling messages are very important. An

overview is given in Table 2.2.

2.2.6 Differences between SONET and SDH

As stated earlier, the Synchronous Optical NETwork (SONET) technology is the

U.S. counterpart of the Synchronous Digital hierarchy (SDH) technology. Despite

RS

OS

MS

RS

OS STM-N Aggr.

Signal

STM-N Aggr.

Signal

Figure 2.10 Functional architecture of a regenerator. (ITU-T Recommendation G. 806, ‘‘Characteris-tics of transport equipment—description methodology and generic functionality,’’ ITU-TStandardization Organization, October 2000, and ITU-T Recommendation G.806,amendment 1, ITU-T Standardization Organization, prepublished March 2003. Availableat: www.itu.int. Accessed May 2004.)

Table 2.2 Overview of Overhead Bytes for Transport of Maintenance Signals and APS SignalingMessages

Signal/Level Bit Rate Payload AIS and RDI OH APS OH

MS 150,336 kbps K2b6-8 K1 and K2b1-5

VC-4 149,760 kbps H1 and H2 G1b5 and C2 K3b1-4

VC-3 48,384 kbps

VC-2 6,784 kbps V1 and V2

V5b5-8

K4b3-4

VC-12 2,176 kbps

VC-11 1,600 kbps



that both technologies adopt very similar frame structures, minor differences in

frame structure and semantics prevent both technologies from being fully compliant

and thus interoperable with each other. These minor differences are not relevant to

understand this chapter and, therefore, are beyond the scope of this chapter. The

remainder of this chapter focuses on SDH, but the discussions are also applicable to

SONET networks.

The main difference between both technologies is that they are based on a

different base signal. The SONET synchronous transport signal-1 (STS-1) is the

base signal for the SONET technology at a line rate of 51.84 Mbps, whereas the

SDH synchronous transport module-1 (STM-1) is the base signal for the SDH

technology at a line rate of 155.52 Mbps. Thus, an SDH STM-N corresponds to a

SONET STS-3N. The main reason that SONET is based on the STS-1 line rate is to

better match the highest line rate of the U.S. Plesiochronous Digital hierarchy

(PDH)—the predecessor of the SONET/SDH technology— more precisely, a DS3

signal at 44.736 Mbps (whereas, e.g., the European PDH defines signals up to an

E4 signal at 139.264Mbps). Note that an optical signal carrying an STS-N signal is

called an optical carrier level N (OC-N).

There is also a difference in terminology. An SDH regenerator section (RS),

multiplex section (MS), and higher order path (HOP) are called, in SONET termin-

ology, an STS section, STS line, and STS path, respectively. An SDH lower order

path (LOP) is called in SONET terminology a virtual trunk (VT) path.

With respect to the remainder of this chapter (describing network recovery in

SDH networks), we want to stress that for each SDH recovery technique, an

equivalent SONET recovery technique exists. However, because SDH and

SONET are not fully compliant and interoperable, this also applies in particular

to the SDH and equivalent SONET recovery techniques.

Last but not least, SONET is standardized by the American National Stan-

dards Institute (ANSI), whereas SDH is mainly standardized by the International

Telecommunication Union (ITU).

2.3 Operational Aspects

This section discusses how failures are detected and propagated through SDH

networks and where the network recovery techniques fit in this architecture. Al-

though this section is more concerned with the protocol aspect, the discussion of the

different flavors of the recovery techniques (on a more abstract level) is left for

Sections 2.4 through 2.6.

Section 2.3.1 gives an overview of the fault management processes in SDH

networks. Sections 2.3.2 and 2.3.3 continue with a more detailed discussion on

how failures are detected and propagated inside a network element and on a

network level (between the network elements), respectively. Then, the Automatic

Protection Switching (APS) protocol (which relies on this notification mechanism)

is discussed in Section 2.3.4. Finally, the major conclusions are recapitulated

in Section 2.3.5.


2.3 Operational Aspects 572.3 Operational Aspects 57

2.3.1 Fault Management Processes

As mentioned in Section 2.1.2, fault management processes are crucial from a

resilience point of view, because these processes are responsible for locating and

reporting network failures and possibly for triggering network recovery actions.

The hierarchical structure of the fault management processes in transmission net-

works is illustrated in Figure 2.11.

As illustrated in Figure 2.11, each fault management level consists of one or

more filters. At the lowest level, filter f1 is responsible for the supervision of (part of

the overhead bytes in) the data signal. When a certain anomaly in the data signal

persists for a certain period of time/a number of frames or multiframes, this filter

will declare the defect corresponding to this anomaly. For each possible defect,

Section 2.2.3 discussed which overhead byte to supervise and how long the detec-

tion lasts. Filters f2 and f3 are responsible for correlating the defect declarations

within an atomic function and, respectively, to trigger the appropriate consequent

action inside the atomic function and to report the fault cause to the element

management function (EMF). Based on the fault cause reports coming from all

atomic functions, filter f4 in the EMF can define the failure. The EMF filters f5–f8

then generate the necessary alarms and failure reports to the higher management

layers.

For example, based on the received alarms and failure reports from different

network elements, the central network management system (NMS) may reconfigure

the network so that affected connections are provisioned along another route. In

accordance with Chapter 1, such actions are classified under restoration actions,

which are the topic of Section 2.6.

Because the main focus of this chapter is on network protection (because

restoration is not yet widely adopted in SDH networks), relying on preprovisioned

backup resources and network elements autonomously (thus, without any interven-

tion of a [central] network management system) switching over to these backup

resources, filters f1 and f2 play a key role throughout the remainder of this chapter.

A closer look at Figure 2.11 shows that the detection process (filter f1) and the

management reporting functions (filters f3, f4, f5–f8) are only present in the sink

direction, whereas the consequent actions (requested by filter f2) are present in both

directions. The reason is that consequent actions involve the generation of the

appropriate maintenance signals needed for failure propagation.

Filters f2 in a TT sink function trigger three consequent actions when one of its

f1 filters declares a defect. The first consequent action concerns notifying down-

stream atomic functions and network elements that a defect was detected affecting

the corresponding data signal. This first consequent action involves replacing the

content of the data signal with an all 1s signal as alarm indication signal (AIS). As

explained in Figure 2.6, downstream atomic functions will detect the presence of

this AIS signal within four frames or multiframes. To avoid this additional detec-

tion time in downstream atomic functions and thus to speed the propagation and

notification process, the second consequent action in a TT sink function involves

the generation of a network element internal auxiliary parallel (thus, separated from



CI_SSFCI_D CI_SSF CI_DCI_SSFCI_D CI_SSF CI_D

CI_SSFCI_D CI_SSF CI_D

NMS

Via TMN

NetworkElement

CI_DCI_SSF

aRDI

RI_RDI

aTSF

AI_TSFAI_D

f1 f1 f1

f3

f2

AI_D AI_TSF

aSSFaAIS

1

f2

CI_D

aRDIRI_RDI

AI_D

AI_D

aAIS

EMFf4

2

2f1 f1 f1

f3

f2aAIS

1

f2

f5-f8

3

Adaptation SinkFunction (A_Sk)

Adaptation SourceFunction (A_So)

Element MgmtFunction (EMF)

Trail TerminationSink Function (TT_Sk)

Trail TerminationSource Function (TT_So)

CI_D Characteristic Information - Data SignalCI_SSF Characteristic Information - Server Signal Fail Signal

AI_D Adapted Information - Data SignalAI_TSF Adapted Information - Trail Signal Fail Signal

RI_RDI Remote Information - Remote Defect Indication Signal

aAIS Consequent Action to Insert an Alarm Indication Signal (AIS)aRDI Consequent Action to Insert a Remote Defect Indication (RDI) SignalaSSF Consequent Action to Enable the CI_SSFaTSF Consequent Action to Enable the AI_TSF

fY Filter Y

123

Defects (dXXX)Fault Cause (cZZZ)Failure (fZZZ)

Figure 2.11 Fault management hierarchical structure. (ITU-T Recommendation G. 806, ‘‘Characteris-tics of transport equipment—description methodology and generic functionality,’’ ITU-TStandardization Organization, October 2000, and ITU-T Recommendation G.806,amendment 1, ITU-T Standardization Organization, prepublished March 2003. Availableat: www.itu.int. Accessed May 2004.)



the stream of STM-N frames) trail signal fail (TSF) signal. This TSF signal

explicitly notifies downstream atomic functions inside the same network element

whether a defect that affects the trail has already been declared in an upstream

atomic function. The third consequent action does not involve notification in the

downstream direction but in the upstream direction. For this purpose, TT functions

do not connect only to access points (APs) and (termination) connection points

([T]CPs), but also to remote reference points (RPs). Via such an RP TT, functions

can exchange remote information (RI). For example, the TT sink function can

request from the corresponding TT source function (belonging to the opposite trail)

to notify the upstream NE where the terminated trail originates that a defect has

been detected affecting that trail, by sending a remote information (RI) remote

defect indication (RDI) signal (i.e., RI_RDI) through the RP.

Filters f2 in an A sink function trigger two consequent actions, when one of its

f1 filters declares a defect, similar to the two consequent actions in a TT sink

function for downstream notification. The generation of an AIS is the same as in

a TT sink function, but instead of the generation of a TSF signal, a server signal fail

(SSF) signal is generated for downstream notification.

Note that generating these network element internal auxiliary parallel (thus

separated from the stream of STM-N frames) SSF and TSF signals are useful only

when they are also accepted as input in downstream atomic functions. More

precisely, A and TT sink functions accept a TSF and a SSF signal, respectively,

to avoid additional dAIS detection time or simply to be capable to learn about

upstream defect declarations (when no f1 filter declaring dAIS defects is present in

that particular atomic function).

As shown in Figure 2.11, an A source function can also accept an SSF signal;

this allows the A source functions responsible for pointer generation to insert an

AU_AIS or TU_AIS in the data signal. A TT source function should be able to

receive a remote information–remote defect indication(RI_RDI) signal to take the

consequent action to insert a remote defect indication (RDI) signal in the data

signal of the opposite trail of the trail on which a defect was declared.

In summary, in addition to the immediate generation and handling of the

network element internal auxiliary parallel (thus, separated from the stream of

STM-N frames) SSF and TSF signals, there are two other consequent actions

that impact the data signal. More precisely, the insertion of an AIS and an RDI

signal. Upon declaration of aAIS, the AIS signal will be inserted within two frames

or multiframes (more precisely, within 250 ms, or 1ms). Upon declaration of aRDI

in the TT sink function, the corresponding TT source function should insert

the RDI signal within 1 ms or 4ms in the case of a VC-2/12/11. The following

sections elaborate on and demonstrate with examples ways in which these processes

interwork.

2.3.2 Fault Detection and Propagation Inside a Network Element

In the previous section, we highlighted the consequent actions atomic functions can

take to enable fault propagation and notification in the downstream and upstream



directions. The goal of this section is to study how a network element (consisting of

multiple atomic functions as explained in Section 2.2.4) reacts to failures and the

resulting maintenance signals. Such information is necessary to study how the

propagation and notification process looks on a network-wide level, which is

discussed in the next section.

The intention of this section is not to discuss every detail of the behavior of the

various network elements. This section aims at highlighting the main mechanism by

illustrating a few simplified examples. We refer you to [G783], [G806], [ETSI1], and

[Sex92] for a more complete and detailed overview of the behavior of all atomic

functions. Figures 2.12 through 2.17 show, for several examples, from left to right

the time (remember that each frame lasts exactly 125 ms) and from top to bottom

the path the data signal follows from ingress to egress. Light gray in the frame

formats refers to digitized noise, whereas the dark gray represents an all 1s signal.

The first example (Figure 2.12) shows how a regenerator (see also Figure 2.10)

would react in case it stops receiving an optical signal at one of its input ports, for

example, as a result of a cut of the incoming fiber. The OS_TT_Sk function will

almost immediately detect that no longer optical signal power is coming in and thus

declare the dLOS defect. It enables the TSF signal (aTSF) and starts inserting an

all 1s signal within two frames (aAIS).8 The downstream atomic sink functions

8For simplicity, we consider that atomic functions start inserting the all 1s signal at the beginning of a new

frame/container.

RS

OS

MS

RS

OS

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

dLOS + T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

Ingr

ess

Egr

ess

Time

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

Figure 2.12 Cable cut immediately upstream of a regenerator (top! down, from ingress to egress;left ! right, time; light gray, digitized noise; dark gray, all 1s).



forward the TSF (more precisely, the OS/RS_A_Sk and RS/MS_A_Sk functions

translate the incoming TSF in an SSF and the RS_TT_Sk function translates the

incoming SSF signal in a TSF signal) and start ‘‘refreshing’’ the AIS signal within

two frames by inserting an all 1s signal in the downstream direction. The first

adaptation source function (here, RS/MS_A_So) is the last atomic function refresh-

ing the AIS signal within two frames after receiving the SSF from the last upstream

adaptation sink function and stops forwarding the SSF as TSF.

The output signal from the first adaptation source function is then transpar-

ently handled and encapsulated in the downstream atomic sink functions. For

example, the RS_TT_So function adds RSOH bytes without caring whether the

encapsulated payload is legal, corrupted, or all 1s. Note that in the upstream sink

part the RS_TT_Sk function already terminated the incoming RS trail by stripping

off and processing the corresponding RSOH bytes. In summary, the regenerator

produces an MS_AIS signal (thus, the MSOH, AU pointer, and HOP overwritten

with all 1s) within two frames after the declaration of the dLOS defect in the

OS_TT_Sk function. Finally, there is no remote defect indication (RDI) signal to

notify the immediate upstream SDH network element that a corresponding RS has

been affected by a failure. Indeed, it is sufficient to have a remote defect indication

(RDI) at the MS level, because (in accordance with Section 2.2.4) regenerators do

not participate in the actual recovery actions (it does not feature any connection

function) and all other network elements terminate the MS trails anyway.

The example in Figure 2.13 considers a slightly different scenario. The receiver

keeps getting an optical signal, but at some point, the signal gets heavily distorted

Time

RS

OS

MS

RS

OS

+ S

SF

+ T

SF

+ S

SF

+ S

SF

+ T

SF

+ S

SF

+ S

SF

+ T

SF

+ S

SF

+ S

SF

+ T

SF

+ S

SF

dLOF + S

SF

+ T

SF

+ S

SF

3 ms

26 Frames

Ingr

ess

Egr

ess

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

Figure 2.13 A heavily distorted/noise signal entering a regenerator (top! down, from ingress toegress; left ! right, time; light gray, digitized noise; dark gray, all 1s).



(the regenerator starts receiving noise). Note the similarities and differences

between Figures 2.12 and 2.13. First, in Figure 2.13, the OS_TT_Sk function no

longer declares a defect, which implies that it keeps converting the received (cor-

rupted) optical signal into an electrical signal. However, the OS/RS_A_Sk function

loses track of the frame alignment bytes and after a relatively long period (3 ms, or

24 frames) gives up and declares the loss of frame defect (dLOF). From that

moment, the process is quite similar to that shown in Figure 2.12. The atomic

function declaring the defect (here, OS/RS_TT_Sk function) starts sending an SSF

(or a TSF in the case of a TT_Sk function) signal and sends out an AIS signal by

inserting an all 1s signal. The fact that this process starts much later than in the

previous example is the second important difference between both figures. In the

end, the result is similar: Once the regenerator has been able to declare a defect, it

generates an MS_AIS signal within two frames after the defect declaration.

Figures 2.14 through 2.17 consider the cross-connection of a VC-4 path (e.g., in

a DXC-4/4, see also Figure 2.8). Note the additional MS_TT_Sk, MS/S4_A_Sk,

S4_C, MS/S4_A_So, and MS_TT_So functions in the signal path from ingress to

egress. Note also that the frame format has been extended with the format of the

VC-4 (the column in the middle represents the VC-4 path overhead and determines

the beginning of the VC-4 container).9

The first example (Figure 2.14) considers a similar scenario as in Figure 2.12: A

fiber cut immediately upstream of the NE interrupts the receipt of the optical

signal. The process also is similar to that in Figure 2.12. The OS_TT_Sk function

almost immediately detects this failure and declares the dLOS defect, enables

the TSF signal, and starts sending out an AIS signal within two frames by inserting

an all 1s signal. Although the principle of signal propagation is similar as in

a regenerator, another behavior will be perceived because of the additional

atomic functions (which are responsible for processing the AU pointer and the

VC-4 POH) in the signal path from ingress to egress. The MS_TT_Sk function will

strip off the MSOH bytes from the received signal and the MS/S4_A function

will extract the VC-4 containers while removing the AU pointers. Because of the

receipt of the TSF signal and the fact the VC-4 containers are not aligned with

the STM frame structure, the insertion/refreshing of the all 1s signal will start

before the receipt of the first bytes in the all 1s signal (note that insertion

should start within two frames and we assumed for simplicity that insertion

starts on frame/container boundaries). This AIS signal will travel through the

HOP connection function (S4_C) together with the SSF from the MS/S4_A_Sk to

the MS/S4_A_So function. The latter is the first source function in the data signal

path, and thus, this source adaptation function will stop the forwarding of the SSF

signal and insert an all 1s signal within two frames after the receipt of the

SSF signal.

9Following the STM-1 frame format, depicted in Figure 2.5, the AU pointer (more precisely, the fourth

row in the nine columns at the left) determines the start of the VC-4, or in other words, the position of the

column with the VC-4 path overhead in the 261 columns at the right.



In contrast to the RS/MS_A_So function in the regenerator, the MS/S4_A_So

function has to add the AU pointer bytes. In other words, it will add some more all

1s bytes to the all 1s that it receives. This is important because these additional all 1s

bytes will allow downstream NEs to define an AUdAIS defect. Note that the MS/

S4_A_So function will not insert all 1s in these additional AU pointer bytes unless it

receives an SSF, or in other words, unless a defect has been detected and declared in

the upstream part in the signal path.

Note also the RDI signal in Figure 2.14: The receipt of the SSF signal in the

MS_TT_Sk function triggers the remote defect indication (RDI) consequent action

in the atomic function, resulting in an MS_RDI signal being inserted within 1ms in

RS

OS

MS

RS

OS

+ T

SF

+ S

SF

+ T

SF

+ T

SF

+ S

SF

+ T

SF

+ T

SF

+ S

SF

+ T

SF

dLOS + T

SF

+ S

SF

+ T

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

+ S

SF

+ S

SF

+ S

SF

+ S

SF

+ T

SF

+ T

SF

+ T

SF

+ T

SF

+ T

SF

+ S

SF

+ S

SF

+ S

SF

+ S

SF

+ S

SF

MS

RDI1 ms

Ingr

ess

Egr

ess

Time

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

MS_TT_Sk

MS/S4_A_Sk

MS_TT_So

MS/S4_A_So

S4_C HOP

RDI

Figure 2.14 Cable cut immediately upstream of a VC-4 cross-connection (top! down, from ingressto egress; left ! right, time; light gray, digitized noise; dark gray, all 1s).



the MSOH of the opposite MS trail. In summary, the result is that the NE starts

generating downstream an AU_AIS signal (all 1s in the AU pointer bytes [thus, H1

and H2] plus all VC-4 container bytes) within two frames and returning upstream

an MS-RDI signal (K2b6-810 set to 110) within 1ms after having detected and

declared the dLOS defect.

In Figure 2.15 we assume that the inlet (the incoming fiber) is connected to the

regenerator outlet (the outgoing fiber) of Figure 2.12. Both the MS_TT_Sk and the

MS/S4_A_Sk function will detect within one frame the all 1s signal in the MSOH

and the AU pointer bytes, respectively, and will, therefore, declare within three

frames the MSdAIS and AUdAIS defect, respectively, and start refreshing the all 1s

signal and sending a TSF and SSF signal, respectively. The result is again that

downstream an AU_AIS signal is generated within two frames and that upstream

an MS_RDI signal is returned within 1ms after the declaration of these defects.

The scenario considered in Figure 2.16 is slightly different in the sense that now

the inlet is connected to the outlet of the VC-4 cross-connection in Figure 2.14

instead of to the outlet of the regenerator in Figure 2.12. The NE now receives a

AU_AIS instead of an MS_AIS signal (the received MSOH is correct because the

MS trail starts in the NE performing the VC-4 cross-connection of Figure 2.14).

The main difference is that only the MS/S4_A_Sk function declares the AUdAIS

defect within three frames whereas downstream a new AU_AIS signal is still

generated within two frames after declaring the defect. Note also that the receipt

of a proper MS trail signal also avoids the need to return an MS_RDI signal

upstream (no consequent actions triggered in the MS_TT_Sk function). The con-

clusion is thus that a VC-4 cross-connection forwards an MS_AIS or AU_AIS

within 3þ 2 ¼ 5 frames.

How would a regenerator react on the input signals considered in Figure 2.15

(MS_AIS signal) and Figure 2.16 (AU_AIS signal)? Because all atomic functions in

a regenerator monitor only the RSOH bytes, these AIS signals can transparently

pass through a regenerator, because these AIS signals always have correct RSOH

bytes. Note also that there does not exist something like an RS_AIS signal.

At the other side, an LOP cross-connection (e.g., a VC-12 cross-connected in

a DXC-4/1) behaves similarly to an HOP cross-connection (e.g., a VC-4 cross-

connected in a DXC-4/4). In the signal path, an Sn_TT_Sk, an Sn/Sm_A_Sk, an

Sm_C, an Sn/Sm_A_So, and an Sn_TT_So function (where n and m represent the

order of the higher and lower order paths, respectively) are added for the LOP cross-

connection. The first adaptation source function is in this case the Sn/Sm_A_So

function, and thus, such a cross-connection will generate a TU_AIS (lower order

path payload and overhead plus the TU pointer bytes (the V1 and V2 bytes in the

case of a VC-2/12/11 or the H1 and H2 bytes in the case of a VC-3 contain an all 1s

signal) within two frames or multiframes (more precisely, 2 * 500 ms ¼ 1ms, except

in the case of a VC-3, where it takes only 2 * 125 ¼ 250 ms) instead of an AU_AIS

signal in the case of an HOP cross-connection. Of course, the Sn/Sm_A_So function

10Remember that we use the notation XXbY throughout this chapter to indicate bit Y in the overhead

byte XX.



RS

OS

MS

RS

OS

MSdAIS + T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

AUdAIS

MS

RDIRDIRDI1 ms

Ingr

ess

Egr

ess

Time

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

MS_TT_Sk

MS/S4_A_Sk

MS_TT_So

MS/S4_A_So

S4_C HOP

Figure 2.15 Incoming MS_AIS signal (¼ output from Figure 2.12) (top! down, from ingress to egress; left ! right, time;light gray, digitized noise; dark gray, all 1s).

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

5:1

9pm

page

66

66

CH

AP

TE

R2

SO

NE

T/S

DH

Netw

ork

s66

CH

AP

TE

R2

SO

NE

T/S

DH

Netw

ork

s

RS

OS

MS

RS

OS

+ S

SF

+ S

SF

+ S

SF

AUdAIS

MS

Ingr

ess

Egr

ess

Time

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

MS_TT_Sk

MS/S4_A_Sk

MS_TT_So

MS/S4_A_So

S4_C HOP

Figure 2.16 Incoming AU_AIS signal (¼ output from Figure 2.14) (top! down, from ingress to egress; left! right, time; lightgray, digitized noise; dark gray, all 1s).

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

5:1

9pm

page

67

2.3

Opera

tionalA

spects

67

2.3

Opera

tionalA

spects

67

can generate this TU_AIS signal only after a defect has been detected and declared

in the upstream sink part of the signal path. The OS_TT_Sk function might still

declare a dLOS defect almost immediately, whereas the MS_TT_Sk or MS/

Sn_A_Sk function might still declare an MSdAIS or an AUdAIS defect, respec-

tively, within three frames (thus, 3 * 125 ¼ 375 ms).

In all these cases, the Sn_TT_Sk function will trigger the consequent action for

returning upstream a HOP_RDI signal (which will be inserted within 1 ms by the

corresponding Sn_TT_So function). In all cases, except the latter one, the

MS_TT_Sk function will also trigger the consequent action for returning an

MS_RDI signal (as already demonstrated earlier) within 1 ms. In addition to that,

the Sn/Sm_A_Sk function might detect a TUdAIS within three frames or multi-

frames (more precisely, 3 * 500 ms ¼ 1:5 ms in the case of a VC-2/12/11). In summary,

an LOP cross-connection generates a TU_AIS within two frames or multiframes to

notify that it has detected a failure or to forward any AIS signal. In the worst case—

forwarding a TU_AIS signal in the case of a VC-2/12/11 cross-connection—this

might take up to (3þ 2) * 500 ms ¼ 2:5 ms. Depending on the incoming signal, it

might also return an MS_RDI and/or an HOP_RDI signal within 1 ms.

Finally, coming back to an HOP cross-connection, Figure 2.17 shows that the

process of detecting a failure and subsequently propagating this as an AIS signal

through the network is not a simple sequence of actions performed one after the

other. Figure 2.17 considers that the inlet of a VC-4 cross-connection is connected

to the outlet of the regenerator of Figure 2.13. That regenerator kept forwarding the

heavily distorted incoming signal before it decided to declare the dLOF defect and

started generating the MS_AIS signal. More precisely, the VC-4 cross-connection

receives the first all 1s only after 26 distorted frames. However, the MS/S4_A_Sk

function needs only 8 to 10 frames to decide that it lost track of the AU pointer and

thus declares the AUdLOP defect. This results in the generation of an AU_AIS

signal within two frames. After a while, the MS_AIS signal comes in, which results

in the declaration of the MSdAIS and AUdAIS defects in the MS_TT_Sk and MS/

S4_A_Sk function, respectively. In other words, the AUdLOP has been overruled

by the incoming MS_AIS signal. Although the NE simply keeps generating an

AU_AIS signal, it is only at this moment that returning upstream an MS_RDI

signal is triggered.

To conclude this section, we can summarize the examples as follows. After an

NE declares a defect, it generates an AIS signal (an all 1s signal) downstream and an

RDI signal upstream. Of course, the time to detect a failure depends on the actual

received signal (see also Section 2.2.3). From the examples, we clearly see that the

time to declare a dAIS defect is important (because the AIS signals are responsible

for the failure propagation process in the downstream direction). The time needed

to declare a dAIS defect takes three frames or multiframes. The last example

illustrates that having different active defect declaration filters (f1 filters) in an

NE can significantly complicate the failure propagation process, because these

filters run in parallel and independent from each other. In accordance with Section

2.3.1, once a failure has been declared, an AIS signal will be inserted in a down-

stream direction within two frames or multiframes and an RDI signal in an



RS

OS

MS

RS

OS

MSdAIS + T

SF

+ S

SF

+ T

SF

+ S

SF

+ T

SF

+ S

SF

AUdAIS[8, 10] Frames AUdLOP

MS

26 frames

+ S

SF

+ S

SF

+ S

SF

RDI1 ms

+ S

SF

+ S

SF

+ S

SF

Ingr

ess

Egr

ess

Time

RS_TT_So

RS/MS_A_So

OS_TT_So

OS/RS_A_So

MS_TT_So

MS/S4_A_So

S4_C

RDIRDIRDI

OS_TT_Sk

OS/RS_A_Sk

RS_TT_Sk

RS/MS_A_Sk

MS_TT_Sk

MS/S4_A_Sk

HOP

Figure 2.17 Late arrival of MS_AIS signal in a VC-4 cross-connection (top! down, from ingress to egress; left ! right,time; light gray, digitized noise; dark gray, all 1s).

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

5:1

9pm

page

69

2.3

Opera

tionalA

spects

69

2.3

Opera

tionalA

spects

69

upstream direction within 1 (or 4) ms. The type of AIS signal that is inserted

depends on the type of the NE. The examples show that a regenerator inserts an

MS_AIS signal, whereas a higher order path cross-connection results in the inser-

tion of a AU_AIS signal. Also the inserted RDI signal depends on the type of the

NE. In the examples, only an MS_RDI is inserted (by the MS_TT functions), but

Sn_TT functions (not considered in the examples) can also insert HOP_RDI or

LOP_RDI signals. When TT functions at multiple levels are involved, an NE can

insert multiple RDI signals simultaneously (of course, each of them transported in

the appropriate overhead bytes).

2.3.3 Fault Propagation and Notification on a Network Level

The previous section has shown that fault propagation and notification are done by

sending alarm indication signals (AISs) downstream and returning remote defect

indication (RDI) signals upstream, but that the actual behavior of the NE strongly

depends on how the incoming signal looks and thus how the NEs are intercon-

nected. This section aims at investigating how fault information propagates

through a network. For this purpose, we consider a network example consisting

of 10 NEs: five DXCs cross-connecting an LOP (DXC-4/1 A, B, H, I, and J in

Figures 2.18 and 2.19 and DXC-4/3 A, B, H, I, and J in Figure 2.20), three DXC-4/4s

cross-connecting a VC-4 (DXC C, F, and G) through which the lower order (LO)

VC is routed, and two regenerators (D and E). The lower order path is routed from

A to J and cross-connected in B, H, and I. The VC-4 under consideration is routed

between DXC-4/1s B and H and cross-connected in the DXC-4/4s C, F, and G. The

link between DXCs C and F is considered very long, requiring two regenerators (D

and E).

Let us consider the example of a fiber cut in front of regenerator D.

The behavior of regenerator D, DXC-4/4 F, and DXC-4/4 G has already been

explained in detail in Figures 2.12, 2.15, and 2.16, respectively. Figure 2.18 gives a

detailed but incomplete (i.e., not all possible defect declaration filters are shown)

overview of the most critical/interesting processes (defect declarations and AIS and

RDI signal insertion) running inside the NEs and which signals they exchange

with each other. Note that the dashed arrows represent TSF and SSF signals.

Figures 2.19 and 2.20 show the AIS and RDI signals between the NEs in a time

diagram.

Regenerator D detects the loss of signal resulting from the fiber cut, declares the

dLOS defect, and generates an MS_AIS signal at its output within two frames. It

has already been explained in the previous section that this MS_AIS signal simply

passes through regenerator E, because it contains only atomic functions processing

RSOH bytes (which are legal, because the RS trail starts in regenerator D). The

DXC-4/4 F detects the MS_AIS signal and, therefore, declares the MSdAIS defect.

Because this DXC terminates the MS trail, it tries to notify the origin of the MS

trail (thus, the DXC-4/4 C) by returning an MS_RDI signal upstream. Almost

simultaneously, another atomic function in this DXC-4/4 will also declare the



VC-12

VC-4

MS

RS

OS

VC-4

MS

RS

OS

VC-4

MS

RS

OS

VC-4

MS

RS

OS

MS

RS

OS

MS

RS

OS

MS

_AIS

MS_RDI

AU_AIS

TU_AIS

LOP_RDI

AU

_AIS

AU

_AIS

HOP_RDI

regen. regen.DXC-4/4DXC-4/1DXC-4/1 DXC-4/4 DXC-4/4

C D E F G H

DXC-4/1 DXC-4/1 DXC-4/1

dRDI : 3-5 Frames

aAIS: 2 Frames dLOS aRDI: 4 msaAIS:2 Multi-Frames

aAIS: 2 Frames aAIS:2 Frames

MSdAIS: 3 Frames

AUdAIS: 3 Frames

aRDI: 1 ms aRDI: 1 ms

VC-12

VC-4

MS

RS

OS

VC-12

VC-4

MS

RS

OS

A B

VC-12

VC-4

MS

RS

OS

VC-12

VC-4

MS

RS

OS

TU

_AIS

TU

_AIS

I J

dRDI: 3, 5 or 10 Frames

dRDI: 3, 5 or 10 Multi-Frames TUdAIS: 3 Multi-Frames

Figure 2.18 Overview of atomic functions and their responsibility in the fault propagation and notification process. (C.Brianza, et al. ‘‘Deliverable D2a: Overall Network Protection—Version 1,’’ deliverable from the ACTS-projectPANEL, April 1997).

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

5:1

9pm

page

71

2.3

Opera

tionalA

spects

71

2.3

Opera

tionalA

spects

71

AUdAIS defect (indeed, an MS_AIS signal implicitly carries an AU_AIS signal).

Assuming that the upstream direction is not affected by the fiber cut, DXC-4/4 C

will receive this MS_RDI signal and declare the dRDI defect within three to five

frames (or thus within 375 to 625 ms).

The DXC-4/4 F then forwards the AIS signal within two frames after the

declaration of the MSdAIS or AUdAIS defect. As illustrated in Figure 2.16, a

DXC receiving this AU_AIS signal (here, DXC-4/4 G) will forward this signal

within two frames after the declaration of the AUdAIS defect (which takes up to

three frames).

Although DXC-4/1 H cross-connects VC-12s, it also terminates the VC-4

originating in DXC B. Because this VC-4 is affected by the fiber cut, DXC-4/1 H

will receive after a while (more precisely, within 2þ 5þ 5 ¼ 12 frames, or 1.5 ms

when ignoring the propagation delays) an AU_AIS signal and declare the AUdAIS

defect within three frames. The declaration of this defect will result in returning an

HOP_RDI signal upstream to notify the origin of the VC-4 (here, DXC-4/1 B) that

the HOP has been affected. In parallel to this process, the DXC-4/1 H will declare

the TUdAIS defect. It is important to note that the TU_AIS signal, already

Regen. Regen.DXC-4/4 DXC-4/4 DXC-4/4 DXC-4/1DXC-4/1DXC-4/1DXC-4/1

A B C D E F G H I J

DXC-4/1

dLOS

2 ms

8 ms

6 ms

4 ms

1 m

s

35

1 m

s

35

3

3*4

= 12

2*4=

8

3*4

= 12

2*4=

8

3*4

= 12

12 ms

MS_RDI

HOP_RDI

MS_AIS

AU_AIS

AU_AIS

TU_AIS

TU_AIS

LOP_RDITO A

dRDI

dRDI

MSdAISAUdAIS

AUdAIS AUdAIS

TUdAIS

TUdAIS

TUdAIS

Figure 2.19 Time diagram for a VC-12 being cross-connected by DXC-4/1s.



implicitly present in the MS_AIS signal generated by regenerator D,11 can simply

transit the DXC-4/4s without being delayed while the AU_AIS signals perceive

some delay when propagating, or ‘‘rippling,’’ through the intermediate DXC-4/4s.

This results in a race between the TUdAIS and AUdAIS defect declaration process

in DXC-4/1 H. Because for the AU_AIS propagation process in DXC-4/4 F and G

the worst-case scenario has been assumed (3þ 2 ¼ 5 frames), the TUdAIS defect

declaration process in DXC-4/1 H slightly wins this race. However, DXC-4/4 F and

G are allowed to insert an all 1s signal in the AU pointer bytes already after one

frame. In this case, the AU_AIS rippling/propagation process would perform best.

The reason is that the TUdAIS defect declaration process is pretty slow when it is

based on multiframes. But as illustrated in Figure 2.20, a VC-3 instead of a VC-12

being cross-connected as LOP would definitely result in the TUdAIS defect declar-

ation process winning the race. Also in the case of a VC-12 being cross-connected,

the TUdAIS defect declaration process would definitely be faster when there would

be at least one or two more intermediate DXC-4/4s through which the AU_AIS

signal has to ripple.

Finally, from the moment that one of both defects has been declared, the DXC-

4/1 H will forward the AIS signal as a TU_AIS signal within two frames or multi-

frames. Similarly, as the AU_AIS signal had to ripple through the intermediate

11The MS_AIS signal consists of all 1s in the MSOH bytes, the AU pointer bytes (H1 and H2), and

the VC-4 path overhead and payload bytes. The VC-12 together with its TU pointer bytes (V1 and V2)

are encapsulated in the payload of the VC-4 (the C-4 container) and, thus, contain an all 1s signal: the

TU_AIS signal. Because the intermediate DXC-4/4s transparently cross-connect the VC-4 payload, they

also leave the TU_AIS signal intact.

Regen. Regen.DXC-4/4 DXC-4/4 DXC-4/4 DXC-4/3DXC-4/3DXC-4/3DXC-4/3

A B C D E F G H I J

DXC-4/3

dLOS

AUdAIS

AUdAIS

2 ms

6 ms

4 ms

1 m

s

35

1 m

s

35

3

MS_RDI

HOP_RDI

MS_AIS

AU_AIS

AU_AIS

TU_AIS

LOP_RDI

35

TU_AIS

35

TUdAIS

TUdAIS

TUdAIS

31

ms

dRDIdRDI

dRDI

MSdAISAUdAIS

Figure 2.20 Time diagram for a VC-3 being cross-connected by DXC-4/3s.



DXC-4/4 G, this TU_AIS signal has to ripple through the intermediate DXC I,

before reaching the destination DXC J. After declaring the TUdAIS defect, within

three frames or multiframes, the DXC J will return an RDI signal upstream to

notify the origin of the LOP (i.e., DXC A) that the LOP has been affected.

Comparing Figure 2.19 with Figure 2.20, the propagation and notification process

becomes relatively slow at the LOP layer when the LOPs are based on a multiframe

structure (e.g., in the case of a VC-12, as illustrated in Figure 2.19). For example, in

Figure 2.19, it may take up to 2.5 ms for the TU_AIS signal to ripple through the

intermediate DXC-4/1 I.

Although having more than one recovery technique is beyond the scope of this

chapter (it is the subject of Chapter 6), the race condition mentioned earlier (thus, in

DXC H) is a good illustration of the potential risk to trigger multiple recovery

actions at almost the same time. The race in Figure 2.19 might trigger in DXC H

recovery actions at the HOP and LOP layers, based on the declaration of the

AUdAIS and TUdAIS defect, respectively, quasi-simultaneously. Chapter 6 will

delve into the details of why such a situation is often not desirable and how it can be

avoided.

2.3.4 Automatic Protection Switching Protocol

In the previous sections, we saw how the alarm indication signal (AIS) and remote

defect indication (RDI) maintenance signals allow network elements along a con-

nection to learn that this connection is affected by a failure. These signals can then

trigger recovery actions in these network elements. SDH networks typically rely on

Automatic Protection Switching (APS) techniques that assume pre-established

backup resources protecting a certain set of working resources. The protection

switching actions are coordinated through the exchange of APS protocol messages

that are transported in part of the K overhead bytes. The goal of this section is to

discuss the APS protocol more generally, and the following section highlights

the various protection strategies available for SDH networks. For complete-

ness, Section 2.6 briefly describes restoration techniques that do not rely on pre-

established backup resources. The discussion in this section has a part devoted to

trail protection and another part devoted to subnetwork connection protection

(SNCP).

Trail Protection

Within a network layer, one can choose between trail and subnetwork connection

protection (SNCP). As illustrated in Figure 2.21, a sublayer is introduced to

implement the trail protection. The characteristic information consists of the

adapted information plus some overhead bytes. Based on these overhead bytes,

the trail termination functions are able to supervise the integrity of a network

connection. In addition, maintenance signals can be transported in those overhead

bytes.



The overhead bytes also provide some overhead bits (more precisely, K1 and

K2b1-512 on the MS level, K3b1-4 on the HOP level, and K4b3-4) dedicated to

transport the APS protocol messages. The trail termination (X_TT) functions in

Figure 2.21 terminate the trails and check their integrity (possibly resulting in a TSF

signal). Instead of handing the trail signal directly to the client adaptation function

(the X/Client_A function), it passes through the trail protection sublayer, which

consists of a connection (Xp C) function that realizes the Automatic Protection

Switching (APS) for a group of trails, a set of adaptation (X=Xp A), and trail

termination (Xp TT) functions. The protection sublayer adaptation (X=Xp A)

functions separate the APS protocol channel from the trail signal (and forward

the TSF from the X TT function as SSF) to the APS controller in the Xp C function

(see dashed arrows in Figure 2.21). It will accept a change in the APS requests only

after it remains for three consecutive frames or multiframes. In order to co-ordinate

the actions in the different network elements, the APS controllers communicate

with each other through these APS channels. The SSF signals that the APS

controllers receive will trigger the APS protection switching actions. Finally, the

TT functions in the protection sublayer (the Xp TT functions) forward the data and

APS Control

Switch/Selector

Bridge

APS Control

Switch/Selector

Bridge

Working WorkingExtra Extra

Working

Protection

X/Client_A

XP_TT

XP_C

X/XP_A

X_TT

Tra

il P

rote

ctio

n S

ub-L

ayer

Laye

r

Figure 2.21 Architecture for trail protection. (ITU-T Recommendation G.841, ‘‘Types and character-istics of SDH network protection architectures,’’ ITU-T Standardization Organization,October 1998. Available at: www.itu.int. Accessed May 2004.)


byte XX.



status signals (translation of SSF into TSF signal) from the trails selected in the

protection connection (Xp C) function.

Note that in the previous discussion, mainly the sink direction has been high-

lighted. The source direction is rather similar but the other way around; a major

difference is that the source signals are typically bridged onto backup/protection

trails, whereas in the sink a selection is made between the working and backup/

protection trails. Finally, note that Figure 2.21 illustrates the possibility to carry

extra traffic over the backup/protection trails while no failures affect the working

trails and to preempt this extra traffic in case the corresponding backup/protection

trails are needed if a failure occurs.

The example in Figure 2.21 is an illustration of linear APS, in which the APS

protocol involves only two nodes. There also exist more complex APS protocols

that involve more than two nodes interconnected in a ring configuration: MS

protection rings, which have been and are still important in today’s production

networks, are an example of such APS protocols. An in-depth discussion of these

ring protocols is beyond the scope of this chapter, in which the discussion is on the

strategies described in the following sections. The basics of the linear APS protocol

are illustrated in Figure 2.22. A detailed specification of linear and ring APS

protocols and rules is provided in [G841], and [Sex92] presents them in the form

of state diagrams and flowcharts.

SF, 2, 0

NoReq, 0, 0

SF, 2, 0

REV, 2, 2

SF, 2, 2

REV, 2, 2

SF, 2, 2

REV, 2, 2

1

23

N

1

23

N

12

N

12

N

12

N

12

N

12

N

12

N

12

N

12

N

WTR, 2, 2

REV, 2, 2

12

N

12

N

NoReq, 0, 2

REV, 2, 2

12

N

12

N

NoReq, 0, 2

NoReq, 0, 0

12

N

12

N

12

N

12

NNoReq, 0, 0

NoReq, 0, 0

Protection Switch Revertive Switch

A B

A B A B

Tim

e

Figure 2.22 Illustration of bidirectional linear 1:N APS protocol (bridge request format: type/priority,requested channel to bridge, local bridge status). (ITU-T Recommendation G.841, ‘‘Typesand characteristics of SDH network protection architectures,’’ ITU-T StandardizationOrganization, October 1998. Available at: www.itu.int. Accessed May 2004.)



Figure 2.22 is an illustration of 1:N linear APS. As explained in Chapter 1, this

means that one backup/protection entity protects N working entities. The 1:N

protection can be generalized to M:N protection, where M backup/protection

entities protect N working entities. If only one entity has to be protected, we can

choose between bridging the signal at the time of the failure (1:1 protection) or

permanently (1þ1 protection). In the latter case, there is no opportunity to carry

extra traffic on the backup/protection entity. The figure also assumes bidirectional

operation (sometimes this is called dual-ended operation). When one direction of a

bidirectional entity is sent over the backup/protection entity, the opposite direction

is also sent over backup/protection entity. It is worth mentioning that unidirectional

operation (sometimes called single-ended operation) is also possible.

The main idea we want to illustrate in Figure 2.22 is that the downstream node

(or the recovery tail end [RTE], following the terminology of Chapter 1) requests

from the upstream node (or the recovery head end [RHE], following the termin-

ology of Chapter 1) to bridge one of the N working channels onto the protection

channel. The RHE should then perform this bridge operation and inform the RTE

that it has fulfilled its request. From that moment, the RTE can safely select the

signal from the backup/protection channel instead of from that particular working

channel. Requests can result from external requests (e.g., manual request from the

network operator), from locally generated requests (e.g., because of detected prob-

lems on one of the incoming channels), or from a request from the opposite side. A

priority is assigned to each request type, allowing discrimination between multiple

concurrent requests.

More practically, Figure 2.22 assumes that channel 2 is affected only in the

direction from node B to node A. Therefore, at the time of the failure (left part of

the figure), node A (the downstream node) detects a signal fail (SF) condition.

Because there are no other requests, node A starts asking B (the upstream node) to

bridge channel 2. Once B receives this request, it verifies that this request does not

conflict with any other requests and performs the bridge. As a result, it will also

request A to bridge channel 2, to complete the bidirectional protection switching,

while it informs that it has already bridged itself channel 2 (note the ‘‘2’’ at the end

of the request). Note also that this involves a ‘‘REVerse’’ request, which has a lower

priority than the SF request from A to B. When A receives this request, it fulfills the

request by bridging channel 2 and informs B of this bridge (note the change from

‘‘0’’ to ‘‘2’’ at the end of the request) and notices that B fulfilled the bridge request

and, therefore, selects for channel 2 the signal from the backup/protection channel.

B also performs this selection from the moment it receives the notification of the

bridge in A.

The right part of the figure illustrates the operation when channel 2 is repaired.

Node A will recognize that it receives a proper signal on channel 2 and therefore

initiate the process to free the backup/protection channel and select channel 2 again.

The APS protocol assumes in most cases a revertive mode of operation. The

nonrevertive mode is typically applicable only to 1þ1 protection. Node A sends a

wait-to-restore (WTR) request (which has a priority higher than the reverse request

from B to A) to node B. This tells B that A has the intention to switch back but



that it waits some more time to be sure that no other requests are sent (e.g., channel

2 going down again after a very short while or a request from B with a lower

priority than that of the SF request). The WTR timer should be configured in the

range of 5 to 12 minutes. Once the WTR timer expires, node A selects the signal

from channel 2 again and requests from B to release the bridge by sending a ‘‘no

request’’ request. Note that the bridge in A still exists (and, thus, this is still carried

in the ‘‘no request’’ request) because B is still requesting the bridge from A. Once B

receives that ‘‘no request’’ request, it releases the bridge, and because there is no

need for keeping the REVerse request, the signal from channel 2 instead of that

from the protection/backup channel is selected again. Upon receipt of the ‘‘no

request’’ from B in A, it also releases the bridge and changes the local bridge status

in the request it is sending accordingly.

From the example at the left in Figure 2.22, it is clear that it can take up to three

times a one-way delay (along the protection/backup channel) before both nodes

complete the bridge and select operation. An additional one-way delay is needed to

inform the opposite side of the last change in the status. When the failure would affect

both (thus, also the direction from A to B) instead of one direction of channel 2, this

protection switch completion time of 3þ 1 ¼ 4 one-way delays would be reduced (in

the best case) to 2þ 1 ¼ 3 one-way delays, because B would start requesting the

bridge from A at the same time that A starts requesting the bridge from B. This one-

way delay consists at least of the propagation delay (0.5 ms per 100 km) plus the

duration of three consecutive frames13 (3 * 125 ¼ 375 ms) before a change in the APS

request is accepted. Let us consider, for example, a link of 100 km: In this case, the

one-way delay equals 0.875 ms and the protection switch completion time would

range between (2þ 1) * 0:875 ¼ 2:625 ms and (3þ 1) * 0:875 ¼ 3:5ms in the ideal

case. Note, however, that in practice it will take some time to process APS requests:

For example, when an APS controller is responsible for processing requests from

multiple APS signaling channels, processing the APS requests sequentially might

become an issue. Depending on the implementation of the network elements, per-

forming a bridge or select might also consume some time.

Subnetwork Connection Protection

As illustrated earlier, trail protection techniques have the advantage that the APS

controllers coexist in the same NEs with the trail termination functions checking the

integrity and allowing access to the APS signaling channels (the adaptation func-

tion in the trail protection sublayer splits only the APS signaling channel from the

data signal). The drawback is that in some circumstances a network operator wants

to protect part of the trail/network connection—thus, a subnetwork connection (as

mentioned earlier subnetwork connection protection [SNCP] is possible). Consider,

for example, the network scenario in Figure 2.23; the cloud represents the domain

administrated by one operator and the VC-4 under consideration has to be set up

through this network between nodes residing outside this network (e.g., in networks

13For simplicity, here, excluding multiframes in the case of VC-2/12/11 trail protection.



belonging to other network operators). The network operator wants to protect this

connection against failures that might occur in his or her own network. Because the

VC-4 trail is terminated outside his or her network, VC-4 trail protection is not

feasible. Because MS trails are set up between DXCs, DXC failures would not be

covered by MS trail protection. Therefore, the network operator wants to protect

the subnetwork connection corresponding to that part of the network connection

that is routed in his or her network.

The problem with subnetwork connection protection is that the subnetwork

connections have to be supervised, which is a typical responsibility of the trail

termination functions. A supervised subnetwork connection is sometimes also

called a tandem connection. There are mainly four methods for the supervision

process, as follows:

. Inherent supervision relies on the status information collected from the lower

layers (the MS and RS layers) to estimate the status of the tandem connec-

tion.

. Nonintrusive supervision: At the downstream end of the tandem connection,

a monitoring TT function (Xm_TT function) simply listens to the received

signal. An Snm_TT function is a classic Sn_TT function that terminates a

STM-N

STM-N

VC-4 Network Connection

VC-4 Tandem Connection

AU_AIS IncAIS IncAIS AU_AIS

AU_AIS AU_AIS

Sub

laye

r

S4_TT

MS/S4_A

S4_TT

MS/S4_A

MS_TT MS_TT

S4_C

MS/S4_A

MS_TT

S4_CS4_C

S4TC_TT

S4TC/S4_A

S4TC_TT

S4TC/S4_A

Figure 2.23 Sublayer tandem connection monitoring. (ITU-T Recommendation G.803, ‘‘Architectureof transport networks based on the synchronous digital hierarchy (SDH),’’ ITU-T Standard-ization Organization, March 2000. Available at: www.itu.int. Accessed May 2004.)



VC-n trail and is capable of detecting and declaring VCdAIS defects. A

VCdAIS is defined by an all 1s signal in the trail signal label (TSL) POH

(more precisely, the C2 byte in the case of a VC-4/3, and V5b5-7 in the case

of a VC-2/12/11).

. Intrusive supervision interrupts the actual trail to set up a supervisory

unequipped trail through the tandem connection.

. Sublayer supervision adds tandem connection trail termination and adapta-

tion functions, overwriting part of the overhead. This is illustrated in the

bottom part of Figure 2.23. The end-to-end VC-4 trail is set up between the

two S4_TT functions. It passes through three DXCs in the network operator

domain (S4_C functions). In the ingress and egress DXCs, the sublayer

tandem connection trail termination (S4TC TT) and adaptation (S4TC=S4 A)

functions are added, to supervise the tandem connection sublayer trail. Note

that SDH provides a dedicated part of the path overhead for the supervision

of tandem connections: This concerns the network operator (N) bytes.

An important aspect in tandem connection supervision becomes possible with

the last method: the ability to distinguish in the egress node between an AIS signal

resulting from a failure upstream of the ingress of the tandem connection or from a

failure that directly affects the tandem connection (thus, downstream of the ingress

node). In the first case, the incoming AIS signal is translated in the ingress to and

transported through the tandem connection as an IncAIS and translated back to

the original AIS signal (in the egress) before being forwarded further downstream.

Not only is the supervision of subnetwork connections a problem, but an

additional problem is the coordination of the APS actions in the RHE and RTE,

because it requires an APS signaling channel. For the moment, the N bytes path

overhead dedicated to tandem connections does not provide an APS channel, and

overwriting the K bytes may cause conflicts with the trail protection APS protocol.

Therefore, as shown in Figure 2.24, only unidirectional 1þ1 subnetwork connection

protection (SNCP) is supported (other modes are ‘‘for further study’’ according to

the standards). Thus, in the RHE, the connection is permanently bridged onto the

working and backup/protection subnetwork connection and the RTE simply selects

the best signal (based on the supervision processes described in Section 2.2.3). In

other words, the protection switch completion time does not involve any one-way

delay (as is the case in M:N linear protection) but only depends on the capabilities

of the RTE (i.e., the time it needs to change the selection). Figure 2.24 shows the

mode adopting the nonintrusive supervision method (SNCP/N). Leaving out the

optional monitoring trail termination (Xm_TT) functions would result in the mode

adopting the inherent supervision method (SNCP/I). According to [ETSI1], the

mode adopting the sublayer supervision method is possible (SNCP/S).

2.3.5 Summary

. F1 filters: defect declaration in sink atomic functions.

. F2 filters: consequent actions: aAIS (in TT_Sk, A_Sk and A_So) insertion

of all 1s within two frames or multiframes in downstream direction, aRDI



(in TT_Sk and TT_So) insertion of notification signal upstream within 1 or

4ms, aTSF/aSSF (in TT_Sk/A_Sk) enabling internal parallel auxiliary

signal fail signal.

. HOP DXC: forwards MS_AIS and AU_AIS signals within three (! detec-

tion) plus two (! insertion) frames as AU_AIS signal. TU_AIS transits

transparently (not delayed).

. LOP DXC: forwards MS_AIS, AU_AIS and TU_AIS signals within three

(! detection) plus two (! insertion) frames or multiframes. MS_AIS and

AU_AIS are detected always within three frames (¼ 375 ms).

. Race conditions: can occur between AIS propagation process in HOP and

LOP layers.

. Automatic Protection Switching (APS): linear versus ring, trail versus sub-

network connection protection, unidirectional versus bidirectional oper-

ation.

. 1:N APS: RTE request from RHE bridge ! RHE performs bridge and

notifies RTE ! RTE selects backup/protection channel.

. SNCP: only unidirectional 1þ1 mode (permanent bridge). Supervision of

the subnetwork connections is an issue and several methods exist.

2.4 Ring Protection

As mentioned in Section 2.3.4, SDH networks typically rely on protection tech-

niques to increase the overall network survivability. These protection techniques

can be categorized in ring and linear Automatic Protection Switching (APS)

???

SS

F

SS

F

TS

F

TS

F

X_C

Y/X_A

Y_TT

Optional: Xm_TT

Protected (Sub-)Network

Connection

Figure 2.24 Subnetwork connection protection (SNCP) with nonintrusive monitoring (SNCP/N).(‘‘Transmission and multiplexing (TM); generic requirements of transport functionalityof equipment; part 1–1: generic processes and performance,’’ ETSI EN 300 417-1-1 V1.2.1, October 2001.)


2.4 Ring Protection 812.4 Ring Protection 81

techniques. In particular, ring-based SDH networks have been very popular and

their dominance remains very significant. For this reason, the overview of the

various SDH recovery techniques starts in this section with the description of self-

healing ring network architectures, followed by a discussion of the linear protection

strategies in Section 2.5, and concludes with highlighting the possibilities for

restoration-based techniques in Section 2.6.

The popularity of ring networks can be explained as follows. First, they

typically feature add/drop multipliers (ADMs) that have only two aggregate

ports: Comparing with the more advanced DXCs (typically used in meshed-based

networks), ADMs became commercially available sooner and have a lower cost.

Ring networks are also rather simple network architectures (e.g., routing decisions

are limited to choosing the clockwise or counterclockwise direction on the ring) that

are able to meet important network operator requirements (e.g., survivability, as

discussed in this section). Therefore, the incentives for upgrading from a ring-based

network to an eventual meshed network (typically featuring DXCs) are not always

strong or clear enough, which resulted in ring-based networks becoming very

popular.

One can distinguish between three protection ring techniques: Multiplex

Section–Shared Protection Rings (MS–SP Rings), Multiplex Section–Dedicated

Protection Rings (MS–DP Rings), and SNCP Rings. MS-SP Rings and MS-DP

Rings are similar in the sense that in the nodes adjacent to a failure, they loop back

the traffic around the opposite side of the ring. However, they differ from each

other in the sense that the forward and backward directions of a bidirectional

connection are routed along the same side and opposite side of the ring in an

MS-SP Ring and an MS-DP Ring, respectively. This is possible because an MS-

SP Ring carries in both the clockwise and the counterclockwise direction 50%

working capacity and 50% protection/backup capacity, whereas all working cap-

acity is carried in one direction and all the protection/backup capacity in the

counter-rotating direction in an MS-DP Ring. Spatial reuse is an important feature

of MS-SP Rings. Nonoverlapping connection can be routed in the same time slot

(or thus capacity) in different sections of the ring. Thus, protection/backup time

slots can be shared among nonoverlapping connections in an MS-SP Ring, whereas

each connection is assigned a dedicated protection/backup time slot in an MS-DP

Ring. In an SNCP Ring, each connection is also assigned dedicated protection/

backup capacity, because the source node bridges (copies) the signal along the

opposite sides of the ring while the destination node selects the best received copy

(instead of locally looping back the traffic, as with an MS-SP Ring or an MS-DP

Ring).

Sections 2.4.1 through 2.4.3 describe the multiplex section–shared protection

ring (MS-SP Ring), the multiplex section–dedicated protection ring (MS-DP

Ring), and the subnetwork connection protection ring (SNCP Ring) technique,

respectively. How these ring networks can be interconnected in a reliable way is

outlined in Section 2.4.4. The discussion is summarized in Section 2.4.5, and

Section 2.4.6 highlights the analogies between SDH and SONET self-healing ring

techniques.



2.4.1 Multiplex Section–Shared Protection Ring

In an MS-SP Ring, the available capacity in the clockwise and counterclockwise

direction is split in two equal parts: 50% is devoted to carry working capacity and

the other 50% carries the spare capacity to protect the working capacity (Figure

2.25).

The operation of the MS-SP Ring of Figure 2.25 in the case of a link failure is

illustrated in Figure 2.26. Nodes adjacent to a failure (here, nodes B and C) detect

the failure and loop back the working capacity in the spare capacity in the opposite

direction around the ring (thus, along the path B-A-H-G-F-E-D-C and vice versa).

Of course, this requires that the intermediate nodes connect the spare capacity enter-

ing and leaving the ADM in the clockwise direction and the spare capacity entering

and leaving in the counterclockwise direction. As the figure illustrates, both the

forward and backward directions of a bidirectional connection (or HOP) are looped

back, because both directions are routed along the same side of the ring.

From Figure 2.26 it is clear that there exists different states for the nodes on the

ring. The three states are illustrated in Figure 2.27. In the absence of failures, all

nodes will be in the normal state. However, from the moment a failure occurs on the

ring, the adjacent nodes will trigger the APS protocol, causing the nodes adjacent to

the failure to loop back the traffic and all the other nodes on the ring to go into the

passthrough state. To trigger the appropriate state transitions in all ring nodes, it is

necessary that the nodes detecting the failure send an APS request along the short

A B C D

H G F E

ADM

Connection Working Capacity

Protection/Backup Capacity

Figure 2.25 Multiplex section–shared protection ring in a failure-free situation. (ITU-T Recommenda-tion G.841, ‘‘Types and characteristics of SDH network protection architectures,’’ ITU-TStandardization Organization, October 1998. Available at: www.itu.int. Accessed May2004.)



and long path to the other node adjacent to the failure. The short path (B-C in

Figure 2.26) is the segment on the ring from which the traffic is deviated through the

loop-back operation: APS requests along the short path are needed to inform the

upstream node of the status in the downstream node (this may trigger the APS

protocol in the upstream node if only one direction fails [e.g., one fiber in a fiber

pair between adjacent ADMs]). Note, however, that a node will never change from

the normal state to the bridged-and-switched state (thus, the looped-back state; see

middle of Figure 2.27) based on the receipt of an APS request along the short path.

The long path (B-A-H-G-F-E-D-C in Figure 2.26) is the ring segment along which

traffic is looped back. The main purpose of APS requests along the long path is to

request from the other node adjacent to the failure to bridge and switch the traffic

A B C

C

C

D

H G F E

ADM

Connection

ConnectionLooped Back

Working Capacity


B

B

Figure 2.26 Illustration of the operation of a multiplex section–shared protection ring. (ITU-T Recom-mendation G.841, ‘‘Types and characteristics of SDH network protection architectures,’’ITU-T Standardization Organization, October 1998. Available at: www.itu.int. AccessedMay 2004.)



(i.e., to activate the loop back), to request/ensure that all intermediate nodes go in

the passthrough state and to inform all nodes on the ring about the ring status (e.g.,

which span fails?). A detailed specification of the MS-SP Ring APS protocol is

given in [G841], whereas [Sex92] summarizes the main characteristics of the proto-

col in the form of state diagrams and flowcharts. The detailed specification of the

protocol (K1b1-4,14 bridge request type; K1b5-8, destination node ID; K2b1-4,

source node ID; K2b5, short/long; K2b6-8, status, inclusive MS-AIS and MS-RDI

signals) in [G841] directly affects the ring size. Node IDs are restricted to 4 bits, so

an MS-SP Ring can cover up to 16 nodes.

As mentioned earlier, changes in the status of a node (see Figure 2.27) are

triggered by requests received along the long path; thus, the one-way delay on the

long path between both nodes adjacent to a failure is important. Consider, for

example, a ring containing 16 nodes interconnected by links of 100 kilometers (km);

this one-way delay equals (16� 1) * 0:875 ¼ 13:125 ms (remember from

Section 2.3.4 that the one-way delay per link of 100 km equals 875 ms). Without

going into the details of the APS protocol, it takes between one and two one-way

delays on the long path before all nodes on the ring have changed their status. An

additional one-way delay along the long path is needed to inform all nodes on

Figure 2.27 States of the ring nodes (MS/SP_A, SP_TT, MS/Sn_A: group of N/2 atomic function for atwo-fiber STM-N MS-SP Ring). (ITU-T Recommendation G.841, ‘‘Types and characteris-tics of SDH network protection architectures,’’ ITU-T Standardization Organization,October 1998. Available at: www.itu.int. Accessed May 2004.)


byte XX.



the ring about the last change in a node status. This implies in this example that the

protection completion time would range between (1þ 1) * 13:125 ¼ 26:25 ms and

(2þ 1) * 13:125 ¼ 39:375 ms. Note, however, that these values assume an ideal

situation in which the time needed to process APS requests and to act accordingly

can be neglected.

Figure 2.27 also illustrates that the MS-SP Ring protocol can be classified as an

MS trail protection technique (the dashed rounded rectangle represents the APS

sublayer according to Figure 2.21). The protection adaptation functions (connected

to the MS_TT functions) splits up the administrative units (AUs) into a working

and a protection group of AUs. These groups are then cross-connected in the

protection connection function, according to the state of the node. The protection

sublayer trail termination functions connect the AUs to MS/Sn_A functions re-

sponsible for the pointer processing. Note that this is also true for the protection

AUs connected trough the node in the passthrough state (the STM-N frames do not

necessarily have to be aligned with each other).

Furthermore, in the normal state extra (or unprotected and not-yet-preempted)

traffic can be routed through the spare/protection capacity, but the extra traffic will

be preempted in case of a failure. MS-SP Rings support not only extra traffic but

also Non-preemptible Unprotected Traffic (NUT) (this feature is not illustrated in

the figure). This can be achieved by removing certain AUs and the corresponding

spare/protection AUs from the groups to which the MS-SP Ring APS protocol

applies. Of course, capacity (AUs or time slots) for supporting NUT has to be

allocated on all spans on the ring and cannot be restricted to certain segments on

the ring. Section 2.4.4 shows that the support of NUT can be interesting when

interconnecting two MS-SP Rings based on the virtual ring interconnection scheme

(otherwise, this would result in double protection).

The lower part of Figure 2.27 implicitly assumes a two-fiber MS-SP Ring

configuration. Figure 2.28 compares the two-fiber configuration with the four-

fiber MS-SP Ring configuration. In a two-fiber MS-SP Ring configuration, two

fibers interconnect the adjacent nodes in the ring, each carrying in the opposite

direction 50% working and 50% spare/protection capacity (i.e., one of the two fibers

belong to the clockwise ring, and the other to the counterclockwise ring). In a four-

fiber MS-SP Ring configuration, four fibers (or two-fiber pairs instead of one fiber

pair) interconnect two adjacent nodes in the ring: One fiber pair is dedicated to the

transport of the working capacity, and the other fiber pair is completely dedicated

to the spare/protection capacity. In other words, considering an STM-N MS-SP

Ring, a two-fiber configuration can at most transport N/2 (e.g., eight in the case of

an STM-16 ring) bidirectional protected VC-4s on each span, whereas a four-fiber

configuration is able to transport up to N bidirectional protected VC-4s.

Figure 2.29 shows that a four-fiber MS-SP Ring not only can accommodate

more traffic than a two-fiber MS-SP Ring (because although 50% of the capacity

still remains dedicated as protection/backup capacity, there is twice the amount of

capacity available in the network) but also can support span (or link) protection. In

a two-fiber configuration, each line failure will affect all working and spare/protec-

tion capacity in at least one direction; therefore, the traffic will always be looped



Sn_C

W(W)

W(P)

E(W)

E(P)

P(W

)

P(E

)

Wo(

W)

Xtr

(W)

Wo(

E)

Xtr

(E)

Sn_C

W(W) E(W)

P(W

)

P(E

)

Wo(

W)

Xtr

(W)

Wo(

E)

Xtr

(E)

MS/Sn_A

MS/SP_A SP_TT

MS_TT MS-SP RingSub-Layer

2-fiber MS-SP Ring 4-fiber MS-SP Ring

SP_C SP_C

W(P) E(P)

Working Fiber Pair

Prot./BackupFiber Pair

Single Fiber Pair For• 50% working cap. (white)• 50% Prot./Backup Cap. (gray)

Figure 2.28 Two-fiber versus four-fiber multiplex section–shared protection ring. (ITU-T Recommen-dation G.841, ‘‘Types and characteristics of SDH network protection architectures,’’ITU-T Standardization Organization, October 1998. Available at: www.itu.int. AccessedMay 2004.)

2-Fiber MS-SP Ring 4-Fiber MS-SP Ring

Figure 2.29 Span protection in four-fiber multiplex section–shared protection ring.



back in a two-fiber configuration (left side of the figure). In a four-fiber MS-SP Ring

configuration (right side of the figure), a failure on one fiber can, for example, affect

only a fiber carrying working capacity (in one direction). In this case (see right

upper part of the figure), the spare/protection capacity on that span remains

unaffected, and thus, the working capacity can be switched over to this spare/

protection capacity on that span. Of course, only a fiber carrying spare/protection

capacity can fail (see right middle part of the figure); in this case no APS will be

triggered (but failure propagation and notification for the extra traffic or NUT

being affected are still required). Finally, a single fiber cut in the two-fiber MS-SP

Ring can also correspond to the cut of both the working and the spare/protection

fiber (in the same direction) affected by a failure (see right bottom part of the

figure); in this case a loop-back operation as in the two-fiber configuration is

needed. Note also that span protection has the advantage of offering a similar

propagation delay as in the failure-free situation.

In Figure 2.26 the physical routing of the traffic in the case of a failure is shown.

From a logical point of view, Figure 2.26 shows that the MS-SP Ring APS lays out

a bypass for the working traffic through the spare/protection capacity (around the

ring) between the two nodes adjacent to the failure. This logical view (considering

the same network and failure scenario) is presented in Figure 2.30; a logical bypass

is laid out between nodes B and C.

Such a logical view might help us to understand more complex failure sce-

narios. For example, consider the scenario depicted in Figure 2.31, in which two

connections between A and D and between H and E are affected by two failures; the

failure on the span B-C and G-F affect connections A-D and H-E, respectively.

The failure on span B-C triggers the loop-back operation in nodes B and C.

A B C D

H G F E

Figure 2.30 Logical view of the operation of a multiplex section–shared protection ring (samescenario as in Figure 2.26).



Similarly, the span failure G-F triggers the loop-back operation in nodes G and F.

However, as Figure 2.31 illustrates, both actions will interfere with each other; the

loop back in B and G will create a logical bypass between both nodes (physically

routed via A and H), whereas the loop back in C and F creates a logical bypass

between C and F (physically routed via D and E). Both logical bypasses will result

in the misconnection of A with H and D with E, respectively. Therefore, to deal

with failures that affect (directly or indirectly) more than one span, the network

must have the ability to ‘‘squelch’’ (i.e., to replace by an AIS signal) some connec-

tions to avoid misconnection, as illustrated in Figure 2.31. This is also true for node

failures; a node failure will affect its west and east span. More precisely, all

connections that originate/terminate in a failing node or in a remote isolated

segment of the ring need to be squelched.

Figure 2.32 considers the same scenario as in Figure 2.31 except that both

failures do not occur simultaneously, but one after the other. The figure shows that

the spare/protection capacity can be shared between different connections as long as

these do not overlap. More precisely, a spare/protection AU (or time slot) allocated

along the whole ring corresponds to exactly one working AU or time slot on each

span along the ring and thus nonoverlapping higher order paths can be allocated to

that particular AU or time slot on each link they pass. Overlapping connections

would compete for the same AU or time slot on the same link, which thus prevents

them from sharing the same spare/protection AU or time slot. The ability of

nonoverlapping connections to reuse the same unit (here, AU or time slot) of

working capacity is sometimes called spatial reuse. This spatial reuse feature be-

comes possible because in an MS-SP Ring, the forward and backward directions of

a bidirectional connection are routed along the same side of the ring (note that this

is not the case in an MS-DP Ring, as is explained in Section 2.4.2), and thus only

occupy capacity on that segment of the ring. Consider, for example, that if a

A B C D

H G F E

Figure 2.31 Illustration of the need for squelching mechanisms (logical view).



two-fiber STM-16 MS-SP Ring containing eight nodes (A, B, C, D, E, F, G, and H),

and that if between each pair of neighbors eight VC-4s need to be set up, then all

working capacity will be occupied. Now for each pair of nodes that are adjacent to

the same neighbor (thus, each VC-4 will be routed over two links), we would need to

install or ‘‘stack’’ a second STM-16 ring (because per STM-16, a two-fiber MS-SP

Ring can accommodate only eight VC-4s per link); one will accommodate the traffic

between A and C, C and E, E and G, and G and A, and the other will accommodate

the traffic between B and D, D and F, F and H, and H and B.

This is independent from which segment (the short or long one) is chosen to

route the connection: Typically, the short segment will be chosen, but in rare

situations the long one may be chosen to balance the traffic over the ring (a typical

design objective is to minimize the number of higher order paths routed over the

A B C D

H G F E

A B C D

H G F E

Figure 2.32 Spare/protection capacity sharing between nonoverlapping connections. (ITU-T Recom-mendation G.841, ‘‘Types and characteristics of SDH network protection architectures,’’ITU-T Standardization Organization, October 1998. Available at: www.itu.int. AccessedMay 2004.)



highest loaded link in the MS-SP Ring). For example, consider again a two-fiber

STM-16 MS-SP Ring containing eight nodes (A, B, C, D, E, F, G, and H) and that

the following traffic needs to be set up: six VC-4s between B and C, four VC-4s

between F and G, and four VC-4s between A and D. Then how do you route the four

VC-4s between A and D: via B-C or via H-G-F-E? Because on the link B-C capacity

for only 8�6¼2 VC-4s remains free and on the link G-F 8� 4¼4 VC-4s, the best

choice (froma capacity point of view) is to route the fourVC-4s along A-H-G-F-E-D.

2.4.2 Multiplex Section–Dedicated Protection Ring

The operation of multiplex section–dedicated protection rings (MS-DP Rings) is

illustrated in Figure 2.33. The main difference between MS-DP Rings and MS-SP

A B C D

A B C D

H G F E

Connection

ConnectionLooped Back

Working Capacity


ADM

H G F E

Situation without Failure

Situation in Case of a Link Failure

Figure 2.33 Illustration of the operation of a multiplex section–dedicated protection ring.



Rings is that the forward and backward direction of a bidirectional connection is

routed along the opposite sides of the ring in an MS-DP Ring. More precisely, one

direction is dedicated to carry the working capacity, and the counter-rotating fiber is

dedicated to the spare capacity. As in an MS-SP Ring, the nodes adjacent to a failure

loop back all traffic on the working fiber around the ring onto the protection fiber.

That the forward and backward directions of a bidirectional connection are not

routed along the same side of the ring implies that a bidirectional connection will

occupy capacity along the whole ring. This prevents the ability for spatial reuse, as in

MS-SP Rings. For example, consider again a ring containing eight nodes (A, B, C, D,

E, F, G, andH) and that between each pair of neighbors eight VC-4s need to be set up.

Then we would need a stack of 8 times (because each VC-4 occupies capacity on all

links in the ring) 8 VC-4s ¼ 64 VC-4s or 4 STM-16 MS-DP Rings, whereas a single

two-fiber STM-16 MS-SP Ring suffices (despite that it can accommodate only eight

VC-4s on each link compared to the 16 VC-4s in an MS-DP Ring). However, an MS-

SP Ring requires a working and protection/backup time slot in both the clockwise

and the counterclockwise direction, whereas only a single time slot in each direction

suffices in an MS-DP Ring. Thus, an MS-SP Ring will only outperform an MS-DP

Ring (in terms of capacity efficiency) when it reuses/shares the same time slots

between on average more than two nonoverlapping connections.

As in MS-SP Rings, the loop-back operation in nodes adjacent to a failure may

also result in misconnections in MS-DP Rings (Figure 2.34). Note that this figure

considers only a single bidirectional connection from node A to node D (whereas

Figure 2.31 considers two bidirectional connections: between A and D and between

H and E). A double-failure scenario that affects both the forward and the backward

direction of a bidirectional connection will result in incorrectly connecting the

endpoints with themselves (more precisely, node A gets connected with node A

and node D with node D).15 Although such a misconnection involves only a single

A B C D

H G F E

Figure 2.34 Misconnections in multiplex section–dedicated protection rings.

15This assumes that no time-slot interchange (TSI) takes place in the intermediate nodes. This means that

a connection gets assigned the same time slot on all links in the ring.



connection, such operation should be avoided and instead an AIS signal should be

raised and propagated to the connection endpoints (squelching).

2.4.3 Subnetwork Connection Protection Ring

SNCP and its operational aspects have already been discussed in Section 2.3.4.

In SNCP the recovery head end (RHE) bridges (or copies) the signal along

two paths and the recovery tail end (RTE) selects one of both received

signals. Of course, there is nothing against adopting SNCP in a ring network

(Figure 2.35).

As explained earlier, currently SNCP works only in a unidirectional 1þ1

mode and thus does not involve any APS signaling; the advantage is that an

SNCP Ring is not necessary restricted to 16 nodes, as is the case for MS-SP

Rings (or MS-DP Rings) and that the protection switch completion time will

depend only on the capabilities of the downstream RTE. In terms of capacity

efficiency, an SNCP Ring performs equally as an MS-DP Ring: Each connection

is assigned capacity (one time slot in each direction) along the whole ring, which

prevents spatial reuse, as in MS-SP Rings. Because the signal is bridged (or copied)

on the backup path all the time, it is not available to support extra traffic. The

SNCP Ring concept is applicable to both the higher and lower order path layer,

whereas MS-SP Rings (and MS-DP Rings) assume higher order path connections

(see also Figure 2.27).

2.4.4 Ring Interconnection

In Sections 2.4.1 through 2.4.3, we have described different self-healing ring mech-

anisms. Of course, it is not always desirable to build a network consisting of only

Bridge

Switch/Selector

Bridge

Switch/Selector

A B C

DF E

Figure 2.35 Illustration of the operation of a subnetwork connection protection ring.



one ring. Typical networks consist of multiple rings. Consequently, a connection in

such a network might cross multiple rings. For example, let us consider the connec-

tion from node A to node F in Figure 2.36. Each ring can easily guarantee

survivability in case of failures inside the own ring. However, when the scope of

all protection mechanisms is restricted to a single ring, then the connections will go

from one ring to the other through a single ‘‘gateway.’’ This means that in Figure

2.36 the nodes C and H and the link between both nodes become a single point of

failure. Thus, in addition to outages of the source and sink ADMs of the end-to-end

connection, an outage of this interconnection gateway will have a large impact on

the overall unavailability of the end-to-end connection (at least, as long as double

failures within a single ring can be ignored). Nevertheless, having two or more rings

each protecting only part of a connection can improve the overall availability.

A ring that covers all the nodes in the network will typically be longer, so the

chance that this ring is affected by a double failure is higher, whereas with two or

more distinct rings some simultaneous failure can affect and thus be handled by

different rings (e.g., a simultaneous failure of link A-B and link J-F). In a nutshell,

dividing a network into multiple rings can be beneficial (and availability is only one

consideration), but it is crucial to ensure that the interconnection gateways do not

affect the overall availability too drastically. Other reasons could be, for example,

that a single ring simply cannot accommodate all traffic routed over a network or

that the end-to-end propagation delay becomes unacceptable.

The rings covering the whole network will typically be laid out so that as much

interring traffic (traffic crossing more than one ring) as possible is avoided. In other

words, only a small fraction of the overall traffic typically crosses multiple rings and

thus needs to be protected against failures of the interconnection gateways. For this

purpose, the interring traffic is sent through two gateways instead of one. The

following sections present different options to achieve this. The first one is the

A

B

C

D

H

E

G

F

J

I

Single Pointof Failure

Figure 2.36 Intrinsic vulnerability of single-node ring interconnections. (ITU-T RecommendationG.842, ‘‘Interworking of SDH network protection architectures,’’ ITU-T StandardizationOrganization, April 1997. Available at: www.itu.int. Accessed May 2004.)



virtual ring (VR) interconnection scheme (see the section titled Virtual Ring Inter-

connection). As in regular SNCP, in the VR interconnection scheme the traffic is

sent along two paths transiting different gateways. The second option is called drop

and continue (D&C): An interring connection is bridged in the gateway node

on the source ring and continues along the ring to leave the ring also through

another interconnection gateway, whereas a gateway node on the destination ring

selects the best of both copies. Thus, the VR interconnection scheme is a global

protection technique, whereas the D&C interconnection scheme is a local protec-

tion technique. D&C can be adopted to interconnect SNCP Rings (see the section

titled Drop and Continue Interconnection of SNCP Rings), to interconnect MS-SP

Rings (see the section titled Drop and Continue Interconnection of MS-SP Rings),

or to interconnect SNCP Rings with MS-SP Rings (see the section titled Drop and

Continue Interconnection of MS-SP and SNCP Rings). D&C is not applicable to

MS-DP Rings, but an alternative for the local protection of the interconnection

gateways between two MS-DP Rings is presented in the section titled Interconnec-

tion of MS-DP Rings. Finally, the section titled Interconnection of Stacked Rings

demonstrates that ring interconnection is of particular interest in a stack of rings,

whereas the section titled Node Architectures for Gateways between Self-Healing

Rings highlights different gateway node architectures.

Virtual Ring Interconnection

Figure 2.37 shows the virtual ring interconnection scheme. As in regular SNCP the

traffic is sent along two paths transiting different gateways (here, C-H and D-I);

both paths form a virtual ring (here, A-B-C-H-G-F-J-I-D-E-A). By properly

routing the traffic over the rings (thus, both paths along opposite sides of the

ring), any single point of failure (except in the connection endpoints of course)

can be avoided. Therefore, unless double-failure scenarios have to be considered,

there is no need to invest in additional protection/capacity to protect one or both

paths once again inside an individual ring. The double failure shown in Figure 2.37

is an example in which having no protection inside the individual rings (but only on

an end-to-end basis) does make a difference, because the two simultaneous failures

affect both paths and occur in different rings (whereas a simultaneous failure of link

A-B and link A-E instead of link A-B and link J-F would always lead to the

connection becoming unavailable, independent of whether the left ring protects

both paths).

Considering a higher order VC (HOVC) as an interring connection, not pro-

tecting both paths inside each ring results in occupying at most one time slot in each

direction on all links (see Figure 2.37). Protecting both paths inside the individual

rings would result in requiring twice the amount of capacity. Because SNCP Rings

and MS-DP Rings do not support spatial reuse, both protected paths would require

on all links one time slot in each direction. In an MS-SP Ring both paths can reuse/

share the same (working and protection/backup) time slot because they do not

overlap when properly routed (e.g., via A-B-C and via A-E-D in the left ring).

When the MS-SP Ring protection is not required, both paths can be transported as



non-preemtible and unprotected traffic (NUT), requiring only half of the capacity;

the protection/backup time slot along the ring then becomes available for the

transport of another interring HOVC protected by means of the virtual ring

interconnection scheme.

Drop and Continue Interconnection of SNCP Rings

Another option for dual-gateway ring interconnection is called drop and continue

(D&C). The D&C technique to interconnect SNCP Rings is illustrated in Figure

2.38. Instead of simply adding/dropping the signal in the gateway ADMs, the signal

is also continued to the next gateway ADM. Thus, the signal entering gateway node

C via B is continued to gateway node D and the signal entering gateway node D via

E is continued to node C. As long as at most a single failure exists in the left ring

(link A-B fails in the figure), both gateway nodes C and D are able to select a valid

copy of the signal (here, the one transiting node E because of the failure of link A-B)

and to hand it over to the other ring. The copy selected in node C reaches the

destination F via nodes H and G, whereas the copy selected in node D reaches

the destination F via I and J. Finally, the destination F selects one of both signals

(here, the one coming from node C because of the failure of link F-J). In summary,

the main difference between Figure 2.37 and Figure 2.38 (both assuming the same

double-failure scenarios) is that in the virtual ring interconnection no valid signal is

sent through the gateway C-H, whereas this is the case in D&C ring interconnec-

tion. Of course, a similar reasoning holds in the opposite direction from node F to

node A. Note also that the D&C operation introduces some additional capacity to

be allocated between both gateways; nevertheless, still only a single time slot on

each link is required.

A

B

C

D

H

E

G

F

J

I

No Signalto Select

No Signalto Select

Figure 2.37 Virtual ring interconnection. (ITU-T Recommendation G.842, ‘‘Interworking of SDHnetwork protection architectures,’’ ITU-T Standardization Organization, April 1997.Available at: www.itu.int. Accessed May 2004.)



Figure 2.39 is almost identical to Figure 2.38 except that a slightly different

failure scenario is considered; now the gateway link D-I instead of the link F-J fails

simultaneously with link A-B. The reasoning for the forward direction from node A

to node F still holds: Two valid copies of the signal are sent from the left to right

ring (one via gateway C and the other via gateway D), but in this case the copy sent

from node D via I and J does also not reach the destination F (this time because

of the failure of the gateway link D-I), and thus, F should select the other copy

of the signal received via node G. However, in the backward direction from node

F to node A the reasoning does not hold anymore. Indeed, gateway nodes H and

I can select both signals received from node F and send it further on to the

destination node A via C and B and via D and E, respectively. However, both

paths are affected (because of the failure of the link A-B and the gateway link D-I,

respectively), and thus, the destination node A does not receive a valid copy of the

signal at all.

Drop and Continue Interconnection of MS-SP Rings

Figure 2.40 illustrates the D&C interconnection of two MS-SP Rings instead of two

SNCP Rings (the working path is assumed to be routed along A-B-C-H-G-F). The

gateway node C (the first gateway along the path from node A to F) bridges/copies

the signal onto two different paths toward the corresponding gateway node on the

other ring. The add/drop signal is sent from node C directly to node H while the

continue signal is routed through the second gateway D-I. Node H then selects

the best copy out of both received signals (here, the signal it directly receives from

node C, because of the failure of the gateway link D-I) before sending the signal

A

B

C

D

H

E

G

F

J

I

Drop &Continue

Drop &Continue

Drop &Continue

Drop &Continue

Figure 2.38 Drop and continue method to interconnect two SNCP Rings. (ITU-T RecommendationG.842, ‘‘Interworking of SDH network protection architectures,’’ ITU-T StandardizationOrganization, April 1997. Available at: www.itu.int. Accessed May 2004.)



further to node G and F. A similar description holds for the opposite direction

from node F to A.

Despite that the right MS-SP Ring protects the continue signal against the

failure of the link H-I (Figure 2.41), it does not protect the continue signal against

the link failure D-I, so H should still select the signal it directly receives from node

C. In the figure, the left MS-SP Ring also simply protects the working signal against

the failure of link A-B. Nevertheless, although Figure 2.41 considers that a third

link fails (link H-I), in addition to the two simultaneous failures of Figure 2.39, the

connection survives the triple failure, whereas in Figure 2.39 the double failure is

already enough to interrupt at least the backward direction (from node F to

node A).

Figure 2.42 illustrates that the D&C and selection operation should not always

be performed in the ADMs of the same gateway (see left: same-side routing), but

that on both rings this can be performed in the other gateway (see right: opposite-

side routing).

Consider, for example, that in Figure 2.41 two interring connections have to be

routed—that is, between nodes B and G and between nodes E and J. Then the left

side of Figure 2.43 represents the most obvious situation between the two gateways

C-H and D-I. The left part of Figure 2.43 shows that on the links on the ring

between the gateway nodes (thus, links C-D and H-I in Figure 2.41), twice the

amount of capacity is required compared to the other links in the same ring. More

generally, D&C on MS-SP Rings may result in the links between the gateway nodes

becoming a bottleneck. In other words, this may require upgrading the whole ring

to a higher capacity (e.g., from STM-16 to STM-64). The middle part of Figure 2.43

A

B

C

D

H

E

G

F

J

I

Drop &Continue

Select Signalfrom G

No Signalto Select

Drop &Continue

Drop &Continue

Drop &Continue

Figure 2.39 Drop and continue method, but considering a slightly different failure scenario. (ITU-TRecommendation G.842, ‘‘Interworking of SDH network protection architectures,’’ITU-T Standardization Organization, April 1997. Available at: www.itu.int. AccessedMay 2004.)



B

C

D

A

H

E

G

F

J

I

Drop &Continue

Drop &Continue

Figure 2.40 Drop and continue to interconnect two multiplex section–shared protection rings. (ITU-TRecommendation G.842, ‘‘Interworking of SDH network protection architectures,’’ITU-T Standardization Organization, April 1997. Available at: www.itu.int. AccessedMay 2004.)

B

C

D

H

E

G

FA

J

I

Drop &Continue

Drop &Continue

Figure 2.41 Drop and continue to interconnect two multiplex section–shared protection rings, butconsidering a triple failure instead of a single failure. (ITU-T Recommendation G.842,‘‘Interworking of SDH network protection architectures,’’ ITU-T Standardization Organ-ization, April 1997. Available at: www.itu.int. Accessed May 2004.)



Figure 2.42 Same-side versus opposite-side drop and continue routing. (ITU-T RecommendationG.842, ‘‘Interworking of SDH network protection architectures,’’ ITU-T StandardizationOrganization, April 1997. Available at: www.itu.int. Accessed May 2004.)

Figure 2.43 Scenarios for handling the additional capacity needed for the continue signal. (ITU-TRecommendation G.842, ‘‘Interworking of SDH network protection architectures,’’ITU-T Standardization Organization, April 1997. Available at: www.itu.int. AccessedMay 2004.)



illustrates how transporting some of the continue signals as extra traffic in the MS-SP

Ring protection/backup capacity can help to leverage this bottleneck problem. Note,

however, that carrying continue signals as extra traffic in the MS-SP Ring protection/

backup capacity will reduce the set of failure scenarios that can be handled properly.

The right part of Figure 2.43 shows an alternative solution: installing a third gateway

between both gateways. In this way (because of the spatial reuse capability), the load

of the congested link between both gateway nodes can be spread over the correspond-

ing two links between the two outer gateways and the added gateway in the middle.

Drop and Continue Interconnection of MS-SP and SNCP Rings

Finally, it is also possible to use D&C to interconnect an MS-SP Ring with an

SNCP Ring (Figure 2.44). The figure shows that the routing in the MS-SP Ring part

is identical to the left part in Figure 2.41 (D&C and selection operation in node C),

whereas the routing in the SNCP Ring part is identical to the right part of Figure

2.38 and Figure 2.39 (in the forward direction from A to F, the two copies of the

A

B

C

D

E

Drop &Continue

Drop &Continue

Drop &Continue

G

F

J

I

H

Figure 2.44 Drop and continue interconnection of a multiplex section–shared protection ring and anSNCP Ring. (ITU-T Recommendation G.842, ‘‘Interworking of SDH network protectionarchitectures,’’ ITU-T Standardization Organization, April 1997. Available at: www.itu.int. Accessed May 2004.)



signal are routed via the opposite sides of the ring to the destination node F, where

the best copy is selected, whereas in the backward direction from F to A, both

gateway nodes H and I select the best signal out of the two copies received from F,

before sending them to nodes C and D, respectively, on the left ring).

Note that in all scenarios described earlier, the D&C ring interconnection does

not involve more than plain SNCP—that is, upstream a permanent 1þ1 bridge (the

D&C operation) and downstream a selection between both bridged signals. More

details on D&C ring interconnection schemes can be found in [G842] and [ETSI2].

Interconnection of MS-DP Rings

Figure 2.45 shows that it is also feasible to interconnect two MS-DP Rings through

two gateways. However, this technique is not based on the drop and continue

principle but features similar capabilities. The forward direction (from node A to

node F) is routed along the path A-E-D-I-J-F, whereas the backward direction

(from node F to node A) is routed along the path F-G-H-C-B-A. In contrast to the

D&C techniques, only a single copy of the signal is sent through one of the gateways

(different gateways for the forward and backward directions). The capacity that is

not used on the working fibers between both gateways (thus, on link D-C and link

H-I) and the unused capacity on the gateway links (thus, link D-I and link H-C)

allows to preestablish a backup loop (D-C-H-I-D) that protects against gateway

failures. For example, during a failure of the gateway link D-I, the forward direc-

tion of the signal is looped back in node D and sent along nodes C and H to node I,

where it is looped back once again to continue on its original route.

A

B

C

D

H

E

G

F

J

I

Figure 2.45 Dual-gateway interconnection of two multiplex section–dedicated protection rings.



Figure 2.46 also shows that the MS-DP Rings protect this loop against failures

of the links on the ring between both gateways (in the figure, link H-I fails, so all

traffic on the working fiber is looped back onto the protection/backup fiber from

node H, via G, F, J to node I). Finally, the left MS-DP Ring protects the backward

direction from node F to node A against the failure of link B-A.

Interconnection of Stacked Rings

Figure 2.47 illustrates that a stack of rings (e.g., when the capacity of a single ring is

not enough to accommodate all traffic to be routed) is a particular situation in which

interconnection of rings is important. The figure shows a stack of three STM-N

Rings; each of them physically passing through four nodes but logically being

terminated by an ADM in only two nodes. The physical location in the back

functions as a hub node, where each ring features an ADM. A connection between

two nodes that are distinct from the hub (Figure 2.47) must be routed through the

hub to go from one ring to another. Routing all STM-N Rings in the stack along

the same physical path requires the least amount of cable to be installed in the

ground (e.g., a cable accommodates 200 fibers). Instead of Space Division Multi-

plexing (SDM), one can also think about solving fiber exhaust problems by

multiplexing stacked STM-N Rings onto a single fiber by means of Wavelength

Division Multiplexing (WDM), instead of transporting each STM-N on its own

fiber pair (or pairs).

In the literature many articles are available on techniques to minimize the

number of required ADMs to be installed in a stack of rings [Ari00], [Col00] and

A

B

C

D

H

E

G

F

J

I

Figure 2.46 Dual-gateway interconnection of two multiplex section–dedicated protection rings, butconsidering a triple failure instead of a single failure.



[Mod01]; this can be achieved by properly grooming connections into the appropri-

ate rings. Unfortunately, most articles have until now been ignoring any potential

need for dual-gateway interconnections in such stacked ring network designs.

Node Architectures for Gateways between Self-Healing Rings

All the examples described thus far implicitly assume that the add/drop ports of an

ADM on one ring are hard-wired to the add/drop ports of an ADM on another

ring. The ADMs are said to be interconnected back to back (see also top part of

Figure 2.48). However, Figure 2.47 illustrates that often more than two rings will be

interconnected with each other in one location. Therefore, one might consider

increasing the flexibility of the ring interconnection by installing a central digital

cross-connect (DXC) in that location, as shown in the middle part of Figure 2.48.

Note that the DXC simply cross-connects the signals from one ring to the other and

is not involved in any D&C or other technique to protect the gateways between the

rings (this is still the responsibility of the ADMs on the rings). The bottom part of

Figure 2.48 shows that such a DXC can also directly terminate the STM-N rings

Cable

STM-NRIng

STM-NRIng

STM-NRIng

Figure 2.47 An illustration of a stack of rings.



instead of passing through some intermediate ADMs. Of course, having a DXC

that directly terminates the STM-N rings does not prevent the need for D&C; for

example, D&C is still needed to survive from a failure scenario as depicted in Figure

2.38, independent of whether ADMs C and H and ADMs D and I are integrated

into two DXCs directly terminating both rings.

2.4.5 Summary

. Multiplex section–shared protection rings (MS-SP Rings): 50% working

and 50% protection/backup capacity in clockwise and counterclockwise

directions; two- and four-fiber modes; possibility for spatial reuse (better

capacity efficiency than in dedicated protection rings when capacity can be

shared among on average more than two nonoverlapping connections);

traffic is looped back in nodes adjacent to a failure; squelching needed to

avoid misconnections; restricted to at most 16 nodes because of the APS

protocol specs.

. Multiplex section–dedication protection rings (MS-DP Rings): One working

fiber in one direction and one protection/backup fiber in opposite direction;

no spatial reuse; traffic is looped back in nodes adjacent to failure; squelch-

ing needed to avoid misconnections; restricted to at most 16 nodes because

of the APS protocol specs.

. Subnetwork connection protection rings (SNCP Rings): Signal permanently

bridged in RHE and both signals sent in opposite directions along the ring;

RTE selects the best copy out of the two signals received from the RHE; no

Ring 1 Ring 2

Ring 1 Ring 2

Ring 1 Ring 2

ADM ADM

ADM ADMDXC

DXC

Figure 2.48 Node architectures for ring interconnection.



spatial reuse; no misconnections; in current SNCP, no APS signaling

needed, so no restrictions on the number of ring nodes and shorter protec-

tion completion times than in MS-SP Rings or MS-DP Rings.

. Ring interconnecetion: Dual-gateway interconnection schemes prevent the

ring interconnection gateways from becoming single points of failure.

. Dual-gateway interconnection schemes: virtual ring and drop and continue

(D&C) to interconnect any combination of SNCP and/or MS-SP Rings;

customized interconnection scheme to interconnect MS-DP Rings.

. Advantage of D&C compared to virtual ring: Rings can protect independ-

ently from each other against single failures (! allows simultaneous single

failures in distinct rings).

. D&C in MS-SP Rings: Risk for overloading the links on the ring between

the gateways! transport fraction of the continue capacity as extra traffic in

the MS-SP Ring protection/backup capacity or install additional gateway

nodes to allow spreading the load on the links between the gateways.

. Interconnection of MS-DP Rings and stacked rings: Not only interconnection

of physically separated but also interconnection of stacked rings is an issue.

. Gateway node architectures: Back-to-back interconnection of ADMs; in-

creased flexibility in the form of a central DXC to which all ADMs in a

gateway connect; single DXC that directly terminates all interconnected

rings.

2.4.6 Differences between SONET and SDH

The protection rings described in the previous sections are protection rings for SDH

networks. For each of them, a counterpart exists in SONET networks (although

conceptually identical, they are not fully interoperable with SDH rings, because of

some minor differences in the APS protocol details). The main difference is that a

different terminology is adopted in SONET networks. More precisely, the format is

xySR. The first character (x) represents whether it concerns a unidirectional (x ¼U)

or a bidirectional (x ¼ B) ring (thus, whether all working capacity flows in one

direction and all protection/backup capacity in the opposite direction). The second

character (y) refers to the recovery extent; it indicates whether it concerns a line

(y ¼ L) or path (y ¼ P) switched ring (respectively, called local and global recovery

in Chapter 1).

. An MS-SP Ring is called a bidirectional line switched ring (BLSR) in

SONET networks. To discriminate between two- and four-fiber configur-

ations, the number of fibers is added (respectively, BLSR/2 and BLSR/4

rings).

. An MS-DP Ring is called a unidirectional line switched ring (ULSR) in

SONET networks.

. An SNCP Ring is called a unidirectional or bidirectional path switched

ring (UPSR or BPSR) in SONET networks. Note that in a UPSR ring, the

RTE will select by default the signal received through one of its ports (e.g.,



the ‘‘west’’ port). Only if this signal is affected will the RTE select the signal

coming in through the other port (here, the ‘‘east’’ port).

2.5 Linear Protection

In Section 2.3.4 APS has been discussed in more detail from an architectural and

protocol viewpoint. Linear protection was used there as an example. The goal of

this section is to discuss different strategies based on linear protection switching:

Sections 2.5.1 and 2.5.2 deal with multiplex section protection (MSP) and path

protection, respectively, and Section 2.5.3 summarizes the main conclusions. Note

that a similar discussion of the more advanced but widely deployed self-healing ring

protocols are the subject of Section 2.4.

2.5.1 Multiplex Section Protection

Linear protection is often applied on the multiplex section (MS) level (see Figure

2.22). Figure 2.49 illustrates the most general case: M:N (here 2:3) multiplex section

protection. A span between two network elements (e.g., two DXCs) consists of five

STM-N signals; two of these protect the span against a single or double failure on

the three working signals. A double failure is shown in the figure; that is, the failure

of STM-N working channel 1 is circumvented by routing the signal through the

protection/backup channel 1, whereas a failure of working channel 3 is circum-

vented by routing the signal through protection/backup channel 2. Note that in the

figure the bidirectional mode has been considered. Thus, although only one direc-

tion of the working channels is affected, both the forward and the backward

direction of the signal are switched over to the protection/backup channels. As

explained in Section 2.3.4 the protection/backup channels can be used for the

transport of extra traffic when they are not used for protection purposes. M:N or

12312

12312

Working STM-N SignalProtection/Backup STM-N Signal

Figure 2.49 Bidirectional linear M:N multiplex section protection; here, M ¼ 2, N ¼ 3. (ITU-T Recom-mendation G.841, ‘‘Types and characteristics of SDH network protection architectures,’’ITU-T Standardization Organization, October 1998. Available at: www.itu.int. AccessedMay 2004.)


2.5 Linear Protection 1072.5 Linear Protection 107

1:N with N larger than 1 assumes that not all channels will fail simultaneously;

therefore, linear M:N or 1:N protection will typically not be used to protect against

cable cuts (what would typically lead to all channels being affected) but to protect,

for example, against line card failures.

In Section 2.3.4 it was pointed out that subnetwork connection protection

(SNCP) relies on unidirectional linear 1þ1 protection. Figure 2.50 shows that

(unidirectional) linear 1þ1 protection can be applied on the multiplex section

(MS) level on a span between two network elements; the recovery head end

(RHE) bridges/broadcasts the signal onto two distinct STM-N channels and the

recovery tail end (RTE) selects the best one. Because multiplex section protection

(MSP) is typically implemented as trail protection, there would be nothing against

operating linear 1þ1 MSP in the bidirectional mode (but unidirectional operation is

considered in the figure to differentiate it from Figure 2.49).

As mentioned earlier, linear M:N or 1:N MSP with N larger than 1 is typically

applied to protect against equipment failures like line card failures. As shown in

Figure 2.51, often the network-wide recovery schemes do not cover the intercon-

nection between the client network equipment and the tributary ports to which this

client equipment is connected. Therefore, very often linear 1:N MSP is foreseen to

protect against failures of the tributary ports on a network element. For example

(Figure 2.51), client equipment can connect to an ADM through one or more

STM-1 ports. The ADMs support linear 1:N protection to protect these STM-1

ports, whereas inside the network they support any of the self-healing ring protocols

described in Section 2.4. As illustrated in Figure 2.9 in Section 2.2.4, the VC-n TT

will be left to the client network equipment, because the client network equipment is

connected through SDH STM-1 ports and thus there is no need for the SDH NEs to

deal with client-specific signal processing.

2.5.2 Path Protection

In Section 2.3.4 it has already been explained that subnetwork connection protec-

tion (SNCP) (instead of trail protection) currently relies on unidirectional linear

Working STM-N SignalProtection/Backup STM-N Signal

Selection

Bridge

Bridge

Selection

Figure 2.50 Unidirectional linear 1þ1 multiplex section protection. (ITU-T Recommendation G.841,‘‘Types and characteristics of SDH network protection architectures,’’ ITU-T Standardiza-tion Organization, October 1998. Available at: www.itu.int. Accessed May 2004.)



1þ1 protection. As Figure 2.52 illustrates, this means that the recovery head end

(RHE) bridges/broadcasts the signal onto two distinct paths (here, RHE ¼ node A),

whereas the recovery tail end (RTE) selects the best copy it receives via both paths

(here, RTE ¼ node D), based on the supervisory processes described in Section

2.2.3 and the failure notification and propagation processes described in Section 2.3

(and when adopting the revertive mode of operation, a WTR is applied to prevent

frequent protection switching actions).

The advantage of SNCP is that it is typically applied in the path (VC-n signals)

instead of section layers. Thus, SNCP can also protect against node failures—for

example, against outages of node B or node C in Figure 2.52. This would not be

possible when adopting linear MSP: Here, linear MSP would be able to protect only

the spans between nodes A and B, between nodes B and C, and between nodes C

and D, but any outage of node B or C itself would result in the path from node A to

node D becoming unavailable. Remember also from Section 2.3.4 that SNCP

STM-N Self-Healing Ring

Coverage STM-1Linear 1:4 MSP

Coverage STM-1Linear 1:4 MSP

Coverage STM-N Ring Protection

Figure 2.51 STM-1 linear 1:N MSP (here, N ¼ 4) to protect tributary ports to/from client equipment.

A

B C

E

D

F

Figure 2.52 End-to-end subnetwork connection protection (only one direction shown). (ITU-TRecommendation G.841, ‘‘Types and characteristics of SDH network protection archi-tectures,’’ ITU-T Standardization Organization, October 1998. Available at: www.itu.int.Accessed May 2004.)



should not be necessary to protect a connection completely from ingress to egress.

For instance, the nodes A and D in Figure 2.52 do not necessarily terminate the

protected connection.

One drawback of applying SNCP on an end-to-end basis in large-scale net-

works is that both paths on which the signal is bridged/broadcasted in the RHE

may fail simultaneously, even when they are routed completely physically disjoint

from each other. For example, the failure scenario considered in Figure 2.53 would

result in the connection protected, as illustrated in Figure 2.52, becoming unavail-

able. Of course, one can think about dividing the network in subnetworks and

adopting SNCP protection inside each individual subnetwork. However, in

that case the interconnection of the subnetworks may become a concern from an

availability point of view. As Figure 2.53 illustrates, drop and continue (D&C)

can be adopted to overcome this problem: Note that the routing (only one

direction shown) is similar to the one illustrated in Figures 2.38 and 2.39 in

Section 2.4.4 (indeed, SNCP Rings simply apply SNCP on a connection-

per-connection basis inside a ring network, but there is nothing specific for

rings). As Figure 2.53 shows, there is no need to explicitly divide the network in

subnetworks; one can apply SNCP enhanced with D&C to build a kind of a

‘‘ladder’’ network on a connection-per-connection basis to increase the end-to-

end availability.

Finally, Figure 2.54 illustrates that linearM:Nor 1:N (here, 1:3) protection is also

applicable at the path instead of section levels. In the figure, one VC-n path protects

against the failure of one out of the three working VC-n paths. The figure also shows

that the protection/backup VC-n path can be used for the transport of extra traffic,

but when it is needed for the protection that extra traffic will be preempted. The main

difference with linear MSP is that it is not applied on a span-per-span basis but on an

end-to-end basis. A drawback of the linear 1:N path protection is that the VC-n trail

needs to be terminated in the RHE and RTE (thus, ingress and egress, respectively)

and thus that these nodes should be capable of processing client-layer information

A

B C

D

EF

Figure 2.53 Subnetwork connection protection with drop and continue mechanism to increaseend-to-end availability (only one direction shown). (ITU-T Recommendation G.842,‘‘Interworking of SDH network protection architectures,’’ ITU-T Standardization Organ-ization, April 1997. Available at: www.itu.int. Accessed May 2004.)



signals. Therefore, the ports to which the client network equipment connects should

not necessary be SDH compliant, because the SDH edge network elements should

process the client-layer signal anyway. Nevertheless, it might be worth considering

linear M:N or 1:N path protection, because it also covers failures of intermediate

nodes (which is not the case for linear MSP, because intermediate nodes terminate the

multiplex sections [MSs]) and it outperforms SNCP in terms of capacity efficiency

(the amount of additional protection/backup capacity compared to the working

capacity only equals 33% in Figure 2.54 instead of the 100% needed when adopting

SNCP). Of course a drawback of adopting 1:N instead of 1þ1 path protection is that

the protection switching actions in the RHE and RTE must be coordinated by means

of an APS signaling protocol, and thus, the protection switching completion time will

be significant longer (this is especially true with geographically large networks

because of the long propagation delays).

Of course, one should be careful about routing the 1þN paths disjoint

from each other with linear 1:N path protection. The (node) connectivity for a

particular node pair is defined as the maximum number of (node) disjoint paths

that exist between both nodes. To derive the node connectivity, the network

topology needs to be translated into a dual-network representation by means of

following steps:

. Each node is represented by an ‘‘in’’ and ‘‘out’’ vertex, interconnected by a

directed edge from ‘‘in’’ to ‘‘out’’ vertex.

Preemption ofExtra Traffic

Preemption ofExtra Traffic

1:3 VC-n Trails

Figure 2.54 Linear 1:N path protection (here, N ¼ 3) with support of extra traffic (only one directionshown). (ITU-T Recommendation G.841, ‘‘Types and characteristics of SDH networkprotection architectures,’’ ITU-T Standardization Organization, October 1998. Availableat: www.itu.int. Accessed May 2004.)



. Each link is represented by two directed edges, leaving its source in the

‘‘out’’ vertex and entering its destination in the ‘‘in’’ vertex.

. Each edge is assigned a single unit of capacity

The maximal flow that can be set up through this dual-network representation

from the ‘‘out’’ vertex in the source node to the ‘‘in’’ vertex in the destination node

determines the node connectivity of the network between source and destination

node. Assuming that the 1þN is less than or equal to the connectivity between

source and destination nodes, supplying 1þN capacity units in the ‘‘out’’ vertex in

the source node and demanding 1þN capacity units in the ‘‘in’’ vertex in the

destination node and running a standard minimum cost flow problem will deter-

mine the cheapest16 and mutually routing of the 1þN connections needed for the

linear 1:N path protection. Note also that supplying/demanding two capacity units

will also determine the cheapest routing of both paths in SNCP protection; this is

illustrated in Figure 2.55.

Edge over which a Capacityunit of the flow is routed

2

2

1

1 1

1 1

[0,1] [0,1]

[0,1] [0,1]

[0,1] [0,1]

[2,1]

[2,1]

[2,1]

[2,1]

[1,1]

[1,1]

[1,1

]

[1,1

]

[1,1

]

[1,1

]

[1,1]

[1,1]

[1,1] [1,1]

Supply = 2 Units

Demand = 2 units

X: Cost/Length[X: Cost, y: Cap]

Figure 2.55 Illustration of the dual-network representation for calculating disjoint paths.

16Of course, cheapest means in terms of a cost assigned to each directed edge to carry one unit of capacity

(or to route one capacity unit over a network link or through a node). This cost, thus, makes abstraction

of the granularity of the network equipment.



2.5.3 Summary

. Linear multiplex section protection (MSP): Span protection (thus, excludes

node failures); protection against line card failures; often used for protect-

ing tributary ports (to client network equipment).

. Linear path protection: Often as subnetwork connection protection (SNCP),

thus, unidirectional 1þ1 protection; D&C increases end-to-end availability

of SNCP; linear M:N or 1:N path protection can improve the capacity

efficiency significantly but requires that RHE and RTE being capable of

processing client layer signals.

2.6 Restoration

The goal of this section is to investigate more flexible recovery mechanisms. Section

2.6.1 compares protection versus restoration techniques, and Section 2.6.2 summar-

izes the main conclusions.


In the previous sections, we have discussed different protection strategies based on

the Automatic Protection Switching (APS) protocol. All these strategies rely on

preestablished protection/backup resources for specific working resources. By

having the configuration management setting up all these working and protection/

backup resources in advance, it becomes possible for a light distributed APS

protocol to switch over in the case of a failure from the working to the protec-

tion/backup resources autonomously (without any direct involvement of the net-

work management system [NMS]) within a very short time frame. More precisely,

all APS-based protection techniques aim to achieve protection switching times on

the order of 50 or 60 ms. Note that even with a light APS protocol, this objective

may be difficult to achieve in (geographically) very large networks, simply because

of the significant propagation delays. For example, in Section 2.4.1, we mentioned

that the protection switch completion time in an MS-SP Ring containing 16 nodes

interconnected by links of 100 km can take up to (2þ1)*[(16�1)*(0:5þ3*0:125)]¼3*(15*0:875)¼3*13:125 ¼39:375ms. Links of 200 km instead of 100 km (or, thus,

a ring of 3200 km instead of 1600 km) would result in a value of (2þ1)*[(16�1)*(1þ3*0:125)] ¼3*(15*1:375)¼3*20:625¼61:875ms. However, remem-

ber also that these values exclude the time needed for the failure to be detected (and

propagated) and assume an ideal case in which the time needed to process the APS

requests and to act accordingly can be neglected.

Of course, having a preestablished standby protection/backup resource for each

resource or a few working resources is not optimal from a capacity (and thus cost)

efficiency perspective. By setting up connections at the time of a failure along the

alternative path, the spare capacity can be used more efficiently. The process of


2.6 Restoration 1132.6 Restoration 113

setting up the connections along the alternative path at the time of the failure will

typically significantly slow down the recovery cycle.

One can distinguish mainly between two approaches for setting up these

connections along the alternative route at the time of a failure. In the first approach

the whole recovery process is typically centralized in the central network manage-

ment system (NMS). The fault management process, as described in Figure 2.11 of

Section 2.3.1, will notify the NMS. It is obvious that quite some time may be lost in

the management reporting functions (e.g., because of the fault correlation filter f3).

Once the NMS is aware of the fault, the configuration management process can

start setting up the connections along an alternative route. Configuration manage-

ment is typically designed to be robust instead of extremely fast. The fact that many

connections may need to be rerouted at the time of a failure will typically imply an

extra burden on the configuration management process. Computing the (alterna-

tive) routes along which the connections need to be set up at the time of a failure can

be done either in advance (to speed the recovery process) or at the time of a failure.

In the second approach, a distributed protocol suite (e.g., including a distributed

routing protocol) may be responsible to signal the set up of connections along the

alternative paths if a failure occurs. In addition, in such a distributed approach,

paths can be precomputed or computed in real time at the time of the failure.

Note also that such a distributed approach does not necessarily imply that the

alternative routes are computed in a distributed fashion, but that it is also possible

to keep the path computation process centralized. Until now, distributed recovery

strategies have never been standardized for SDH networks. However, this

may change with the introduction of a standardized distributed control plane in

Automatically Switched Transport Networks (ASTNs), most likely based on

Generalized Multi-Protocol Label Switching (G-MPLS) protocol stacks (as

described in Chapter 6).

The most flexible recovery strategy is the one that computes the alternative

routes and possibly also some working routes at the time of a failure based on the

actual status of the network. Because nothing is precomputed, this will be one of the

slowest recovery strategies. In accordance with Chapter 1 this strategy is classified as

restoration. Similarly as in protection strategies, one can distinguish between global

and local recovery strategies. In path restoration, alternative paths are computed on

an end-to-end basis, so each affected connection can (but should not necessarily) be

assigned a route that is completely distinct from the working route, whereas in link

or node restoration all connections transiting the nodes adjacent to a failing link or

node are rerouted only between these nodes. Typically path restoration is more

capacity (and thus cost) efficient than link restoration because its global nature

allows spreading the alternative routes over the entire network and allows finding

more optimal routes (a shortest path plus a local detour can become longer than a

second shortest path between the endpoints). For example (see the case study in

Section 3.6.4 in Chapter 3), in some networks choosing path restoration instead of

link or node restoration can at least save up to 10% of the amount of capacity

needed in the network, whereas choosing path restoration instead of dedicated path

protection (like SNCP) can result in a capacity savings of up to 30%.



The distinction between protection and restoration is not very well defined. For

example, a strategy that precomputes a backup route disjoint from the working

route for each connection and in which affected connections still need to be set up

along the backup path at the time of a failure is sometimes called path restoration or

shared path protection. The drawback of recovery strategies based on precomputed

alternative paths is that such paths can be precomputed only for a limited number of

expected failure scenarios, whereas strategies that can compute the (alternative)

routes at the time of a failure are much more flexible in the sense that they can take

into account the actual status of the network, even in the case of unexpected failure

scenarios. Thus, for example, in shared path protection, a double failure affecting

both working and backup routes will still result in the connections remaining unavail-

able. In summary, setting up connections at the time of a failure typically implies

better capacity efficiency, and route computation at the time of a failure implies

a better compromise between capacity efficiency and failure coverage.

2.6.2 Summary

. APS based protection techniques: Preestablished backup resources ! very

fast protection switch completion times on the order of 50 or 60 ms.

. Set up connections along the (alternative) route at the time of a failure:

Slower, but better capacity efficiency; distributed or centralized.

. Route computation at the time of a failure: Implies slowest recovery; better

compromise between capacity efficiency and failure coverage; distributed or

centralized.

. Link versus path restoration: Path restoration typically more capacity effi-

cient.

2.7 Case Study

In this section an extensive practical case study is presented for a realistic network

scenario. The goal is to compare three protection strategies from a cost and capacity

perspective: pure end-to-end SNCP protection, pure MS-SP Ring–based protection,

and amix of end-to-end SNCP and MS-SP Ring protection. The case study presented

in this section is part of a larger study [Col02], [Ari01], [Ari98], [Str00].

First, we list the assumptions of the case study: the network scenario, the node

architectures, and the different protection strategies. Next, the objectives of the case

study are highlighted, and the proposed design and evaluation methodologies are

described. Finally, we present the results of this case study, before the major

conclusions are recapitulated.

Assumptions: Network Scenario, Node Configurations, and Protection Strategies

In this case study a pan-European SDH-based carriers’ carrier network (a network

providing transport services to other carrier networks like PSTN/ISDN and ISP


2.7 Case Study 1152.7 Case Study 115

networks) is considered. This network interconnects 15 European cities by means of

19 intercity links. The sum of the length of all these links equals 4954 km. Major

cities host two points-of-presence (PoPs) over which the client traffic is distributed

evenly to increase the network reliability. The entire network contains 25 nodes, of

which 5 are not used as PoP but only as flexibility points (thus, nodes in which

traffic only transits the node but is not added/dropped to client network equip-

ment). Eight nodes have a node degree of 3 (the number of links incident to a node),

whereas all remaining 17 nodes have a node degree of 2, resulting in an average

node degree of 2.32.

The traffic forecast considered in this case study is specified in terms of a

number of E1s17 (¼ 2 Mbps) or VC-12s, E3s (¼ 34 Mbps) or VC-3s, and E4s

(¼ 140 Mbps) or VC-4s. Very roughly speaking, one can say that the total traffic

volume (in Mbps) is distributed evenly over the three traffic components in the

forecasts. The total traffic volume is equivalent to an average of 30 Mbps per node

pair.

Based on the results of the broader study [Col02], [Ari01], we have decided to

present only two possible node configuration architectures. Figures 2.56 and 2.57

illustrate the node architecture without DXC for the SNCP and MS-SP Ring

protection strategy, respectively. In these figures the node is incident to two network

WD

M M

ux WD

M M

ux

Flexibility PartAccess Part

LOMUX

E4 --> VC-4

E1 --> VC-12 E3 --> VC-3

Transit VC-4carrying E4or E3&E1traffic atthe end

Figure 2.56 Node architecture without digital cross-connect for the subnetwork connection protec-tion strategy.

17Ex signals are PDH signals. For backward compatibility (remember from Section 2.2 that PDH is the

predecessor of SDH), the SDH C-n containers are designed so that their capacity matches that of the

PDH Ex signals.



links. By having wavelength division multiplexing (WDM) only a single fiber pair

(one fiber in each direction) is needed per network link. Each wavelength channel

carries an STM-16 signal. In addition to the WDM both multiplexers and demulti-

plexers, such an optical link also features booster, inline, and preamplifiers and

transponders having a cost of 20%, 25%, 70%, and 15%, respectively, of the cost of

a WDM (de)multiplexer. A more detailed description of optical networking equip-

ment is given in Chapter 3.

As shown in the Figures 2.56 and 2.57, each STM-16 wavelength channel enters

the node through one fiber pair and passes through one or more ADMs before

leaving the node through the other fiber pair; this way, the individual VC-4s in the

STM-16 signal can be accessed (added and/or dropped) in a relatively cost-efficient

way. Note that each ADM has 16 STM-1 tributary ports.

Both figures also show that LO MUXs (de)multiplex the low order traffic (VC-

12/E1s and VC-3/E3s) into VC-4s at the edge of the network. These LO MUXs are

directly connected to the STM-1 ADM tributary ports. Because this multiplexation

and demultiplexation takes place only at the edge of the network, such VC-4s

carrying lower order traffic is not terminated in intermediate nodes, the lower

order traffic should be groomed into the VC-4s based on a per-destination basis

(thus, the capacity will not always be used completely). For example, let us consider

a PoP with a degree of two, and that to each of the other 19 PoPs, an SNCP-

protected VC-12/E1 needs to be routed. Then 19 VC-4s would leave the PoP in each

direction (because the VC-4s are not terminated in intermediate nodes), or thus, in

each direction two instead of one STM-16 wavelength channels would be needed.

However, when considering the other node architecture (based on a central DXC-4/

WD

M M

ux WD

M M

ux

E4 --> VC-4

Transit VC-4carrying E4or E3&E1traffic atthe end

LOMUX

E1 --> VC-12 E3 --> VC-3

Figure 2.57 Node architecture without digital cross-connect for the multiplex section–shared protec-tion ring protection strategy.



3/1), VC-4s carrying LO traffic are terminated in intermediate nodes, so the LO

traffic can be regroomed in the intermediate nodes. This implies in this example that

a single VC-4 (being able to accommodate up to 63 VC-12s) in each direction is

enough to transport the 19 VC-12s and that only a fraction (one sixteenth) of one

wavelength channel will be used in each direction.

An important difference between Figure 2.56 and Figure 2.57 is that in the SNCP

protection strategy, some ADMs are dedicated for access (ADMs through which the

local customer traffic enters and exits the network) and some are dedicated to provide

the necessary flexibility in the node to allow a transit VC-4 entering the node on one

wavelength and leaving the node on another wavelength, although the same ADMs

can serve both functions in the MS-SP Ring protection strategy. As Figure 2.56

illustrates for the SNCP protection strategy, despite that both working and backup

connections may enter and leave the node on different wavelength channels, only a

single ADM in the node can be responsible for the SNCP operation (the bridge/

select). In the case of one ADM would function as access and flexibility ADM, then it

should be possible to have the customer traffic entering the network throughone of its

STM-1 tributary ports and a copy of the traffic being sent through another of

its STM-1 tributary interfaces to another wavelength channel. Because ADMs typi-

cally do not support such capability, it is necessary to have a dedicated access ADM

performing the SNCP operation and to have another ADM for flexibility purposes

on the same wavelength. Finally, note that at most one access ADM per wavelength

channel is needed (all 16 tributary signals can be bridged in/selected from the west and

east directions), whereas up to two flexibility ADMs may be needed per wavelength

channel (in the worst case, all VC-4s from both the west and the east side need to be

added from/dropped to another wavelength channel).

Figure 2.57 shows the node architecture without DXC for the MS-SP Ring

protection strategy. As mentioned earlier, no separation of access and flexibility

ADMs is needed. The reason is that the traffic is not duplicated in the MS-SP Ring

protection strategy,18 so there is also no risk that both duplicates enter and leave the

node on distinct wavelength channels/rings. Even when supporting D&C ring

interconnection, this risk does not exist. For local customer traffic, there is no

need for routing the traffic from one ring to another, because this access traffic

can be directly added on/dropped from the right ring/wavelength channel. Note

that because there is no need to separate access and flexibility functions into distinct

ADMs and the MS-SP Ring protocol operates in the two-fiber19 mode, a single

ADM per wavelength is needed (at most 50% of the 16 VC-4s ¼ 8 VC-4s can be

dropped from/added in both the west and the east direction).

In Figures 2.56 and 2.57, all transit VC-4s (independent of the whether they

carry E4 or E3 and/or E1 traffic) entering and leaving the node on different

18In each STM-16 MS-SP Ring/wavelength channel, 50% of the capacity is dedicated as protection/

backup capacity. Because this protection/backup capacity is intrinsically available on the MS-SP Rings,

there is no need to route a duplicate of the VC-4 path through the network.19Because of the wavelength division multiplexing, two-wavelength mode (one wavelength in the clock-

wise and one in the counterclockwise direction) would be more appropriate as terminology here.



wavelength channels are routed directly from one (flexibility) ADM to another one

on the other wavelength. In other words, the STM-1 tributary ports are intercon-

nected hard-wired back-to-back (thus, by means of a fixed fiber interconnection).

Because lower order traffic tends to be more variable (requiring many times main-

tenance people going onsite to modify the back-to-back interconnections) and there

is a discrepancy between the average traffic volume per node pair and the capacity

of a VC-4, the option to invest in a DXC-4/3/1 to cross-connect the lower order

traffic was investigated. Note that such DXCs are pretty expensive; it is assumed

that a DXC with 56, 112, or 224 STM-1 ports costs 941%, 1647%, or 2824%,

respectively, whereas a single LO MUX costs only 29% of the cost of an ADM.

Nevertheless, it was judged that the rare modifications needed for the E4 traffic do

not motivate the investment in DXC equipment to get rid of the manual interven-

tions.

This leads to the node architecture with DXC as presented in Figures 2.58 and

2.59. Nothing changes for the VC-4s carrying E4 traffic; access is done directly

through the (dedicated access) ADMs, whereas the back-to-back interconnection of

the ADM STM-1 tributary ports remains hard-wired. For the lower order traffic, a

central DXC-4/3/1 is installed. This DXC connects to multiple ADMs on different

wavelength channels through STM-1 interfaces. Customers of lower order traffic

access the network directly through this DXC. To improve the capacity efficiency as

WD

M M

ux WD

M M

ux

Flexibility PartAccess Part

E4 --> VC-4

E1 --> VC-12 E3 --> VC-3

Transit VC-4carrying E4

traffic atthe end

Transit VC-3or VC-12

DX

C-4/3/1

VC-4/STM-1Ports

Figure 2.58 Node architecture with digital cross-connect for the subnetwork connection protectionstrategy.



much as possible, the DXC terminates all VC-4s entering the node and carrying LO

traffic.

As Figure 2.58 illustrates, the SNCP operation to protect the E4 traffic is still

the responsibility of the dedicated access ADMs. However, the figure also shows

that the DXC-4/3/1 becomes responsible for the SNCP protection of the lower

order traffic. Note that in this case the SNCP protection is done at the lower order

path level (thus, VC-12 or VC-3 level), whereas in the other node architecture, the

SNCP protection is done at the VC-4 level.

Figure 2.59 illustrates the node architecture with DXC for the MS-SP Ring

protection strategy. Once again, access of E4 traffic is done directly through the

ADMs, whereas the access of E1 and E3 traffic passes through the DXC-4/3/1. It is

important to mention that each ADM and each DXC-4/3/1 becomes a single point

of failure for the LO traffic in the MS-SP Ring protection strategy; more precisely,

the MS-SP Ring will recover the lower order traffic only when a link between two

network nodes breaks. This is the result of the assumption that all VC-4s carrying

lower order traffic entering the node are terminated by the DXC-4/3/1. In other

words, a VC-4 carrying lower order traffic leaves the DXC in one node, enters a

wavelength channel through the appropriate ADM, continues on a WDM link to

an adjacent node where it leaves the wavelength channel through an ADM, and is

then terminated in the DXC in that adjacent node. Remember from Figure 2.27 in

Section 2.4.1 that the MS-SP Ring sublayer bridges/switches AU groups in the case

of a failure. Thus, although a lower order path may reenter the same STM-16 ring/

WD

M M

ux WD

M M

ux

E4 --> VC-4

E1 --> VC-12 E3 --> VC-3

Transit VC-4carrying E4

traffic atthe end

Transit VC-3or VC-12

VC-4/STM-1Ports

DX

C-4/3/1

Figure 2.59 Node architecture with digital cross-connect for the multiplex section–shared protectionring strategy.



wavelength channel, it cannot be recovered if the ADM fails because the corre-

sponding VC-4s are added/dropped in the ADM and terminated in the node, and

thus, the MS-SP Ring protection of these VC-4s will fail. Because only the ADMs

and not the DXC-4/3/1 participate in the MS-SP Ring protocol, a failure of the

DXC will even not trigger the MS-SP Ring protection. For the same reasons, this

node architecture does not support D&C interconnection of the STM-16 rings.

The general node architecture framework is illustrated in Figure 2.60. Figures

2.56 through 2.59 consider nodes having a node degree of 2. As mentioned earlier

and as illustrated in Figure 2.60, nodes can also have a node degree of 3; the node

architectures presented earlier for degree-2 nodes are a special case of the general

architecture for degree-3 nodes.

As explained earlier, the incident fiber pairs are terminated by WDM (de)mul-

tiplexers. Each wavelength channel enters the node via one fiber pair and passes

through a number of ADMs before leaving the node on another fiber pair; the

figure shows that this remains true even in degree-3 nodes (the only exception is that

at most one wavelength channel does not continue on another fiber pair; this will be

the case when for the end-to-end SNCP protection strategy, the sum of the number

of required wavelength channels on the three incident links is uneven). The figure

also shows that each wavelength channel is dedicated to the transport of only SNCP

or MS-SP Ring–protected traffic, but both wavelength channels can coexist in the

SNCP:Access

MS-SP Ring

SNCP:Flexibility

SNC

P:Fl

exib

ility

MS-

SP R

Ing

SNC

P:

Access

SNC

P:

Flexibility

MS-SP R

Ing

Cross-Connecting:• E4 Traffic: Always Hard-Wired• E3&E1 Traffic: Hard-Wired or DXC

Local AccessPorts:

E4, E3 and E1

WDM Mux

WD

M M

ux

WD

M M

ux

SNC

P:Ac

cess

Figure 2.60 General node architecture.



hybrid SNPC/MS-SP Ring protection strategy. As illustrated earlier, the figure also

shows that ADMs have to be dedicated to either access or flexibility purposes in the

SNCP protection strategy, whereas this is not true in the MS-SP Ring protection

strategy.

All the STM-1 tributary ports of the ADMs have to be interconnected with

each other and with the local access ports to which local customer equipment is

connected. As explained earlier, all these interconnections are hard-wired, except

for the lower order traffic (thus, VC-12/E1s and/or VC-3/E3s) when a central DXC-

4/3/1 that terminates all VC-4s that carry lower order traffic is available. Note that

also a single DXC-4/3/1 will be foreseen in the hybrid SNCP/MS-SP Ring protec-

tion strategy. Customers of lower order traffic connect directly to the central DXC,

or when there is no such DXC, it is groomed into VC-4s by means of LO MUXs

(not shown in the figure) that directly connects to a tributary port of an ADM (or

access ADM). The customers of higher order traffic always directly connect to a

tributary port of an ADM (or access ADM).

As mentioned earlier, this case study aims at comparing three protection

strategies, as follows:

1. Pure SNCP protection: All traffic is duplicated in the source node and

routed along disjoint paths to the destination where the best copy is selected.

Because SNCP protection is performed on an end-to-end basis, no rings

have to be defined in the network. This implies that there is no constraint

requiring that the wavelength channels are organized as rings through the

network (on each link the amount of needed wavelength channels can be

calculated independently). Although the risk that double failures affecting

both the working and backup routes cannot be ignored for some node pairs

(because of the long distances), applying drop and continue to improve the

availability as illustrated in Figure 2.53 of Section 2.5.2 has not been

considered in this case study.

2. Pure MS-SP Ring protection: Traffic is routed over a set of interconnected

STM-16 rings that cover all nodes in the network. Wavelength channels

must be organized so they form the rings in the network; this implies that

not all capacity of each wavelength channels is used completely on each link.

More than one STM-16 wavelength channel may need to transit the same

network nodes, resulting in a stack of rings, as illustrated in Figure 2.47 in

Section 2.4.4. Of course, a fiber pair (or network link) will probably carry

more than one stack of rings; the different stacks guarantee geographical

coverage. As explained earlier the node architecture with DXC does not

allow D&C ring interconnection; thus, lower order traffic cannot recover

from ADM or DXC failures, but organizing the network in a set of inter-

connected rings allows different rings simultaneously recovering from dif-

ferent single link failures. In the node architecture without DXC, D&C and

single-gateway ring interconnection options are considered.

3. Hybrid SNCP/MS-SP Ring protection: As explained earlier, it is possible to

have part of the wavelength channels dedicated to SNCP-protected traffic



and the other wavelength channels dedicated to MS-SP Ring protection

traffic.20 The only common equipment for the SNCP- and MS-SP Ring–

protected traffic is the central DXC-4/3/1 in a node to cross-connect the

lower order traffic. An important additional assumption is that no traffic is

allowed to be protected partly by SNCP and partly by the MS-SP Rings.

Without making the routing process much more complicated, it would not

be possible to avoid double protection (thus, one or both copies of the

SNCP protected signal to be protected again by an MS-SP Ring).

Objective of the Case Study

In the previous section a node architecture with and another one without a central

DXC-4/3/1 have been presented. Also three protection strategies have been pre-

sented: pure end-to-end SNCP protection, pure MS-SP Ring protection and a

hybrid SNCP/MS-SP Ring protection strategy.

The major objective of this case study is to compare the three protection

strategies with each other from a cost (and capacity) perspective. This case study

also aims at investigating the impact of the adopted node architecture in this

comparison. Finally, this case study also aims at studying the impact on the overall

network cost of different routing approaches (D&C ring interconnection or not and

balanced versus shortest path routing) with pure MS-SP Ring protection.

Proposed Network Design and Evaluation Process

The network design and evaluation methodology is split into four independent

phases:

1. The traffic is routed over the network with respect to the considered protec-

tion strategy.

2. The required capacity is calculated.

3. The amount of equipment to transport this capacity is estimated.

4. The total network cost is derived.

The remainder of this section focuses mainly on the routing phase, because this

is the most relevant phase to understand the case study. As mentioned earlier, the

traffic is routed over the network before any capacity is dimensioned in the net-

work. In other words, the routing is not optimized (e.g., to minimize the network

capacity) but relies on the link lengths of the given physical topology, so each traffic

demand can be routed independently over this topology.

With SNCP, a connection is protected by a node-disjoint backup route. There-

fore, the shortest cycle containing both endpoints is calculated (based on the

methodology described at the end of Section 2.5.2). The shortest path (SP) between

20As mentioned in Section 2.4.1, some VC-4s on an MS-SP Ring can be supported as non-preemptible

unprotected traffic (NUT). Therefore, one could think about having SNCP-protected traffic to be routed

through these NUT VC-4s. However, such architecture was assumed to be too complex for being

considered in this case study.



both endpoints along this shortest cycle (SC) is chosen as the working route and the

other one as the backup route. Note that looking for the shortest cycle ensures that

the sum of the length of the working and backup routes is minimal while guaran-

teeing the node disjointness status of the working and backup routes.

In the MS-SP Ring case, the routing phase is split into two steps. First, the

traffic is routed over the given topology along a shortest path. Second, the set of

rings carrying the traffic as efficiently as possible is computed. Tabu-search and

simulated annealing are applied for this optimization process [Ari96], [Ari97]. An

important constraint in this selection process is that the rings have to be intercon-

nected with each other via at least two gateway nodes, to allow D&C for the ring

interconnection.

In the hybrid SNCP/MS-SP Ring case, the routing phase is split into three

steps. The first two steps deal with the MS-SP Ring part. In contrast to the pure

MS-SP Ring case, first a set of rings is selected (step one) and then as much traffic as

possible is routed over this set of rings (step two). The ring selection is done

manually. Note also that the set of rings is not allowed to cover the whole network

(because otherwise all traffic would be routed over the rings, and thus no traffic

would be left after the first two steps to be routed via SNCP). In the second step,

the traffic that can be routed over the rings is routed over the rings. First interring

traffic (thus, demands which span at least two rings) is routed over the rings

along the shortest path. Afterwards intraring traffic is routed (over the shortest

ring going through both endpoints) so that the load on the highest loaded link is

minimized, resulting in the load on all links in the ring getting more balanced

[Car97], [Cos94], [Kar97]. Per ring, the demands are sorted by descending capacity

and routed one after the other over the ring. Finally, all the traffic that cannot be

routed over a chain of rings belongs to the SNCP part and is routed over the

meshed network, as explained earlier.

Once all traffic has been routed over the network, one can derive the amount of

capacity needed on the links and in the nodes. Note that the capacity dimensioning

relies on the considered node architecture, because with a DXC-4/3/1, all VC-4s

carrying lower order traffic and entering a node are terminated by that DXC,

whereas no transit VC-4s are terminated in the other case. Note that the capacity

for SNCP- and MS-SP Ring–protected traffic is dimensioned independently from

each other in the hybrid protection strategy. The capacity dimensioning will identify

the number of E1, E3, the number of corresponding VC-4s carrying those E1s

and E3s (which depends on the considered node architecture), and the number

of E4s routed over a link. The same information is determined for the amount of

traffic routed in a node between each pair of incident links. However, here also a

distinction needs to be made between transit traffic passing through the node and

the amount of access traffic coming from/send to the local customers. Access traffic

routed between a pair of incident links means that both duplicates are routed over

these links in the case of SNCP-protected traffic or that it is routed over a ring that

is routed over these links. Out of this, the number of required wavelength channels

on each link or between each pair of incident links in a node can be derived. Note

that this will depend on the considered protection strategy: In SNCP-protected



traffic this number can be calculated on a per-link or a per-node basis, whereas in

the MS-SP Ring protection strategy the number of stacked rings needs to be

determined per topological ring because the ring capacity is determined by the

link for which the ring carries the highest amount of traffic.

Once the capacity needed in the network has been derived, the amount of

required equipment can be estimated. Despite the relatively detailed information

coming out of the capacity dimensioning phase, it is impossible to compute the

exact required amount of each equipment type. And the methodology to obtain an

acceptable estimation is quite complex and is thus beyond the scope of this chapter.

A more detailed discussion on the adopted methodology in this equipment dimen-

sioning (and other) phases can be found in [Col02] and [Ari01].

Once the amount of each equipment type is estimated, the corresponding cost

can be calculated based on the relative cost figures mentioned in the previous

section.

Cost Comparison for Different Protection Strategies

Figure 2.61 compares the network cost for the pure end-to-end SNCP protection,

the pure MS-SP Ring protection, and the hybrid end-to-end SNCP/MS-SP Ring

protection strategies for the node architectures with and without a DXC-4/3/1.

In a case of node architecture with a central DXC-4/3/1, the SNCP protection

strategy is significantly more expensive than the other strategies. As the figure

shows, this is mainly a result of the significantly higher node cost in the SNCP

protection strategy. This can be understood as follows: In SNCP-protected lower

Comparison of Protection Strategies

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

NA with DXC-4/3/1 NA without DXC-4/3/1

Tot

al C

ost

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

Node C

ost

Total Cost: SNCP Total Cost: MS-SP Ring Total Cost: Hybrid SNCP/MS-SP Ring

Node Cost: SNCP Node Cost: MS-SP Ring Node Cost: Hybrid SNCP/MS-SP Ring

Figure 2.61 Relative network cost for the different protection strategies.



order traffic, the traffic is duplicated and thus transits much more (expensive) DXCs

(let’s say twice for simplicity), although this is not the case for MS-SP Ring–

protected traffic. An additional consideration is that no drop and continue is

allowed in this node architecture with central DXC. Thus, unless only link failures

are important, the lower cost for the MS-SP Ring–protected traffic will be paid by

the drastically reduced overall availability.

In a case of node architecture without central DXC-4/3/1, the situation is

reversed: The pure end-to-end SNCP protection strategy slightly outperforms the

pure MS-SP Ring protection strategy. The higher cost for the MS-SP Ring protec-

tion strategy is due to its higher link cost. Indeed, the number of stacked rings is

driven by the link that carries the largest amount of traffic, whereas in the SNCP

protection strategy the number of wavelength channels can be calculated on a link-

per-link basis. In addition to that, drop and continue ring interconnection has been

considered. Thus, for a slightly higher cost, a better availability is achieved in the

MS-SP Ring protection strategy. Note that where possible, the ring selection

process tried to prevent an overload of the links between the gateway nodes by

selecting rings that have three or more nodes in common (see the issue raised in

Figure 2.43 in Section 2.4.4).

An important conclusion from Figure 2.61 is that independent of whether the

pure SNCP protection strategy is cheaper or more expensive than the pure MS-SP

Ring protection strategy, it is always possible to outperform both protection

strategies by choosing for a hybrid SNCP/MS-SP Ring protection strategy.

Although the difference shown is not that large, one may not forget that the ring

selection process is done manually in the hybrid protection strategy, whereas this is

automated in the pure MS-SP Ring protection strategy. In other words, there is still

some room for improvements by automating this ring selection process in the

hybrid SNCP/MS-SP Ring protection strategy.

Figure 2.61 has considered drop and continue ring interconnection for the node

architecture without DXC-4/3/1, although this is prevented when considering the

node architecture with a central DXC-4/3/1. Figure 2.62 investigates the pure MS-

SP Ring protection strategy in combination with the node architecture without a

central DXC-4/3/1 the impact of having drop and continue on the overall network

cost. The figure confirms that indeed drop and continue results in a significant cost

increase (approximately 10%). However, as explained previously, the traffic is first

routed along the shortest path and then an appropriate set of rings is selected to

accommodate this traffic. The figure shows that when having an additional step to

balance the traffic on each individual ring, much higher cost savings can be

obtained (while keeping the same overall availability) than in the case of not

choosing for drop and continue ring interconnection (and thus affecting the overall

availability).

Summary

. Node architecture: SNCP protection requires dedicated access and flexibility

ADMs.



. Node architecture with a central DXC-4/3/1 (terminating all VC-4s carrying

lower order traffic and entering the node): Does not allow drop and continue

ring interconnections and leads to the ADMs and DXCs becoming single

points of failure (for lower order traffic).

. Protection Strategies: It is possible to combine the advantages of pure

SNCP protection and pure MS-SP Ring protection strategies in a hybrid

SNCP/MS-SP Ring protection strategy with a better cost performance.

. Cost evaluation: Drop and continue ring interconnection has a significant

impact on the overall network cost, but balancing the traffic on the rings is

at least as important as the drop and continue from a cost perspective.

2.8 Conclusion

In Section 2.1 the concept of transmission/transport networks was introduced.

Transmission/transport networks are networks that can provide huge amounts of

capacity between nodes in client-layer networks in a flexible and cost-efficient way.

The modeling/structuring of such networks has been discussed. Transmission/trans-

port networks are built on three types of atomic functions: connection, trail termin-

ation, and adaptation functions are responsible for providing the flexibility, the

supervisory processes for verifying the integrity of the network connections, and the

process to adapt client-layer information so that it can be used in a network layer,

respectively.

A brief overview of the SDH network technology was given in Section 2.2.

SDH networks can be decomposed in four network layers: a regenerator section

Relative Network Cost for NA without DXC-4/3/1 in Case of MS-SP Ring

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

Shortest Path + D&C Shortest Path without D&C Balanced Routing + D&C

Link Cost Node Cost

Figure 2.62 Impact of the routing in the multiplex section–shared protection ring protection strategyin the absence of DXC-4/3/1s.


2.8 Conclusion 1272.8 Conclusion 127

layer, a multiplex section layer, a higher order and a lower order path layer. Also

the different types of network elements (terminal multiplexers, add/drop multi-

pliers, and digital cross-connects) and the frame format used on the interface

between these network elements have been discussed. While discussing the SDH

frame format, special attention was paid to the part of the overhead that is needed

for failure detection.

Section 2.3 highlighted the operational aspect of automatic protection (APS) in

SDH networks. More precisely, the failure notification and propagation process

plus the basics of the APS protocol have been presented. Failure notification and

propagation is achieved by inserting in the downstream direction an alarm indica-

tion signal (AIS) (an all 1s signal) and upstream a remote defect indication (RDI)

signal. With respect to Automatic Protection Switching a distinction has to be made

between trail protection and subnetwork connection protection (SNCP). Trail

protection is realized by introducing in the trail endpoints sublayer functionality

that implements the APS protocol. The APS protocol is needed to coordinate

protection switching actions in all the involved network elements. SNCP is able

to protect only part of a network connection but suffers from the fact that it has no

access to the APS channels carrying the APS protocol messages embedded in the

path overhead; therefore, SNCP relies on a permanent bridge in the upstream

recovery head end (RHE), whereas the downstream recovery tail end (RTE) selects

the best copy it receives (in this way, the APS signaling for coordinating the

protection switching actions is avoided).

Another important part of this chapter described the various recovery strat-

egies possible in SDH networks. Section 2.4 presented three protection ring types:

multiplex section–shared protection rings (MS-SP Rings), multiplex section–dedi-

cated protection rings (MS-DP Rings), and SNCP Rings. MS-SP Rings and MS-

DP Rings are similar in the sense that they both rely on nodes adjacent to a failure

for looping back all traffic along the opposite side of the ring, whereas SNCP Rings

rely on protecting paths on an end-to-end basis. Because the forward and backward

direction of a bidirectional connection are routed along the same side of an MS-SP

Ring, MS-SP Rings can profit from the spatial reuse concept. More precisely,

connections that do not overlap can reuse/share the same capacity on different

segments of the ring. SNCP and MS-DP Rings are not able to profit from this

concept, because each connection occupies capacity on all links in the ring. Finally,

to improve the overall availability of interring connections, dual-gateway intercon-

nection schemes have been studied: the virtual ring and the drop and continue

interconnection schemes.

Strategies based on linear protection switching were presented in Section 2.5:

Linear protection switching involves at most two network nodes (in contrast to the

ring-based APS protocols). Although M:N linear protection switching is often

applied on the multiplex section level (thus, span protection), 1þ1 SNCP is applied

at the higher order or lower order path level.

Finally, Section 2.6 highlighted opportunities for more flexible network recov-

ery techniques (e.g., restoration instead of protection) than those based on an APS

protocol switching over very fast (targeting a protection switch completion time on



the order of 50 or 60 ms) from the affected resources to dedicated preestablished

protection/backup resources. By signaling the establishment of the spare resources

at the time of the failure, the recovery process will slow down but will achieve a

better capacity efficiency. Computing alternative paths in real time at the time of the

failure instead of in advance allows a better compromise between capacity efficiency

and failure coverage.

A practical case study was presented in Section 2.7. This case study illustrates

the advantages (from a network cost perspective) of having a hybrid SNCP/MS-SP

Ring protection strategy instead of a pure end-to-end SNCP or pure MS-SP Ring

protection strategy and highlights some issues with respect to providing protection

when considering practical node architectures.

2.9 Recommended Reference Work and Research-RelatedTopics

For more than a decade, SDH/SONET has been a mature technology, being

specified in many standardization documents. The ITU recommendations used

throughout this chapter and other recommendations can perfectly serve as reference

material to understand every detail of this chapter.

The decomposition of SDH networks in path and section layers and in atomic

functions (see Section 2.2.2) is specified in ITU recommendations G.803 [G803] and

G.805 [G805]. The frame structure describing the interface between SDH network

elements (see Sections 2.2.3 and 2.2.4) is specified in ITU recommendation G.707

[G707]. A more comprehensive overview of these network and frame structures can

be found in [Sex92] and [Kar99].

A major part of this chapter is devoted to the detailed discussion in Sections

2.3.1 through 2.3.3 on the interworking of the atomic function for fault detection,

notification, and propagation. The characteristics of these atomic functions are

crucial in fully understanding this discussion: These characteristics are specified in

ITU recommendations G.783 [G783] and G.806 [G806] and the ETSI document

EN 300 417-1-1 [ETSI1].

The other major part of this chapter is devoted to the overview of the different

SDH network recovery techniques in Sections 2.3.4, 2.4, 2.5, and 2.6. A specifica-

tion of all these techniques is given in ITU recommendation G.841 [G841]. ITU

recommendation G.842 [G842] and ETSI document TS 101 010 [ETSI2] are dedi-

cated to the interconnection of SDH network recovery techniques (in particular

protections rings as described in Section 2.4.4). A more comprehensible overview of

the SDH network recovery techniques can be found in [Sex92].

Taking into account that SDH/SONET is a mature technology and that it is

similar to the optical transport network technologies, related research topics for

those optical transport network technologies, as described in Chapter 3, will also

apply to the SDH/SONET technology. It is necessary to investigate how the

SDH network recovery techniques described in this chapter can be adopted in


2.9 Recommended Reference Work and Research-Related Topics 1292.9 Recommended Reference Work and Research-Related Topics 129

next-generation SDH networks, featuring virtual concatenation (VC), the link

capacity adjustment scheme (LCAS) [G7042], and/or the generic framing procedure

(GFP) [G7041], [Her02].



C H A P T E R 3

Optical Networks

Chapter 2 discussed the Synchronous Optical NETwork (SONET)/Synchronous

Digital Hierarchy (SDH) layer of the network. The focus of this chapter is on the

optical network layer. This layer can in the current backbone network be found

underneath the SONET/SDH layer and may in the future even replace (a large part

of) the functionality of the SONET/SDH layer. The recovery issues described in

Chapter 2 for SONET/SDH are in this chapter extended to the optical layer of the

backbone network.

First, a general introduction to optical networks is given. The evolution of the

optical layer of the backbone network from a pure transmission-based layer with

static point-to-point connections to a true managed optical networking layer with

reconfigurable switching nodes is discussed. In addition, the equipment that enables

the optical networking functionality is shortly described. The current standardiza-

tion efforts on optical networks are highlighted in Sections 3.1 and 3.2. The last

sections of this chapter elaborate on the different recovery mechanisms that can be

applied in optical networks. To understand how a failure of the optical network

equipment is detected and how the subsequent alarms raised by the various pieces

of equipment that detect the fault are correlated and suppressed, the overhead

(OH) of the Optical Transport Module (OTM) is discussed in detail in Section

3.3. Section 3.4 introduces the different recovery schemes that can be applied in

Optical Transport Networks (OTNs). Recovery schemes in both ring-based net-

works (Section 3.5) and mesh-based networks (Section 3.6) are discussed and

compared with one another (Section 3.7). Section 3.8 discusses network availability,

a crucial performance factor of recovery schemes. Because a single fiber can

transport 160 or more different wavelengths, each with a capacity of 10 gigabits

per second (Gbps) or more, a single cable cut can affect a tremendous amount of

traffic. Network availability is thus of utmost importance in the design of modern

131

We are greatly indebted to Sophie De Maesschalck, INTEC, Ghent University, for her exceptional

contribution to the writing of Chapter 3.

Vasseur / Network Recovery Final Proof 10.6.2004 1:34am page 131

telecommunication networks. After a theoretical discussion of the calculation of the

network availability, some factors that influence the availability are highlighted. In

Section 3.9 some recent trends in research on optical network recovery are dis-

cussed. Finally, the summary and conclusions are formulated in Section 3.10.

3.1 Evolution of the Optical Network Layer

Although the optical layer of the backbone network is relatively new, it has already

lived through an impressive evolutionary path. In this first introductory section, the

evolutionary path from an optical network with point-to-point static connections to

an intelligent optical network with reconfigurable switching nodes is discussed.

3.1.1 Wavelength Division Multiplexing in the Point-to-Point OpticalNetwork Layer

As the amount of voice and data traffic kept growing and the Internet, which turned

out to generate vast amounts of traffic, came into the picture, more and more traffic

needed tobe conveyedover thenetwork. Several solutionswere available to tackle this

increasing trafficdemand.Onecould simplyusemorefibers, each transportinga single

SONET/SDH signal. This is, however, not a very elegant or scalable solution and a

very expensive one if there is a shortage of fiber in the existing cable or duct infrastruc-

ture. One approach would be to continually increase the bit rate (e.g., increase Time

Division Multiplexing [TDM] rates above 40 Gbps), but again this is not a scalable

solution. Besides, because of the dispersion effects of fiber optics, including chromatic

dispersion (light sent over different wavelengths travels at different speeds) and

polarization mode dispersion (PMD) (caused by imperfections in the fiber and also

resulting in pulse broadening), increasing the bit rate is quite challenging.

Another way to proceed was to introduce Wavelength Division Multiplexing

(WDM) into the network. With WDM (Figure 3.1), multiple channels, each with a

different (e.g., SONET/SDH) signal, are transmitted at distinct wavelengths over

a single optical fiber. The distinct wavelength channels are indicated in Figure 3.1

by the use of color labels. In this way, the capacity of a single fiber is upgraded to a

multiple of its original capacity. The capacity of an optical fiber is, thus, no longer

limited to the bit rate of a single TDM signal, but by the number of wavelengths

supported by the WDM system.

The principle of WDM is quite similar to that of Frequency Division Multi-

plexing (FDM): Several signals are transmitted using different carriers, each occu-

pying a nonoverlapping part of the frequency spectrum. Most WDM systems

currently use the frequency region around 1550 nm, because this is one of

the frequency regions where the signal attenuation reaches a local minimum.

Besides attenuation, chromatic dispersion and PMD are a problem in optical fibers.

The signals should be transported using wavelengths that are sufficiently far from

each other to avoid interference. In contrast with chromatic dispersion, which can

be compensated using, for example, nonzero dispersion shifted fiber or other


132 C H A P T E R 3 Optical Networks

dispersion compensating techniques, PMD is much harder to overcome. Also, other

impairments come into play, for example, nonlinear effects such as four-wave

mixing (FWM). With FWM three signals transported at a different wavelength

interfere and create a signal on a fourth wavelength, which may already be used to

transport a real data signal.

With current technology, more than 160 optical wavelength channels can

be multiplexed onto a single fiber, and this number is expected to increase even

further. Each wavelength can transport a signal with a bit rate of 10 Gbps,

and 40 Gbps is just around the corner. WDM systems with channel spacing of

50 GHz have been developed, and we begin to see channel spacing of 25 GHz.

When such a large amount of channels can be transported by the WDM system, the

term Dense Wavelength Division Multiplexing (DWDM) is used, in contrast to

Coarse Wavelength Division Multiplexing (CWDM), which is considered for the

metropolitan network and multiplexes a limited number of wavelengths (typically

four to eight) onto a single fiber. For CWDM, the frequency range of around

1310 nm, another frequency window with low attenuation, is commonly used.

The main advantage of DWDM is its ability to increase many-fold the capacity

of the infrastructure in place, without the need for expensive digging works to lay

new fibers. Moreover, because the capacity is present in the already installed fibers,

new wavelengths can be lit quickly. Besides this, because of the development of

Erbium-doped fiber amplifiers (EDFAs), WDM allows sharing the amplifier cost by

more traffic because all signals on a single fiber can be amplified using a single

EDFA amplifier (Figure 3.2). EDFA amplifiers work in the frequency range of

approximately 1530 to 1565 nm (the so-called C-band ). This is one of the reasons

the 1550-nm window was chosen for the WDM technique. Also, for the L-band

(approximately 1565 to 1625 nm), EDFAs have been developed. Other frequency

ranges could be served (e.g., the S-band, 1460 to 1530 nm) by other types of optical

amplifiers.

The use of WDM and EDFA dramatically decreases the cost of long-haul

transmission, because a single optical EDFA amplifier replaces the array of elec-

trical amplifiers that was previously needed.

MU

X

DE

MU

X

λ2

λ1

λ3

λM-1

λM-2

λM

λ2

λ1

λ3

λM-1

λM-2

λM

Fiber

M Wavelength Channels(e.g., M x 10Gbps)

λ Multiplexer λ Demultiplexer

Red

Orange

Yellow

Green

Blue

Violet

Red

Orange

Yellow

Green

Blue

Violet

... ...

Figure 3.1 Wavelength division multiplexing.


3.1 Evolution of the Optical Network Layer 133

At the beginning, WDM was mainly used to increase the point-to-point trans-

mission capacity of the links in the transport network. Introducing WDM into the

network provided high-capacity bit pipes between the client-layer equipment, typ-

ically SONET/SDH (see Chapter 2) or Internet Protocol (IP) (see Chapter 4)

equipment. Traffic is transported optically between optical line terminals (OLTs)

in such a point-to-point optical network. An OLT demultiplexes the wavelength-

multiplexed signal coming from the optical network and adapts the signal coming

from the optical network into a signal suited for the client layer. This involves

an optoelectronic conversion because the client nodes are SONET/SDH add/drop

multiplexers (ADMs) (see Chapter 2, Section 2.2.4), digital cross-connects (DXCs)

(see Chapter 2, Section 2.2.4) or IP routers and the traffic has to be processed at this

digital level. After the processing (e.g., switching), the signal is converted back to

the optical domain at the OLT and multiplexed for further transmission in the

optical network. It is obvious that with the maturity of the WDM technology (160

wavelengths of 10 Gbps and up), the electronics in the SONET/SDH DXCs or

ADMs or the IP routers will not be able to keep up with the traffic coming from the

optical domain. Hence, the electrical nodes in this WDM point-to-point architec-

ture become the new bottleneck.

Tx Rx

Tx Rx

Tx Rx

Transmitter Receiver

Tx λ1

Tx λ2 Mux

Dem

ux

EDFAAmplifier

Optical LineTerminal

Optical LineTerminal

SONET/SDH Network

Optical Network

ElectricalRegenerator

Tx λ3

Rx λ1

Rx λ2

Rx λ3

Figure 3.2 A single optical amplifier in the optical network replaces an entire array of electricalregenerators from the SONET/SDH network.



3.1.2 An Optical Networking Layer with Optical Nodes

The introduction of optical nodes into the network, through which individual

wavelength channels can pass or can be terminated to the client-layer node, allows

true optical networking. These nodes alleviate the need for expensive optoelectronic

conversions and electrical processing equipment. It now becomes possible to keep

passthrough traffic demand in the optical domain, by establishing lightpaths be-

tween the client-layer node equipment. At the network nodes the transit traffic is no

longer always converted to the electrical domain, but can stay in the optical domain

until it has reached its destination, as illustrated on the right side of Figure 3.3.

The optical nodes have a structure similar to the digital nodes introduced in

Chapter 2, Section 2.2.4 on SONET/SDH. Optical Add/Drop Multiplexers

(OADMs) allow, just as in SONET/SDH, networks to be built in rings, whereas

Optical Cross-Connects (OXCs) allow mesh optical networks or interconnected

mesh-ring optical networks. The optical infrastructure is, thus, making the transi-

tion from a pure transmission layer to a real managed optical networking layer.

3.1.3 An Optical Network Layer Organized in Rings

The introduction of OADMs in the optical layer allows the network to be config-

ured in rings, similar to the SONET/SDH ring-based networks discussed in Chapter

2, Section 2.4. An OADM allows the dropping of a signal onto a specific wave-

length out of the bundle of WDM-multiplexed signals and to add another signal on

this wavelength to the WDM-multiplexed bundle. OADMs can be classified

according to the number of wavelength channels that can be dropped and added

and according to their flexibility. In a fixed OADM (see Figure 3.4 for an imple-

mentation example), a predetermined set of wavelength channels is reserved to add

and/or drop data in and out of the WDM signal.

The advent of flexible reconfigurable OADMs (see Figure 3.5 for an implemen-

tation example) at a low cost in which the dropping and adding of wavelength

channels can be controlled was very important.

Point-to-Point Connections between IP Routers Optical Networking with OXCs

IP Routers SwitchPackets

Opto-ElectricalConversion

OXCs SwitchLightpaths

Figure 3.3 From static point-to-point connections between IP routers to true optical networking.



In a fully flexible OADM the transit, add, and drop channels can be chosen

without any constraint (instead of only channels out of a predetermined set, as with a

fixed OADM). Note, however, that in general the number of wavelengths that can be

added/dropped will be smaller than the total number of wavelengths in the WDM-

multiplexed bundle. In ring-based networks, in which OADMs are used as node

elements, typically only a limited part of the traffic that enters a specific OADM is

nontransit traffic and has to be dropped from that node. As a consequence, it is

economically more advantageous to install OADMs with limited add/drop capabil-

ities, which are cheaper compared to OADMs that allow to add/drop all wavelengths.

Conceptually, an optical ring does not differ greatly from a SONET/SDH ring.

As in the case of SONET/SDH rings, several ring network configurations exist. The

MU

X

Dropped λ's Added λ's

DE

MU

X

λ1 λ2 λ3 λ1

λ1, λ2...,λNλ1, λ2...,λN

λ2 λ3

Figure 3.4 Implementation example of a fixed optical add/drop multiplexer, using a simple mux/demux pair. (Adapted from R. Ramaswami, K. Sivarajan, Optical networks: a practicalperspective, 2nd ed, Morgan Kaufmann, San Francisco, CA, 2002.)

MU

X

Dropped λ's Added λ's

DE

MU

X... ... ...

λ1 ... λN λ1 ... λN

λ1, λ2...,λNλ1, λ2...,λN

2x2

2x2

2x2

Figure 3.5 Implementation example of a flexible optical add/drop multiplexer using 2 � 2 switchesto add and/or drop the signals. (Adapted from R. Ramaswami, K. Sivarajan, Opticalnetworks: a practical perspective, 2nd ed, Morgan Kaufmann, San Francisco, CA, 2002.)



various configurations and the differences between optical rings and SONET/SDH

rings are highlighted and discussed in Section 3.5. Optical rings can be intercon-

nected using back-to-back installed OADMs or OXCs, the optical equivalent of the

DXC discussed in Chapter 2, Section 2.2.4. The introduction of OXCs in the optical

network layer, however, allows not only interconnecting rings but also meshed

optical networks.

3.1.4 Meshed Optical Networks

Other possible network architectures are mesh-based optical networks, or optical

networks that interconnect a ring-based part with a mesh-based part. The key

component in a meshed network architecture is the optical cross-connect (OXC).

The basic functionality of an OXC is to switch a signal from an in-going port to the

appropriate outgoing port.

Several designs of an OXC exist. A first distinction that can be made is based on

whether the switching matrix is electrical or optical. With an electrical switching

matrix, an optoelectrical conversion is needed first. The traffic is switched in the

electrical domain and then is converted back to the optical domain. The electroop-

tical and optoelectrical conversions are quite expensive, and this solution is not

future proof because the expensive transceivers and the electrical switching core

have to be replaced when the data rate increases (e.g., from 10 to 40 Gbps). This

type of OXC is called opaque or optical-electrical-optical (OEO) OXC switch. This is

the type of OXC that is available on the market at the time of publication. If the

traffic is switched in the optical domain, the term transparent or OOO OXC is used.

This solution is much more attractive because it avoids optoelectrooptical conver-

sion and because the core switch is independent of the bit rate. Future upgrades that

change the TDM rate no longer pose a problem. As the complexity of the system is

reduced, the reliability improves. Several optical switching technologies are under

investigation. Possible choices are, for example, switching based on micro-electro-

mechanical systems (MEMS), bubble switching, thermooptic switching, holographic

switching, switching based on beam steering, liquid crystals, and others [Ram02],

[Ben01]. For the moment the 3D variant of MEMS is the most plausible technology

to be implemented at large scale, because of its scalability and rather low optical loss,

among other things. Another variant is the OEOEO opaque switch, a compromise

between the aforementioned OEO and OOO switches. The switching matrix itself

switches in the optical domain, but at the ingress and egress of the switch matrix, an

OEO conversion takes place. This type of OXC does not have the bandwidth

limitation and power consumption of an OXC with electrical switch matrix and

allows wavelength conversion (because fully optical wavelength conversion is not

currently available), and 3R (reamplification, reshaping, and retiming) signal regen-

eration (but in that case the bit rate and data format transparency are lost).

Figure 3.6 shows a generic representation of an OXC. As explained earlier, the

switchmatrixmaybeelectricaloroptical.The transponders (OEOconverters) thatare

depicted at the ingress and egress of the OXC may be present or not, depending on the

exact implementation (e.g., in an all-optical implementation, theywill not be present).



Another classification of OXCs is based on their ability to convert the wave-

length of the signal between incoming and outgoing ports. With a wavelength

routing OXC (WR-OXC), an incoming signal on a certain wavelength is switched

to the correct outgoing port of the OXC, but the signal remains on the same

wavelength. A WR-OXC is not able to perform wavelength conversion. A wave-

length translating OXC (WT-OXC), on the other hand, can translate the

wavelength of an incoming signal to another wavelength before the signal leaves

the correct outgoing port. A WT-OXC is, thus, more flexible than a WR-OXC.

Note that as long as wavelength converters cannot be implemented fully optical,

transparent all-optical OXCs cannot perform wavelength conversion. The choice of

the type of OXC installed in the network also has an impact on the network

planning process and thus the resilience planning. In a network with WR-OXCs,

the lightpath between source and destination OXC has to be conveyed using the

same wavelength channel on all links along the path. This is known as the wave-

length continuity constraint and it requires solving the routing and wavelength

assignment problem. Networks that deploy such WR-OXCs are often denoted

wavelength path (WP) networks. In a network with WT-OXCs, the wavelength

continuity constraint is not applicable. Such networks are called virtual wavelength

path (VWP) networks. VWP networks increase the routing flexibility and the

throughput to some extent, but there is a price to pay: The wavelength converting

elements currently are very expensive, and they cannot be achieved all optically (i.e.,

cannot function completely in the optical domain). Wavelength conversion thus

requires the OEO conversion of the signal, making this type of OXC opaque. A lot

of work has been dedicated to comparing the pros and cons of both types of OXCs.

Under a fixed static traffic load, the difference in performance between optical

networks with WT-OXCs and WR-OXCs depends on the network characteristics

but is usually very small. A comparison between WP and VWP networks using path

I/O Interface

Transponders

Switch Matrix(Electrical or Optical) Mux/Demux

and OA

Figure 3.6 Generic representation of an optical cross-connect. (Adapted from J. Derkacz, et al. ‘‘IP/OTN Cost Model and Photonic Equipment Cost Forecast- IST LION project,’’ Proc. 4thWorkshop on Telecommunications Technoeconomics, Rennes, France, May 2002.)



protection as a recovery scheme is discussed in Section 3.6.2. Under a dynamic

traffic load, however, around 10% more traffic can be sent in a VWP network than

in a WP network with the same blocking probability.

3.1.5 Adding Flexibility to the Optical Network Layer

Sections 3.1.1 through 3.1.4 described the evolution of the optical network layer

from a layer providing high-capacity bit pipes to its client layers to a real managed

networking layer. The next step in the evolutionary path is the transition from a

static optical networking layer to a flexible and agile optical layer. Such a flexible

optical network layer enables the fast and efficient provisioning of new connections

across the network, provides the possibility to deploy flexible restoration options,

and allows efficient and high-quality network management. Part of the work on this

flexible optical network has arisen from the evolution of the IP client layer (more

precisely, from the Multi-Protocol Label Switching [MPLS] protocol). More details

on this subject are discussed in Chapter 6, Section 6.1.

3.2 The Optical Transport Network

The OTN, as defined by the International Telecommunications Union-T (ITU-T),

has a structure very similar to that of a SONET/SDH network. The functionality

of the OTN follows the generic principles defined in ITU-T Recommendation

(ITU-T Rec.) G.805 [G805]. The specific aspects concerning the optical layer of

the transport network are described in ITU-T Rec. G.872 [G872] and G.709/Y1331

[G709]. Also within other organizations such as the American National Standards

Institute (ANSI) workgroup T1.X1, the Optical Internetworking Forum (OIF),

and the Internet Engineering Task Force (IETF) a lot of work has been going

on concerning the OTN. The IETF has historically been focussing on the IP

client layer, but more recently IP over optical has also been under study. The

OIF, on the other hand, focuses mainly on the optical layer of the network with

contributions in interoperability and interfaces between vendor domains within

optical networks and between the optical network layer and client layers. It has

also dedicated quite some effort in making the OTN more flexible, which is further

discussed in Chapter 6, Section 6.1.

3.2.1 Architectural Aspects and Structure of the Optical TransportNetwork

According to ITU-T Rec. G.872 [G872], an OTN is characterized by a path layer on

top of two section layers, just like the SONET/SDH network structure. These three

layers are illustrated in Figure 3.7.

The optical transmission section (OTS) layer is the lowest section layer. It

provides the functionality for the transmission of optical signals on various types


3.2 The Optical Transport Network 139

of optical fiber. The layer on top is the optical multiplex section (OMS) layer. It

provides the networking functionality for a multiwavelength optical signal. The

highest layer is the optical channel (OCh) path layer and provides end-to-end

networking functionality for optical channels to allow transparently conveying

client signals of varying format between 3R regeneration points in the network.

Instead of the OTS and OMS layers, an optical physical section (OPS) may be

present (Figure 3.8). The OPS is a network layer that provides functionality for

transmission of a single- or multiwavelength optical signal just as the OMS and

OTS layers, but without their supervisory information. This is discussed in more

detail later in this section.

Currently, the only possibility to ensure the management requirements of

the optical channel (nonintrusive monitoring21 and management of each optical

channel) is by implementing the optical channel by means of a digital framed

signal with digital overhead (similar to the SONET/SDH frame22 [G707], discussed

in Chapter 2, Section 2.2.3). This results in the introduction of additional digital

Red

Orange

Green

Yellow

Red

Orange

Green

Yellow

OA

OCh

OMS

OTS OTS

Mux

DE

MU

X

Figure 3.7 Optical transmission section (OTS), optical multiplex section (OMS), and optical channel(OCh) layer.

21The types of connection supervision considered by the ITU-T were introduced in Chapter 2, Section

2.3.4. A short reminder:

1. Intrusive monitoring: Test wavelength and fiber performance for continuity, achieved by breaking in

the original trail and introducing a test trail that extends the connection for the duration of the test.

2. Inherent monitoring: The client layer (IP, ATM, STM) continuously monitors the state of a given

connection by processing the overhead provided by the OCh layer to approximate the operational

state of the client connection. Similarly, the OCh layer processes the data received from the OMS layer

to approximate the operational state of each OCh channel. The overall status of the connection

cannot be achieved with this type of monitoring because not all the necessary information for

performance monitoring is contained in the overhead information.

3. Nonintrusive monitoring: The connection monitoring capability is provided by listening to the original

data and its associated overhead. The overhead information transported by a connection is also used

for fault detection.

22In fact, the concepts described in ITU-T Rec. G. 709 find their roots in the correspondent standardiza-

tion of SONET/SDH, ITU-T Rec. G.707. [G.707]



layer networks: the optical channel payload unit (OPU) layer, the optical channel

data unit (ODU) layer, and the optical channel transport unit (OTU) layer (Figure

3.8). The OPU layer adapts the client signal. Multiple types of client traffic formats,

including legacy traffic protocols such as SONET/SDH and ATM and newer proto-

cols such as IP and Ethernet, can be transported. The ODU layer provides end-

to-end path monitoring (PM) and tandem connection23 monitoring (TCM). The

OTU layer provides supervision between two 3R regeneration points in the OTN.

The complete set of layers depicted in Figure 3.8 (from OPU layer to OPS layer or

OTS layer) form the optical transport module (OTM).

Figure 3.9 summarizes these different layers of the OTN for a small sample

network.

Optical Channel Payload Unit(OPUk)

IP ATM STM-N Ethernet

STM-N GbE

Optical Channel Data Unit(ODUk)

Optical Channel Transport Unit(OTUk)

Optical Channel(OCh)

Optical Multiplex Section(OMSn)

Optical Transmission Section (OTSn)

OpticalChannel(OChr)

OpticalPhysicalSection(OPSn)

Figure 3.8 Structure of the optical transport network. (Adapted from M. Vissers, Optical TransportNetwork & Optical Transport Module, ITU-T Standardization Organization, April 2002.Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May 2004, and ITU-T Recommendation G.709/Y.1331,Interfaces for the optical transport network, ITU-T Standardization Organization, February2001, and amendment 1, November 2001. Available at: www.itu.int. Accessed May2004.)

23A tandem connection (TC) is an arbitrary series of contiguous link connections and/or subnetwork

connections. A TC represents that part of a trail that requires monitoring independently from the

monitoring of the complete trail. See also Chapter 2, Section 2.3.4.



3.2.2 Structure of the Optical Transport Module

Different OTMs, with different functionality, have been defined. We discuss here

only the OTM with full functionality (OTM-n.m) in detail. The frame structure of

the OTUk within the OTM (Figure 3.10) consists, just as a SONET/SDH frame, of

an overhead area for OA&M functions and a payload area for client data. A

forward error correction (FEC) data block is added after the payload area to

improve the performance because it enables error checking and correction at the

receiving end of the signal.

The detailed structure of the OTM-n.m is depicted in Figure 3.11. OPUk,

ODUk, and OTUk overhead and FEC data are added to the client signal to form

the optical channel transport unit (OTUk). Three bit rates are supported: signals

with a bit rate of approximately 2.5 Gbps (k ¼ 1), 10 Gbps (k ¼ 2), and 40 Gbps

(k ¼ 3). These correspond with the SONET/SDH data rates of an OC-48/STM-16,

an OC-192/STM-64, and an OC-768/STM-256, respectively. The overhead of the

OTSnOTSn OTSn OTSn

OMSn

OTSn

OMSn

OXC with 3R

3R

All-Optical OXC

STM-N

ODUk, OPUk

OCh, OTUk OCh, OTUk

DXC 2DXC 1

Figure 3.9 Layers in the optical transport network.

87

Column

17

1

2

3

4

1

FrameAlignm. OH

14 15 16

OTUk OH

ODUk OH OP

Uk

OH

3824

3825

4080

OTUkFEC

OPUk Payload

Row

Byte

Figure 3.10 Optical channel frame structure. (Adapted from ITU-T Recommendation G.709/Y.1331,Interfaces for the optical transport network, ITU-T Standardization Organization,February 2001, and amendment 1, November 2001. Available at: www.itu.int. AccessedMay 2004.)



OPU, ODU, and OTU layer is called the associated overhead. The specific infor-

mation contained in these overheads is discussed in Section 3.3.

An OTM-n.m supports the transport of several wavelength channels. The

maximum number of supported wavelength channels—the order of the OTM—is

indicated by the index n. For instance, an OTM-16.m signal can support up to 16

wavelengths. The index m defines the supported bit rate of the multiwavelength

signal. It can take the values 1, 2, 3, 12, 23, or 123—meaning that the supported

wavelengths can have a bit rate of only 2.5 Gbps, only 10 Gbps, only 40 Gbps, 2.5

and 10 Gbps, 10 and 40 Gbps, or 2.5, 10, and 40 Gbps, respectively. The optical

channel payload is carried on a specific wavelength as payload of an optical channel

carrier (OCC), its overhead in the corresponding OCC overhead. Finally, the OMS

and the OTS are constructed. The OCh, OMS, and OTS overhead, denoted as

nonassociated overhead, are transported in an additional optical wavelength chan-

nel, the optical supervisory channel (OSC).

Besides the OTM-n.m, an OTM-0.m and an OTM-nr.m are defined. The index

r in OTM-nr.m stands for reduced functionality. Instead of an OTS and OMS layer,

an OPS layer is present. Remember that the OPS is a network layer that provides

functionality for transmission of a single wavelength optical signal (OPS0) or a

Optical Channel Payload Unit (OPUk)

OTUkOH

Optical Channel Data Unit (ODUk)

Optical Channel Transport Unit (OTUk)

OC

Cp

Optical Channel (OCh)

Optical Multiplex Unit (OMU-n.m)

Optical Transport Module (OTM-n.m)

OCC: Optical Channel CarrierOCCp: OCC PayloadOCCo: OCC Overhead

...

OMSnOH

OTSnOH

Optical Carrier Group (OCG-n.m)

OC

Co

OC

Co

OC

Co

OC

Co

...

OTM OverheadSIgnal (OOS)

OC

Cp

OC

Cp

OC

Cp

ODUkOH

Non

-Ass

ocia

ted

OH

(OC

h, O

MS

n an

dO

TS

n O

H)

OTSn Payload

OMSn Payload

OChOH

OTUkFEC

OPUk Payload

ODUk Payload

OTUk Payload

Client Data

OPUkOH

OCh Payload

Figure 3.11 Structure of the Optical Transport Module with full functionality OTM-n.m. (ITU-TRecommendation G.709/Y.1331, ‘‘Interfaces for the optical transport network,’’ ITU-TStandardization Organization, February 2001, and amendment 1, November 2001.Available at: www.itu.int. Accessed May 2004.)



multiwavelength optical signal (OPSn) just as the OMS and OTS layers, but

without their supervisory information. No overhead of the OCh and OPS layer is

supported (the nonassociated overhead of the OTM-n.m is, thus, not present). The

OTM-0.m consists of a single optical channel; hence, there is no support for WDM.

An OTM-0.1, for instance, transports a single wavelength channel of 2.5 Gbps. The

OTM-0.m is also an OTM with reduced functionality.

3.2.3 Overview of the Standardization Work on the Optical TransportNetwork

A complete overview of the standardization activities on OTNs within the ITU-T is

given in [OTNTS] [G871]. A summary can be found in Figure 3.12.

3.3 Fault Detection and Propagation

In Section 3.2 the Optical Transport Module (OTM) was introduced. We have seen

that several types of overhead can be used for nonintrusive monitoring and man-

Network Architecture(G.872)

Structures and Mapping(G.709/Y1331)

Physical Layer Aspects(G.691, G.692, G.694.1,G.694.2, G.664, G.959.1,

G.693, Sup. 39)

FunctionalCharacteristics(G.798, G.806)

Management Aspects(G.874, G.874.1,G.875,

G.7710)

Error Performance(G.8201, M.24otn)

Protection Switching(G.808.1, G.873.1)

Jitter and WanderPerformance

(G.8251)

Framework(G.871/Y.1301)

Data CommunicationNetwork

(G.7712/Y1703)

Testing(O.173, O.201)

Physical Layer(G.959.1, G.693)

Figure 3.12 Overview of the available and planned ITU-T recommendations on the Optical TransportNetwork. (Adapted from ‘‘Optical Transport Network & Optical Transport Module,’’ ITU-TStandardization Organization, April 2002. Available at http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May2004.)



agement of the optical signal. The information transported in the overhead will be

used for fault detection and propagation, as well as alarm suppression, among other

things. In Section 3.3.1, the OTN overhead is discussed, emphasizing that part of

the overhead that is useful for the discussion of fault detection and propagation. In

Section 3.3.2, we shortly enumerate the different defects that are currently defined

in the OTN. Finally, Section 3.3.3 illustrates the discussion on defects, fault detec-

tion and propagation, and alarm suppression, and provides some examples.

This section (which is about fault detection and propagation) is useful to get

better insight in the recovery mechanisms described in Sections 3.4 through 3.6,

although these can be fully understood without this information. To fully compre-

hend this section, we recommend you to first get acquainted with the general

transmission network terminology explained in Chapter 2, Section 2.1.

3.3.1 The Optical Network Overhead

We start this section by detailing the different types of OTN overhead. As men-

tioned earlier, a distinction can be made between associated overhead and non-

associated overhead.

Associated Overhead

The associated overhead includes the OPU, ODU, and OTU overhead.

Optical Channel Payload Unit Overhead

The optical channel payload unit overhead (OPUk OH) enables the support of

various kinds of client signals. It includes information to support the client signal

adaptation.

Optical Channel Data Unit Overhead

The optical channel data unit overhead (ODUk OH) (Figure 3.13) includes infor-

mation for maintenance and operational functions to support optical channels. It

provides path layer connection monitoring functions.

Concerning monitoring and resilience, the path monitoring (PM) and tandem

connection monitoring i (TCMi) fields are of particular importance. The PM field is

dedicated to the end-to-end ODUk path and the TCMi fields allow for six levels of

TCM. Both are illustrated in detail in Figure 3.14.

The PM field contains a byte to transport the trail trace identifier (TTI), a bit

interleaved parity (BIP) byte, a backward defect indication (BDI) field, a backward

error indication (BEI) field, and a STAT field. In the TTI byte, a 64-byte long TTI is

repeatedly transmitted to verify the connectivity of the OChs through network

elements as ODU cross-connects. The BIP-8 is used for performance monitoring.

The BDI and BEI are used for performance monitoring. The BDI signal conveys the

signal fail status detected in a path termination sink function in the upstream

direction. The BEI signal has been defined to convey in the upstream direction the

count of interleaved bit blocks that have been detected in error by the corresponding


3.3 Fault Detection and Propagation 145

Column

TCM: Tandem Connection MonitoringACT: ACTivation/deactivation Control ChannelAPS: Automatic Protection Switching Coordination ChannelEXP: EXPerimentalFTFL: Fault Type & Fault Location Reporting ChannelPCC: Protection Communication Channel ControlGCC: General Communication ChannelPM: Path MonitoringRES: Reserved for Future International Standardization

Ro

w 87

1

2

3

4

1

Frame Alignment OH

14

OTUk OH

2 3 4 5 6 9 10 11 12 13

RESTCMACT TCM6

TCM3 TCM2

GCC1 GCC2

TCM5 TCM4 FTFL

EXPPMTCM1

APS/PCC RES

15 16O

PU

kOH

Byte

Figure 3.13 Optical channel data unit overhead (ODUk OH). (ITU-T Recommendation G.709/Y.1331,‘‘Interfaces for the optical transport network,’’ ITU-T Standardization Organization,February 2001, and amendment 1, November 2001. Available at: www.itu.int. AccessedMay 2004.)

TTI: Trail Trace IdentifierBIP-8: Bit Interleaved Parity−Level 8SAPI: Source Access Point IdentifierDAPI: Destination Access Point IdentifierBEI: Backward Error IndicationBIAE: Backward Incoming Alignment ErrorBDI: Backward Defect IndicationSTAT: Status

7654

1 2 33 Bytes of the PM or TCMi (i=1..6) Fields

TTI BIP-8

SAPI

DAPI

OperatorSpecific

0

1516

31

32

63

1 2 3 8

BEI STAT

BEI/BIAE BD

IB

DI

STAT

76541 2 3 8

PM

TC

Mi

Byte

Bit

Bit

64 Byteslong trailtraceidentifier,repeatedlytransmittedin TTI field

Figure 3.14 The bytes of the path monitoring (PM) or tandem connection monitoring i (TCMi) field ofthe optical channel data unit overhead. (ITU-T Recommendation G.709/Y.1331, ‘‘Inter-faces for the optical transport network,’’ ITU-T Standardization Organization, February2001, and amendment 1, November 2001. Available at: www.itu.int. Accessed May2004.)



ODUk PM sink using the BIP-8 code. The use of the BDI and BEI is illustrated in

Section 3.3.3 with an example. The STAT bits indicate the presence of a mainten-

ance signal, as shown in Table 3.1. These maintenance signals are discussed in

Sections 3.3.2 and 3.3.3.

The six TCM fields each support monitoring for ODU connections. The

tandem connection (TC) overhead is added and terminated at the source and sink

of the corresponding TCs, respectively. The TCMi overhead bytes have a structure

similar to that of the PM signal of the ODUk overhead. The BEI/BIAE signal is

used to convey in the upstream direction the count of interleaved bit blocks that

have been detected in error by the corresponding ODUk TCM sink using the BIP-8

code. It is also used to convey in the upstream direction an incoming alignment

error (IAE) condition that is detected in the corresponding ODUk TCM sink in the

IAE overhead. The STAT bits indicate not only the presence of a maintenance

signal but also whether there is an IAE at the source or there is no source active

(Table 3.2).

Using these multiple instances of the PM functions, the ODU paths can be

monitored end-to-end through the public transport network or through the net-

work of a network operator, or it can be used for subnetwork connection monitor-

ing or monitoring on a ring network, among other things. Both nested (Figure 3.15)

and overlapping ODU connections can be monitored. This functionality allows for

true optical connection monitoring across multiple networks independent from

network operator or equipment vendor, that is, across multiple administrative

domains.

The ODUk overhead also contains a field for automatic protection switching

(APS) at the path layer (APS/PCC field). Its functionality is similar to that of the

corresponding field in the SONET/SDH overhead and allows the implementation

of path protection schemes such as optical channel shared protection rings (OCh-SP

Rings, see Section 3.5.2 for details). In the ODUk overhead a byte is also allocated

Table 3.1 ODUk PM Status Interpretation

PM byte 3, bits 678 Status

000 Reserved for future international standardization

001 Normal path signal




101 Maintenance signal: ODUk-LCK

110 Maintenance signal: ODUk-OCI

111 Maintenance signal: ODUk-AIS

Source: From ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transport network,’’ ITU-T

Standardization Organization, February 2001, and amendment 1, November 2001.



to transport the fault type and fault location (FTFL) message. This byte provides

fault status information including information regarding type and location of the

fault. This message is related to the TCM span. The TCM activation/deactivation

control channel (TCM ACT) field is related to TCM. The general communication

channel (GCC) fields are communication channels that can be used to pass infor-

mation between any two network elements with access to the ODUk frame.

A1 A2B1 B2 B3 B4C1 C2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCM3

TCM4

TCM5

TCM6

TCM1

TCM2

TCMi

TCMi

Tandem Connection Monitoring i (TCMi) overhead field not in use

Tandem Connection Monitoring i (TCMi) overhead field in use

C1 - C2

B1 - B2 B3 - B4

A1 - A2

Figure 3.15 Nested and cascaded monitored ODUk connections. (ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transport network,’’ ITU-T Standardization Organiza-tion, February 2001, and amendment 1, November 2001. Avialable at: www.itu.int.Accessed May 2004.)

Table 3.2 ODUk TCMi Status Interpretation

TCMi byte 3, bits 678 Status

000 No source tandem connection

001 In use without incoming alignment error (IAE)

010 In use with IAE



101 Maintenance signal: ODUk-LCK

110 Maintenance signal: ODUk-OCI

111 Maintenance signal: ODUk-AIS

Source: From ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transport network,’’ ITU-

T Standardization Organization, February 2001, and amendment 1, November 2001.



Optical Channel Transport Unit Overhead

The optical channel transport unit overhead (OTUk OH) (Figure 3.16) enables

the transport of the digital ODU over an optical channel connection. For fault

and performance monitoring the section monitoring (SM) field is particularly

important. It has a structure similar to the PM and TCMi fields in the ODUk

overhead.

The SM field, detailed in Figure 3.17, consists of a TTI subfield, a BIP-8

subfield, a BDI subfield, a BEI subfield, and an IAE subfield. These functions

serve the same purpose as the parallel ones in the ODUk overhead, only now at

1

8 149 10 11 12 13

SM GCC0 RES

SM: Section MonitoringRES: Reserved for Future International StandardizationGCC: General Communication Channel

Figure 3.16 OTUk overhead. (ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the opticaltransport network,’’ ITU-T Standardization Organization, February 2001, and amend-ment 1, November 2001. Available at: www.itu.int. Accessed May 2004.)

TTI: Trail Trace IdentifierBIP-8: Bit Interleaved Parity−Level 8SAPI: Source Access Point IdentifierDAPI: Destination Access Point IdentifierBEI: Backward Error IndicationBIAE: Backward Incoming Alignment ErrorBDI: Backward Defect IndicationIAE: Incoming Alignment IndicationRES: Reserved for Future International Standardization

7654

8 9 10

3 Bytes of the SM Field

TTI BIP-8

SAPI

DAPI

OperatorSpecific

0

1

1516

31

32

63

1 2 3 8

BEI/BIAE RESBD

I

IAE

Byte

Bit

64 Byteslong trailtraceidentifier,repeatedlytransmittedin TTI field

Figure 3.17 Section monitoring field of the OTUk overhead. (ITU-T Recommendation G.709/Y.1331,‘‘Interfaces for the optical transport network,’’ ITU-T Standardization Organization,February 2001, and amendment 1, November 2001. Available at: www.itu.int. AccessedMay 2004.)



the OTUk section level instead of at the ODUk path level. Also the frame alignment

signal (FAS) overhead is part of the OTUk overhead.

The OTUk FEC allows for error detection and error correction in the optical

links. The BIP-8 fields in the OTUk and the ODUk overhead only allow error

monitoring on the OTUk and ODUk payloads, respectively. FEC performs

error monitoring on the complete optical channel, including the OTUk OH. FEC

enables the detection and correction of bit errors caused by physical impairments

in the transmission medium (e.g., linear impairments such as attenuation or

dispersion and nonlinear effects such as self-phase modulation or four-wave

mixing).

Nonassociated Overhead

The nonassociated overhead of the OTM (Figure 3.18) consists of the OTS, the

OMS, and the OCh overhead and is transported by means of an optical supervisory

channel (OSC).

Optical Channel Overhead

The OCh OH includes information for maintenance functions to support fault

management. The OTM-n.m OCh OH (for each optical channel carried within an

OMS) consists of the following:

. OCh forward defect indication payload (FDI-P), used for OCh trail moni-

toring and defined to convey in the downstream direction the OCh payload

signal status, namely normal or failed.

n3

2

PMI

BDI-P

BDI-O

TTI

OT

Sn

PMI

BDI-P

BDI-O

FDI-P

OM

Sn

FDI-O

OCI

FDI-P

OC

h

FDI-O1

General Management Communications

Figure 3.18 OTSn, OMSn, and OCh overhead within the OTM overhead signal (OOS). (ITU-T Recom-mendation G.709/Y.1331, ‘‘Interfaces for the optical transport network,’’ ITU-TStandardization Organization, February 2001, and amendment 1, November 2001.Available at: www.itu.int. Accessed May 2004.)



. OCh forward defect indication overhead (FDI-O), used for OCh trail

monitoring and defined to convey in the downstream direction the OCh

overhead signal status, namely normal or failed.

. OCh open connection indication (OCI), indicates that upstream in a con-

nection function the matrix connection is opened as a result of a manage-

ment command. The consequential detection of the OCh loss of signal

(LOS) condition can now be related to an open matrix.

OCH overhead extensions may be expected in the future.

Optical Multiplex Section Overhead

The OMS OH includes information for maintenance and operational functions

to support OMSs. It consists of the following:

. OMS FDI-P, used for OMS section monitoring and defined to convey in the

downstream direction the OMS payload signal status (normal or failed).

. OMS FDI-O, used for OMS section monitoring and defined to convey in

the downstream direction the OMS overhead signal status (normal or

failed).

. OMS backward defect indication-payload (BDI-P), used for OMS section

monitoring and defined to convey in the upstream direction the OMS

payload signal fail status detected in the OMS termination sink function.

. OMS backward defect indication overhead (BDI-O), used for OMS section

monitoring and defined to convey in the upstream direction the OMS

overhead signal fail status detected in the OMS termination sink function.

. OMS payload missing indication (PMI), sent downstream as an indication

that upstream at the source point of the OMS signal none of the OCCps

contain an optical channel signal, to suppress the report of the consequen-

tial LOS.

Optical Transmission Section Overhead

The OTS OH includes information for maintenance and operational functions to

support optical transmission sections. It consists of the following:

. OTS trail trace identifier (TTI), used for OTS section monitoring.

. OTS backward defect indication payload (BDI-P), used for OTS

section monitoring and defined to convey in the upstream direction

the OTS payload signal fail status detected in the OTS termination sink

function.

. OTS backward defect indication overhead (BDI-O), used for OTS

section monitoring and defined to convey in the upstream direction the

OTS overhead signal fail status detected in the OTS termination sink

function.

. OTS payload missing indication (PMI) (sent downstream as an

indication that upstream, at the source of the OTS signal, no payload is

added. Is defined to suppress the report of the consequential loss of signal

condition.



The use of this nonassociated overhead is also clarified with some examples in

Section 3.3.3.

3.3.2 Defects in the Optical Transport Network

Section 3.3.1 discussed in detail the available overhead for fault detection and

propagation. In this section the defects in the OTN are briefly discussed. Table

3.3 enumerates the most important ones [G798], [G806]. Most defects in the OTN

are quite similar to those defined in the SONET/SDH layer (see Chapter 2, Section

2.3). After a defect has been declared, a decision has to be made on the consequent

action to be taken. An example of such a consequent action is to send out an FDI.

Also the fault cause has to be determined. In general, the defect and the consequent

action share the same name.

Table 3.3 Defects in the Optical Transport Network

Loss of Signal Payload

(LOS-P)

No signal is coming in. The LOS-P defect is monitored at the

OTS, OMS, and OCh layers of the OTM-n.m and the OPS and

OChr layers of the OTM-nr.m and OTM-0.m.

Loss of Signal Overhead

(LOS-O)

No optical supervisory channel (OSC) containing the

nonassociated overhead is coming in.

Loss of Tandem Connection

(LTC)

Detects the presence or absence of tandem connection

overhead. The LTC is monitored at the tandem connection

sublayer of the ODUk layer.

Trace Identifier Mismatch

(TIM)

Connectivity fault because of improper routing of the

connection between trail termination source and sink, or

because the connectivity is not maintained while the connection

is active. The TIM will be monitored at the OTS, OTUk, and

ODUk layers.

Signal Degrade (SDEG) The signal has degraded (too many error blocks in a monitoring

interval). Monitored at the OTUk and ODUk layers.

Payload Mismatch (PLM) The payload type is not equal to the expected payload type.

PLM is monitored at the path layer of the ODUk layer.

Loss of Frame (LOF) The correct pattern in the FAS bytes of the OTUk frame is not

found for five consecutive frames.

Loss of Multiframe (LOM) The received MFAS does not match with the expected MFAS

in five consecutive OTUk frames.

Forward Defect Indication

Payload (FDI-P)

The FDI-P signal is monitored at the OMS and OCh layers

to suppress downstream alarms at the client layer caused by

upstream defects detected by the server layer, which interrupt

the client payload signal. The FDI signal is sent downstream as

an indication that an upstream defect has been detected.



Table 3.3 (cont’d)

When the signal is in the optical domain, the term FDI is used.

In the electrical domain, the term AIS is used.


Overhead (FDI-O)

The FDI-O signal is monitored at the OMS and OCh layers to

suppress downstream alarms at the client layer caused by

upstream defects detected by the server layer, which interrupt the

OTM overhead signal.

Alarm Indication Signal

(AIS)

The AIS signal is monitored at the OTUk and ODUk layers to

suppress alarms at the client layer caused by upstream defects

detected by the server layer and/or the client’s tandem

connection sublayer, which interrupt the client payload signal.

It is the electrical equivalent of the FDI-P signal in the optical

domain.

Backward Defect Indication

Payload (BDI-P)

The BDI-P defect signal is monitored at the OMS and OCh

layers. The BDI-P is sent upstream as an indication that a defect

has been detected that interrupts the client payload signal.


Overhead (BDI-O)

The BDI-O defect signal is monitored at the OMS and OCh

layers. The BDI-O is sent upstream as an indication that a

defect has been detected that interrupts the OTM overhead

signal.


(BDI)

The BDI is declared or cleared through the BDI field of the SM,

TCMi, and PM overhead fields of the OTUk and ODUk layers.

A BDI signal is sent upstream as an indication that a defect has

been detected that interrupts the client payload signal.

Backward Error Indication

(BEI)

The BEI is declared or cleared through the BEI field of the

SM, TCMi, and PM overhead fields of the OTUk and ODUk

layers. A BEI signal is sent upstream as an indication that an

error has been detected that affects the client payload signal.

Open Connection Indication

(OCI)

The OCI defect is monitored at the OCh and ODUk path

and tandem connection layers to qualify a downstream

LOS defect by indicating that the LOS is due to the fact that

the signal is not connected. An OCI signal is sent downstream

as an indication that upstream the signal is not connected.

Payload Missing Indication

(PMI)

This parameter is monitored at the OTS and OMS layers to

suppress downstream LOS alarms caused by upstream defects

that caused the missing payload. A PMI signal is sent

downstream as an indication that upstream at the source point

of the signal payload is missing.

Locked (LCK) The LCK parameter is monitored at the ODUk path and

tandem connection layer. An LCK signal is sent downstream to

indicate that upstream the connection is locked, so no signal is

passed through.



3.3.3 OTN Maintenance Signals and Alarm Suppression

In this section we discuss how different maintenance signals can be correlated to

reduce the number of alarms raised. Maintenance signals indicate defects in a

connection. The defect indications are given in the upstream and downstream

direction. Figure 3.19 gives an overview of the maintenance signals that convey

the backward (upstream) information in the different layers of the OTM. Note that

in the OCh layer, no BDI or BEI is defined for the moment. In addition, for some of

the client layers, the corresponding RDI24 and REI25 are not defined yet. If the

client layer is a constant bit rate (CBR) signal (i.e., a SONET/SDH signal), the RDI

and REI (as discussed in Chapter 2, Section 2.2.3) are used.O

Ch

OC

h

OC

h

OC

h

OC

h

...OC

h

Eth

erne

t

OTSn

OMSn

OT

Uk

AT

M

MP

LS

OD

Uk

IP

CB

R

Fut

ure

Ser

ver

Laye

r

BDIRDIREI

RDIREI

ODUk-BDIODUk-BEI

OTUk-BDIOTUk-BEI

OMSn-BDI-POMSn-BDI-O

OTSn-BDI-POTSn-BDI-O

Figure 3.19 Optical Transport Network maintenance signals: backward information. (M. Vissers,‘‘Optical Transport Network & Optical Transport Module,’’ ITU-T Standardization Organ-ization, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May 2004.)

24Remote defect indication is a signal that is sent upstream to indicate that a defect has been detected that

interrupts the signal.25Remote error indication is a signal that is sent upstream to signal an error condition.



Figure 3.20 summarizes the forward (downstream) maintenance information

and indicates how the FDI of the optical layers, the AIS of the digital layers, and

the PMI can be used to perform alarm suppression.

The FDI is used to notify in the downstream direction that there is a signal fail.

It is used to suppress downstream alarms in the client layer caused by an upstream

and already detected defect in the server layer that has interrupted the client

payload signal. The PMI signal is used to signal to the termination point of the

trail that there was already no payload signal at the origin point of the trail.

The use of the FDI, AIS, and PMI maintenance signals is illustrated in Figure

3.21. Let us assume that in the network depicted in Figure 3.21, each fiber is able

to transport 200 wavelengths (optical line system of 200 wavelength channels).

Each fiber cable contains 96 fibers and five fiber cables are grouped per duct. This

means that in the case of a failure caused by erroneous digging activities,

5 * 96 * 200 ¼ 96, 000 wavelength channels could be affected. This would also

imply that 96,000 LOS alarms are generated, a tremendous amount—too much to

be handled.

OTSn

OMSn

OC

h

OC

h

OC

h

...

Fut

ure

Ser

ver

Laye

r

OT

Uk

OD

Uk

CB

R

ATM IP

MP

LS

Eth

erne

t

OMSn-PMI

OM

Sn-

FD

I

OC

h-F

DI

gen-

AIS

OT

Uk-

AIS

OD

Uk-

AIS

VP

-AIS

MP

LS-F

DI

OTSn-PMI

OC

h

OC

h

OC

h

OC

h-F

DI

OC

h-F

DI

OC

h-F

DI

OC

h-F

DI

OC

h-F

DI

Figure 3.20 Optical Transport Networkmaintenance signals: forward information. (M.Vissers, ‘‘OpticalTransport Network & Optical Transport Module,’’ ITU-T Standardization Organization,April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May 2004.)



The use of the OTN maintenance signals FDI, AIS, and PMI reduces the

number of alarms in the case of Figure 3.21 from 96,000 to a single one per broken

fiber (maximum 5 * 96 ¼ 480 LOS alarms). Let us look into more detail to this

example. We focus on the connections between DXCs 1 and 2. After the duct has

been damaged, an OMS-FDI is sent from the OTS layer to the OMS client layer. At

the OMS termination point, thus, at the first OXC, the OMS-FDI is converted into

an OCh-FDI. This OCh-FDI signal is sent to the upstream OCh termination points.

In Figure 3.21, the OCh termination points are the 3R regenerators at the second

OXC encountered. At this point the OCh-FDI signal is converted into an ODUk-

AIS signal. The OTS-PMI (and the OMS-PMI) signals prevent the LOS alarm

from being raised at the OTS (and the OMS) layer, when the wavelengths are not

present.

Figure 3.22 illustrates the use of the BDI and BEI signals. After the occurrence

of the network fault, the downstream OTS termination point (at the downstream

OA) sends an OTS-BDI to the upstream OTS termination point (at the upstream

OA) to indicate that a defect has been detected that interrupts the client signal.

Likewise, the downstream OMS termination point (at the downstream demulti-

plexer) sends an OMS-BDI to the upstream OMS termination point (at the upstream

multiplexer). The downstream OTUk termination points send both an OTUk-BDI

and OTUk-BEI signal to the upstream OTUk termination points. Also, the down-

stream ODUk termination points send an ODUk-BDI and ODUk-BEI to the

upstream ODUk termination points. The signals that are sent in the upstream

direction are similar to those in Figure 3.21.

DXC 4

OMS-FDI OCh-FDI

OCh-FDI

3R

ODUk-AIS

OTS-PMI OTS-PMI

DXC 1

DXC 3

3R ODUk-AIS

DXC 2

Figure 3.21 Alarmsuppression basedon the Optical TransportNetworkmaintenancesignals. (M.Vissers,‘‘Optical Transport Network & Optical Transport Module,’’ ITU-T Standardization Organiza-tion, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May 2004.)



3.4 Recovery in Optical Networks

In this section, we focus on recovery schemes in the optical network layer. These

recovery schemes show a lot of resemblance to the schemes discussed in the previous

chapter on SONET/SDH.

3.4.1 Recovery at the Optical Layer?

Why would we want to deploy recovery schemes at the optical network layer? The

reason for this is actually quite straightforward. Many of the failures occur at

the optical network layer: Fiber cuts resulting from, for instance, digging works

and the failure of an individual transmitter or receiver are quite common. As

discussed earlier, the recovery schemes at the optical layer work at the level of a

multiplex section or an optical channel. In both cases the recovery action is carried

out using the large granularity of an optical channel or even a complete multiplexed

bundle of optical channels. This means that there are fewer connections to restore. In

Chapter 6 on multilayer survivability, we will see that a root failure at the optical

network level typically results in a significant number of secondary failure indica-

tions at the higher layers (e.g., the SONET/SDH or IP layer). A recovery scheme at

the client layer (e.g., SONET/SDH or IP) would need to restore quite a lot of affected

connections.26 In the optical layer, however, the number of connections that are

26The client layer of an Optical Transport Network typically does not consist of a single client network,

but of a number of independent client networks. A single client network might not have to undertake

more recovery actions than the optical network, but overall, recovering from a failure in the optical layer

will typically be cheaper. A disadvantage of recovery at the optical layer is that decreased possibility to

differentiate the recovery scheme per client.

OCh-FDI

OTS-PMI

3R

ODUk-AISOMS-FDI

OTS-BDI

OMS-BDI

OTUk-BDI, OTUk-BEI

ODUk-BDI, ODUk-BEI

DXC 1

DXC 2

Figure 3.22 Use of backward and forward information.


3.4 Recovery in Optical Networks 157

affected by the root failure and that have to be restored is limited. Because recovery

at the optical layer recovers the affected connections in group, the recovery action is

also fast and easier to manage than recovering each affected connection individually

in the client layer. Recovering the affected connections in the client layer implies a

lot of individual actions to switch the traffic from its working path to its backup

path.

The same recovery scheme classification as discussed in Chapter 2 can be used

for optical networks. A distinction can be made between protection and restoration

schemes. The difference between both was already explained in Chapter 1. Both

options require signaling, but the (subtle) difference lies in the timing of the

signaling actions. In the case of protection, the recovery paths are preplanned and

fully signaled before a failure occurs. Hence, when a failure occurs, no additional

signaling is needed to establish the protection path. In the case of restoration, the

recovery paths can be either preplanned or dynamically allocated, but when a

failure occurs additional signaling will be needed to establish the restoration path.

All protection schemes, except the 1þ1 unidirectional protection switching scheme,

rely on an APS coordination protocol, which is currently being standardized in

the standardization bodies (see Section 3.4.2). It will undoubtedly show a major

resemblance to the APS scheme in the SONET/SDH layer, which was explained in

detail in Chapter 2, Section 2.3.4. Other classifications that can be made are based

on whether the recovery scheme recovers from link and node (OXC or OADM)

failures or only from link failures; whether the spare resources are preplanned

and allocated off-line, or dynamic, after the failure has happened; whether the

scheme is deployed in a ring-based or mesh-based network, and so on. Survivability

schemes at the optical layer can also be classified depending on the exact sublayer

in which the recovery action is performed. More precisely, in an optical network

the recovery scheme can operate at the OCh level or at the OMS level. In the

former case, each lightpath is switched to its backup lightpath when a failure

occurs, one at a time. In the latter case, the whole multiplex of optical channels

transmitted over a single fiber are switched over from the working path to the

backup path.

3.4.2 Standardization Work on Recovery in the Optical TransportNetwork

The status of the standardization work on the OTN at the ITU-T and other

standardization organizations as the OIF and ANSI T1X1, at the time of this

publication, has been discussed in Section 3.2. The topic of survivability in the

OTN layer is, however, still under study within these standardization organizations.

Specification of protection switching in both ring-based and mesh-based OTNs will

soon be published (2004–2005). The following is a list of recommendations that are

currently under development concerning recovery in the OTN within the ITU-T:

. ITU-T Rec. G.808.1: Generic Protection Switching–Linear Trail and Subnet-

work Protection [G808.1]



The scope of this recommendation is the definition of the generic functional

models, characteristics, and processes associated with various linear protec-

tion schemes for connection-oriented layer networks. This recommendation

is thus not limited to the OTN only, but is also valid for the SONET/SDH

and ATM network layers. The protection schemes that are described are

trail protection (see Chapter 2, Section 2.3.4) and subnetwork connection

protection (see Chapter 2, Section 2.3.4).

. ITU-T Rec. G.808.2: Generic Protection Switching–Ring [G808.2]

. ITU-T Rec. G.873.1: Optical Transport Network (OTN)–Linear Protection

[G873.1]

This recommendation describes the APS protocol to support linear protec-

tion in the OTN at the ODUk path and ODUk TC sublayers.

. ITU-T Rec. G.873.2 Optical Transport Network (OTN)–Ring Protection

[G873.2]

This recommendation describes the APS protocol to support ring protec-

tion in the OTN.

3.4.3 Shared Risk Group

An important concept to keep in mind when discussing recovery of optical net-

works is that of a shared risk group (SRG) [Str01]. This concept is closely related to

the concept of diversity. Two lightpaths are said to be link/node diverse if they do

not share a common link/node. Diversity implies, thus, that there is no single point

of failure. However, to ensure real physical diversity, the lightpaths have to be

diverse not only on the fiber cable topology, but also on the underlying duct

topology. A short discussion of the physical placement of the optical fibers is

needed. Optical fibers that will be buried underground are grouped into a fiber

cable that is generally installed into a duct (a prefabricated pipe in which the cable is

drawn inside using a draught winch), which is in its turn placed in a trench. Such a

trench is often a right of way (ROW), which is frequently obtained from railroad

companies or electricity companies. A situation that is fairly common is depicted in

Figure 3.23.

Although in the fiber topology of Figure 3.23 both paths between node A and

node B (e.g., a working path and a dedicated protection path) seem to be diverse,

this is clearly not the case physically, in the duct topology. A duct failure would

affect both the working and the backup path. This example is typical for a submar-

ine duct or ducts in dense metropolitan areas. One way to solve this problem is to

introduce the SRG. This is an identifier that is assigned to the common resource, or

thus, the common risk. In the previous example, this is the duct: The duct is a

shared risk component, whose failure (e.g., caused by a cut from a digging accident)

will cause all fibers in the duct to be cut. All fibers that go through that duct belong

to the same SRG. During the calculation of the primary and backup path, one can

then avoid using resources with this identifier in both the primary and the backup

path at the same time (SRG diverse paths). Because a fiber will typically run

through a sequence of ducts, a fiber will typically belong to several SRGs.


3.4 Recovery in Optical Networks 159

Of course, the principle of SRG is not limited to the cable topology versus the

duct topology but can also be applied to the fiber topology versus the fiber cable

topology (e.g., in the case of fiber splicing, see Figure 3.24), to the logical network

topology versus the transport layer topology in multilayer networks (see Chapter

6), and so on. In Figure 3.24, fiber is spliced at the manhole to reach, for example,

an office with limited bandwidth needs at the upper node. Other examples are

provided in Chapter 5 in which two IP links can share the same shared risk link

group (SRLG)27 if they are routed in the same fiber.

3.5 RecoveryMechanisms in Ring-BasedOptical Networks

In this section we focus on recovery strategies in ring-based optical networks. As

already explained (Chapter 2, Section 2.4) several ring-based architectures are

possible. In such a ring-based architecture, the most natural way to provide recov-

ery is using a protection scheme. Each of the possible ring-based network architec-

tures has their own way to provide recovery [Ari1/00], [Bon01], [Ram02].

Fiber Cable Topology

Duct Topology

A B

A B

Figure 3.23 Fiber cable topology versus duct topology.

27Note that in the IP/MPLS world, the term shared risk link group is usually used.



A first distinction that can be made is based on the layer in which the protection

scheme is implemented: the OMS layer or the OCh layer (Figure 3.25). With a

scheme at the OCh layer, the recovery process is performed by the OADMs through

which the traffic enters and leaves the ring network. With a scheme at the OMS

layer, the recovery process is performed by the OADMs adjacent to the failure. The

choice whether to use a scheme at the OMS or the OCh layer decides of course also

on the granularity of the protection action. In the OMS layer the whole bundle of

multiplexed optical channels is protected as a whole, whereas with a protection

scheme at the OCh level, each optical channel is protected individually and the

protection switching occurs at the granularity of a single optical channel. In Figure

3.25 a ring network with four nodes is shown in which a failure affects a single

wavelength channel of the group of multiplexed optical channels of a connection

between OADMs A and C (Figure 3.25). With a scheme at the OCh level (Figure

3.25, left), only the affected wavelength channel is switched over to the backup path

(dotted line), which runs between OADM A where the affected traffic flow enters

the ring and OADM C where this traffic flow exits the ring. With a scheme at the

OMS level (Figure 3.25, right), the whole group of wavelength channels is switched

over and the backup path runs between OADMs A and B, the OADMs adjacent to

the failure.

A second distinction is based on whether the protection scheme is dedi-

cated (dedicated protection ring [DPRing]) or shared (shared protection ring

[SPRing]). In the former scheme, each working wavelength around the ring has a

dedicated protection wavelength, whereas in the latter scheme the protection cap-

acity is shared between several working paths. The shared protection scheme is

typically more complex to implement and manage but consumes fewer resources

than the dedicated approach. This distinction between shared and dedicated rings is

discussed in Chapter 2, Section 2.4.

Another distinction can be made based on the direction in which the traffic is

transmitted under normal working conditions. In a unidirectional ring, signals are

always transmitted in the same direction on the ring, whereas in a bidirectional ring,

signals are transmitted in both directions of the ring. Again, this is discussed in

Fiber TopologyFiber Cable Topology

Manhole

Figure 3.24 Fiber topology versus fiber cable topology. (J. Strand, A. Chiu, R. Tkach, ‘‘Issues forrouting in the optical layer,’’ IEEE Communications Magazine, vol. 39, no. 2, February2001, pp. 81–87.)


3.5 Recovery Mechanisms in Ring-Based Optical Networks 161

Chapter 2, Section 2.4. In addition, a further distinction can be made between ring

architectures that use two fibers along the ring and four-fiber architectures.

The recovery schemes in optical ring networks are not yet (completely) stan-

dardized (see Section 3.4.2). The thought is, however, that the APS protocol in the

optical ring must meet the 50-ms switching time, just as with SONET/SDH.

Because the recovery schemes in optical ring networks show a lot of resem-

blance to the parallel schemes in SONET/SDH ring networks (Table 3.4), they are

not explained fully in the following sections, but the emphasis is on the difference

with the corresponding SONET/SDH scheme.

Section 3.5.1 discusses OMS protection rings. Both dedicated and shared OMS

protection rings are studied. Section 3.5.2 focuses on OCh protection rings, again

discussing both SPRings and DPRings. In Section 3.5.3, the OMS-based approach

is compared with the OCh-based approach, and Section 3.5.4 compares the shared

and dedicated approaches.

D

AB

C

D

AB

CD

AB

C

Figure 3.25 Ring recovery scheme operating at the optical channel level (left) and the opticalmultiplex section level (right).



3.5.1 Multiplex Section Protection in Ring-Based Optical Networks

The first type of protection scheme in ring-based optical networks that is discussed

operates at the OMS layer and performs fiber protection switching. The protection

granularity is thus the capacity of a single fiber: the bundle of multiplexed optical

channels. Such a fiber-based protection scheme requires only simple control and

management mechanisms. The OMS level ring protection scheme can use dedicated

or shared backup capacity.

OMS Dedicated Protection Rings

With a two-fiber OMS DPRing (Figure 3.26), one fiber is dedicated for working

traffic (outer fiber in Figure 3.26) and the other counterrotating fiber is reserved for

protection traffic (inner fiber in Figure 3.26). Both directions of a bidirectional

wavelength demand are routed on different sides of the ring, using the same

wavelength. The same also applies for the protection path on the protection fiber.

There is thus absolutely no possibility to reuse wavelengths on the ring for different

Table 3.4 Parallel Scheme for SONET/SDH Ring Networks and Optical Ring Networks

SONET/SDH Optical Characteristics

MS-DPRing OMS-DPRing

OULSR1Dedicated protection, local

recovery scheme performed by

the OADMs adjacent to the

failure

MS-SPRing (SDH)

BLSR (SONET)

OMS-SPRing

OBLSR2Shared protection, local


the OADMs adjacent to the

failure

SNCP (SDH)

UPSR (SONET)

OCh-DPRing

OUPSR3Dedicated protection,

end-to-end recovery scheme

performed by the OADMs on

which the traffic enters/leaves

the ring

/ OCh-SPRing

OBPSR4Shared protection, end-to-end


the OADMs on which the

traffic enters/leaves the ring

1OMS-DPRing is sometimes called optical unidirectional line-switched ring.2OMS-SPRing is sometimes called optical bidirectional line-switched ring.3OCh-DPRing is sometimes called optical unidirectional path-switched ring.4OCh-SPRing is sometimes called optical bidirectional path-switched ring.



demands. When a failure occurs, it is detected by the two OADMs adjacent to the

failure, based on the monitoring information in the OMS OH. Both OADMs loop

back the affected multiplexed bundle of optical channels on the protection ring in

the opposite direction (Figure 3.26). An APS protocol is required to handle the

switching.

OMS Shared Protection Rings

The implementation of SPRing schemes is more complicated than that of DPRing

schemes but is more efficient in terms of backup bandwidth usage. The shared

protection equivalent of the OMS-DPRing is the OMS-SPRing. Two implementa-

tions are used: a two-fiber implementation and an architecture with four fibers

along the ring.

In the two-fiber implementation (Figure 3.27), half of the wavelengths on each

fiber are reserved as working channels (marked in white) and the other half as

A B C

F E D

A B C

F E D

Figure 3.26 The two-fiber optical multiplex section dedicated protection ring architecture in a failure-free condition (top) and after a link failure (bottom). The outer fiber ring is dedicated toworking traffic, and the inner fiber ring to protection traffic.



protection channels (marked in gray). Working connections in one fiber are pro-

tected by the protection capacity in the other fiber, in the opposite direction of the

ring. Both directions of a bidirectional demand are routed along the same side of

the ring, in different fibers. The same wavelength can, thus, be reused to accommo-

date a connection between other nodes, whose route does not overlap. For instance,

in Figure 3.27, besides the connection between OADMs A and D, a connection

between OADMs E and F can be accommodated on the two-fiber OMS-SPRing

using the same wavelength as connection A-D.

When a failure is detected at the OMS level (link or OADM failure), the

OADMs adjacent to the failure will loop back all the affected lightpaths at once

on the protection channels of the ring. This is illustrated in Figure 3.27 for the

failure of link A-B. An APS protocol is needed to coordinate the switching actions

and ensure correct use of the shared protection capacity.

There is no dedicated protection connection per working connection. The spare

capacity in the network can be used by different working connections. In Figure

A B C

E DF

A B C

DF E

Figure 3.27 The two-fiber optical multiplex section shared protection ring architecture in a failure-free condition (top) and after a link failure (bottom).



3.27, for instance, the same spare capacity is used to provide recovery for connec-

tions A-D and E-F. The spare capacity is, thus, shared between several working

connections. Note that this implies that it is impossible to recover from multiple

simultaneous failures affecting more than one connection. In Figure 3.27, for

instance, when links A-B and E-F fail simultaneously, only one of the connections

A-D and E-F can be recovered. If the same wavelength is used for both working

directions of a bidirectional connection, wavelength conversion is required when a

protection switch takes place. The need for wavelength conversion in the OADMs

can be avoided by assigning different wavelengths to both directions of a working

connection.

In the four-fiber implementation (Figure 3.28), working and protection chan-

nels are carried over a different fiber in the ring. In this situation, both directions of

a bidirectional demand can, thus, always get assigned the same wavelength without

the need for wavelength converters in the OADMs (in contrast to the two-fiber

implementation depicted in Figure 3.27). Of course, the four-fiber OMS-SPRing

has twice the amount of capacity of the two-fiber implementation. However, the

four-fiber implementation can recover from more failure situations than its two-

fiber version. If only the multiplex section in the working fiber of the ring is affected,

the parallel protection fiber can be used after a simple span switch and no loop back

occurs (Figure 3.28, middle). In this way, certain multiple failures can be fully

protected. For instance, in Figure 3.28 in the event of the simultaneous failure of

a single working fiber on links A-B and E-F, both connections A-D and E-F can be

recovered. This was not possible with the two-fiber implementation of the MS-

SPRing. If both the working and the protection fiber are affected or in the case of a

node failure, a ring switch is performed (Figure 3.28).

The OMS-SPRings architecture needs only a limited number of protection

switches, because the OMS bundle of optical channels is switched as a whole.

However, because of this collective switching action, it cannot cope effectively

with a failure that affects only a single wavelength channel (e.g., a failure of an

optical transmitter in an opaque OXC or of a single mirror in a MEMS-based OXC

design).

3.5.2 Optical Channel Protection in Ring-Based Optical Networks

The second type of protection scheme in ring-based optical networks is deployed at

the OCh layer and performs optical channel protection switching. The protection

granularity is thus the capacity of a single wavelength. Again a distinction can be

made between DPRings and SPRings.

OCh Dedicated Protection Rings

OCh DPRing is a dedicated protection scheme that requires two fibers in the ring.

Each wavelength demand is routed on a working path along one side of the ring and

a dedicated backup path along the other reverse side of the ring. Bidirectional

wavelength demands are supported by two wavelengths, one in each direction.



A B C

DEF

A B C

DEF

A B C

DEF

Figure 3.28 The four-fiber optical multiplex section shared protection ring architecture in a failure-free condition (top), after recovering from a single fiber fault using a span switch(middle), and after recovering from an optical add/drop multiplexer fault using a ringswitch (bottom).



Both working wavelengths of the bidirectional wavelength demand can be routed

along the same side of the ring, in different fibers and using the same wavelength

(Figure 3.29).

An alternative could be that both working wavelengths are routed on different

sides of the ring so that one fiber of the two-fiber rings transports only working

traffic while the other fiber transports only protection traffic. The wavelengths can,

thus, not be shared by wavelength demands between other node pairs. The protec-

tion switching occurs at the OCh layer. When a link or node failure occurs in the

ring, the affected traffic is switched to the protection path.

Two alternative protection schemes can be implemented: 1þ1 or 1:1 dedicated

protection. In the former case, when an optical splitter is used at the sending side,

single-ended switching takes place at the receiving side, based on the monitoring

information of the optical channel. No complicated signaling protocol is required,

making the single-ended 1þ1 protection scheme simple and robust. In the latter

case, the traffic at the sending side is not permanently bridged. Dual-ended switch-

A B C

E DF

A B C

E DF

Figure 3.29 Optical channel dedicated protection ring in a failure-free condition (top) and after afailure (bottom).



ing is required, and a switching protocol is needed to coordinate the switching

action at both ends. The advantage of deploying 1:1 instead of 1þ1 protection is

that the spare capacity can be used to accommodate low-priority traffic (extra

traffic) in failure-free conditions, which can be preempted in the event of a failure

to provide the spare capacity for the high-priority failing connection. This 1:1

scheme is, however, more complex because the recovery actions of the various

OADMs on the ring must be coordinated.

OCh Shared Protection Rings

The OCh-SPRing scheme (Figure 3.30) is the only ring protection scheme in the

optical network layer that has no equivalent in the SONET/SDH layer. It is

implemented as a two-fiber ring. On each fiber, half of the wavelengths are reserved

for working traffic, and the other half for protection traffic. Working channels in

one fiber are protected by protection channels in the other fiber. The protection

A B C

E DF

A B C

E DF

Figure 3.30 Optical channel shared protection ring in a failure-free condition (top) and after a failure(bottom).



channels travel around the ring in the opposite direction as the working channels.

The two directions of a bidirectional wavelength demand are routed on the same

side of the ring, in different fibers. The same wavelength can, thus, be reused for

another nonoverlapping demand between a different node pair. For instance, in

Figure 3.30, the nonoverlapping connections A-D and E-F can use the same

wavelength. When a failure occurs, the affected optical channels are switched at

the terminating OADMs to the other side of the ring and use then the protection

channels in the fiber (Figure 3.30). Bidirectional traffic demands will, thus, need to

be routed using different wavelengths for both directions, otherwise wavelength

conversion is required when traffic on a working channel is switched to a protection

channel in the opposite direction.

In this shared approach there is no dedicated protection wavelength for each

working path, but a pool of shared recovery resources is available for affected

working connections. The backup paths are formed only after the failure has

occurred. When the failure affects all links between two adjacent OADMs in the

ring, no loop-back switching action is performed, but a direct backup path between

source OADM and destination OADM is established. If a failure occurs, coordi-

nation is, thus, needed between the switching actions at the source and destination

OADM of the wavelength demand. Moreover, the switching must be performed for

each affected wavelength. Therefore, the OADMs must be managed by a quite

sophisticated protocol to coordinate the switching and to ensure that the protection

channels are correctly assigned under different fault conditions. Despite its com-

plexity, the OCh-SPRing offers a lot of advantages, with its most important feature

being capacity efficiency, because of the sharing of the spare capacity by several

working connections.

Mix of OCh-SPRing and OCh-DPRing

The protection schemes at the OCh level also allow to mix and match the dedicated

and shared ring protection approaches to have the appropriate protection level on a

per-wavelength basis. Some wavelength channels can be protected using the 1þ1 or

1:1 dedicated OCh-DPRing approach, and other wavelengths can employ the

shared OCh-SPRing approach that allows the reuse of wavelengths and to accom-

modate extra traffic on the ring. Some other wavelengths even could not be

protected at all. The protection is thus selected per wavelength channel.

3.5.3 OMS- versus OCh-Based Approach

In an OMS-based protection ring, the switching action in the OADMs is based

on OMS-level failure indications. The whole multiplexed bundle of optical

channels within the OMS is switched as a group. Also the APS signaling is

supported at the OMS level. In OCh-based protection rings, on the other hand,

whether the protection is dedicated (OCh-DPRing) or shared (OCh-SPRing), not

all wavelengths belonging to the OMS need to be switched at once. This means that

failures that affect only a single wavelength of the multiplexed bundle (typically a



failing transmitter or receiver) can be handled more efficiently. They can also

support different protection schemes in the various wavelengths of the multiplexed

bundle, allowing to better accommodate the needs of the clients.

In the OCh rings, no loop-back switching action is performed, while this is the

case in the OMS-SPRing with two fibers. An OCh-based ring protection scheme

ensures that the optical signal will never be transported over a distance longer than

the circumference of the ring. In the case of an OMS ring, the protection path can

in the worst case span almost twice the entire ring circumference. This also influ-

ences the potential size of the ring network. Optical signals suffer from signal

degradation and signal distortion, and to ensure a correct interpretation of the

signal at the endpoints, they need to be amplified or even regenerated at regular

intervals. If we assume transparent OADMs (without regeneration of the optical

signal), the total length of the OCh ring may, thus, be longer than that of the two-

fiber OMS-SPRing because no extra ring length is added for the loop-back switch.

3.5.4 Shared versus Dedicated Approach

Because of the shared nature of the SPRings, they make more efficient use of the

ring capacity than DPRings. With SPRings, there is a pool of spare resources in the

ring that can be shared by the working connections. In contrast, in DPRings, each

working wavelength around the ring has a dedicated protection wavelength.

Table 3.5 gives an overview of the capacity requirements of an OCh-DPRing

and an OCh-SPRing with n nodes for three traffic patterns. For the star traffic

Table 3.5 The Required Number of Wavelengths in a Two-Fiber OCh Ring with n Nodes

Cyclic Traffic Pattern Star Traffic Pattern Full-Mesh Traffic Pattern

OCh-DPRing OCh-SPRing OCh-DPRing OCh-SPRing OCh-DPRing OCh-SPRing

n 2 n�1 n�1 if n odd n(n�1)/2 (nþ1)(n�1)/4 if n odd

n if n even n(nþ2)/4 if n even

Source: From T. Shiragaki, S. Nakamura, M. Shinta, N. Henmi, S. Hasegawa, ‘‘Protection architecture

and applications of OCh shared protection rings,’’ Optical Network Magazine, Vol. 2, No. 4, July/August

2001, pp. 48–58.



demand that is typical for a metropolitan area network, both ring protection

schemes require a comparable number of wavelengths. For the full-mesh traffic

pattern, usually found in the backbone network, the OCh-SPRing performs much

better in terms of capacity than the OCh-DPRing: The required amount of wave-

lengths becomes almost half than that for an OCh-DPRing for larger n.

The shared protection approach has the great advantage that it consumes less

capacity around the ring, because of the pool of spare resources available for the

backup paths. However, SPRings are more complex than DPRings. Dual-ended

switching is needed. The assignment of protection channels to the affected working

channels is done in real time. A sophisticated protection switching protocol is, thus,

required to coordinate and supervise the recovery actions of the OADMs and to

ensure that the protection channels are correctly assigned under different fault

situations. With SPRings there is also the potential problem of misconnection in

the case of an OADM failure. This can be explained using Figure 3.31. In the

failure-free condition (Figure 3.31, top), two connections, A-C and C-D, are routed

A B C

E DF

A B C

E DF

Figure 3.31 Misconnection in a two-fiber shared protection ring after failure of optical add/dropmultiplexer C.



using the same wavelength on the fiber and they both have OADM C as an end-

point. When this OADM fails (Figure 3.31, bottom), the loop-back procedure will

try to connect both connections to each other, thereby establishing an unwanted

connection A-D between the two endpoints of both connections that are not the

failed OADM. To prevent this misconnection, a squelching mechanism is needed

(see Chapter 2, Section 2.4, for more details).

3.5.5 Interconnection of Rings

The size (fiber length and number of nodes) of a ring is limited by a number of

physical constraints as transmission impairments (attenuation, loss, etc.), by the

time it takes for the protection switch to be executed, and by the availability

constraint, similar to what is described in Chapter 2 for SONET/SDH networks.

A large-scale network thus typically is not covered by a single ring but will consist

of several interconnected rings. The interconnection options for optical rings are the

same as the ones explained for SONET/SDH rings in Chapter 2, Section 2.4.4. The

simplest solution is to have two OADMs at the interconnection points, installed

back to back: The optical multiplexed signals or the optical channels that have to

change between rings are dropped at the first OADM and added at the second

OADM. A second solution is to add flexibility in the interconnection point by

installing an OXC between the two OADMs. The option with the most flexibility is,

however, to place only an OXC in the interconnection point of the ring, allowing

traffic to be added/dropped, to pass through the ring, or to change rings. Again, as

in the case of SONET/SDH, a single point of failure for ring interconnection can be

avoided by deploying a drop-and-continue (D&C) scheme. With the D&C scheme,

two rings are always interconnected by two nodes. Instead of simply dropping the

signal that has to be handed over from one ring to another at the first interconnec-

tion point between both rings, the signal also continues on the first ring and is

handed over again at the second interconnection point between both rings. In this

way, the network can always recover from single failures. For more details on D&C

and the difference between D&C in the different ring types, see Chapter 2, Section

2.4.4.

A difference with the SONET/SDH-based ring networks is that optical ring

networks are an analogue transmission medium. The signal gets distorted by

attenuation, noise, and nonlinear effects such as dispersion and self-phase modula-

tion. The signal should thus be reamplified and even regenerated at regular inter-

vals. Transponders can act as such 3R regenerators. If transparent OADMs are

used, the ring interconnection points are a good choice to place these regenerators

so each optical ring is an island of transparency.

3.6 RecoveryMechanisms inMesh-Based Optical Networks

Recovery schemes in mesh-based optical networks are under study by standardiza-

tion organizations including the ITU-T, the OIF, and T1X1 (see Section 3.4.2).


3.6 Recovery Mechanisms in Mesh-Based Optical Networks 173

These schemes will without a doubt show a major resemblance to the corresponding

schemes in the SONET/SDH layer.

A first distinction that can be made concerning recovery schemes in a mesh-

based optical network is between protection and restoration schemes. For protec-

tion, the recovery paths are preplanned and fully signaled before a failure occurs.

Hence, when a failure occurs, no additional signaling is needed to establish the

protection path. For restoration, the recovery paths can be either preplanned or

dynamically allocated, but when a failure occurs additional signaling will be needed

to establish the restoration path. Protection is further discussed in Sections 3.6.1

and 3.6.2, and restoration in Section 3.6.3. The comparison of both is discussed in

Section 3.6.4.

Another distinction is based on the extent of the recovery schemes. A recovery

scheme can be implemented at the OMS layer or at the ODU layer. In the latter case,

each working optical channel is protected individually between its source node and

its destination node. This is called a path-based recovery scheme, because a recovery

path between the source and destination nodes of the working lightpath is applied. In

the former case, the complete bundle of multiplexed optical channels is protected

between the endpoints of the OMS (OXCs or OADMs). Protection at the ODU level

has the advantage that it can survive node failures. This is not the case for protection

at the OMS level, because the OMS does not transit the nodes. A recovery scheme at

the OMS layer is called a link-based recovery scheme. Only a local recovery path

between the endpoints of the failed link is used to work around it. A link recovery

mechanism replaces only the affected part of the working path, leaving the remaining

part of it unaltered. Both approaches are illustrated in Figure 3.32. Of course, the

WorkingPath

WorkingPath

Link RecoveryPath

WorkingPath

Path RecoveryPath

Figure 3.32 Recovery extent: link versus path recovery.



granularity of the recovery switching action differs for a link-based and a path-based

recovery scheme. With link-based recovery all the lightpaths that travel along a failed

link are simultaneously rerouted (the multiplexed bundle of wavelength channels is

switched as a whole). Path-based recovery, on the other hand, needs to switch each

affected lightpath individually on its alternative path between the endpoints of the

lightpath.

A local link recovery strategy has a number of disadvantages. The resulting

backup path is often not the shortest alternative path. This is the case in Figure

3.32, where the complete backup path with link recovery crosses five links, whereas

the backup path resulting from path recovery crosses only three links, and is in fact

only as long in terms of hops as the working path. Link recovery may even lead to

back-hauling, because the working capacity is looped back to recover from a failure

(Figure 3.33). Back-hauling increases the length of the backup path. In optical

networks this influences the placement of amplifiers and transponders, which are

needed to guarantee a good signal quality.With apath recovery scheme, back-hauling

is avoided. These remarks already give an indication that path-based recovery

schemes perform better (higher capacity efficiency, better signal quality, etc.) than

link-based schemes.Thiswill be confirmedby the case studydiscussed inSection3.6.4.

3.6.1 Protection

A simple way to recover from failures in a mesh-based network is to use a protec-

tion scheme. Different protection options are available: 1þ1, 1:1, 1:N, or M:N (see

Chapter 1 and Chapter 2, Section 2.3.4). With 1þ1 protection, the traffic signal is

duplicated for protection purposes and transmitted over both a working path and a

backup path. Using 1þ1 link protection will for each link of the working path

reserve a backup path around that link. With 1þ1 path protection, a working and a

backup path will be reserved between the source node and the destination node of

the traffic demand. When both paths are link disjoint,28 both single link and single

WorkingPath

Link RecoveryPath

WorkingPath

Path RecoveryPath

Figure 3.33 Back-hauling because of loop back of traffic with link recovery (left) is avoided with pathrecovery (right).

28Link disjoint means that the working path has no links in common with the backup path. Note that link

disjointness implies node disjointness.



node failures can be recovered. When both paths are only node disjoint,29 only

recovery in the case of single link failures is guaranteed. The receiving end selects a

nonfailing signal from both received signals. Switching occurs solely at the receiving

end. This is a single-ended switching mechanism because one switching action is

sufficient to recover the affected signal. The advantage of 1þ1 protection is that it is

fast and easy, but a drawback is that the backup resources are permanently

occupied. Path protection will typically use less capacity than link protection,

because with link protection the backup paths are in general longer than with

path protection.

With 1:1 protection, on the other hand, the backup resources are used only to

ensure recovery when a failure has occurred. The advantage is that in failure-free

condition, the spare resources can be used to accommodate so-called extra traffic.

This is additional traffic with lower priority than normal working traffic that is

preempted and dropped when the spare resources are needed to perform the

recovery action. The disadvantage of 1:1 protection is that selection and switching

now has to be done at both the sending and the receiving end (dual-ended switch-

ing). An APS protocol is, thus, needed to coordinate the recovery action. The 1:1

protection scheme is thus somewhat more complex than 1þ1 protection. Besides

this, if extra traffic is accommodated, the preemption of this extra traffic may slow

down the recovery process, because it also consumes time.

The 1:N and M:N protection schemes are shared variants of the 1:1 protection

scheme. With 1:N protection, one spare path is shared between N working paths.

With M:N (M<N), M spare paths are shared among N working paths, making this

a quite complex scheme to implement.

In optical networks that cover a large geographic area, multiple simultaneous

failures are not that uncommon for very long connections. One way to decrease the

probability that, for example, a double failure affects both the working path and the

recovery path of a long (e.g., intercoastal) connection is to break up such a long

connection into shorter connections with independent protection resources. Atten-

tion should then also be paid to avoid a single point of failure in such a design.

3.6.2 Protection in a WP Network versus Protection in a VWP Network

In Section 3.1.4, different types of OXCs were discussed. A first type of OXC, called

wavelength routing OXC (WR-OXC), is not able to perform wavelength conver-

sion. A network with this type of OXCs installed is a WP network, where the

lightpath between source and destination OXC has to be conveyed using the same

wavelength channel on all links along the path. A further distinction can be made

between a WR-OXC with and without wavelength tunability at the transmitter and

receiver. With a WR-OXC without tunability, the working path and the recovery

path have to use the same wavelength. A WR-OXC with tunability does not have

29Node disjoint means that the working path has no nodes in common with the backup path. This

conditions is less restrictive than link disjoint.



this restriction, enabling the use of a different wavelength for working and recovery

path. A second type of OXC the wavelength translating OXC (WT-OXC) can

perform wavelength conversion and leads to a VWP network, where the wavelength

continuity constraint along a lightpath no longer has to be met.

Figure 3.34 illustrates the typical influence on the total network cost of using

WR-OXCs with or without tunability or using WT-OXCs, using the results

obtained for two sample networks with a static traffic demand.

From Figure 3.34, it is clear that the cost of WP and VWP path protection is

not really different. The WP network requires somewhat more wavelengths than the

VWP network to resolve wavelength conflicts: 5% for the 32-node network, 15% for

the 16-node network, because in the latter case fewer fibers per link are needed,

making the wavelength assignment problem more difficult. With wavelength tun-

ability, the difference in required wavelength channels between WP and VWP is less

than 5% for both network sizes.

3.6.3 Restoration

Until recently, 1þ1 or 1:1 dedicated protection was the only realistic choice for a

network operator to make a meshed OTN resilient against failures. With 1þ1

dedicated protection, everything is calculated before the failure occurs: the route

of the protection path and the wavelength assignment in the case of transparent

optical networking. In addition, the cross-connects on the backup route are

0

50000

100000

150000

200000

250000

300000

350000

0

10000

20000

30000

4000

50000

60000

WPProtection

WPProtection

+ Tunability

VWPProtection

Cos

t

Cos

t

WPProtection

WPProtection

+ Tunability

VWPProtection

Figure 3.34 Comparison of the cost for a 16-node network (left) and a 32-node network (right)between path protection, with wavelength routing optical cross-connects (OXCs) with(wavelength path [WP] protection þ tunability) and without tunability (WP protection)and with wavelength translating OXCs (virtual WP protection). (P. Arijs, B. Van Caene-gem, P. Demeester, P. Lagasse, W. Van Parys, P. Achten, ‘‘Design of ring and meshbased WDM transport networks,’’ Optical Networks Magazine, Vol. 1, No. 2, July 2000,pp. 25–40.)



switched beforehand. The routing and wavelength assignment problem can, thus,

be solved off-line. With the implementation of an IP-based optical control plane

(see Chapter 6, Section 6.1), however, restoration can become a real option for

providing resilience in the optical backbone network. Just as in SONET/SDH,

restoration schemes in an OTN will undoubtedly be superior in terms of capacity

efficiency compared to protection schemes, but the implementation of a mesh

restoration scheme is quite complex and requires sophisticated algorithms. Restor-

ation is usually also slower than protection. With restoration, capacity in excess of

the working capacity needed to support the normal working traffic is provided in

the network. This spare capacity, which is shared among the various working

connections, will be used to recover from failures. One must, however, keep in

mind that restoration schemes in meshed optical networks are not (yet) standard-

ized. If used today, they are based on proprietary schemes.

Several options for restoration in a meshed optical network can be envisaged.

The choice can be made between a link-based scheme (link restoration, working at

the OMS layer) and a path-based scheme (path restoration, working at the ODU

layer). This choice has a rather large influence on the recovery implementation and

requirements. In the case study explained in Section 3.6.4, we will see that link

restoration requires typically more capacity than path restoration, because of the

often suboptimal routes found with link restoration (e.g., because of back-hauling)

and because path restoration has a larger view on the network. Path restoration will

typically distribute the backup routes of the affected connections over a larger part

of the network than link restoration, allowing more opportunities to optimize the

spare capacity needed in the network. The recovery extent also has an influence on

the complexity of the recovery scheme. With path restoration, a restoration path

has to be found for each affected working lightpath, whereas with link restoration

the affected lightpaths are switched per multiplexed bundle and only a single

restoration path per OMS has to be found. Thus, with link restoration the route

computation process is easier and the number of required switching actions is

limited compared to path restoration (often resulting in a lower recovery time),

but the capacity efficiency is lower.

An example of a shared restoration scheme [Lab02] in a meshed optical

network is illustrated in Figure 3.35. In this figure, two working paths between

different OXCs are depicted (solid line). They each have a protection path that is

node disjoint with the working path they protect. However, both backup paths are

routed on a common link, on which they can share an optical channel reducing the

capacity needed in the network to protect against single link or node failures.

This is indeed a restoration scheme because the OXCs at the end of the link that

is shared by both backup paths need to be configured according to the failure that

has happened. For instance, on the right-hand side of Figure 3.35, when working

path 2 gets interrupted, both OXCs must be configured for backup path 2.

Another classification basis is the route computation moment. The restoration

route can be calculated before (preplanned restoration) or after the failure occurs

(dynamic restoration). The same applies to the wavelength assignment. However,

the actual switching action in the OXCs can be performed only after the failure has



occurred, as only then the shared extra capacity is available to recover the affected

connection(s). With preplanned restoration, the network will usually recover faster

from a failure (no time needed for route calculation). All nodes will contain cross-

connection maps, which indicate the cross-connection that is required, ensuring fast

local actions in the OXCs. The preplanned recovery process is simpler and the

restoration routes used during the recovery process are the ones that were envisaged

to be used. With dynamic restoration, the restoration route resulting from the

calculation scheme may be different from the one that was envisaged during the

network design. As a consequence, the spare capacity provided in the network is not

used in the foreseen way, and some failures may not be recovered, although

sufficient spare capacity is provided in the network overall. On the other hand,

dynamic restoration will be able to react to unexpected failures, which is not the

case with preplanned restoration. The example shown in Figure 3.35 is a path

restoration scheme with precalculated backup paths. The wavelengths may or

may not be preassigned.

Table 3.6 gives an overview of the types of restoration schemes and compares

them with protection. A similar classification can be found in [Eli03], where even

more restoration scheme variations are distinguished.

The route computation in dynamic restoration can be done centrally or distrib-

uted. In the former case the central route computation entity has to have a full

overview of the network topology and state (e.g., link utilization). In the latter case

Working Path 2

Working Path 1

Back-upPath 1

Back-upPath 2

Back-Up Paths 1 and 2 Sharea Channel on this Link

Working Path 1

Back-uppath 1

Back-upPath 2

Shared Channel is Usedby Back-Up Path 2

Working Path 2

Figure 3.35 A sample path restoration scheme.

Table 3.6 Comparison of Characteristics of Protection and Different Types of Restoration

Backup Route

Calculation

Wavelength Assignment

on Backup Route

Cross-Connection

on Backup Route

Restoration Preplanned Preplanned After failure

Preplanned Dynamic After failure

Dynamic Dynamic After failure

Protection Preplanned Preplanned Before failure



the network nodes have typically only local information at their disposal. The exact

performance of the restoration scheme in terms of capacity depends on this choice.

In [Ell03], implementing the path restoration scheme of Figure 3.35 in a distributed

manner resulted in a 10% to 15% capacity increase compared to the centralized

case.

Also a mesh-restorable network can accommodate extra traffic in the unused

spare capacity it has. This implies that a preemption protocol is needed, which may

slow the recovery process. An overview of restoration algorithms can, for instance,

be found in [Gro04].


As with a SONET/SDH network, applying a protection scheme in a meshed

network requires much more installed capacity (wavelengths) in the network than

applying a restoration scheme. In Figure 3.36 this comparison has been quantified

for a European size network with 28 nodes connected by 41 links in a biconnected30

mesh topology. The total fiber length of the links is 25,640 km, and the average

node degree (number of adjacent links that connect a node to adjacent nodes) of the

network is 2.93.

Figure 3.36 clearly illustrates that 1þ1 dedicated protection requires the most

wavelengths in the network. In fact, typically more than 50% of the capacity

installed in the network with a dedicated protection scheme is spare wavelength

capacity. This is because in almost all cases the backup path is longer than the

working path, and thus uses more wavelengths. In this example, 60% of the required

wavelength capacity is used for the protection paths and 40% for the working paths.

There is a very small difference between the protection path and backup path that

are link disjoint or node disjoint. In the latter case somewhat more capacity is

needed because the network is protected against both single link and single node

failures, whereas in the former case only recovery from single link failures can be

guaranteed. Restoration lowers the required amount of spare capacity in the

network. With path restoration, in which a backup path is calculated between the

endpoints of the affected working path, less spare capacity is needed in the network

than with link restoration, in which a local backup path around the failed link from

the working path is established. Using path restoration the ratio between spare

capacity and total capacity is around 40%. Restoration is thus more capacity

efficient than protection, but the time needed to complete the recovery process is

typically much longer with restoration than with protection. The recovery time with

restoration lies between hundreds of milliseconds and tens of minutes. With protec-

tion, this is limited to tens of milliseconds.

As explained in Chapter 1, a recovery scheme is chosen to recover from the

so-called expected or accounted failures. Not all network failures are common

30Biconnected means that between each node pair two disjoint paths can be found in the network.



enough to justify the use of (often capacity hungry) recovery schemes to recover

from them (e.g., triple or quadruple network failures). In most cases the

applied recovery scheme is dimensioned to recover from single link and/or node

failures. A restoration scheme, however, shows more flexibility than a protection

scheme in dealing with unexpected failures. With 1þ1 dedicated protection , the

traffic cannot reach its destination when a double failure affects both the

working path and the protection path. With restoration, there is often more than

one option for the backup path, making it possible to recover from unexpected

double failures.

Oslo

Stockholm

Copenhagen

Amsterdam

Dublin

London

Brussels

Paris

Madrid

Zurich

Milan

Berlin

Athens

BudapestVienna

Prague

Warsaw

Munich

Rome

Hamburg

Barcelona

Bordeaux

Lyon

Frankfurt

Glasgow

Belgrade

Straatsburg

Zagreb

0

5000

10000

15000

20000

25000

NoProtection

1+1 PathProtection,

Link Disjoint

1+1 PathProtection,

NodeDisjoint

PathRestoration

LinkRestoration

# of

Req

uire

d W

avel

engt

hs

Back-Up PathWorking Path

Figure 3.36 Wavelength usage for different recovery schemes in a mesh optical network. (Top:Adapted from S. De Maesschalck, et al. ‘‘Pan-European optical transport networks: anavailability based comparison,’’ Photonic Network Communication, vol. 5, no. 3, May2003, pp. 203–225.)



3.6.5 Protection Combined with Restoration

Instead of making a single choice between applying a restoration scheme or a

protection scheme in an OTN, both schemes can be used simultaneously. For

example, one can distinguish between different classes of traffic with different

transport requirements. One type of traffic could be high-priority traffic, which

should be recovered very quickly. Another traffic type could be normal priority

traffic, for which the recovery times are not that stringent. The latter traffic type

could be recovered using, for example, a restoration scheme, and the former traffic

type could be recovered using 1þ1 protection. Another more exotic way of com-

bining a restoration and a protection scheme is to resolve failures that cannot be

recovered using protection by a restoration scheme. For instance, with a double

failure that affects both the working path and the backup path of a 1þ1 protected

connection, restoration could be used for recovery.

3.7 Ring-Based versusMesh-Based Recovery Schemes

Sections 3.5 and 3.6 have explained in some detail the protection and restoration

schemes that can be applied in ring-based and mesh-based optical networks. In this

section, we compare these options.

Figure 3.37 illustrates the difference in cost between protection and restoration

in a mesh-based topology and the OCh-DPRings strategy in a topology that

consists of interconnected rings, for a network with 32 nodes (the same as the one

used to obtain the results of Figure 3.34). Both the link cost and node cost are

considered.

Figure 3.37 shows that the OCh-DPRings strategy using D&C results in the

highest link cost. The difference between the interconnected ring design with and

without D&C is around 5%. This is because with D&C, some connections have to

take a longer route than without D&C to make sure that all rings are interconnected

by two nodes. The link cost with 1þ1 dedicated mesh protection is about 19% less

expensive than with interconnected OCh-DPRings.

From Figure 3.37, it is clear that the node cost is significantly lower for the ring-

based schemes than for the mesh-based schemes because of the relatively low price

of OADMs compared to the expensive OXCs used in meshed networks. When

D&C is used, the node cost is 10% to 15% higher, because the traffic between two

rings is exchanged at two nodes to improve the availability. The 1þ1 dedicated

protection option in a meshed network is the most expensive in terms of node cost.

From Figure 3.37, we can draw a number of general conclusions. Table 3.7

summarizes the pros and cons of all recovery schemes and gives a qualitative

comparison. The conclusions reached are similar to those obtained in Chapter 2

on SONET/SDH networks.

The link cost is higher for dedicated schemes than for shared schemes, because

with the former more capacity is needed because the spare capacity cannot be

shared by several working connections. We have seen that link restoration needs



0

50000

100000

150000

200000

250000

300000

350000

MeshProtection

LinkRestoration

PathRestoration

OCh-DPRing

OCh-DPRing+ D&C

Co

st

Node CostLink Cost

Figure 3.37 Comparison of the cost for a 32-node network between ring-based and mesh-basedrecovery strategies. (P. Arijs, B. Van Caenegem, P. Demeester, P. Lagasse, W. Van Parys,P. Achten, ‘‘Design of ring and mesh based WDM transport networks,’’ Optical Net-works Magazine, Vol. 1, No. 2, July 2000, pp. 25–40.)

Table 3.7 Qualitative Comparison between the Various Ring- and Mesh-Based Recovery Schemes

Link

Cost

Node

Cost

Manage-

ment Cost

Flexibil-

ity

Availabil-

ity

Recovery

Time

Dedicated Protection Rings Higher Lowest Low Mid/low High Fast

Shared Protection Rings Low Lowest Mid Lower High Fast

Mesh Path Protection High High Low Mid Mid Fast

Mesh Link Protection Highest High/mid Low Mid Mid Fast

Mesh Path Restoration Lowest Mid/low Higher High Mid/high Slowest

Mesh Link Restoration Low Mid Higher High Mid/high Slowest

Source: From P. Arijs, B. Van Caenegem, P. Demeester, P. Lagasse, W. Van Parys, P. Achten, ‘‘Design

of ring and mesh based WDM transport networks,’’ Optical Networks Magazine, Vol. 1, No. 2, July 2000,

pp. 25–40.


3.7 Ring-Based versus Mesh-Based Recovery Schemes 183

typically more capacity than path restoration. With path restoration the spare

capacity is more balanced over the network because this is a global approach. On

the other hand, link restoration is a local scheme. The capacity needed with

DPRings lies typically in between: With DPRings, a protection path is established

within each ring that the traffic traverses (between a local and a global approach).

Restoration schemes need typically much less capacity than protection schemes.

Again restoration in a mesh network is more efficient than shared protection

because of the global extent.

In ring networks the nodes are OADMs, whereas in mesh network OXCs are

needed. OADMs are typically less expensive than OXCs. The hardware implemen-

tation of the APS protocol is simpler than that of a complex restoration scheme.

Therefore, the node cost will be lower with ring-based schemes. However, in

interconnected ring networks, complex and expensive OXCs may be used, increas-

ing the node cost of ring networks, even more when D&C schemes are implemented.

In mesh networks deploying a restoration scheme, less spare traffic has to be

switched in the OXCs, lowering the node cost compared to networks using a

capacity-hungry protection scheme.

The management cost of the different recovery schemes can be more or less

estimated by the amount of signaling needed. The 1þ1 unidirectional protection

switching scheme requires no signaling protocol, which leads to a low management

cost. All other protection schemes need an APS protocol implementation, increas-

ing the management cost. Restoration schemes require quite complicated and thus

expensive (distributed or centralized) signaling protocols. With protection, the

backup path is fixed, whereas with restoration several backup paths may be pos-

sible. The chosen back-up path then has to be set up by configuring all the OXCs

along the restoration path. In a shared protection scheme, switching has to be

performed only in the two OADMs involved in the protection scheme.

Flexibility encompasses the ability of the recovery scheme to cope with unex-

pected or unaccounted failures, or unpredicted traffic patterns. One of the advan-

tages of restoration schemes is that they can more easily accommodate churn by

allocating unused network capacity to the restoration process. This is impossible

with a protection scheme in a ring network because 50% of the capacity is always

allocated for recovery purposes. In a ring network capacity has to be added along

the entire ring, or a completely new ring has to be added. In a mesh network the

capacity extension can take place gradually. The flexibility of restoration schemes is

also discussed in Section 3.8.2.

Network availability was introduced in Chapter 1. It reflects the portion of time

the network is operational. The availability of the different recovery schemes is

discussed in more detail in Section 3.8.2. However, a few general statements can be

made here. With ring networks, multiple failures occurring in different rings of the

network can be recovered simultaneously. Ring interconnection points are pro-

tected with D&C. Recovery schemes in mesh networks typically offer recovery only

for the expected or accounted failures (typically one, at most two simultaneous

failures). With mesh path protection schemes, if the working and dedicated protec-

tion paths are affected simultaneously, there is no recovery possible. Mesh restor-



ation schemes offer a higher flexibility, but this depends heavily on the available

spare resources in the network (see also Section 3.8.2). Ring networks, thus,

typically offer a higher availability than mesh networks.

The protection switching in ring- and path-based protection schemes should

take place in less than 50 ms, just as for SONET/SDH. Restoration schemes will

typically require more time (hundreds of milliseconds to tens of minutes).

3.8 Availability

Availability is an important performance assessment factor of recovery schemes.

We start this section with a general overview of terms and definitions used when

performing availability calculations. Next, the availability of an unprotected and

protected connection is discussed, which allows calculating the expected loss of

traffic (ELT). Also the availability when a restoration scheme is deployed is dis-

cussed. Finally, some factors influencing the availability performance, such as the

average node degree of the network topology or the characteristics of the trans-

ported traffic type, are studied.

3.8.1 Availability Calculations

The term availability was introduced in Chapter 1. As explained there, the avail-

ability, A, of an item can be expressed using its mean time to repair (MTTR) (the

time needed for the restoration of the item) and the mean time between failures

(MTBF) (the time between consecutive failures of the item):

A ¼ 1�MTTR

MTBF(3:1)

Of course the unavailability, U, of an item can then be expressed as

U ¼ 1�A (3:2)

For a complex system such as a telecommunications network, availability is quite

difficult to define and evaluate. In literature several definitions of network avail-

ability have been presented. In this chapter, we use a straightforward method. Line

and node failures are assumed to be statistically independent.

Optical Node Failures

Network elements such as OXCs and OADMs are composed of a (large) set of

different pieces of equipment, each with its own MTBF and MTTR. For more

details on the calculation of the overall MTBF of optical node equipment, we refer

to [G911]. The MTBF of node equipment is usually expressed in hours or using the

metric failures in time (FITs) (the number of failures in 109 hours, or roughly

114,155 years). The MTTR is expressed as an amount of time units (hours).


3.8 Availability 185

Line Failures

Line failures can, for instance, be caused by a failure in the fiberoptic cable or the

failure of an OA or WDM line system. An assumption often made is that fiber

failures within a single fiberoptic cable are completely dependent, because most

failures are caused by dig ups, affecting all fibers within the cable. For physical

cables the MTBF can be specified using the cable cut (CC) metric. This is the

average cable length that results in a single cable cut per year (e.g., CC ¼ 450 km

means that per 450 km cable, there will be on average one cable cut each year). This

expresses the fact that the probability to have a cable cut is larger for a longer link.

The MTBF of the cable is then calculated as

MTBF(hours) ¼ (CC * 365 * 24)=Length of the cable (3:3)

In addition, the metric FITs/km (average number of failures in 109 hours/km) can

be used to denote the MTBF of a cable. The MTTR for a cable is usually expressed

as an amount of time units (hours). It includes the time needed to localize the fault,

access the cable, repair the break, and put the cable back into service (transmission

quality testing, etc.). The MTTR of an undersea cable is typically much longer than

that of a terrestrial cable, because of the extra time needed to dispatch a cable ship

and crew to do the repair, and the more complicated cable recovering, repair, and

replacement. The MTBF of an OA and a WDM line system can again be expressed

in hours or in FITs. The MTTR is expressed as an amount of time units (hours).

Table 3.8 [Wil01] gives an idea of the typical MTBF and MTTR of important

optical network equipment.

A single (bidirectional) line, connecting two optical nodes, is made up of a

series of items, namely pieces of physical cable, a number of OAs (how many

depends on the line length and the spacing distance between the OAs), and a line

system at each side of the line. A series of items is available if all individual items are

Table 3.8 MTTR and MTBF Values for Fiberoptic Cable, OAs and WDM Line Systems

Equipment MTBF (hours) MTTR (hours)

Bidirectional OA 5 * 105 24

Bidirectional WDM Line System 5 * 105 6

OXC 1 * 105 6

OADM 1 * 105 6

CC (km) MTTR (hours)

Terrestrial Fiberoptic Cable 450 24



available. If we assume that the items fail statistically independent, we can express

its availability as

A(series item1, item2, . . . , itemN) ¼ P( (item1 ¼ av) and

(item2 ¼ av) and . . . and (itemN ¼ av))

¼Y

i

P(itemi ¼ av)

¼Y

i

Ai

¼Y

i

[1�Ui]

(3:4)

where P(item1 ¼ av) stands for the probability that item1 is available.

The availability of a bidirectional line, Aline, connecting node ni and nj can,

thus, be expressed as

Aline ¼ Acable*ANOA*A2

line�system (3:5)

where

. Acable is the availability of the cable between ni and nj,

. AOA is the availability of a bidirectional OA (if one direction of the OA

fails, also the other direction goes immediately out of service),

. N is the number of bidirectional OAs needed on this line (depends on the

length of the line),

. Aline�system is the availability of a bidirectional line system (if one direction

of the line system fails, the other direction goes immediately out of service).

The availability of the bidirectional line depicted in Figure 3.38 can, thus, be

calculated as

MTBFcable ¼450 km * 365 * 24 h

260 km¼ 15161:5 h (3:6)

Aline ¼ Acable *ANOA *A2

line�system

¼ 1�MTTRcable

MTBFcable

� �* 1�MTTROA

MTBFOA

� �2

* 1�MTTRline�system

MTBFline�system

� �2

¼ 1� 24 h

15161:5 h

� �* 1� 24 h

5 * 105 h

� �2

* 1� 6 h

5 * 105 h

� �2

¼ 0:998417 * 0:999904 * 0:999976

¼ 0:998297

(3:7)



Availability of Connections and Load

Once we have the MTBF and the MTTR of the nodes and lines of the network, the

availability of the connections can be calculated. Because a connection is assumed

to be bidirectional, a connection is available only if both directions of this connec-

tion are available. Based on the availability of the individual connections, the

availability of the total traffic load can be calculated. This also allows for the

calculation of the expected loss of traffic (ELT),31 which is the total amount of

traffic that the network is expected to lose every year because of failures [Ver95],

and of the average ELT (AELT) per channel. The calculation of the availability of a

connection depends on the applied recovery technique (1þ1 protection, link or path

restoration, etc., or no protection).

Unprotected Connection

An unprotected connection is routed over a series of nodes and lines and is thus

available if all nodes and lines along the route of this connection are available.

Equation 3.4 can thus be applied. Consider the example in Figure 3.39. The

connection between node A and node B is available if node A, link A B, and

node B are available.

Equation 3.4 thus leads to

A(connection A B) ¼ P((node A ¼ av) and (link A B ¼ av) and (node B ¼ av))

¼ A(node A) *A(link A B) * A(node B)

(3:8)

If link A B corresponds to the link depicted in Figure 3.38, the availability of the

unprotected connection A B is

A(connection A B) ¼ 1�MTTROXC A

MTBFOXC A

� �* 0:998297 * 1�MTTROXC B

MTBFOXC B

� �

¼ 0:99994 * 0:998297 * 0:99994

¼ 0:998177

(3:9)

80 Km 80 Km100 Km

WD

MLi

ne S

yste

m

WD

MLine S

ystem

OA OA

Figure 3.38 Example of a bidirectional line.

31Sometimes the term total expected loss of traffic (TELT) is used instead of expected loss of traffic.



Protected Connection

A protected connection is available if the working path or the protection path of the

connection is available. Equation 3.4 can no longer be applied, because we now

have two series of items placed in parallel: the working path and the protection

path. This is illustrated in Figure 3.40. The working path of the connection from

node A to node B follows the route from node A, to link A B, to node B. The

protection path, placed in parallel with the working path, is from node A, to link

A C, to node C, to link C B, to node B.

C

A B

Figure 3.39 Availability of an unprotected connection. (Adapted from S. De Maesschalck, et al. ‘‘Pan-European optical transport networks: an availability based comparison,’’ Photonic Net-work Communication, vol. 5, no. 3, May 2003, pp. 203–225.)

Working Path

Protection Path

C

A B

Figure 3.40 Availability of a protected connection. (Adapted from S. De Maesschalck, et al.‘‘Pan-European optical transport networks: an availability based comparison,’’ PhotonicNetwork Communication, vol. 5, no. 3, May 2003, pp. 203–225.)



The availability and unavailability of items placed in parallel can be calcu-

lated as

A(parallelitem1, item2, . . . , itemN) ¼ 1�U(parallelitem1, item2, . . . , itemN)

¼ 1� P( (item1 ¼ unav) and (item2 ¼ unav) and

. . . and (itemN ¼ unav) )

¼ 1�Y

i

P(itemi ¼ unav)

¼ 1�Y

i

Ui

U(parallelitem1, item2, . . . , itemN) ¼Y

i

Ui(3:10)

As can be seen in Figure 3.40, the working and protection paths have the source and

destination nodes (nodes A and B) of the protected connection (connection between

A and B) in common. This means that the availability of such a protected connec-

tion must be expressed as

A(protected connection) ¼ A(source node) *A(parallelpaths) *A(destination node)

¼ A(sourcenode) * (1�U(working0path) *

U(protection0path)) * A(destinationnode)

(3:11)

where working0 path and protection0 path are the working and the protection path

without the source and destination node of the considered connection.

Restored Connection

Calculating the availability of connections using a restoration mechanism (link or

path restoration) to cope with failures is more complex. The recovery path is now

no longer uniquely defined per working path but depends on the failure. Calculating

the availability of a single restored connection is not as straightforward as for an

unprotected or 1þ1 protected connection. Therefore, instead of calculating the

availability of a single connection, the availability of the total traffic load is often

calculated. Let CapConi denote the capacity of connection i. The availability of the

traffic load can then be expressed as

A(Load) ¼P

i (A(connection i) *CapConi)Pi CapConi

(3:12)

This definition, of course, cannot be applied to restored connections because we do

not know the availability of a single restored connection. Therefore, the method

usually applied for availability calculations taking into account a restoration mech-

anism is as follows: The probability that a certain failure scenario occurs is deter-

mined and the percentage of the traffic that was not affected or can be restored is



calculated; and this is repeated for all possible failure scenarios. The availability of a

load under restoration is then obtained by

A(load) ¼ 1�X

x

(Prob(failure scenario x) * (1� CapRecovered i

CapTotal

)) (3:13)

where Prob (failure scenario x) is the probability that failure scenario x occurs,

CapRecovered i is the total capacity of the recovered connections for failure scenario x,

and CapTotal is the total capacity of the complete traffic matrix.

Of course, the number of possible failure scenarios grows fast with a growing

network size, so in general the number of failure scenarios taken into account for

the availability calculations needs to be limited to leverage the calculation work.

For example, in a network with five nodes and eight lines, there are 13 failure

scenarios with one failure (5 with one node fault, 8 with one line fault). There are 78

scenarios with two failures (10 with two node faults, 28 with two line failures, and

40 with one node and one line failure), 336 scenarios with triple failures, and already

715 scenarios with four failures. In practical calculations, the number of simulta-

neous failures taken into account will thus typically be limited to two, at most three.

Expected Loss of Traffic and Average Expected Loss of Traffic

Once the availability has been calculated, the ELT and the average expected loss of

traffic (AELT) can be calculated. They are used to express the availability of the

network services. The ELT of a traffic load can be calculated in the following way

[Ver95]:

The unavailability of a connection c, Uc, can be expressed as

Uc ¼ EDTc=observation time (3:14)

where EDTc (expected downtime of connection c) is the average time that the

connection c is interrupted during a certain observation period. When the observa-

tion period equals 1 year, EDTc is the total ELT for this connection c, thus

ELTc ¼ Uc * Capc * 525600 minutes=year

¼ (1�Ac) * Capc * 525600 minutes=year(3:15)

The ELT for the whole traffic load is then expressed as

ELT ¼X

c2load

ELTc (3:16)

The AELT can then be calculated as

AELT ¼ ELTPc2load

Capc

(3:17)



As each wavelength transports one STM-X/OC-Y (with X ¼ 4=Y ¼ 12,

X ¼ 16=Y ¼ 48, X ¼ 64=Y ¼ 192 . . .), the AELT of the optical layer is usually

expressed in STM-X/OC-Y hours per year. If we would assume X ¼ 16=Y ¼ 48

(each wavelengths is capable of transporting 2.5 Gbps), the equation for the ELT

becomes

ELT ¼X

c2load

(1�Ac) *Capc * 8760 in STM-16=OC-48 h=y (3:18)

3.8.2 Availability: Some Observations

Until now the discussion on availability was fairly theoretical. Without pretending

to give a complete overview of all factors influencing the availability of a connec-

tion, there are some trends that can be observed. Unless stated otherwise, the

MTTR and MTBF values indicated in Table 3.8 are used. Let us first get a glimpse

of the influence of the recovery scheme on the availability.

Availability Comparison between 1þ1 Protection in Ring-Based andMesh-Based Networks

In [Ari7/00], the availability performance of 1þ1 dedicated protection in a mesh-

based network and an interconnected ring network was compared, both for link

failures only and for link and node failures. The figures of Table 3.8 were used, except

for the CC, which was assumed to be 300 km. This study clearly showed that when

only considering link failures,mesh protection performedmuchworse than dedicated

ring protection. For a 32-node network, the ELT is almost twice as large for mesh

protection than for OCh-DPRings (Figure 3.41). This could be expected, because in a

mesh network the working path is protected by an end-to-end protection path,

whereas in interconnected OCh-DPRings the working path is protected by a succes-

sion of several protection paths, one per ring. The latter can thus survive multiple link

failures, occurring in different rings of the interconnected ring network. When the

D&C ring interconnection scheme is used, the ELT lowers a bit because the intercon-

nected ring network cannow survive from slightlymore link failures (e.g., double-link

failure in a ring, with one failing link between the two ring gateway nodes).

When node failures are also taken into account, the ELT increases substantially,

because a node failure causes all traffic terminating in that node to be lost. In addition,

the relation between the mesh- and ring-based network’s ELT changes considerably,

as can be seen in Figure 3.42. The interconnected ring network without D&C now

performs worst. Because the gateway between rings is a single point of failure in this

scenario, the 1þ1 protection scheme is not able to recover from such failures.

Introducing D&C into the ring network significantly lowers the ELT. Applying the

1þ1 protection scheme in the mesh network causes the ELT to reach a value compar-

able to but somewhat larger than that of DPRings with D&C, again because of the

end-to-end protection versus the ring-by-ring protection of the working path.



0

20

40

60

80

100

120

140

160

180

Path Protection OCh-DPRing

Link Failures Only

OCh-DPRing (+D&C)

ELT

Figure 3.41 Comparison of the expected loss of traffic (ELT) for a 32-node network using pathprotection and optical channel dedicated protection ring with and without drop andcontinue (link failures only). (P. Arijs, B. Van Caenegem, P. Demeester, P. Lagasse, W. VanParys, P. Achten, ‘‘Design of ring and mesh based WDM transport networks,’’ OpticalNetworks Magazine, Vol. 1, No. 2, July 2000, pp. 25–40.)

0

200

400

600

800

1000

1200

1400

Path Protection OCh-DPRing OCh-DPRing (+D&C)

Link and Node Failures

ELT

Figure 3.42 Comparison of the expected loss of traffic (ELT) for a 32-node network using pathprotection and optical channel dedicated protection ring with and without drop andcontinue. (P. Arijs, B. Van Caenegem, P. Demeester, P. Lagasse, W. Van Parys, P. Achten,‘‘Design of ring and mesh based WDM transport networks,’’ Optical Networks Maga-zine, Vol. 1, No. 2, July 2000, pp. 25–40.)



Availability Comparison between Protection and Restoration Schemes inMesh-Based Networks

Another comparison that can be made is between restoration and protection in a

meshed optical network. The results for such a comparison, using the MTBF and

MTTR values of Table 3.8 on the European network depicted in Figure 3.36, are

shown in Figure 3.43. As explained earlier, calculating the availability of the

connections in a network that recovers from failures using link or path restoration

is not that straightforward. In Figure 3.43, the results shown for the restoration

schemes are an approximation. Because the availability of the network elements is

quite high, the probability of multiple simultaneous failures is rather small. For

these calculations, only single and double link and/or node faults were assumed.

When a triple failure scenario occurred, only the nonaffected connections were

taken into account. For other fault scenarios (e.g., quadruple failure scenario), we

assume that none of the connections is available. The calculated ELT for link and

path restoration is thus an upper bound of the exact value. Calculations for the case

study presented in Figure 3.43 revealed that with the MTTR and MTBF numbers

used, the probability that the network suffers from more than two simultaneous

failures is indeed rather small: There is only a 0.18% chance for a fault scenario with

more than two simultaneous network failures.

As can be seen in Figure 3.43, the lowest ELT value is reached with the 1þ1

protection scheme. There is not much difference between the node- and link-disjoint

0

5,000

10,000

15,000

20,000

25,000

30,000

1+1 ProtectedLink Disjoint

1+1 ProtectedNode Disjoint

Path Restoration Link Restoration

Recovery Scheme

EL

T (

ST

M-6

4 H

ou

rs/Y

ear)

0

50,000

10,000

15,000

20,000

25,000

30,000

# Wavelen

gth

Ch

ann

els

Unprotected: ELT = 259,082 STM-64 Hours/Year # Wavelength Channels = 8975

ELT Wavelength channels

Figure 3.43 Comparison of the expected loss of traffic and the capacity requirements for differentmesh-based recovery schemes.



protection schemes, because the availability of the nodes is quite high and the

chance for a line failure to happen is thus much larger than for a node failure

(e.g., for this network and traffic situation, the chance of a single node failure to

happen is 0.13%, whereas the chance for a single link failure is 18.90%). The

performance of path restoration is comparable to that of 1þ1 protection. Link

restoration performs worse, because this recovery scheme is not able to recover

from node failures.

Figure 3.43 also compares these different recovery schemes from a capacity

point of view. It is clear that when taking into account both the ELT and the

capacity requirements of these different schemes, path restoration seems to be a

good compromise.

The results of Figure 3.43 may seem a bit surprising. Path protection uses a

dedicated backup path for each working path in the network. The flexibility of this

recovery scheme is thus rather small. Restoration, on the other hand, searches for

the backup path only after the failure has occurred, making it a more flexible

recovery scheme, because it is not restricted to a predefined backup path route.

The prior illustrated results, however, do not seem to reflect this increased flexibility

to recover from failures using path restoration. This is due to the lower amount of

capacity installed in the network that reduces this increased flexibility of the path

restoration recovery scheme. In [Wil01], it was shown that overdimensioning the

resources in the network with 10% to 20% in the case of path restoration signifi-

cantly improves the ELT, because the path restoration scheme has more flexibility

in choosing the backup path.

Availability versus Topology

In [DeM03], the influence of the topology of the optical network on the availability

of the connections was investigated. Starting from the network in Figure 3.36, links

were removed or added to the network. In this way three topologies were studied: a

quite sparse one with an average node degree of 2.43, the network of Figure 3.36

with an average node degree of 2.93, and a quite dense network with an average

node degree of 4.36. Traffic was protected using the 1þ1 dedicated protection

scheme. As can be seen in Figure 3.44, the ELT increases with decreasing average

node degree of the topology. This can be explained by the fact that in the sparse

topology, the connections have to follow on average a longer route in kilometers of

fiber and have to pass through more OXCs. This means that the probability for a

line or node failure along the path of the connection is higher, the availability lower,

and thus the ELT higher. In the densest topology the routes between origin and

destination OXC are the shortest and these routes thus have the highest availability

(lowest ELT).

Availability versus Traffic Type

Often, the total traffic demand between node pairs consists of different traffic types.

In [Dwi00], for instance, a distinction is made between voice, transaction data and

IP data traffic. Each of these traffic types is typically exchanged on a different



geographical level, and they thus each have a different distance dependency rela-

tionship. For example, voice traffic is inversely proportional to the square of the

distance, transaction data traffic is inversely proportional to the distance, and IP

data traffic is independent from the distance

Voice traffic � 1=D2

Transaction data traffic � 1=D

IP data traffic independent from D

(3:19)

In addition, this different distance relationship has an influence on the avail-

ability of the connections and thus on the ELT. In [DeM03], this effect has been

investigated, and the result is summarized in Figure 3.45. All three traffic types

are protected using the 1þ1 dedicated protection scheme. The volume of voice

traffic between cities A and B is inversely proportional to the square of the distance

between these cities. Most voice traffic connections are between locations that are

geographically quite close and thus follow on average a shorter path. As explained

earlier, a shorter path means a more available path. For transaction data traffic the

distance dependency decreases, because this traffic type is inversely proportional to

the distance. There will be more transaction data connections between locations

that are far away from each other than in the case of voice connections. This

explains the longer routes that transaction data connections will have to follow

on average and thus the higher AELT. IP data traffic is not dependent on the

distance, so longer connections are more probable, which is translated in a higher

AELT. Typically a service-level agreement will be less strict for IP data traffic than

for voice traffic—meaning that the higher AELT incurred by IP data traffic is not

2002 2003 2004 2005 2006 2007 2008

4.36

2.932.43

0

200000

400000

600000

800000

1000000

1200000

1400000

ELT

(S

TM

-1 h

/y)

Year

Node DegreeTopology

Figure 3.44 Influence of the optical layer topology on the expected loss of traffic.



necessarily reflected in less revenue for the network operator, because of, say,

rebates (see Chapter 1).

3.9 Recent Trends in Research

In this section, some recent trends in research are discussed. In Section 3.9.1, we

focus on the concept of p-cycles. Section 3.9.2 discusses the meta-mesh recovery

technique. In Section 3.9.3, we introduce flexible and intelligent optical networks

and how they can be used to provide recovery. Of course this list is not exhaustive.

Many other topics could be added to this section.

3.9.1 p-Cycles

In Sections 3.5 through 3.7, recovery schemes in optical networks were discussed. A

distinction was made between protection and restoration. Recovery schemes were

also classified based on the topology of the optical network: ring-based or mesh-

based. Restoration in mesh-based networks is typically significantly more efficient

in terms of capacity use than the protection schemes in ring-based networks.

However, the latter are able to guarantee very fast switching times (50 to 60 ms),

because only two nodes need to perform any action (see Section 3.5). Until recently

there was a quite strict distinction between recovery schemes in mesh-based and

ring-based networks. In [Gro98], however, a recovery scheme, called p-cycles, was

proposed that offers the advantages of both the ring-based and the mesh-based

recovery schemes: Ringlike switching speeds while having a capacity-efficiency

comparable to that of restorable mesh-based networks. It is based on the formation

of rings in the spare capacity of a mesh-restorable network. These are formed in

2001 2002 2003 2004 2005 2006

Voice TrafficTransaction Data Traffic

IP Data Traffic2.00

2.50

3.00

3.50

4.00

AE

LT (S

TM-1

h/y

)

Year

Traffic Type

Figure 3.45 Influence of the traffic type on the average expected loss of traffic.


3.9 Recent Trends in Research 197

advance of any failure. The p-cycle recovery scheme is similar to a ring-based

recovery scheme in that both use rings. However, unlike ring-based recovery

schemes, p-cycles recover both failures on the ring and straddling failures. This is

the key factor for obtaining the efficiency of a mesh-based recovery scheme using a

ringlike protection structure. An example of the use of p-cycles is illustrated in

Figure 3.46.

In Figure 3.46(a), an example of a p-cycle is shown. In Figure 3.46(b), a link

that is part of the ring breaks, and the surviving part of the p-cycle is used for

recovery purposes. In Figure 3.46(c) and (d), although the link that fails is not part

of the p-cycle, the p-cycle is used to support recovery of the broken link. Moreover,

(a)

A p-cycle

(b)

(c)

A link that is not part of the p-cycle fails. The p-cycle contributes

two recovery paths

(d)

A link that is not part of the p-cycle fails. The p-cycle contributes

two recovery paths

A link that is part of the p-cyclefails. The p-cycle contributes one recovery path

Figure 3.46 Use of p-cycle as recovery scheme. (W.D. Groover, D. Stamatelakis, ‘‘Bridging the ring-mesch dichotomy with p-cycles,’’ Proc. of 2nd International Workshop on Designof Reliable Communication Networks (DRCN’00), (Munich, Germany, April 2000),pp. 92–104.)



not one but two recovery paths are available from the p-cycle, leading to more

advantageous recovery circumstances. The difference with conventional ring-based

schemes is thus that not only links on the ring but also failing straddling links are

recovered by the p-cycle, and that in the latter case two recovery paths are available.

With p-cycles, a single ring can provide recovery paths for much more failing links

than with traditional ring-based recovery schemes, making this scheme significantly

more capacity efficient, even as efficient as mesh-based recovery schemes.

3.9.2 Meta-Mesh Recovery Technique

The technique called meta-mesh [Gro02] is a refinement of existing recovery

schemes in mesh-based networks that increases the capacity efficiency in networks

with a rather lower average node degree (sparse networks). For this type of

network, ring-based recovery schemes are often thought to be the best solution,

because a mesh-based recovery scheme may be equally expensive, because of the

sparseness of the topology. The resulting design using the meta-mesh concept lies

between pure link restoration and pure path restoration.

A sparse network typically contains chains of degree-2 nodes (nodes with two

incident links, see Figure 3.47[a]). When using link restoration, if a link between

degree-2 nodes fails, the affected working traffic is looped back using the spare

capacity (back-hauling, as discussed in Section 3.6) until it encounters a node with

degree higher than 2, at the end of the chain (Figure 3.47[b]).

Working PathLocal Restoration Path

(a) Sparse Network

(c) Meta-Mesh Topology of Network(b) Link Restoration with Loop back (Back hauling)

Figure 3.47 Difference between link restoration and meta-mesh technique.


3.9 Recent Trends in Research 199

With the meta-mesh recovery technique, each chain (a number of degree-2

nodes with a node of at least degree 3 at each end) is represented by one ‘‘meta-

link,’’ leading to a ‘‘meta-mesh’’ as shown in Figure 3.47(c). For a working path

that contains one or more complete chains, traffic can be restored on the level of the

affected meta-link instead of the level of the individual affected link of the original

topology (i.e., rather meta-link restoration than link restoration). In this way, a part

of the spare capacity that was used for the loop back needed with link restoration

can be avoided, leading to a more cost-efficient capacity assignment. In [Gro02], it

is shown that significant capacity savings can be obtained using this meta-mesh

technique.

3.9.3 Flexible Optical Networks

In Section 3.1.5, we already shortly discussed the evolution of the OTN from a

static networking layer to a flexible and agile one. In Section 3.6, we also discussed

how such an intelligent and flexible OTN with its IP-based control plane will enable

optical restoration. However, these flexible and intelligent optical networks do not

only have advantages for single-layer network recovery. The fast connection provi-

sioning, typical for this kind of network, can also be used to provide resilience in a

very capacity-efficient way in a multilayer network scenario. In such a multilayer

network, resilience schemes need to be deployed in all network layers to recover

from all possible network failures. The flexibility of intelligent and agile optical

networks enables the reconfiguration or even reoptimization of the logical client

topology (e.g., during a client node failure), which could be used to work around

such a failure. This is discussed in more detail in Chapter 6, Section 6.2.4.

3.10 Conclusion

In this chapter, recovery in optical networks was studied in detail. First, in Section

3.1, an overview was given of the ongoing evolution of the optical network layer

from a static point-to-point layer providing high-capacity bit pipes to the client

layer, over an optical network layer with switching and management capabilities, to

a fully flexible optical layer. This flexible optical network is further discussed in

Chapter 6. The main network elements in the optical layer are the optical cross-

connect and the optical add/drop multiplier, which were both discussed in Sections

3.1.3 and 3.1.4.

Next, the current architecture and structure of the optical transport network

was discussed in Section 3.2. Several layers can be distinguished within the optical

transport network, from bottom to top, the optical transmission section, the optical

multiplex section (both are sometimes replaced by the optical physical layer), and

the optical channel layer, which substructure consists of the optical channel trans-

port unit layer, the optical channel data unit layer, and the optical channel payload

unit layer. These layers form the optical transport module, with full or reduced



functionality, depending on whether associated overhead is supported or not. The

different optical transport module types also differ in the number of wavelength

channels they can support. Also an overview of the current standardization effort

was given.

In Section 3.3, the overhead of these different network layers was discussed,

emphasizing those parts of the overhead that are useful for fault detection and

propagation. In addition, the different types of defects that can be encountered in

the network were described. The use of the maintenance signals conveyed in the

overhead for alarm suppression was illustrated with some examples.

Sections 3.4 through 3.7 focused on the different recovery schemes that can be

applied in optical networks. In Section 3.4, the question why we would like to use a

recovery scheme at the optical network layer was answered. As such a recovery

scheme works at a large granularity: It is fast, efficient, and easy to manage.

Recovering from a root failure at the optical layer in the higher network layer

would mean to recover from potentially many resulting secondary failures. A first

distinction was made between recovery schemes in ring-based and mesh-based

optical networks. Section 3.5 discussed the ring-based recovery schemes. Such a

scheme is characterized by the level at which the recovery action occurs (OMS or

OCh) and by whether the recovery capacity is dedicated to or shared between the

working traffic (DPRing or SPRing). All four resulting schemes (OMS-DPRing,

OMS-SPRing, OCh-DPRing, and OCh-SPRing) were discussed and compared in

detail. Section 3.6 focused on recovery schemes (both protection and restoration) in

meshed-based optical networks. The pros and cons of protection and restoration

schemes were discussed. Protection is fast and easier to implement but is quite

capacity consuming. Section 3.7 compared the performance of recovery schemes in

ring-based and mesh-based optical networks.

Section 3.8 was dedicated to availability, an important performance parameter

of the different recovery schemes. After a theoretical introduction to availability

calculations, some factors influencing the availability (e.g., the applied recovery

scheme and the network topology) were discussed.

Finally, Section 3.9 gave a short overview of some recent trends in research.

As discussed in Chapter 1, several other layers can reside above the optical

transmission layer. The current trend, however, is to evolve to an IP/MPLS-over-

OTN multilayer network. The IP layer is discussed in Chapter 4. Chapter 5 focuses

on the Multi-protocol Label Switching (MPLS) protocol, which was introduced to

enhance the capabilities of the IP client layer.


3.10 Conclusion 201


C H A P T E R 4

IP Routing

This chapter is devoted to the recovery aspects of Internet Protocol (IP) routing.

Link state interior gateway protocols (IGPs) have undoubtedly been successful

during the past few years and have been deployed in the vast majority of the

operators and large enterprises networks. Consequently, this chapter is mainly

focussed on link state protocols. Interestingly the foundation of current link state

protocols has been laid in the late 1970s in the very well known ARPANET

network, but an increasing interest in fast recovery properties of link state routing

protocols has been driving numerous optimization techniques during the last

few years, leading to fast convergence enhancements, which are extensively covered

throughout this chapter. The first part of this chapter, Section 4.1 through 4.12,

focuses on the fundamental aspects of link state protocols, which include

the reliable network topology discovery mechanism, the distributed shortest path

computation, and the routing table calculation, which are described in detail

not only from a protocol perspective but also with the objective of providing

IP recovery network design rules. Because the nature of IP routing is to be com-

pletely distributed, an important part of this chapter focuses on the dynamic aspects

of distributed routing and the various steps occurring during network convergence.

Throughout this chapter we demonstrate that IP routing can provide subsecond

convergence while preserving network stability even with major and multiple

network failures, thanks to the use of dampening mechanisms. Furthermore,

we show that the common perception that IP routing is limited to best-effort service

is misleading and that optimized IGP metrics algorithms allow an operator to

traffic engineer an IP network both at steady state and under single network

element failure. Nonstop forwarding (NSF), a recovery technique available

on many platforms that ensures continuance of data forwarding in light of control

plane failures, is also discussed (this interaction with fast IP convergence is

covered in Section 4.15). A rich set of examples are provided that highlight the

203


concepts introduced in this chapter, which concludes with a detailed case

study. Finally, the second part of this chapter, from Section 4.13 to 4.15, discusses

some advanced topics like algorithm complexity, incremental shortest path

first (SPF), and the potential interaction between fast IGP convergence and

NSF. Note that you could skip this more advanced part and still have a very

good understanding of the IP routing recovery mechanisms covered from Sections

4.1 through 4.12. This chapter concludes with a section on research-related

topics.

4.1 IP Routing Protocols

We start with an introduction on IP routing protocols, followed by an overview of

the principles of the two major families of routing protocols: the distance vector and

the link state protocol. The ineluctable superiority of link state protocols in terms of

recovery is highlighted, explaining their wide adoption in most if not all the

operators and large enterprises networks. Finally, this section concludes with the

local versus the global recovery aspect of IP routing.

4.1.1 Introduction

The objective of running a routing protocol is for each node to build a routing table

that contains the shortest path32 to each reachable IP prefix. As detailed later in this

chapter, the entire path does not have to be stored to route the packet; instead the

router maintains a dedicated data structure called the forwarding information base

(FIB) that contains the next hop for each reachable IP prefix along with other

protocol information.

Several routing protocols designed during the last three decades fall under one

of the two following categories:

1. Distance vector routing protocols

2. Link state routing protocols

4.1.2 Distance Vector Routing Protocols Overview (‘‘Bellman-Ford’’)

Distance vector routing protocols rely on the principle of periodic distribution of

the routing table to each neighbor. Upon periodic timer expiration (and when

network changes occur, like a network element failure), each node sends its routing

table to each of its participating neighbors.

The easiest way to illustrate how distance vector routing protocols work is

through an example. Consider the simple network depicted in Figure 4.1; let us see

step by step how each router builds its routing table.

32The notion of ‘‘shortest path’’ is explored in Section 4.6.


204 C H A P T E R 4 IP Routing

At time t0: the router A boots up. As soon as the links A-B and A-C are

effective (this includes the time for the layer 2 protocol underneath to be fully

operational), the router A sends its routing table to both the node B and C. At this

stage, A’s routing table is reduced to its directly attached links because the router

A has not yet received any routing information from any of its direct neighbors.

Then at time t1, B and C send their routing tables to A. So for instance, A learns

that D is reachable from B with a distance of 4 upon receiving B’s routing table

and that D is also reachable from C with a distance of 1. After adding, respectively,

the costs of the local links A-B and A-C, A determines that the shortest route to

reach D is by means of C with a distance of 1þ 1 ¼ 2. Finally at time t2, the router

A sends its new routing table (which now contains some reachability information

about D [among others]) to each of its neighbors B and C. Note that before the

node boots, the shortest path computed by B to D was through its directly

connected link B-D (actually it was the only existing path). Upon receiving that

new routing update from A, B figures out that the path cost to reach D by means of

A is 2þ 1 ¼ 3, which is a shorter path than the existing one; consequently,

B updates its routing table and selects A as its preferred next hop to reach D.

The same process occurs between each node, and after some time the network

converges.

Unfortunately, distance vector routing protocols become quite inefficient during

a network element failure. Now consider the failure of the subnetwork N1 [Figure

4.1(b)] locally attached to the router A (e.g., because of the failure of the interface

connecting node A to the subnetwork N1). When node A detects the failure, it

quickly updates its routing table and marks the corresponding IP prefix as unreach-

able (cost is infinite). However, bear in mind that routers running a distance vector

protocol exchange their routing table regularly. Hence, in the absence of a network

Link cost

C

E

(a) (b) (c)

D

F

A B

x

C

E

D

F

A B

N1

Failure 1

C

E

D

F

A B

N1

Failure 2

4

Figure 4.1 Distance vector routing protocols.


4.1 IP Routing Protocols 205

failure, A periodically receives B’s routing table indicating that N1 is reachable by a

distance of 2 (B’s routing table selected A as the next hop to reach N1).

What would happen if A receives B’s routing table update just after having

marked the corresponding routing table entry as unreachable? A would now select

B as its preferred next hop to reach N1 with a cost of 3 (because at this point it does

not have any route for N1 and would send a routing update to B). Then B would

reflect the cost change (which is now 4) and advertise the new cost to A and so forth.

This clearly creates a loop and the solution to break that loop is to consider

the route nonreachable once the cost has reached a large value that can be

considered infinite; this is called the count-to-infinity problem. For instance, RIP

considers the value of 16 as infinite, which allows breaking the loop relatively

quickly; however, the downside of such an approach is that the limit of the network

diameter33 is now 16 because no path can exceed this value of 16 without being

considered infinite.

Various solutions have been proposed to avoid such loops. A very well known

but partial solution is the ‘‘split horizon.’’ The idea is that a neighbor should never

advertise a route to a node X if its preferred next hop for that route is by means of

X. For instance, in the example depicted in Figure 4.1, B would not advertise N1 to

A because its preferred next hop for N1 is via A (the same reasoning applies to C).

Unfortunately, this works only to avoid a loop involving two nodes.

Now consider the case of a failure of the link C-E [Figure 4.1(c)]. One possible

sequence of events is the following. When C detects the failure of the link C-E, once

the periodic routing update timer expires (or immediately depending on the distance

vector protocol), it sends its routing table reflecting that the cost to reach E is now

infinite (i.e., E is no longer reachable via the node C). Both node A and node D

learn the news and update their routing tables, which are then sent to each of their

neighbors. After some period the network converges.

Now suppose a slightly different event timing: Suppose that C detects the

failure of link C-E and sends its routing table update to A and D. However, bear

in mind that routers exchange their routing table periodically; in the absence of any

failure, both A and D advertise to B that they can reach E with a cost of 2. Suppose

that B selects node D as its preferred next hop to reach E with a cost of 3. By virtue

of the split horizon technique, neither router A nor router D advertise E to C in

their routing table update because they both selected C as their preferred next hop

to reach E. Now, although B does not advertise E to D (its preferred next hop to

reach E), it does send an update related to E to node A with a cost of 3. At steady

state, A selects C as its next hop, because the path via C is shorter. Back to the

previous example, suppose that A receives C’s routing table update related to E

(and reporting an infinite cost for E) and then immediately after A receives B’s

routing update for E (this could happen if B sends its routing update before having

received D’s routing update reporting that E is no longer reachable). Then, in this

case, A selects B as its preferred next hop to reach E with a cost of 4 and sends a

33The diameter of the network is defined as the maximum number of routers an IP path can contain.



routing update related to E to its neighbor C, which results in building a loop

involving four routers, hence the statement on which the ‘‘split horizons’’ technique

partially solves the problem.

Improvements of ‘‘split horizon’’ have been proposed to speed up the conver-

gence time like the ‘‘split horizon with poison reverse,’’ where a node always

re-advertises route N to a neighbor X with an infinite cost if its preferred next

hop for that route is X, but despite several improvements, distance vector protocols

inherently suffer from lack of efficiency in terms of convergence time.

One of the popular distance vector routing protocols is Routing Information

Protocol (RIP), which was the routing protocol provided on UNIX BSD in 1982

(known as ‘‘routed’’). Various versions of RIP have been defined: RIP version

1 [RIP-1] followed by RIP version 2 [RIP-2]. Some other interesting enhancements

have also been made, in particular the ‘‘triggered update’’ [RIP-TRIG], which relies

on the principle that RIP no longer sends periodically its complete routing table

(except when explicitly requested) to every neighbor but does so only when a change

in the network occurs, hence reducing unnecessary background noise. Also, some

other distance vector protocols like Enhanced Interior Gateway Routing Protocol

(EIGRP) have been designed with more advanced features and are certainly more

optimal than the first version of RIP, but they all rely on the basic principle that

each router provides to its participating neighbors its own view of the network after

having computed its routing table. So although distance vector protocols like RIP

may be suitable in small networks or in some particular network topologies (like

‘‘star’’ topologies), as stated earlier, their limitations in terms of convergence speed

render them not suitable for large and meshed networks. This is especially true

when convergence speed is required. In such circumstances, link state protocols are

undoubtedly preferred.

4.1.3 Link State Routing Protocols Overview

A Brief History

The ARPANET has undoubtedly played a tremendous role in the current routing

protocol design, and link state protocols are not an exception because the first link

state protocol was invented and deployed in the ARPANET. The very first version

of a dynamic routing protocol was deployed in the ARPANET and is described in

[ARPA-1]. This first routing algorithm was an adaptive dynamic distributed routing

protocol. Each term is important and must be clearly defined here:

. Dynamic: Dynamic routing tables are computed by contrast with static

routing where, for each destination, the next hop is manually configured

by the network administrator.

. Distributed: Each router computes its own routing table. In other words,

there is no central server that computes the routes and downloads the

resulting routing tables on each node.

. Adaptive: A routing protocol is said to be adaptive when the route compu-

tation takes into account certain dynamic network state conditions like the



link load or experienced delays to influence its routing decision. Of course,

adaptive routing protocols require some measurement process that deter-

mines/quantifies network characteristics and some way to disseminate that

information to other nodes in the network. Finally a computation module is

required to compute the shortest path according to specific constraints. One

of the major challenges of this class of protocols is to ensure a sufficient

reaction to network changes while preserving network stability without

requiring unreasonable protocol overhead cost. This is definitely an inter-

esting property that current link state protocols like Open Shortest Path

First (OSPF) and Intermediate System to Intermediate System (IS-IS)34 do

not have, for some reasons discussed later in this chapter.

The first ARPANET routing algorithm (Figure 4.2) relied on the following

principle: Each node maintained a table of the estimated delays to reach any other

node in the network. Upon receiving the table from node X, node Y would first

evaluate the delay to reach X and compute a new table of estimated delays to every

other node in the network, where the shortest path was considered as the path with

the shortest delay. Each node would send its estimated delay table to every adjacent

node at a quite high frequency (every 128 ms).

Although the first ARPANET routing algorithm was used for several years,

several issues came up, as follows:

1. Packets containing the estimated delays tables were getting long and were

growing as the ARPANET grew.

2. Route consistency was difficult to maintain across multiple nodes, which

was inherent to the nature of the route computation where each node used to

make its route computation on the estimated delay table calculated by other

nodes.

3. The fast rate of exchange of estimated delay tables led to a lack of efficiency

in adapting to congestion and major network changes and at the same time

could overreact to minor changes in the network.

4. The delay measurement method was solely based on the queue lengths,

which were not accurate because links had different characteristics like

speed, propagation delays, and packet sizes. Moreover, at that time a

processing delay (independent of the queue size) was not a negligible factor

that was just ignored by the delay measurement method. Queue lengths were

measured based on the instantaneous queue size, which was not an excellent

indicator either.

This justified coming up with a newer routing protocol version significantly

different from the first version [ARPA-2]. One of the first major changes in the new

version of the ARPANET routing protocol was that each node was disseminating

the measured delays between itself and its adjacent neighbors (instead of generating

a packet [or some packets] reflecting its estimated delays to every other node in the

34Although some proposals have been made to make OSPF and IS-IS adaptive.



AMES16

LBLMOFFETT

AMES15

HAWAII

SRI51STANFORD

SUMEX

LLL

SRI2XEROX

TYMSHARENPS

NOSCACCAT

AFSDISI27

CIT

UCLA

USCRAND

ISI52

ISI22

YUMA

AFWL

WSMR

DOCB

COLLINS GUNTER

SCOTT

STLADTI

WPAFB

ANDRWNRLDCEC

SDAC

MITREARPA

BRAGG

ROBINSEGLIN

LONDON

NORSAR

PENTAGON

NBSNSA

DARCOM

CMU

DEC

CCA

MIT6

MIT44

LINCOLN

AFGLNYU

RADC

CORADCOM

ABERDEEN

HARVARDBBN72

BBN63BBN40

RCC71

RCC5RCC49

UTAHGWC ANL

TEXASSatellite Circuit

IMP

TIP

Pluribus IMP

Pluribus TIP

C30(Note: This Map Does Not Show ARPANET Experimental Satellite Connections) Names Shown Are IMP Names, Not (Necessarily) Host Names

ARPANET Geographic Map, October 1980

Figure 4.2 ARPANET map in October 1980. (M. Dodge. ‘‘Cybermap of the Month Column,’’ ARPANET, October 1980. [Illustrationcourtesy of the Computer Museum of History Center.] Available at http://mappa.mundi.net/maps/maps_001. AccessedMay 2004.)

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

3:5

8am

page

209

4.1

IPR

outin

gPro

toco

ls209

network). Then upon receiving the information generated by each node, each router

was able to compute the shortest path from itself to every other node using some

distributed shortest path computation. Each packet was flooded throughout the

network using a new reliable flooding mechanism (called the updating procedure at

that time). It is worth highlighting some important properties of the flooding

procedure, which was fast and reliable: This was of the utmost importance to ensure

database consistency between nodes and avoid loops, as discussed in detail later in

this chapter. A new updating packet was also originated and flooded throughout

the network upon link state change (see [ARPA-5] for a detailed description of the

updating procedure). Another important and new component of this new adaptive

dynamic and distributed routing protocol was the delay measurement method

where the average delay (by contrast with the instantaneous queue size) was

measured every 10 seconds and reported if a significant change was noticed.

Several detailed analyses were conducted to determine the efficiency of this new

adaptive routing protocol and the results were very promising:

. Quick and accurate response to topological changes.

. Dynamic packet rerouting upon network congestion.

. Efficiency in terms of shortest delay path computations.

. Routing loops were very temporary and packets entering a loop did not

traverse a router more than twice; on the other hand, several routers could

be involved in a single temporary loop (see Section 4.6 for more details on

temporary loops).

. The algorithm did not provoke network instabilities and oscillations, which

is of course one of the potential drawbacks of adaptive algorithms. Indeed,

because paths were computed based on actual traffic load, an inappropriate

measure procedure and update frequency might have led to traffic oscilla-

tions. Flows were routed around congested areas, which alleviated the

congestion in the area in question but could also create some congestion

on other links, which could result in a new traffic shift, hence the possible

oscillation. This is the reason current link state protocols are not adaptive

because the trade-off between traffic load efficiency and some potential

network oscillations has been in favor of using static IGP link metrics.

As described throughout this chapter, current link state protocols have several

strong commonalities with the routing protocols designed for the ARPANET:

ineluctably, the foundations of link state protocols have been laid during the

ARPANET’s years.

Several interesting references of the ARPANET and Internet history milestones

can be found at [HISTORY].

Link State Protocols Overview

Link state protocols rely on a fundamentally different concept than distance vector

protocols. Each router is responsible for originating a link state protocol data unit

(PDU) that describes its local topology (in a nutshell, its set of direct neighbors, the



local link characteristics (like the metric), the local IP addresses, etc.). Link state

PDUs are then disseminated throughout the network via a reliable flooding mech-

anism. The collection of all the link state PDUs originated by every router in the

network (which is called a link state database [LSDB]) allows every router to

constitute a complete map of the network. Then, every router runs an algorithm

that computes the shortest path tree (SPT), which provides the shortest path from

the computing node to every other node in the network, as well as the routing table

that contains all the reachable IP prefixes along with the corresponding preferred

next hop(s) and the cost. The reliable link state PDU dissemination (flooding)

process and the shortest path computation is covered in detail in this chapter.

At steady state, routers exchange short messages (called hello) that allow them

to ensure that their neighbors are still reachable; the correct link state protocol

terminology is that routers’ adjacencies are still up. When a router first boots,

it starts exchanging hello messages to automatically discover its participating

neighbors. Once a neighbor is discovered, the process of LSDB synchronization

starts, upon which routers exchange their LSDB; this guarantees that all the routers

share the same view of the network (they have identical LSDBs).

When a link or node fails in the network, as soon as the failure is detected, each

router detecting a failure (and so a loss of adjacency) originates a new link state

PDU that reflects the network topology change. Note that the network element

failure detection can be done by means of the routing protocol’s hello messages

(no hello messages are received from a neighbor during a configurable period) or

the layer 1/2 protocol that sends an alarm to explicitly indicate a link failure. The

aspects of failure detection are covered in detail in Section 4.3. So, for instance,

in the case of a link failure the two routers interconnected via the failed link will

originate a new link state PDU, which will be flooded in a reliable mode throughout

the network.

Upon receiving a new link state PDU reflecting the network topology change,

each router triggers a new routing table computation, using a shortest path compu-

tation algorithm (usually referred to as SPF algorithm) and described in detail in

Section 4.6. Various timers related to the origination35 of the link state PDU and

SPF computation can be used to tune the routing protocol convergence while

guaranteeing network stability; they are covered in detail in Section 4.4, but the

aim of this paragraph is to introduce the general concept of link state routing

protocols.

Although temporary loops may appear during network convergence because of

some lack of LSDB synchronization between routers (detailed in Section 4.7), those

loops have a very short period of life and SPF algorithms guarantee the computa-

tion of loop-free paths.

Link state routing protocols also support the notion of hierarchical routing,

which allows splitting the network in multiple zones where just the routers

belonging to a zone share the same LSDB. Limiting the number of routers in

35The process by which a router builds the link state PDU and floods it is called the origination: We say

that the router originates a new link state PDU.



each zone reduces the LSDB size, which in turn reduces the routers routing-related

operations (less memory usage, faster route computation, and higher stability are

examples of potential gains). In some cases, this might also be useful to isolate some

part of the network that experiences regular instabilities (e.g., a region of the world

where link failures occur very frequently). Then just the routers belonging to the

zone where the failure occurs will be affected (receipt of the new link state PDU,

routing table recomputation). The reachability information of an IP prefix outside

of a zone will be provided by some routers connected to multiple zones and called

area border router (ABR) in OSPF and L1L2 (level 1–level 2) router in IS-IS, which

advertise the IP prefixes reachable outside of the zone along with optional metrics

and various degrees of summarization (the process of route summarization allows

reducing the number of advertised IP prefixes making use of the hierarchical nature

of IP addresses). We must underscore that routing protocols have benefited from

numerous implementation optimizations, and routers’ CPUs are much more power-

ful than several years ago, so the limits in terms of number of routers per zone have

drastically changed. Trying to determine the maximum number of routers per zone

is nonsense because that number highly depends of several factors like network

stability, router’s CPU, and degree of connectivity, to mention a few.

The two most widely used link state protocols are OSPF [OSPF] and IS-IS

[ISIS]. There are significant differences between the two protocols, such as the link

state packet (LSP) formats (called a Link State Advertisement [LSA] for OSPF and

Link State Packet [LSP] for IS-IS), the protocol message types, and the LSDB

synchronization procedures, to mention a few of them. Although the list of differ-

ences is certainly quite long, the similarities between both protocols are also

numerous because they both are link state protocols. In particular, their properties

in terms of recovery are very similar. The routing dynamic upon a network element

failure is fairly identical: Once the failure is detected, a new LSP is originated and

flooded throughout the network. Then every router receiving a new LSP triggers a

routing table computation, making use of a shortest path algorithm.

Hence, the set of mechanisms described in this chapter are applicable to both

IS-IS and OSPF. The generic terminology of LSA is used and refers to an LSA for

OSPF and an LSP for IS-IS in the rest of this chapter.

Distance vector versus link state protocols: For the reasons explained earlier and

in particular the scalability and convergence time aspects, link state protocols have

been widely deployed, particularly in large networks.36 There might be very few

36It is worth mentioning that a mix of link state and distance protocol routing protocols might be seen in

some particular network topologies. For example, consider a network made up of a backbone of core

routers interconnected in a mesh topology with a set of remote or edge routers attached to some core

routers via a single link. In such a case, running RIP between the remote routers and the core routers and

OSPF or IS-IS between the core routers is an interesting option. Indeed, this reduces the size of the LSDB

because just the core routers will be part of the link state domain, while providing a dynamic way of

learning the IP prefixes reachable by means of the local routers without the need for fast convergence

between the remote routers and the core routers because they do not have an alternate path anyway

(those routers are attached to the core routers via a single link). Note that such a configuration is more

commonly seen in large enterprise networks than in operators’ networks. Such a routing design also has



exceptions, but most if not all service providers and large enterprise networks run a

link state routing protocol. For that reason, the rest of this chapter is entirely

devoted to the recovery aspects of link state protocols.

4.1.4 IP Routing: A Global versus Local Restoration Mechanism?

IP routing is fundamentally a restoration recovery mechanism (Figure 4.3). Indeed,

as described in detail throughout this chapter, once the failure has been detected by

the router directly attached to the failed network element (e.g., link or node), it

propagates the fault indication signal (FIS) throughout the network. More pre-

cisely, the link state protocol propagates the network topology change, which can

be interpreted as an FIS in the case of a link or a node failure. Then, every router

that receives the notification of the network change recomputes ‘‘on the fly’’ its

routing table, which is by definition a restoration process. Now, strictly speaking,

IP routing cannot be classified as either a global or a local restoration recovery

mechanism. Indeed, the point of rerouting of the traffic affected by the failure

depends on the network topology and can either be the router directly attached to

the failed link or node (in this case, the restoration is local) or several routers

upstream to the failure.

As shown in Figure 4.3, depending on the network topology and the failure

location, the rerouting node is either immediately upstream to the failure or several

nodes upstream. So the degree of meshing (also sometimes referred to as the

several drawbacks in terms of network management because it requires routing information redistri-

bution between routing protocols that may be complex to configure and a source of configuration errors.

Rerouting location in ahighly meshed (dense)network

Rerouting location in asparsely meshed network

HGF

A DB C E

A DB C

HGF

E

Figure 4.3 IP restoration.



network density) determines how close the upstream rerouting node is likely to be

from the failure. This has an obvious impact on the rerouting time, but increasing

the degree of meshing is not always possible and has a cost. Some studies on several

large IP networks show that on average the rerouting node is between three and six

hops upstream to the failure location.

4.2 Analysis of the IP Routing Recovery Cycle

The aim of this section is to give an overview of each phase that takes place during

the recovery cycle (introduced in Chapter 1) in the context of IP when a network

element failure occurs in the network. Then an example is provided that illustrates

the various rerouting phases previously described. Sections 4.3 through 4.6 explore

in detail each of those phases, but it is important to first have a good understanding

of the IP routing dynamics (Figure 4.4).

4.2.1 Fault Detection and Characterization

As with any other recovery mechanism, the first task to occur in the recovery cycle

is the failure detection itself, which usually has a nonnegligible impact on the overall

convergence time. Section 4.3 is devoted to this important aspect.

4.2.2 Hold-Off Timer

Multilayer recovery mechanisms are studied in detail in Chapter 6, but in a nutshell,

there are situations in which it is appropriate for the IP layer to wait for the

expiration of some hold-off timer before triggering an action, once the failure has

Time


Fault Detection TimeHold-Off Time

Recovery Operation TimeTraffic Recovery Time

FailureFault Detected

Recovery Time

Figure 4.4 Recovery cycle.



been detected. For instance, consider the case of an IP-over-Dense Wavelength

Division Multiplexing (DWDM) network in which some recovery mechanisms are

also available at the optical layer. In other words, the DWDM links are protected

and rerouted by means of some recovery mechanism (protection or restoration), as

seen in Chapter 3. Then, if the rerouting time for the optical layer is bound to some

time X, the IP layer should wait for some time Y, where Y > X , before triggering

any action to avoid some undesirable racing conditions. If after the expiration

of the hold-off timer the link is still down, the optical recovery probably has not

succeeded and the IP layer should trigger some recovery action. Note that the hold-

off timer may be dynamically computed when dampening techniques are used, as

discussed in Section 4.4.

4.2.3 Fault Notification Time

When a link or a node fails, every node directly attached to the failed network

element will detect the failure after some period. Thus, for instance, in the case of a

link failure, the two nodes interconnected by the failed link will detect the link

failure (note that they may not simultaneously detect the failure), whereas in the

case of a node failure, all the neighbors of the failed node will detect the failure.

Each node having detected the failure sends an FIS throughout the network. The

FIS in an IP network is a new LSA and the action of sending LSA is called

the flooding. In the rest of this chapter, we will refer to this as LSA flooding.

A node that receives a new LSA (compared to the local copy of that LSA stored

in its local LSDB) must validate the received LSA, store it in its LSDB, and flood it

to each of its neighbors (except to the neighbor from which it received the new

LSA). The LSA flooding is always reliable and both IS-IS and OSPF have a reliable

flooding mechanism that relies on the retransmission of nonacknowledged LSAs.

The LSA flooding mechanism is detailed in Section 4.5, but the aim of this

paragraph is to introduce the general IP routing dynamics.

4.2.4 Computation of the Routing Table

Once a router has received a new LSA reporting a network topology change, it must

compute a new routing table.37 Strictly speaking, an LSA does not report topology

change but the current topology state. In other words, a router receiving a new LSA

must first compare it to the current version of the LSA stored in its LSDB

to determine whether a topology change has occurred. Note that the routing

computation can be delayed by some amount of time for various reasons, which

are explored in Section 4.6. The routing table contains the shortest path from the

computing node to each reachable IP prefix. More accurately, the routing table just

contains, for each IP prefix, the next hop in the shortest path, the IP metric (indeed,

storing the complete shortest path for each IP prefix is not needed and would

37Note that the routing table is often called the Routing Information Base (RIB).


4.2 Analysis of the IP Routing Recovery Cycle 215

unnecessary consume memory space), the outgoing interface, and some lower layer

information (related to the layer 2 protocol in use for the respective outgoing

interface).

The routing computation process requires two operations:

1. The SPT computation, which is the topology tree representing the network

(Figure 4.5)

2. The next-hop information for each IP prefix (next hop, metric, outgoing

interface)

The computed SPT by a node38 X is the tree whose root is X and that computes

the shortest path from X to every other reachable node in the network.

Consider the network depicted in Figure 4.5. The SPTs computed by the nodes

A and G are depicted on the right side of the figure. Hence, for instance, the shortest

path from A to D is A-E-F-D and the shortest path from G to F is G-A-E-F. Note

that there are two equal cost paths from G to B: the paths G-H-B and G-A-B. This

allows performing load balancing between G and B along those two paths. In the

other diagrams, equal-cost paths are not always depicted in SPTs.

A

E

DB

F

C

G IH

3

3

3

3

3

3

1

1

1 1

1 1

1

11

1

1 1

1

1

1 1 1

1 1

1

2

2

2

1

x Link Metric

SPT Computed by A

A

E

DB

F

C

G IH

SPT Computed by G

A

E

DB

F

C

G IH

1

1

1

1

1

1

Figure 4.5 Shortest path tree (SPT) computation.

38The terms router and node are used interchangeably in this chapter. When describing an algorithm, we

use the generic term node because the algorithm generally applies to any kind of node, such as routers,

optical switch, and SONET-SDH switch. On the other hand, when describing an action specific to IP like

the LSA flooding, the term router is more appropriate. The two terms are equivalent in this chapter.



A note on terminology: The term equal-cost multipath (ECMP) is usually used

to describe the ability to have multiple paths having identical costs. The exact

algorithm to compute the SPT is covered in Section 4.6.

Once the SPT has been computed, the next operation consists in populating the

routing information base (RIB). Each IP prefix announced by each node present

in the SPT is added to the RIB along with its shortest path (next hop). In other

words, the SPT provides the shortest path to any reachable node in the network,

whereas for each reachable IP prefix the RIB contains the next-hop, metric, and

outgoing interface. So, for instance, in the example above, if node F announces a

network prefix 160.92.23.0 (mask 255.255.255.0), the corresponding IP prefix is

added by G to the RIB with A as its next hop because the shortest path to reach F is

G-A-E-F. Note that for a prefix 161.23.54.0 (mask 255.255.255.0) announced by

B, for instance, two entries would be added to the RIB with the respective next-hop

A and H and the packets to that destination address will be load balanced (see

Section 4.8 for more details on load balancing).

In fact, the RIB can be computed as the SPT is calculated. In other words, there

is no need to have a two-step approach whereby the SPT would be entirely

computed followed by the RIB computation.

4.2.5 An Example of IP Rerouting upon Link Failure

In this section, we saw the different phases occurring during IP rerouting upon a

network failure event. Let us now illustrate those different steps through an

example.

Again, we emphasize the lack of predictability of the event timing sequence in a

distributed computing environment. Several timing sequences can occur and this

depends on several factors like the network topology, the links characteristics

(propagation delay, level of congestion), the router performance, and load. Figure

4.6 highlights a possible event timing sequence. Figures 4.6 and 4.7 illustrate the

event sequence.

Time T0: The link C-D fails. After some period, the router C detects the failure

(note that router D also detects the link failure, but the assumption is made in

this example that the router C first detects the link failure). That period of time

depends on the layer 2 protocol, the IGP parameter setting, and the failure

type. In high-speed backbone networks, it is quite common to interconnect

routers with optical lambdas using SONET/SDH framing or native SONET-

SDH links in which the failure detection is on the order of tens of milliseconds.

That event triggers the origination of a new LSA, which is flooded to every

neighbor. As discussed in detail in Section 4.5, node C may decide to delay the

LSA origination by some (dynamic) period, but consider the general IP routing

dynamics, the details of each phase being discussed later. Once the LSA

is originated and flooded throughout the network, each router triggers an

SPF and computes a new RIB corresponding to the new network state. The

dotted arrow indicates the RIB entry for the destination Z (e.g., the next hop



A

E

DB

F

C

G IH

T0: The Link C-D Fails,C Converges.

Z

S

T03

x Link Cost – When not specified, the link metric is 1

3

A

E

DB

F

C

G IH

T1

Z

S

3

3

2

RIB Entry for Z

LSA/LSP Propagation

A

E

DB

F

C

G IH

T2

Z

S

3

3

2

2

Figure 4.6 An example of IP rerouting dynamics.

x Link Cost

RIB Entry for Z

LSA/LSP Propagation

A

E

DB

F

C

G IH

T3

Z

S

3

3

2

2

A

E

DB

F

C

G IH

T4

Z

S

3

3

2

Figure 4.7 An example of IP rerouting dynamics (continued).



computed by router C to reach subnetwork Z is now B). We say that router C

has converged.

As with LSA origination, the SPF and RIB computation may also be delayed by

node C (this is discussed in detail in Section 4.6).

T1: B now receives the new LSA and first determines whether the LSA is a new

one. Because the LSA is new (it reflects a topology change with no link between

the routers C and D), B floods it to each of its neighbors and recomputes its

routing table. In this example, the new shortest path computed by router B to

reach Z is now through node F.

T2: H receives the LSA, floods it to I and G (its neighbors) and recomputes its

RIB. The new shortest path to Z is via node I.

T3: A receives the LSA, floods it to E and G and recomputes its RIB. The new

shortest path to Z is now via node E.

T4: Finally, G receives the LSA from A and H and recomputes its RIB. The

shortest paths to Z are via nodes H and A (as before). Finally, E will also

receive the LSA from A and F and will recompute its RIB with F as its next hop

to reach Z (as before).

There are several important notes to mention here:

1. The event timing sequence might have been different.

2. Not all the nodes are affected by the link failure. For instance, node G is not

affected by the failure of link C-D (the computed SPT is unchanged). Some

optimizations of the SPT computation detect such a condition and do not

trigger a routing table computation when the failure does not affect their

current SPT. This is the case of incremental SPF as described in Section 4.14.

This is also the case for node E.

So which nodes are affected by a network element failure? A node is affected by a link

or node failure if the failed resource belongs to its SPT. Another way to determine

the set of affected nodes is to compute a reverse SPT rooted on the node terminating

the failed link (Figure 4.8).

3. Another very important aspect to notice is the location where the traffic

can actually be rerouted upon a network element failure. In the example

above, the first node capable of rerouting the traffic coming from node

A to destination Z is router B because C does not have any alternate path

to Z. This highlights an important property of IP rerouting: As already

pointed out, the location of the rerouting node with respect to the

failure definitely has an impact on the overall convergence. The closer

the rerouting node is to the failure, the smaller the failure’s impact on the

traffic is. So the traffic sent from S to Z, for instance, will be dropped

upon the failure of link C-D until node B has converged. As previously

mentioned, in existing service provider networks, the number of hops

between the rerouting nodes and the failure varies between three and six

on average.



4.3 Failure Profile and Fault Detection

The objective of this section is to answer the following set of important questions:

. What are the different failure profiles seen by the IP/Multi-Protocol Label

Switching (MPLS) layer?

. What are the set of mechanisms for failure detection?

. How can each failure be unambiguously identified and what are the

requirements for failure characterization?

. What are the implications on the traffic of the various types of failures?

This section applies to both IP and MPLS (discussed in Chapter 5), so

although this chapter is entirely dedicated to IP, the term IP/MPLS is often used

in Section 4.3.

4.3.1 Failure Profiles

Various profiles of failures can occur in an IP/MPLS network. The aim of this

section is not to provide an exhaustive list of all the possible failure types, but it is

worth listing the main categories of failures.

Link Failures

Several types of failures result in IP/MPLS link failure, as follows:

. Fiber cut

. Optical equipment failure

x Link Cost

Reverse Spanning Tree Rooted at D

A

E

DB

F

C

G IH

Z

S

3

3

2

2

DCBA

H

Figure 4.8 Set of nodes affected by a failure.



. SONET/SDH equipment failure

. Router interface failure (the port on a router line card or the line card itself

fails)

Although these failures are not identical, they all result in a loss of connectivity

between two routers and, therefore, can be considered a link failure from an IP/

MPLS layer perspective.

Node Failure

There are multiple possible causes of node failures whose nature has very different

implications on traffic forwarding, as follows:

1. Router power supply outage: Some routers have a backup power supply that

is automatically activated in case of failure of the primary power supply.

Sometimes, a set of power supplies share the load with the capability to

absorb the extra load in case one or more power supplies fail. Hence, in this

case, a power supply failure does not have an impact on the router (a Simple

Network Management Protocol [SNMP] trap is usually sent to a manage-

ment agent so the power supply replacement can be performed). On the

other hand, when a router does not have an embedded power supply

redundancy, a power supply failure results in the complete cessation of

operation of the router.

2. Facility power supply failure: The router does not get power supply when a

facility power supply failure occurs in the building, for instance.

3. Route processor failure: In this instance, there are two families of router

architectures that must be considered, as follows:

. Centralized architectures: The route processor (RP) is responsible for the

control plane tasks (routing table computation, signaling, and manage-

ment) and is involved in traffic forwarding.

. Distributed architectures: The RP is responsible only for the control

plane tasks. Packets transit through the router via line cards (having

usually their own processor and a set of specialized processors) without

involvement of the RP.

As described later in this section, the impact of an RP failure significantly

differs in both cases.

4. Software failure: Impact on some specific features or software crash of the

router operating system (OS) because of a software bug.

5. Planned node failure: The phrase planned node failure may sound quite

surprising because a failure is usually inherently unpredictable. During the

life of a network, routers (and any other active equipment) requires hard-

ware and/or software upgrades for various reasons, as follows:

. New interfaces must be added to increase the router connectivity.

. New interfaces types are required to support higher speed rates.


4.3 Failure Profile and Fault Detection 221

. New functionalities are required to support new services and/or to

optimize the network.

Depending on the platform, some of these operations (hardware or software

upgrades) require stopping the router operation and can be considered a node

failure, with the particular property of being predictable, which obviously helps in

reducing/eliminating their impact on traffic forwarding. Note that router upgrades

occur relatively frequently in large networks and the requirement for very high

network availability requires having mechanisms to minimize their impact.

4.3.2 Failure Detection

In the previous section, we saw different failure profiles that can occur in an IP/

MPLS network. Now we turn to the set of mechanisms that can be used to detect

those failures and their respective performance and scalability.

There are two families of failure detection mechanisms, as follows:

. Lower layers failure notification

. Hello-based mechanisms

Lower Layers Failure Notification

The role of lower layers (layer 1 and 2) to detect and notify a link failure is essential

and largely varies from one layer to another. For instance, the optical and SONET/

SDH layers provide very fast link failure notifications (on the order of tens of

milliseconds, usually less than 10 ms). By contrast, if two routers are connected via

a Frame Relay Permanent Virtual Circuit (PVC), then if a failure occurs in

the Frame Relay network, the routers will have to wait a significant amount of

time (usually several seconds) to be notified of the failure. Generally, the failure

notification is obtained by means of protocols such as Local Management Interface

(LMI) that allow a router to get PVC status from the Frame Relay switch. Note

that some versions of LMI (e.g., T1.617 Annex D) provide asynchronous mecha-

nisms so the Frame Relay switch can spontaneously notify a PVC status change

after a network failure, provided the link connecting the router to the Frame Relay

switch is not the cause of the failure. Similarly, if two routers are interconnected via

ATM Switched Virtual Circuit (SVC), the failure notification is performed by the

ATM PNNI routing and signaling protocol and is usually of the order of a few

hundreds of milliseconds up to several seconds depending on the PNNI parameters

tuning.

An extreme case is when routers are interconnected via a layer 2 local area

network switch. For instance, say three routers are connected to a Gigabit Ethernet

switch and are IP/MPLS neighbors (a routing adjacency is active between each pair

of routers). The failure of the link between a router and the switch (e.g., a port

of a switch or a router interface connecting the router to the switch) will not be

detected by the other two routers at the layer 2 level. Some mechanisms for fast

failure detection in such an environment have been proposed. In a nutshell, the



Multiaccess Reachability Protocol (MARP) allows a router to be notified of the

local failure between an IP neighbor and the switch. Suppose three routers R1, R2,

and R3 are connected to a layer 2 switch S. If R1 establishes a router adjacency with

R2 and R3, then using MARP, R1 will inform S of its interest to be explicitly

notified of a failure of both the connection R2-S and the connection R3-S. When

link R2-S fails, as soon as the failure is detected by switch S, S immediately sends a

failure notification to R1. Such a protocol greatly improves the failure detection

time in this type of environment.

Mechanism Based on Hello Protocols

Failure detection by means of hello-based protocols have been used for almost

several decades and relies on the principle of sending a periodic hello message

between two neighbors. When one of the routers stops receiving hello messages

for a configurable period, it concludes that a failure of the link between them or the

neighbor itself has failed.

IGP Hellos

Thanks to more powerful CPUs, some router architectures allow sending IGP

hellos at a relatively high frequency (on the order of hundreds of milliseconds).

Scalability impact: Running IGP hellos is not a cost-free operation for the

routing task because multiple checks are performed when an IGP hello packet is

received. Thus, the frequency cannot be increased without a nonnegligible impact

on the router CPU.

Bidirectional Forwarding Detection

Other hello protocols can also be used. Bidirectional forwarding detection (BFD)

(see [BFD]) is a low-overhead hello mechanism that is independent of any routing

protocol whose benefit is to be light and fast. BFD is undoubtedly an interesting

alternative to the option of tuning IGP hello to higher frequency. Note also that

because BFD is not tied to a routing protocol, it can be used in some network areas

where no routing protocol is in use (e.g., between two autonomous system border

routers [ASBRs]).

Scalability impact: As opposed to IGP hello, BFD has been designed to require

a very limited processing overhead with the objective to quickly detect a forwarding

plane failure. Moreover, distributed implementation drastically improves the over-

all scalability in environments in which the number of neighbors is quite large.

Layer 2 versus Hello-Based Failure Detection Mechanisms

As already pointed out, routing protocol fast hello mechanisms must be used with

care to avoid scalability impact. Moreover, the failure detection time is likely to be

significantly higher than with layer 2 link failure notification, although it might

be acceptable in some cases.

That being said, there are also some situations in which a combination of both

failure detection mechanisms may be required; for instance, some failures like



a forwarding plane failure (e.g., a line-card processor failure) may require some

hello-based mechanisms because the layer 2 will not fail in this case, so even if a

layer 2 detection failure mechanism is available, it might not be sufficient and

should be complemented by hello-based mechanisms.

Which hello mechanism should be selected is usually a quite difficult question.

As mentioned earlier, there are multiple fast hello mechanisms: IGP, BFD, and

RSVP.39 Clearly, IGP hellos (although implementations have been considerably

optimized over the last couple of years) have not been designed to run at very

high frequency because the processing of IGP hello is not a cost-free operation.

Hence, there are no general rules that can be derived. Each method has some pros

and cons and should be selected based on the set of objectives. For instance, if the

convergence time objective has an order of magnitude of several seconds, then being

able to detect the failure in a few tens of milliseconds is not a requirement and

tuning the IGP hello is an appealing approach that does not require deploying

additional failure detection mechanisms. If more stringent failure detection times

are required and the layer 2 does not provide any fast failure detection mechanism,

then BFD is certainly an interesting candidate (the choice will depend on the

implementation of the protocol in the router, the network design, and the required

failure detection times).

4.3.3 Failure Characterization

This section deals with the failure characterization and why distinguishing a link

from a node failure is not always straightforward.

Consider three possible failure scenarios:

1. A node and its attached interfaces fail: Example: a power supply failure. In

this case, all its neighbors will detect the failure because the set of attached

links will also fail.

2. A node fails but its attached interfaces do not: Example: the node control

plane fails (RP) but the platform is distributed and line cards keep forward-

ing the traffic.

3. A link between two nodes fails: Example: The line-card interface of one of

two interconnected routers fails, or the link fails because of a fiber cut, for

instance.

We saw in the previous section that the link failure detection mechanism can either

be the underlying layer 1 or layer 2 failure detection (i.e., SONET-SDH, optical

layer) or some hello-based protocol (such as BFD or IGP hellos).

Clearly a neighboring router cannot make any distinction between the failure

scenarios 1 and 3 listed above because in both cases the failure detection is a link

failure that makes current existing mechanisms incapable of differentiating a link

from a node failure in some circumstances.

39RSVP hello is a Multi-Protocol Layer Switching–specific fast hello mechanism, which is covered in

Chapter 5.



4.3.4 Analysis of the Various Failure Types and Their Impacton Traffic Forwarding

As previously mentioned, there is a large set of possible failures that can occur in a

network. This section proposes to analyze the impact on traffic forwarding for each

failure profile listed earlier in this section and the set of failure detection mecha-

nisms that can be used to detect those failures.

Link Failure

Link failures always affect the data traffic until an alternate path is found and data

traffic is rerouted over some alternate paths.

Node Failure

As previously mentioned, there are multiple possible causes of node failures, and

their nature has a different impact on the forwarded traffic.

1. Power supply outage: A router power supply outage (in the absence of power

supply redundancy mechanisms) provokes both a control and a forwarding

plane failure, so the traffic is black-holed until it is rerouted over an alternate

path.

2. RP failure: The impact on traffic forwarding depends on the router archi-

tecture. On some centralized platform architectures, an RP failure usually

implies a failure of both the control plane (routing, signaling, and manage-

ment) and the data plane (packets sent to the failed router are simply

dropped). That being said, there are also some centralized platform archi-

tectures in which the control plane functions are separated from the data

plane. By contrast, on some distributed platform architectures, the RP is

responsible only for the control plane; packets transit through the router via

line cards without being in the path of the RP. Therefore, an RP failure does

not affect the data plane, and packets are still forwarded by the router; this is

just the control plane that fails. The expected behavior in this case is the

following: After some period,40 the IGP adjacency will go down, the IGP

neighbors of the failing routers will flood an updated LSA (router rink LSA

for OSPF) or LSP (for IS-IS) and the normal IGP rerouting operations will

occur.

3. Software failure: The impact of a software failure on forwarded traffic is

highly coupled to the nature of the software failure and the system architec-

ture, which can vary from the simple generation of a warning message

followed by an automatic recovery (via restorable module) handled by the

OS to a situation in which the router is completely affected and can no

longer recover from the failure, which might require a complete reinitializa-

tion. In the latter case, the traffic is black-holed until the control plane of the

40This period depends on the IGP timer’s configuration.



router’s neighbors detect the node failure. Note that OSs with a modular

architecture usually allow limiting the failure to the software component

that actually failed; consequently, the failure has a limited scoped and the

failed software component can be restarted independently of the other

modules.

4. Planned node failure: Because by definition the failure is ‘‘planned,’’ various

actions can be taken before performing the upgrade so that the traffic may

be gracefully rerouted around the node. Various methods can be used

to meet that goal: For instance, the link costs to every adjacent node can

be manually increased so the router will be smoothly excluded from

the shortest path of other routers in the network. Another method consists

of the node to be upgraded in originating a new IGP OSPF LSA or IS-IS

LSP explicitly indicating that the node should be avoided in the SPT

computation.41

In both cases, the consequences are that the other routers in the network will

smoothly reroute the traffic around the node in question. The node to be upgraded

will no longer carry any transiting traffic and could be safely upgraded without

risking traffic disruption. It is worth mentioning that some software and hardware

architectures support ‘‘hitless’’ software and hardware upgrades without requiring

any of the actions mentioned earlier.

4.4 Dampening Algorithms

This section covers the important notion of dampening. Stability is an important

property of recovery mechanisms and should always be carefully considered when

trying to achieve fast convergence. Fast converge implies to react quickly to network

changes. In the case of network instabilities (e.g., because a network element

experiences state changes at a high frequency), the overall recovery mechanism

should not overreact. This could have some very undesirable side effects in the

network instigating other failures that could themselves potentially create a snow-

ball effect. Moreover, if a resource ‘‘flaps’’ (i.e., a link or node constantly goes up

and down) and the traffic is systematically rerouted through the flapping resource

(once restored), the traffic will experience multiple consecutive failures. This can be

highly undesirable because multiple failures can be even worse than fewer longer

failures. One solution is to implement some mechanisms to dampen the revertive

process.

One of the virtues of dampening mechanisms is to preserve network stability

under unstable network conditions. The basic principle of dampening is to slow

down the effect of network instability. This can be achieved by means of various

algorithms at different layers and can be deployed at various locations of the

network.

41For example, with IS-IS a specific bit called the overload bit is set in the LSP.



Three dampening algorithms are described below:

1. Up-state timer: When an interface (or more generally a network resource)

fails, it is considered down immediately (unless a ‘‘hold-off’’ time is imple-

mented). Then as the interface flaps, it is no longer considered operational

(in an ‘‘up’’ state) until the interface is operational for a fixed period (value

of the ‘‘up’’ timer) (Figure 4.9). This kind of algorithm has frequently been

used in SONET/SDH networks.

2. Interface dampening using an exponential decay algorithm: With the expo-

nential decay algorithm, when a link goes down the interface change is

immediately reflected to the routing protocol that triggers the appropriate

action. If the interface starts flapping (goes ‘‘up’’ and ‘‘down’’), the interface

starts to accumulate penalties until a threshold is reached. Then the router

starts considering the interface down even if the interface recovers (goes in

‘‘up’’ state). To be considered ‘‘up’’ again the amount of accumulated

penalties needs to decrease according to an exponential curve until a second

threshold is reached. This guarantees that an unstable interface is no longer

advertised as ‘‘up’’ under unstable conditions (Figure 4.10).

Let us now define more precisely the set of parameters used by such a

dampening algorithm:

Suppress and reuse thresholds: When the number of accumulated penalties

exceeds the ‘‘suppress’’ threshold, the router dampens the Facility (e.g., an

interface) (starts considering it ‘‘down’’) until the number of penalties de-

creases and crosses the ‘‘reuse’’ threshold (see Figure 4.10 for an illustra-

tion).

Half-time period: This parameter controls the speed at which the number of

penalties decays exponentially. When an interface, for instance, is put in

‘‘dampened’’ mode because the number of accumulated penalties has

crossed the suppress threshold, if it stops flapping, the number of penalties

Up State TimerActual InterfaceState

TimeDown

Up

TimeDown

Up

Advertised Stateof the Interface

T1

Figure 4.9 Illustration of up-state timer dampening algorithm.


4.4 Dampening Algorithms 227

is reduced by half after each half-time has elapsed. Of course, if the interface

continues to flap, the penalties get increased.

Maximum suppress time: This represents the maximum amount of time an

interface can stay in dampened mode.

The value of this parameter, the reuse threshold, and the half-time period

gives the maximum number of penalties that an interface can accumulate.

3. Exponential back-off algorithm: The exponential back-off algorithm has

been implemented by some router vendors for both the LSA/LSP propaga-

tion and the SPF computation trigger (see Sections 4.5 and 4.6).

Let us consider the example of LSA origination. Without any particular

measure, each time a router connectivity state changes, the router generates

a new LSA. This not only generates flooding in the network but also

triggers an SPF and routing table computation on each node of the area.

Moreover, this can potentially generate extra work on the ABR for OSPF

(generating some new summary routes, ASBR summary) and L1L2 routers

for IS-IS. To achieve fast convergence, the first time a link fails, it is highly

desirable to trigger an LSA origination so that every router in the network

can quickly recompute a new routing table. On the other hand, when the

link flaps, a desirable behavior would be to slow down the generation of new

LSAs, which is achieved using the algorithm detailed below. Later in this

chapter we will explain that it may also de desirable to dampen the SPF

execution on the routers upon receiving successive LSAs reporting network

changes.

Exponential DecayActual Interface State

TimeDown

Up

Time

Penalties

Reuse Threshold

Max

Suppress Threshold

TimeDown

Up

Advertised Interface State

Figure 4.10 Illustration of interface dampening.



Description of the exponential back-off algorithm: The following parameters

are defined:

. X: initial time before declaring the link down after the first failure has

occurred

. Y: time before declaring the link down after the second failure

. Z: max time

When the link goes down the first time, the router waits X milliseconds

before advertising a new LSA. Then, if a second state change occurs, the

router now waits Y milliseconds. If the link keeps on flapping, 2 *Y has

to elapse before a new LSA is originated, then 4 * Y until a maximum of Z.

The LSA origination timer is reset to the original value if no state change

occurs during a period of time equal to 2 *Z (Figure 4.11).

Note that there are various locations where the dampening process can be

implemented on a network. Indeed, dampening can be deployed at the interface

level: As the link starts flapping, the state of the interface is entirely controlled by

some process in charge of managing the interface state, underneath the IGP layer,

which uses dampening algorithm. Another approach is for the process in charge of

managing the interface state to transparently reflect the interface state to the IGP

and let the IGP take care of the dampening algorithm. Those two approaches are

not exclusive from each other.

4.5 FIS Propagation (LSAOrigination and Flooding)

As briefly described in Section 4.2, every router detecting a topology state change

(e.g., a network element failure) will trigger the sending of a new LSA, a process

called the LSA origination. Then upon receiving a new LSA, a router will flood it

to each of its neighbors, known as the flooding procedure. Before detailing the

Exponential Back-Off TimerActual Interface State

TimeDown

Up

Time

LSA Generation

X Y 2*Y 2*Z

Figure 4.11 Illustration of exponential back-off timer algorithm.


4.5 FIS Propagation (LSA Origination and Flooding) 229

origination and flooding mechanism, it is worth mentioning several interesting

aspects of LSA flooding:

LSA flooding is reliable: Every LSA sent to a neighbor must be acknowledged.

If not acknowledged after some period, the LSA is retransmitted, making the

process of LSA flooding is reliable.

Two-way connectivity check42: When a link fails, the two routers interconnected

to the failed link will originate a new LSA to report the loss of adjacency over

that link. So two new LSAs will be flooded throughout the network. Of course,

the two LSAs will not be simultaneously received by the other routers in the

network and the timing sequence will vary based on the LSA originating

routers, parameters setting, and the network topology, to mention a few

parameters. So a two-way connectivity check procedure is performed during

the SPF operation (described hereafter) so a link is considered operational if the

link is reported in the two LSAs of the two adjacent routers. In other words, if a

link L between routers X and Y is just reported in the LSA originated by X but

not by Y, the link is not taken into account in the path computation. The

receipt of just one LSA is sufficient for the link to be considered ‘‘down’’ in case

of link failure.

LSA origination triggers and frequency: A new LSA is always originated when

one of the following events occurs: refresh, local IP prefix change, local con-

nectivity change.

Number of flooded LSAs: When a link fails, in a full mesh network topology,

the number of flooded LSA is O(n2); indeed, when a link fails, the LSA will be

sent to n neighbors that will themselves reflood the LSA to n neighbors. In the

case of a node failure in a fully meshed network, the number of flooded LSA is

O(n3) because a node failure corresponds to the failure of n links and for a link

failure O(n2) LSAs are flooded. Note that this really corresponds to a worst

case and the number of flooded LSAs does not cause any problem in practice;

the number of flooded LSAs received by any node will be much smaller. That

being said, the LSA flooding relies on the flooding of a new received LSA to all

the interfaces (except the interface the new LSA has been received from43),

which can be significant if the degree of connectivity (number of neighbors) is

large.

An obvious case where such a flooding procedure may be quite inefficient is

when two routers X and Y are interconnected via multiple links L1, L2, . . . , Ln. If

X receives a new LSA from another neighbor Z, after having stored a copy of that

LSA in its local LSDB, it will flood n copies of the same LSA to its neighbor Y (one

per link L1, . . . , Ln), which has the following consequences: Although Y will just

install one copy of that new LSA in its LSDB, some bandwidth will be unnecessarily

42Note that the two-way connectivity check may be disabled in some very specific cases.43There is just one exception to that rule in the case of OSPF for a designated router on a local area

network that does reflood a new LSA received by a neighbor to every other neighbor using the same

interface (the local area network interface).



consumed. Moreover, this will also consume CPU on both X and Y. Finally, upon

receiving the first copy of the new LSA on a link Li, Y will acknowledge it and will

retransmit the LSA over all the n-1 other links back to X. Some ideas have been

proposed in [LSA-FLOOD2] to modify the flooding procedure from a per-link to a

per-neighbor basis, while still preserving the compatibility with existing procedures.

Because every router relies on the receipt of a new LSA to perform traffic

recovery, it is of the utmost of importance to prioritize the LSA origination and

flooding mechanism to reduce the total rerouting time.

4.5.1 LSA Origination Process

In this section, we will see the various events that trigger the origination of a new

LSA.

1. Link failure: When a link L between two routers X and Y fails, both X and

Y will detect the failure after the link failure detection time. This triggers the

origination of new LSAs.

2. Node failure: When a router detects the failure of one of its neighbors, this

also triggers the origination of a new LSA. Note that with a link failure, a

router does not have the ability to determine whether the failure is a link or a

router failure. Multiple mechanisms44 can be used to tell a link from a node

failure, but in general link state protocols just report a loss of adjacency with

the neighbor, which can be provoked by the detection of layer 2 link failure

or an IGP hello timeout.

3. IP prefix reachability change: For instance, let’s suppose that an IP address

is added (or deleted) on a router. This also triggers a new LSA flooding.

Another typical example is the case of ABR routers (routers that participate

to more than one area). Those routers are called ABR routers in OSPF (they

interconnect multiple OSPF areas) and L1L245 routers in IS-IS (they inter-

connect multiple IS-IS levels). A topology change in one area/level can result

in a reachability change. Indeed, some prefixes might be no longer reachable,

new prefixes might now be reachable, or the metrics might have changed. As

already mentioned, IS-IS and OSPF are slightly different in terms of LSA

format. In the case of an ABR, for instance, OSPF will flood one LSA (type

3 for interarea) for each interarea route change while IS-IS will flood an

entire new LSP, which contains all the data related to the L1L2 router.

4. LSA refresh: Link protocols periodically refresh LSA. Each router

maintains a timer that triggers the refresh of its locally originated LSA

upon a timer expiration. The LSA sequence number is incremented and

44Some heuristics can be used in some particular network scenarios. For instance, let’s suppose that two

routers are interconnected via multiple links. If just one link fails, this is probably due to a simple link

failure and not a node failure. Other mechanisms could be used (e.g., sending a keep-alive message to the

node via an alternate diverse path via some tunneling mechanism).45L1L2 stands for level 1–level 2.



the LSA is originated with the new sequence number. Note that this oper-

ation is performed regardless of whether the LSA content has changed and

guarantees that any router that could have experienced a locally corrupted

LSDB will receive a new copy of that LSA on a regular basis. In OSPF, this

timer is called the LSRefreshTime and is an architectural constant, which is

set to 30 mn and cannot be changed. Some proposals have been made to

overcome this limitation (see [LSA-FLOOD1]). By contrast, the correspond-

ing IS-IS timer (called refresh time) is configurable and can have a maximum

value of more than 18 hours (large IS-IS networks usually set the refresh

timer to this maximum value to reduce unnecessary flooding).

5. Configuration changes: For example, a link metric change.

Hence, once an LSA origination event occurs, the router builds the new LSA

and floods it to each of its neighbors. Bear in mind that each LSA is flooded

throughout the network area (or the whole network for some types of LSA with

OSPF) and possibly triggers routing table computation on every router in the

network (if the LSA is a truly ‘‘new’’ LSA; i.e., if its content has changed and this

LSA is not just a refresh).

One can easily realize that LSA origination can have a significant impact on the

network if the frequency is too high, which could occur in the case of network

instabilities if no prevention mechanisms were taken. Let’s consider the case of a link

whose state keeps changing because of some failed component (usually referred to

as a link flap). If at each link state change a new LSA is originated, the number

of flooded LSAs and SPF computations triggered on each node could provoke

a network collapse or could just needlessly hog the routers’ CPU and consume

network bandwidth. To prevent such an undesirable behavior, dampening mechan-

isms can be used by link state protocol to control the LSA origination process

(and the SPF triggering process as mentioned in Section 4.6). Various existing

implementations use an exponential back-off algorithm as described in Section 4.3.

Example with IS-IS on a Cisco router:

Router isis

...

lsp-gen-interval A B C /* Line of command referring to theLSP propagation tuning */

...

The algorithm used strictly corresponds to the exponential back-off algorithm

described in Section 4.4. Note that on a Cisco router, the variables A, B, and C

correspond to the variables Z, X, and Y, respectively, described in the exponential

back-off algorithm in Section 4.4.

As already pointed out, such link state LSA dampening algorithms can be used

in conjunction with interface dampening that could itself use other dampening

algorithms (e.g., the exponential decay algorithm described in Section 4.3).

So if the link flaps, when the state of the link first changes, the router waits for

B milliseconds. When the link goes up again or another attached link fails,



the router then waits for C milliseconds. Then, the time between every LSA

origination exponentially increases up to the maximum value of A seconds.

The router returns to the previous behavior if no LSA origination triggers

occur during 2 * A seconds. This efficient algorithm allows for quick reaction to

failures while protecting the network from an LSA flooding storm in the case of

instability.

Parameters Tuning

How those parameters should be set highly depends on the network characteristics

and the convergence objectives. A case study is proposed in Section 4.6.

4.5.2 LSA Flooding Process

When an LSA is flooded, at each hop, multiple operations must be performed,

which all participate in the overall flooding delay:

1. The LSA processing: The receiving node performs a set of operations to

decide whether the LSA must be flooded.

2. The LSA queuing: If the LSA must be flooded, the router must transmit the

LSA on each appropriate interface.

3. Propagation delay: Time for the LSA to travel from node to node.

LSA Processing

Once a new LSA is originated, it is flooded to every neighbor. When a router

receives an LSA, it first checks whether the LSA is present in its LSDB. If the

LSA is present, the router verifies whether the LSA is an older or a newer version

(this is done by checking the sequence number; if the sequence number is identical,

the checksum field is checked to determine whether the LSA is more recent).

Multiple scenarios can occur:

. If the received LSA is older than the local copy, the LSA is acknowledged,

the local (newer) copy is sent back to the sending node, and the flooding

procedure stops.

. If the LSA is neither newer nor older (same sequence number), the router just

acknowledges the LSA receipt to its neighbor (because flooding is a reliable

mechanism, LSA receipts are always acknowledged) and does not flood the

LSA to any other neighbor.

. If the LSA is newer, an acknowledgment is sent to the sending neighbor, the

LSA is stored in the local LSDB and a copy of the LSA is flooded to every

neighbor. Then a new route computation is triggered (see Section 4.6 for

more details).

It is worth noting that a router does not flood an LSA that is not newer than

the local copy of that LSA in its LSDB; this prevents it from creating infinite

forwarding loops of LSAs!



Another important fact to underscore is related to the prioritization of the

flooding operation over the SPF computation. Indeed, upon receiving a newer LSA,

the router learns that it must perform a new route computation. So an implemen-

tation could decide to first update its routing table and then flood the LSA to other

neighbors. This would undoubtedly have a negative impact on the overall network

convergence so an efficient implementation should always flood the new LSA

before triggering a routing computation to make sure that the LSA (and so the

fault indication) is propagated as quickly as possible to every other node in the

network. Note there are a few specific cases where triggering SPF before flooding

might be more optimal, but in general it is preferable to first flood the new LSA and

then trigger an SPF.

Queuing Delays

When an LSA is transmitted over an interface, it potentially competes with other

data packets. Hence, depending on the interface congestion level, the LSA packet

may experience some nonnegligible delays if no particular measures are taken to

provide the required quality of service (QoS). This is especially important during

network failure where the level of congestion is likely to increase because of some

rerouted flows.

The requirement for QoS mechanisms is also dependent of the network design.

For instance, if the network is highly overprovisioned (the links are not congested

even during network failures), then the queuing delays are probably negligible even

without any particular QoS mechanism (in this case, every packet is indifferently

queued in a first-in first-out (FIFO)46 queue without any particular discrimination).

On the other hand, if a link might be congested (even temporarily), appropriate

QoS mechanisms should be provided to appropriately handle IGP control plane

packets in particular the LSA packets.47

More precisely, this usually implies to put in place the following components:

1. Packet marking: each differentiated services (DS) header48 field of an

IP packet is marked with a particular Diffserv code point (DSCP) value

(see [RFC2474]). In a nutshell, this consists of ‘‘coloring’’ packets so that

every router processing the packet can recognize the class of traffic the

packet belongs to in order to provide the appropriate treatment in terms

of QoS. Usually, the task of packet marking is performed at the edge of the

network for the user traffic. In the case of an IGP control plane packet, each

router originating an IGP packet is responsible for marking the IGP packets

with the appropriate DSCP. Note that this only applies to OSPF because IS-

IS uses connectionless network service (CLNS). That being said, an IS-IS

46A FIFO queue is such that packets are serviced in the order of their arrival.47Note that generally speaking the control plane packet should receive the appropriate QoS. An obvious

example is IGP hello packets. If a link congestion is so high that a hello packet is unacceptably delayed,

this can lead to undesirable loss of adjacencies and network instability.48In IP version 4, the DS field is the type-of-service field (ToS).



implementation should appropriately treat the IS-IS packets using internal

prioritization mechanisms.

2. Packet scheduling: When a router has to send packets out of an interface, it

can use the DSCP (‘‘color’’) to provide the appropriate QoS to the packet,

which in practice, means two things:

. Queue the packet based on its DSCP.

. Potentially use congestion avoidance mechanisms like random early

detection (RED).

Queue the Packet Based on Its DSCP

Critical packets will be queued in a high priority queue, which will guarantee that

those packets receive enough bandwidth and experience minimal jitter and loss.

Multiple queuing systems are available in modern routers. For instance, the

queuing system can be made of N queues where each queue Ni gets some fraction

of the link speed bandwidth (also usually referred to as a weight). Optionally

a maximum queue length can also be specified for each queue. Then a local policy

is defined that specifies the set of DSCP (‘‘colors’’) that match the queue. Sophisti-

cated queuing systems also provide the ability to provision preemptive queue(s) that

are always served before any other queue in the system; the queuing scheduler

serves each queue proportionally to its weight, but each time a packet is dequeued

from a queue Ni, it examines whether there are packets waiting in the priority

queue. If so, the packets in the priority queue are immediately served.

Consequently, such a priority (also called preemptive) mechanism allows offering

minimal queuing delays to the packets queued in the priority queue.

Two interesting comments can be made at this point:

. To guarantee minimal queuing delay and jitter, the proportion of high-

priority traffic should be kept below some threshold with respect to the total

amount of traffic. To illustrate this aspect, the aim of queuing is about

precedence so if all the packets have a high priority, then the notion of

priority no longer makes sense. Various papers have been written on this

topic in order to determine what this threshold value should be to bound

both the queuing delay and the jitter, but the numbers vary based on the

traffic profile and the level of conservatism. Usually, values vary between

20% and 30%. Note that such values should take into account the situation

of failures provided that the objective is to also guarantee the QoS in the

case of a network element failure. The type of queuing system also plays an

important role here. The weights assigned to the delay-sensitive traffic

queue should be determined based on the QoS objectives at steady state

and during failure. On the other hand, if the sensitive traffic is served by

a preemptive queue, it will always receive the best treatment in steady state

and during failure.

. One possible side effect of preemptive systems is the well-known phenom-

ena of famine. Indeed, a preemptive queuing system may lead to the

undesirable effect for the low-priority queues to receive a very poor amount



of bandwidth, if any. This is particularly true with hierarchical queuing

systems, in which multiple queues are served in a hierarchical preemptive

mode of operation. Note that the proportion of high-priority traffic is

usually kept under some threshold for the reasons mentioned earlier so

the risk of famine is extremely limited. Furthermore, some queuing systems

provide additional mechanisms (sometimes called rate limiters) that allow

limiting the amount of bandwidth allocated to the priority queue.

What if a queue Ni does not use its bandwidth allocation? Suppose that a queue Ni

has been configured to get a percentage of the link capacity and does not use it just

because the traffic that matches that queue is not sufficient to use the configured

percentage of the link speed. By contrast with time division multiplexing (TDM)

systems, the bandwidth is not wasted and is usually redistributed to other queues

proportionally to their respective weight.

Congestion Avoidance Mechanisms

Congestion avoidance mechanisms like RED can be used to perform selective packet

discard upon queue congestion. The idea of congestion avoidance algorithms is the

following: transport protocol algorithms like Transport Control Protocol (TCP)

react to packet loss by reducing the rate at which they send packets. As a queue

grows, packets experience higher delays, which are taken into account by TCP to

dynamically increase their timer that determine the amount of time the sender TCP

client waits before retransmitting a nonacknowledged packet. When the queue

reaches its maximum, all the packets are dropped, which results in generating an

‘‘overreaction’’ of each TCP sender sending traffic over the congested link that

suddenly drastically reduced their packet sending rate. This has the undesirable

effect of provoking some traffic oscillation, in which a link gets congested, then

utilization drops (because all the TCP senders drastically reduce their packet sending

rate), then the link utilization increases again, and so forth. So the idea of congestion

avoidance algorithms like RED is to use a probabilistic dropping mechanism that is a

function of the average queue size (that uses a low-pass filter) and starts dropping

packets when the average queue size starts to cross some threshold. When the

average queue size exceeds some maximum value, all the packets are dropped.

RED allows having a progressive effect: The number of TCP senders affected gets

smoothly increased as the average queue size grows. A variant of RED (called

weighted RED [WRED]) has been proposed to provide higher granularity per traffic

profile (this allows to have different thresholds and discard rates for different DSCPs

in a queue). Note that other variants of RED have also been elaborated like flow

RED (FRED), but RED and WRED are the most commonly used congestion

avoidance mechanisms. For more details on the subject, see [COMP-NETWORKS],

[DIFFSERV-DEPLOY], [RED], [WRED] and [FRED].

Typically, an IP network must always be designed so that IGP control packets

do not suffer from network congestion (i.e., experience low delay and no drop).

Note that providing an appropriate QoS to IGP packets not only implies the

handling of the packets to be processed at the output interface level but also



internally to the router such that the internal processing of packets between inter-

faces is tuned to achieve the required level of QoS.

Propagation Delay

Once the IGP packet is serviced by the router queuing system, the LSA experiences

some incompressible propagation delay before reaching the next-hop router. At first

sight, the propagation delay might appear negligible, but this might not be the case.

A rule of thumb is to consider that the propagation delay is 5ms per 1000 kilo-

meters (km) of fiber. Because the IP and optical layers are generally not congruent,

it is not rare to see propagation delays that are significantly higher than the

expected distance between two IP routers.

To illustrate this aspect, consider a sparse optical network providing lambdas

to an IP network. Although the geographical distance between a router in New

York and a router in Los Angeles could be estimated to 5000 km (hence, a propa-

gation delay of 25 ms), the path followed by the lambda interconnecting those two

routers may be significantly higher. Note that the usual one-way propagation delay

between two routers in the United States, for instance, rarely exceeds 30 ms. The

propagation delay can of course be much smaller for a domestic network in small

countries/states, for instance, and sometimes higher for an international network in

which the optical paths can be quite long.

4.5.3 Time Estimate for the LSA Origination and Flooding Process

To originate a new LSA, the IGP process must first obtain CPU resources, then run

for some period to build the new LSA and then send it to every neighbor. The first

component is directly dependent of the router OS. The router OS must perform

various tasks and has a scheduler that is responsible for the CPU time sharing

among the various tasks/processes that require the CPU. This is similar to any OS

run on computer systems. In the case of a router, the OS must ensure that the IGP

process can quickly get access to the CPU to process the LSA origination, once

required.

Generally, on modern router architectures, the LSA origination time rarely

exceeds a few tens of milliseconds and can even be less than 10 ms; this includes

both the waiting time to get the CPU and the LSA origination process run time.

The same reasoning applies to LSA flooding. The router OS must also ensure

that the process of LSA flooding will get an appropriate treatment to make sure

that LSA flooding will not be delayed. Again, on modern router architectures, the

LSA flooding time rarely exceeds a few tens of milliseconds.

4.6 Route Computation

This section deals with all the aspects related to the computation of a shortest path

between a router and every reachable IP prefixes. Route computation in an IP


4.6 Route Computation 237

network is a key component of the overall network recovery time as with any

recovery mechanism.

4.6.1 Shortest Path Computation

Notion of Shortest Path

As previously mentioned, a link state protocol allows each node in the network to

dynamically learn the network topology (stored in an LSDB), which is used

to compute the shortest path(s) to each reachable node in the network and therefore

to each IP prefix that is advertised.

Before describing how shortest path can be computed, the notion of shortest path

must first be explored. A path cost is defined as the sum of the link metrics traversed

along the path where a link metric is an integer defined by the network administrator

that can reflect different link attributes depending on the overall objective.

For example, a common practice is for the link metric to reflect the link speed

(link bandwidth). So the metric is inversely proportional to the link speed. For

example, if an OC192 link has a metric of 1, an OC48 link will have a metric of 4,

whereas an OC3 link will have a metric of 64. Although this approach has been

widely used in many networks, with the apparition of QoS-sensitive application,

another scheme has also been adopted, which consists of reflecting the link propa-

gation delay. Note that such a scheme is particularly suited in networks with

homogenous link speeds but a wide range of propagation delays.

In the former case (metric is a function of link speed), the shortest path between

two routers is the path that follows the links with the higher link speed, which

allows to better traffic engineer the network flows because large pipes will attract

more traffic than small pipes. In the latter case (metric is a function of propagation

delay), the shortest path from a node A to a node B is the path that minimizes the

total propagation delay across the network. Some networks use a hybrid approach

in which the link metric is a polynomial function of multiple attributes like the link

speed and the propagation delay.

From the shortest path computation perspective, the objective is still to com-

pute the shortest path between two nodes taking into account the link metrics.

The Notion of Multitopology Routing

The idea of being able to use multiple metrics is not really new and was introduced

some time ago in both OSPF and IS-IS. This notion has been extended to the

concept of multitopology routing in which multiple virtual topologies can be

derived from a single physical network (with the ability to assign multiple metrics

to each link).

In the case of IS-IS, some extensions have been proposed [M-ISIS] to maintain

separate topologies (called multiple topologies [MTs]) per protocol49 (IP versions 4

and 6, Multicast); even for a protocol like IP version 4, separate topologies could be

49Protocol usually referred to as address family is this context.



maintained. In a nutshell, when forming adjacencies over a link, routers exchange

the set of MTs the link belongs to (IS-IS hello packets [IIH] are used for that

purpose). IS-IS LSPs origination and flooding is unchanged, but new MT type

length values (TLV) are carried within IS-IS LSP to flood the multiple MTs.

In terms of path computation, one SPF is performed per MT and the corre-

sponding routing table is populated. So the failure of a link being part of multiple

MT would trigger the computation of multiple SPFs (one per MT the link belongs

to), but the flooding remains unchanged.

Several examples are provided below that illustrate the concept of MTs in

various situations.

Example 1: M-ISIS used for different protocol (address family): IP version 4

and 6 (Figure 4.12). This first example is illustrated in Figure 4.12, where some

links belong to either the IP version 4 or IP version 6 topology and some other

links belong to both.

Example 2: M-ISIS used for multiple topologies of the same address family (IP

version 4) (Figure 4.13). In this second example depicted in Figure 4.13, a

network administrator is interested in running multiple IP version 4 topologies

(with multiple metrics, each metric reflecting a particular constraint). For

instance, in MT1 the link metric refers to the bandwidth, whereas in MT2,

A

I

GE

H

F

DB

3

4

2 5

2 3

4

5

31

4

Metric used for MT2 (IPv6)5

2

5

1

4 Metric used for MT1 (IPv4)3

Physical Topology

4

7

9

5

7

1

9

4

5

2

A

I

GE

H

F

DB

3

4

2 5

2 3

4

5

31

4

25

1

A

I

GE

H

F

DB

34

7

9

5

7

1

9

4

5

2

MT2 (IPv6)

MT1 (IPv4)

Figure 4.12 M-ISIS used for different protocols.



the link metric is a function of the propagation delay. Note that some links may

be excluded from an MT (a satellite link with long propagation delay could be

excluded from MT2).

An interesting aspect to highlight is related to the forwarding: How does the

router determine the routing table to consult in order to route an IP packet? In the first

example, there is one MT per address family: For instance an IP version 4 packet

will be routed using the IP version 4 routing table (MT1); hence, the address family

determines the MT and thus the appropriate routing table. On the other hand, if

several MTs belong to the same address family, there are multiple cases to consider:

Situation 1: The MTs are fully disjoint (an interface cannot belong to more than

one address family). Packets are received from an interface that unambiguously

determines their MT.

Situation 2: The MTs of the same address family share some interfaces, but the

addresses do not overlap, then the router, by determining the destination

address, can determine the MT to which the packet belongs.

Situation 3: The MTs belong to the same address family and share some

interfaces with overlapping address. This corresponds to the example depicted

in Figure 4.13 in which some links belong to MT1 and MT2 and the address

spaces overlap. Then in this case, some additional mechanisms are required. A

typical solution is to use the DSCP that identifies in the IP header the required

A

I

GE

H

F

DB

3

2

2 6

2 3

4

5

35

4

Metric used for MT2 (LinkMetric = Propagation Delay)

5

5

5

1

4 Metric used for MT1 (LinkMetric = Bandwidth)

3

Physical Topology

4

10

5

7

1

1

4

5

2

A

I

GE

H

F

DB

3

2

2

2 35

1

A

I

GE

H

F

DB

3

4

5

1

1

2

SP2 - MT2 (IPv4 – Propagation Delay Metric Based)SP1 - MT1 (IPv4 – bw Metric Based)

3

Data( MT1)

Voice (MT2)

3Data( MT1) Voice (MT2)

9

Figure 4.13 Multitopology routing: Example 2.



QoS. In this case, each MT is reserved for a particular DSCP, and based on the

DSCP the appropriate routing table is consulted. This way, voice packets

marked, for instance, with the DSCP value of 5 will be routed using MT2,

which computes the shortest path based on the propagation delay metric.

Referring to our previous example depicted in Figure 4.13, if two packets enter

node A and have node D as their IP destination, the data packet (marked with a

specific DSCP value) will be routed along the SPT1 of MT1 and the voice packet

marked with a different DSCP value will follow the SPT2 of MT2.

The case of OSPF is slightly different. The support of IP version 6, for instance,

requires OSPF version 3 ([OSPFv3]), but the capability to support multiple topol-

ogies (with multiple metrics per link) is part of the current protocol specification.

When an adjacency is formed between two neighbors, the link can be advertised

with more than one metric in the corresponding LSA type 1. Also, when interarea

or external routes are advertised in LSAs of type 3 and 5, respectively, multiple

metrics can be associated with each route. The protocol can support up to one

metric per IP ToS value. Then each router computes an SPF per ToS and packets

are routed using the appropriate routing table based on the ToS value of their IP

header. This is equivalent to the example 2 presented earlier. Note that OSPF

routers can be configured to route all the IP packets on the ToS 0 path only.

When routers supporting ToS routing are combined with routers that just support

ToS 0 path, then during the SPF computation, the routers that only support ToS 0

routing should be avoided to route non–ToS 0 IP packets.

4.6.2 The Dijkstra Algorithm

The famous mathematician Edger Dijkstra, a pioneer in computer science, gave his

name to an algorithm allowing the computation of a loop-free shortest path, which

has been used for several decades in a wide range of contexts. The algorithm is

described in detail in this section, but it is quite interesting to read a quote from

Dijkstra related to this invention (from [EWD-1166]):

I designed my first nontrivial algorithms. The algorithm for The Shortest Path

was designed for the purpose of demonstrating the power of the ARMAC at its

official inauguration in 1956, the one for The Shortest Spanning Tree was

designed to minimize the amount of copper in the backpanel wiring of the Xl. In

retrospect, it is revealing that I did not rush to publish these two algorithms: at

that time, discrete algorithms had not yet acquired mathematical

respectability, and there were no suitable journals. Eventually they were

offered in 1959 to ‘‘Numerische Mathematik’’ in an effort of helping that new

journal to establish itself. For many years, and in wide circles, The Shortest

Path has been the main pillar for my name and fame, and then it is a strange

thought that it was designed without pencil and paper, while I had a cup of

coffee with my wife on a sunny cafe terrace in Amsterdam, only designed for a

demo. . . .



Algorithm Description

The Dijkstra algorithm (also referred to as SPF) finds the shortest path from one

source S to any other router in a network with nonnegative arcs (note that solving

the shortest path problem in networks with negative arcs is much more compli-

cated—actually this problem is NP complete; see Section 4.13 for a discussion on

algorithm complexity). In the particular case of IP routing, arcs represent links

whose cost is always positive. The result of the SPF algorithm is the shortest path

tree (SPT), which represents a graph of the set of shortest paths.

Dijkstra Algorithm

Before describing the Dijkstra algorithm, let’s first start with a few definitions. The

network can be represented as a directed50 graph noted G ¼ (N,L), where

N: The set of nodes (routers).

L: The set of links.

S: Source (the computing node).

jLISTj: Number of elements of the list LIST.

n: Number of nodes in the network.

Lij: Link between the node i and the node j.

c(Lij): Cost of the link Lij.

d(i): Current distance between the source S and the node i (sum of the links cost

of every individual link along the shortest path).

d(S) ¼ 0

Three lists are then defined:

REM (REMAINING): list of nodes for which a shortest path has not yet been

found. This list is also called the UNKNOWN list.

PATHS: list of nodes for which a shortest path has been found.

TENT: tentative list. List of nodes for which at least one path (may not be the

shortest path yet) has been found.

Note that jREMj þ jPATHSj ¼ N

Step 1: Initialize the three lists:

PATHS Empty

TENT {S}REM N (all nodes in the network)

While TENT is not empty

Move the node i to PATH such that d(i) ¼ min {d(k) for k 2TENT}

For each neighbor j of node i

If the node j is not already in TENT

Remove j from REM and move to TENT

50Generally, two routers are connected via a bidirectional link. In this case, the link is represented as two

directed arcs, which may or not have the same cost.



Compute d(j) ¼ d(i) þ Lij

Record i as its predecessor

If the node j is already in TENT

Compute d(j) ¼ d(i) þ Lij and update its predecessorif d(j) < current value for d(j)

Compute the next-hops of j

End

A detailed step-by-step example of the Dijkstra algorithm is provided later in this

section.

Dijkstra Algorithm Complexity

Algorithm complexity is undoubtedly a key topic because it has a direct impact on

the required amount of time for an IP router to compute an alternate path. Section

4.13 is devoted to algorithm complexity, but in a nutshell this refers to the number

of operations required by an algorithm to provide an output as a function of the

problem size.

The Dijkstra algorithm complexity can be very easily computed. As covered in

Section 4.13, the algorithm complexity is computed by considering the worst case

complexity of the algorithm. The algorithm actually performs two different sets of

tasks:

. Selection from the TENT list of the next node to move to the PATH list.

. For each node moved to the PATH list, each of its neighbors is moved to

the TENT list and for each of them the distance d(i) þ c(Lij) is computed.

In the worst case, the operation (1) is performed n times (where n is the problem

size; i.e., the number of nodes in the network) and at each step the number of

nodes scanned is n (actually at step k, the maximum number of nodes in the TENT

list is n-k). So the total number of tasks turns out to be: Sum (n-k) for k ¼ 1, . . . ,n ¼ O(n2) (see Section 4.13 for a detailed definition of algorithm complexity). Then

the operation (2) is performed L times, where L is the total number of links in

the network. Hence, the resulting algorithm complexity of the Dijkstra algorithm

is O(n2).

An Example Step by Step

This section provides a detailed example followed step by step of the Dijkstra

algorithm to compute a set of shortest paths. Let’s consider the network depicted

in Figure 4.14.

Initial Step: The following lists are:

PATH¼{}TENT¼{A} (The computing node is A)

REM¼ {B,C,D,E,F,G,H,I} (All the nodes in the network

excluding the root A)



Step 1 The closest node from the root A is A itself because d(A) ¼ 0. So i ¼ A (Figure

4.14).

PATH¼{A}TENT¼ {B, E, H} (All the neighbors of A are moved to the TENTlist51 and their shortest distance from the root is

computed as well as their predecessor noted P(X)).

d(B) ¼ 3(P(B) ¼ A), d(E) ¼ 6 (P(E)¼A), d(H)¼5 (P(H)¼A)REM¼{C, D, F, G, I}

Step 2 The closest node to A belonging to the TENT list is selected: i ¼ B and its neighbors

are moved to the TENT list (new d(i) are also computed) (Figure 4.15).

PATH¼{A,B}TENT¼{E, H, C, F} (All the neighbors of B are moved to the

TENT list and their shortest distance from the root is


d(E) ¼ 6(P(E) ¼ A), d(H) ¼ 5(P(H) ¼ A), d(C) ¼ 7(P(C) ¼ B),d(F) ¼ 5(P(F) ¼ B) (Note that the node E keeps the same

51As already mentioned during the Dijkstra algorithm description, some neighbors may already be in the

TENT list.

Initial Step

PATH = {}TENT = {A}REM = {BCDEFGHI}A

I

GE

H

F

DCB

3

4 3

3

8

8

2 4

2

5

511

4 Indicates d(i) (distance from A)for a node I for which ashortest path has been found (inthe PATH list)

Step 1

Node i = A (d(A)=0)PATH = {A}TENT = {B, E, H}(list of neighbors of node A)d(B) = 3, d(E) = 6, d(H) = 5REM = {CDFGI}

A

I

GE

H

F

DCB

3

4 3

3

2

8

2 4

2

5

511

4

5

28

6

6

0

Figure 4.14 An example of the Dijkstra algorithm.



predecessor since the shortest path from A to E via the

direct link A-E with a cost of 6 is shorter than the path

A-B-E whose cost is equal to 11).REM¼{D, G, I}

Step 3 The closest node to A belonging to the TENT list is selected: i¼H and its neighbors

are moved to the TENT list (new d(i) are also computed).

PATH¼{A,B,H}TENT¼{E, H, C, F, I} (All the neighbors of H are moved to theTENT list and their shortest distance from the root is


d(E) ¼ 6(P(E) ¼ A), d(C) ¼ 7(P(C) ¼ B), d(F) ¼ 5(P(F) ¼ B),

d(I) ¼ 7(P(I) ¼ H) (as in the previous case, the node E

keeps the same predecessor since the shortest path from the

A to E via the direct link A-E with a cost of 6 is shorter than

the path A-H-E whose cost is equal to 9).REM¼{D, G}

Step 4 The closest node to A belonging to the TENT list is selected: i ¼ F and its neighbors

are moved to the TENT list (new d(i) are also computed) (Figure 4.16).

PATH¼{A, B, H, F}

TENT¼{E, C, I, D, G}

A

I

GE

H

F

DCB

3

4 3

3

2

8

2 4

2

5

511

4

Step 3

Node i = H (d(H)=5)PATH = {A,B,H}TENT = {E,C,F,I}d(E) = 6, d(C) = 7, d(F) = 5, d(I) = 7REM = {DG}

A

I

GE

H

F

DCB

3

4 3

3

8

8

2 4

2

5

511

4

Step 2

Node i = B (d(B) = 3<{d(E),d(H)}PATH = {A,B}TENT = {E, H, C, F} (add B’s neighbors)d(E) = 6, d(H) = 5, d(C) = 7, d(F) = 5REM = {DGI}

8

2

6

6

0

3

0

5

3

Figure 4.15 An example of the Dijkstra algorithm (steps 2 and 3).



d(E)¼6(P(E)¼A),d(C)¼6(P(C)¼F),d(I)¼7(P(I)¼F),d(G)¼13 (P(G)¼F), d(D)¼6 (P(D)¼F) (this time, the distance fromA to C is changed the C’s predecessor is updated).

REM¼{}Step 5 The closest node to A belonging to the TENT list is selected: i ¼ E and its neighbors

are moved to the TENT list (new d(i) are also computed). Note that the node C or D

could also have been selected at this stage, but this does not change the resulting SPT.

PATH¼{A, B, H, F, E}

TENT¼{C, I, D, G} (All the neighbors of E are moved to the

TENT list and their shortest distance from the root iscomputed as well as their predecessor noted P(X)); note

that B’s neighbors, A, F, and H, are already in PATH.

d(C)¼6 (P(C)¼F), d(I)¼7 (P(I)¼F), d(G)¼13 (P(G)¼F),d(D)¼6 (P(D)¼F)REM¼{}

Step 6 The closest node to A belonging to the TENT list is selected: i ¼ C. Note that D

could also have been selected at this stage, but this does not change the resulting

SPT (Figure 4.17).

PATH¼{A, B, H, F, E, C}TENT¼{I, D, G} (All the neighbors of C are already in the

TENT list and their shortest distance from the root is

computed as well as their predecessor noted P(X))

Step 4

Node i = F (d(F) = 5)PATH = {A,B,H,F}TENT = {E,C,I,D,G}d(E) = 6, d(C) = 6, d(I) = 7,d(G) = 13, d(D) = 6REM = {}

Step 5

Node i = E (d(E) = 6)PATH = {A,B,H,F,E}TENT = {C,I,D,G}d(C) = 6, d(I) = 7, d(G) = 13, d(D) = 6REM = {}

A

DCB

3

4 3

3

8

8

2 4

2

5

511

4

2

6

5

A

DCB

3

4 3

3

8

8

2 4

2

5

511

4

2

6

5

3

GE F

IH

6 50 GE F

IH

3

0 5




d(I)¼7 (P(I)¼F), d(G)¼13 (P(G)¼F), d(D)¼6 (P(D)¼F)REM¼{}

Step 7 The closest node to A belonging to the TENT list is selected: i ¼ D

PATH¼{A, B, H, F, E, D}

TENT¼{I, G} (All the neighbors of D are already in the TENTor PATH lists and their shortest distance from the root iscomputed and their predecessor noted P(X)); note that the

node C (D’s neighbors) is already in PATH.

d(I)¼7 (P(I)¼F), d(G)¼11 (P(G)¼D) (a new shortest

distance from A to G is computed (11) with D as a

predecessor).

REM¼{}Step 8 The closest node to A belonging to the TENT list is selected: i ¼ I

PATH¼{A, B, H, F, E, D, I}

TENT¼{G} (All the neighbors of I are already in the TENT orPATH lists and their shortest distance from the root is

computed as well as their predecessor noted P(X));

d(G)¼11 (P(G)¼D)REM¼{}

Step 9 The last node in TENT is added to PATH: the node G (Figure 4.18).

Step 6

Node i = C (d(C) = 6)PATH = {A,B,H,F,E,C}TENT = {D,I,G}d(D) = 6, d(I) = 7, d(G) = 11REM = {}

A

I

GE

H

F

DCB

3

4 3

3

8

8

2 4

2

5

511

4

2

6

5

3

Step 7

Node i = D (d(D) = 6)PATH = {A,B,H,F,E,C,D}TENT = {I,G}d(I) = 7, d(G) = 11REM = {}

6

A

I

GE

H

F

DCB

3

4 3

3

8

8

2 4

2

5

511

40

2

6

5

3 66

6 50

56




Some Performance Numbers

As mentioned earlier, a ‘‘naive’’ implementation of the Dijkstra algorithm has a

complexity of O(n2). It is worth mentioning that various optimizations can be

implemented that can drastically reduce the running time of the SPF computation.

Some existing implementations have a complexity in n *Log(n).

Figure 4.19 shows the algorithm complexity as a function of the problem

size (the number of routers) for algorithms having a complexity of O(n2) and

n * Log(n). The two figures show identical functions but with different scales on

the axis.

Of course, the router’s CPU greatly determines the overall computation

time, but to give a rough estimate, existing optimized implementations running

on core routers are able to complete an SPT in a few tens of milliseconds

for networks having hundreds of routers. As already mentioned, the routing

table computation not only requires the computation of the SPT but also the

RIB, this component being nonnegligible in the total routing table computation

time.

A very interesting optimization of the original SPF algorithm, called incremen-

tal SPF, consists of limiting the SPT computation to some part of the tree and is

covered in detail in Section 4.13.

Step 8

Node i = I (d(I) = 7)

PATH = {A,B,H,F,E,D,C,I}

TENT = {G}

d(G) = 11

REM = {}

A

I

GE

H

F

DCB

3

4 3

3 8

2 4

2

5

5

40

6

5

5

3

6

66

7

Step 9 (Final step)

Node i = G (d(G) = 11)

PATH = {A,B,H,F,E,D,C,I,G}

TENT = {} EMPTY

REM = {}

A

I

GE

H

F

DCB

3

4 3

3

8

8

2 4

2

5

51

40

2

6

5

5

3

6

66

7

11

2 1 18

1




4.6.3 Shortest Path Computation Triggers

As already mentioned, an SPF computation must be triggered each time a new LSA

is received, which happens whenever, for instance, a topology change occurs or an

IP prefix is added or deleted locally on a router (in that case, the network topology

does not change, but the router advertises a change of IP address reachability).

Both events provoke the origination of a new LSA, but OSPF and IS-IS handle

the two events differently. Regardless of the event that caused the new LSA to be

originated, OSPF systematically triggers a new SPF computation upon receiving a

new LSA. On the other hand, in some implementations, IS-IS triggers a new SPT

and RIB computation only if the LSA reflects a topology change. If the LSA

reports a new IP prefix reachability information (so there is no topology change),

IS-IS performs a new RIB computation (called a partial route computation [PRC]),

which saves the cost of an SPT computation. This is, for example, the case of the

Cisco IS-IS implementation.

Once a node receives a new LSA and determines that an SPT and/or RIB

computation must be triggered, it might be desirable to delay the computation. An

efficient algorithm to handle delay between the triggering event and the actual task

execution is to use a dampening mechanism (like the exponential back-off algo-

rithm described in Section 4.4), as in the case of LSA origination. This preserves the

Algorithm Complexity as a Function of the Number ofRouters (n)

0

100

200

300

400

500

1 3 5 7 9 11 13 15 17 19

Number of Routers n

Com

plex

ity

nLog(n)

n2

Algorithm Complexity as a Function of the Number ofRouters (n)

0200000400000600000800000

10000001200000

1 115 229 343 457 571 685 799 913

Number of Routers n

Com

plex

ity

nLog(n)

n2

Figure 4.19 Algorithm complexities of n2 and nLog(n).



router’s CPU in the case of network instability while allowing a fast SPF triggering

in the case of limited network changes (which corresponds to the vast majority of

the failure scenarios).

Example on a Cisco router:

Router isis

...

spf-interval A B Cprc-interval A B C

...

Router ospf

...

timers throttle spf A B C

...

The algorithm used corresponds to the exponential back-off algorithm

described in Section 4.4. Just note that on a Cisco router the variables A, B, and

C correspond to the variables Z, X, and Y, respectively.

As with LSA origination, such a dampening algorithm allows a router to

quickly react to a single failure while protecting it from triggering too many SPT

and RIB computations in the case of network instabilities. Indeed, although the

SPT and RIB computation can be fast, it is clearly undesirable to run tens of SPT

and RIB computations back to back if the router keeps on receiving new LSAs

from a flapping router in the network that would not have any LSA dampening

mechanism.

Parameters Tuning

Similarly to the case of LSA origination parameter tuning, the setting on the SPF-

related parameters highly depends on the network characteristics and the rerouting

time objectives. A case study is proposed in Section 4.6.

That being said, a good practice is generally to set up variable A to a short

value to get fast convergence upon a single failure and then rely on variables B and

C to slow down the SPT and RIB computation triggering in the case of network

instability. The case of a network having multiple SRLGs52 is quite interesting

though. Suppose a network with a large amount of SRLGs and in which the

propagation delays between links belonging to common SRLGs are not negligible

(Figure 4.20):

Let’s consider the network depicted in Figure 4.20: Links B-C and C-H belong

to the same SRLG (e.g., these links share a common resource like a fiber). In the

case of SRLG failure (fiber cut), both link B-C and link C-H will simultaneously

fail. Upon the receipt of node B’s LSA, A may want to quickly trigger an SPF to

improve the convergence, and in this particular example, A will likely select the path

A-G-H-C to reach node C (if we suppose that all the links have a cost equal to 1).

52The notion of SRLG has already been covered in Chapter 3 and is examined again in Chapter 5.



Unfortunately, link H-C has also failed, but A will get an accurate updated

topology view after having received node H’s LSA (or node C’s LSA), which may

be delayed because of some propagation delay along the H-G-A path. Then,

a second SPF must be triggered by node A (after the timer Y has expired). So in

such networks, there are multiple strategies to handle such situations. The first

strategy is to slightly increase the value of A, so the SPF is triggered after all the

LSAs have been received. Another approach is to set B to a small value so a second

SPF can be triggered if another LSA quickly arrives that reflects another topology

change (actually there may be a third SPF required to provide the actual topology

because each link failure triggers the origination of two LSAs).

4.6.4 Routing Information Base Update

Computing the shortest path between the computing node and every other node in

the network is one thing, but the ultimate goal is obviously to compute the routing

table also called the RIB.

Although the RIB computation can be performed during the SPT computation,

we can consider the RIB computation as a separate task. The SPT provides the

shortest path between the computing node and every other reachable node in

the network: The RIB computation includes populating a table that contains the

shortest path to reach the various IP prefixes. Then, usually, routers will compute

another table, called the forwarding information base (FIB), which will contain

the minimum required set of information to forward an IP packet. For instance,

keeping track of the whole path and the corresponding metric to reach an IP prefix

IP1 in the FIB is not really necessary; the only useful information from a forward-

ing perspective is the next hop and the outgoing interface for IP1. Consequently, the

FIB will contain the list of IP prefixes, and for each of them, the next hop along with

some other low-level information related to the layer 2 protocol in use on the

outgoing link.

Link Cost

A

D

B

E

C

G H

ZS

x

SRLG Failure

F

Figure 4.20 Case of an SRLG failure.



The computation time of the RIB is a component of the overall IGP conver-

gence, which may not be negligible. Indeed, at first glance, one might think that the

SPT computation, which is a function of the network topology as shown earlier, is

the predominant factor of the routing table computation, but this might not be the

case, especially with very powerful CPU RPs (this is even more true when tech-

niques like iSPF53 are used). So both the network topology and the number of

routes are important factors of the RIB computation time. This is why an advisable

and very good common practice consists in trying to reduce the RIB size and the

number of IP prefixes flooded by the IGP.

Another interesting approach is to use mechanisms to prioritize the update of

the important IP prefixes. But what is an important IP prefix and how can such

prefixes be identified?

Let us consider the two ends of the spectrum with a link IP address and a BGP

peer address. IP link address usage is usually limited to the ‘‘traceroute’’ application

and so losing the connectivity to an IP link address for a short period (until the

routing protocol has converged) is certainly not an issue. On the other hand, a BGP

peer IP address is used to forward all the traffic announced by that BGP peer.

If router A has a peering session with router B, all the IP prefixes (Internet and/or

virtual private network [VPN] in the case of MPLS VPN) announced by B are

resolved via a mechanism called route recursion, where A tries first to find the path

to reach router B; consequently, being able to find an alternate path to reach router

B upon a network failure is of the utmost importance! Hence, a BGP peer IP

address is typically an important prefix. So as mentioned earlier, prioritizing the

treatment of those important prefixes is a good idea. Now how can such important IP

prefixes be identified? IS-IS proposes a mechanism allowing to tag (‘‘color’’) certain

routes (see [IS-IS-TAG]). The network administrator will of course be responsible

for assigning particular tags to the ‘‘important’’ prefixes. Equivalent mechanisms

for OSPF are under development. Of course another approach is to try to limit the

IP prefixes carried in the IGP to the important prefixes (e.g., link IP addresses do

not require to be advertised within the IGP).

4.7 Temporary Loops duringNetwork State Changes

Link state protocols compute loop-free shortest paths between various source-

destination pairs under steady state. But when failures occur in the network, the

momentary lack of synchronization between various routers’ LSDBs may lead to

the creation of temporary loops. Such loops have the effect of potentially dropping

the traffic traversing the links involved in those loops and to substantially increase

some link loads, which may affect other traffic traversing those links even though

those traffics do not follow a path affected by the failure. Moreover, there are other

situations in which such loops can be observed and in particular when a link cost is

53The iSPF algorithm is described in detail in Section 4.14.



increased (by manual configuration) or when a link is restored. The rest of this

section describes the behavior of these temporary loops and their characteristics.

4.7.1 Temporary Loops in the Case of a Link or Node Failure

In a distributed routing environments, the timing sequence of events is not deter-

ministic and depends on various factors: the IGP configuration and the set of

associated timers, the propagation and queuing delays experienced by an LSA

flooded throughout the network, and the router implementations, to mention just

a few of them. Hence, it is virtually impossible to predict the exact event sequence

timing. That being said, one can analyze a possible sequence of events that could

lead to temporary loops upon link or node failure.

To better illustrate how temporary loops can appear in a converging network

upon network element failure, consider the example in Figure 4.21. In Figure 4.21,

the assumption is made that the IGP timers are tuned to provide fast convergence.

So in this example, S is the source of the traffic and Z the destination. NH(X,Z) is

the next hop computed by node X to reach destination Z. So, for instance, at steady

state (no failure)

NH(A,Z)¼B because the shortest path from A to Z is A-B-C-D

NH(G,Z)¼H and A because the two shortest paths from G to Z

are G-H-I-D and G-A-E-F-D

Now consider the following sequence of events:

Time T0: The link C-D fails. The IP packets traveling along this link start to be

dropped. After some period (link failure detection time), router C originates a

new LSA and triggers a new SPF to recompute its routing table. Once C has

converged, NH(C,Z)¼B, because the next hop along the shortest path from C

Link Cost

A

E

DB

F

C

G IH

ZS

x

2

9

25

Figure 4.21 Illustration of a temporary loop.


4.7 Temporary Loops during Network State Changes 253

to reach Z is now B. This leads to the first temporary loop until node B has itself

converged (which requires to receive the LSA of node C [or the LSA of node D]

reporting that the link C-D has failed and recompute its routing table). Indeed,

before B receives the new LSA and recomputes its routing table, NH(B,Z)¼C,

and thus, there is a temporary loop B-C-B.

T1: B then receives the LSA originated by C. A very useful optimization is

always to flood a received LSA before triggering an SPF (this is to ensure that

the LSA flooding is not delayed by the SPF computation). Once the LSA is

flooded, B triggers an SPF and converges, then NH(B,Z)¼A. Now a new

temporary loop appears between nodes A and B because NH(A,Z)¼B (A is

not yet aware of the failure of the link C-D and thus has not converged yet).

This secondary temporary loop is illustrated in Figure 4.22:

Time T2: A receives the LSA, floods it to G and E, and triggers an SPF.

NH(A,Z)¼G. The two previous micro-loops respectively between (A,B)

and (B,C) disappear but a new temporary loop appears: A-G-A since

NH(G,Z)¼A (for the traffic selecting this path since there are two equal cost

paths from G to Z).

Time T3: H receives the LSA, floods it to I and G and triggers an SPF.

NH(H,Z)¼I. Note that there is no change in the path to reach Z from H, so

no new temporary loop appears (Figure 4.23).

P Important note: Once again, one cannot predict whether T2 would occur before (or

after T3), but this sequence just highlights a possible routing dynamic.

A

E

DB

F

C

G IH

T0

ZS

A

E

DB

F

C

G IH

ZST1

5

x Link Cost2

2

5

2

2

9

9

Figure 4.22 Illustration of temporary loops (continued).



T4: Finally, G now receives one of the LSAs originated by C and D either

from A or H, triggers an SPF, and converges: NH(G,Z)¼H. Note that the

temporary loop between G and A now disappears. This final state is depicted in

Figure 4.24.

Loop Duration and Number of Routers Involved

As pointed out earlier, the loop duration depends on the event sequence timing,

which is highly unpredictable. Transient loops are caused by the lack of synchro-

nization between the LSDBs of various routers. The duration of such a lack of

synchronization is essentially driven by several clearly identified factors:

1. The newly originated LSA flooding: The longer the LSA flooding takes, the

higher the probability is to get lack of synchronization between routers’

databases and by consequence temporary loops. This highlights the fact that

flooding should get an appropriate treatment; in particular, a received LSA

should always be flooded as fast as possible and the LSA packets should

receive an appropriate QoS. The only incompressible period is the propaga-

tion delay along the links,

2. The IGP timers: A homogeneous set of IGP timers is usually recommended

because heterogeneous timers may have the undesirable effect of increasing

temporary loop durations. For instance, back to the previous example, it

can easily be seen that if upon receiving the LSA originated by C, node B

x Link CostA

E

DB

F

C

G IH

ZS

52

2

A

E

DB

F

C

G IH

ZS

52

9

T3

T2

9

2




delays the triggering of its SPF (because different timers are used), then the

temporary loop between B and C would last longer than necessary. Hence, it

is desirable and recommended to configure homogeneous timers (although

under some specific circumstances/designs heterogeneous timers may be

used). So either both B and C will slowly converge or they will be configured

to quickly converge, but the situation of one node (e.g., C) converging

rapidly and other nodes converging slowly is not desirable.

The number of routers also highly depends on the timing sequence. The example

above showed a set of concatenated temporary loops involving several pairs of

routers, but a different sequence of events could have led to a larger loop involving

more routers.

It is worth noting that when the IGP is appropriately tuned, temporary loops

have a short duration, but although their effect can be reduced by high IP time-to-

live (TTL) values and a router’s large buffering capabilities, at high link speed, the

packets entering a temporary loop rarely exit the loop.

Administratively Link Cost Increase

An administrative link cost increase can also imply the creation of temporary loops,

although there is no network element failure. Indeed, if a link cost is increased,

some temporary lack of synchronization may result between routers’ LSDBs,

T4

A

E

DB

F

C

G IH

ZS

3

4

2

2

A

E

DB

F

C

G IH

ZS

3

4

2

2

T5




thereby provoking some temporary loops. Note that a link failure can be seen as a

cost increase to infinity.

An Undesirable Effect of Temporary Loops

A potentially undesirable side effect of temporary loops is the link-load increase

resulting from the looping packets, which can lead to link congestion and as such

could affect some traffic a priori nonaffected by the link failure. To illustrate,

consider the example in Figure 4.22. At time T1, the temporary loop between

nodes A and B has undoubtedly an effect on the link A-B utilization. The traffic

routed from S to some destination Z’ directly connected to F would follow the path

A-B-F. Although this traffic should not be affected by the failure of link C-D because

it does not follow this path, it suffers from the potential congestion created by the

temporary looping traffic from S to Z. Now, things should be put in perspective:

Such temporary loops last a short period, so the impact is usually minimal.

4.7.2 Temporary Loops Caused by a Restored Network Element

As pointed out earlier, the cause of temporary loops is in the lack of synchronization

between routers’ LSDBs, which also occurs upon a network element restoration.

Indeed, consider the case of a restored link. As stated earlier, the sequence of

events is not deterministic, but consider the two following situations:

Situation 1: The diagram in Figure 4.25A shows the situation after link C-D has

failed and all the routers have converged. T0: Link C-D is restored. Node C

establishes an IGP adjacency with the node D, originates a new LSA to reflect

the topology change (note that node D will also flood a new version of its LSA).

C converges and NH(C,Z)¼D. Then at time T1, B receives the new LSA, floods

it to each of its neighbors (H and F), triggers an SPF, and finally converges:

NH(B,Z)¼C. Note that at this point, node B does not receive any traffic to Z

from any of its neighbors. At time T2, A now receives the new LSA, floods it to

each of its neighbors (G and E), triggers an SPF, and converges: NH(A,Z)¼B.

At this point, the traffic sent by S to F starts flowing along the A-B-C-D path.

Such a sequence event did not lead to any temporary loop, but now consider

another sequence event timing.

Situation 2: Suppose now that for some reason, the following sequence of events

occurs.

At time T0, link C-D is restored. Node C establishes an IGP adjacency

with node D, and originates a new LSA to reflect the topology change (note

that node D will also flood a new version of its LSA). At time T1, B receives the

new LSA and floods it to each of its neighbors H and F. At time T2, A receives

the new LSA and floods it to each of its neighbors G and E, triggers an SPF,

and converges. Then NH(A,Z)¼B. Because B has not yet converged

(NH[B,Z]¼A), a microloop A-B-A is created. At time T3, B converges and



NH(B,Z)¼C. This removes the previous temporary loop and another tempo-

rary loop appears because C has not yet converged (NH(C,Z)¼B): B-C-B.

Finally, at time T4, C converges and NH(C,Z)¼D. Of course, all the temporary

loops eventually disappear.

The reason such temporary loops are created with such an event timing se-

quence in the case of a link restoration is that a node farther from the restored link

converges before a node closer to that restored link. Consequently, it starts reusing

the restored link in its SPT before the downstream node had time to converge,

hence the temporary loop.

Although this illustrates that such temporary loops can appear when a network

element is newly restored, this case greatly differs with the previous case in

two respects. First, the probability of such an event sequence timing is not very

high. Then in the situation of a link or node failure, the microloops are not

avoidable but do not affect the convergence. By contrast, in the case of a restored

network element, the traffic is being needlessly dropped, but there is a solution to

this issue. Actually there are many possible solutions. One of the solutions is to

come up with a distributed algorithm that guarantees the sequence of converging

nodes. Indeed, such temporary loops can be avoided only if a node does not

converge before a node closer to the restored network element. So the idea is to

use incremental delays in the SPF computation or routing table update to achieve

that objective.

A

E

DB

F

C

G IHSituation 1

ZS

5

2

2

9

Situation 2A

E

DB

F

C

G IH

ZS

5

2

2

9

A

E

DB

F

C

G IH

ZS

5

2

2

9

(a)

(b)

(c)

Figure 4.25 An example of temporary loops on link-up event.



4.8 Load Balancing

Load balancing undoubtedly plays a key role in IP networks and refers to the

ability of a router to balance the traffic load to a destination X among a set of N

equal cost paths. This is also called equal cost multiple paths (ECMPs); both OSPF

and IS-IS support the computation of equal cost paths.

Symmetrical networks offering a large number of equal cost paths between

each pair of routers is not uncommon. Figure 4.26 shows a typical example of such

a symmetrical network.

In the simple network depicted in Figure 4.26, every edge router is dual

attached to two core routers. In this simple example, two equal costs paths exist

between each pair of edge devices, but of course there might be many more ECMPs

between pairs of routers.

Per-packet versus per-destination load balancing: Once N equal cost paths are

computed by the routing protocol, there are actually two modes of operations

to load balance the traffic among the set of N paths, known as per-packet versus

per-session load balancing:

1. Per-packet load-balancing: In this mode, packets are distributed among

the N paths in a round-robin fashion. Although quite efficient in terms

of load sharing, this mode has the downside effect of introducing packet

reordering among microflows. Indeed, the packets belonging to a single

flow/conversation between a pair of hosts are likely to follow different

paths, which may have different characteristics in terms of delay (e.g.,

different propagation or queuing delays along the paths). The immedi-

ate consequence is that they may be delivered in a different order,

especially if the delays among the set of N paths significantly differ.

All Links Have a Cost of 1

C4 C5

C1 C2

ECMP Paths

C6

C3

E3

E2E1

E4

Edge RoutersCore Routers

Figure 4.26 Symmetrical networks with ECMP paths.


4.8 Load Balancing 259

The impact of packet reordering is highly application dependent, but

even a small reorder rate can have a very significant impact on the traffic

throughput. An extensive analysis of the impact of reordering on TCP

traffic can be found in [REORDERING] and shows a nonnegligible

TCP throughput drop for a reorder rate between 0.1% and 1.0%,

especially on long live flows and when delays experienced along the

path are high. When the reorder rate approaches 10%, the application

throughput is close to the minimal utilization. Video applications are

also quite sensitive to packet reordering that basically results in an

increase in packet loss. In addition to the packet reordering problem,

sessions may experience increased jitter.

2. Per-session load balancing: One way to alleviate the packet reordering

problem is to ensure that the packets belonging to a single flow always

follow the same path.54 This invokes some hashing mechanisms in

which a set of K buckets are used to select one of the N ECMP paths

and the hashing function is performed on a set of IP fields. The idea is to

ensure that the packets belonging to a session between two hosts is

assigned to a single path while trying to achieve load balancing because

multiple sessions will be assigned to different paths. The function in

charge of assigning a session to a particular path among the set of N

candidate paths uses a hashing function involving K hash buckets that

takes into account the source and destination IP address of each packet.

Then each of the K hash buckets points to a particular active paths

(between 1 and N). There is a well-known issue with such hashing

functions called the polarization effect, whereby traffic gets polarized

along the same path if the same hashing function is used at each hop

along the path. There are solutions to correct the polarization effect

implemented in existing commercial routers where the hashing function

is not identical at each hop. In some cases (in particular when the

number of sessions is small), the load share among the set of N paths

may not be even; in particular, when tunneling mechanisms are used in

the network, the number of sessions may be relatively small, which may

lead to unequal load sharing if the amount of traffic carried over those

tunnels significantly differs. Enhanced hashing algorithms allow over-

coming this potential limitation. Further analysis of hashing-based

schemes can be found in [HASH].

Symmetrical versus asymmetrical load balancing: It is worth noting that only

symmetrical load balancing is supported by both IS-IS and OSPF (some

54Note that the queuing system should also ensure that packets from a single flow do not get reordered,

which could typically occur in the case of packets re-marking. Indeed, suppose that the packets between a

host X and Y are marked with a DSCP D1. If at some point some packets are re-marked with a different

value D2 (e.g., because the bandwidth flow is not compliant with the QoS contract), then packet

reordering could also occur if packets with DSCP D1 and D2 match different queues, even if all the

packets follow the same path in the network. So a good practice is to ensure that packet re-marking does

not imply selecting a different queue in any node of the network.



distance vectors protocols like EIGRP support asymmetrical load balancing

though). Indeed, asymmetrical load balancing requires some extra precaution;

consider Figure 4.27.

As depicted in the Figure 4.27, there are two paths between routers E1 and E2:

Path1: E1-C1-C2-E2 – Cost¼3Path2: E1-C3-C4-C5-E2 – Cost¼6

Because the cost of path 2 is twice the cost of path 1, the idea of asymmetrical

load balancing is to load balance the traffic between E1 and E2 with a share

inversely proportional to the respective costs, so in this example, twice as many

packets would be sent on path 1 than path 2. So suppose that 99 packets are sent

from E1 to E2. According to the previous rule , 66 packets would be sent along path

1 and 33 packets would follow path 2. But node C3 also has two paths from C3 to

E2: path3 (C3-E1-C1-C2-E2 with cost of 5) and path 4 (C3-C4-C5-E2 with a cost of

4). If C3 applies the same rule, it would send approximately (4/9) * 33 ¼ 15 packets

along path 3 and 18 packets along path 4. So 15 of the 99 packets originally sent by

E1 would be looping between E1 and C3. Then of course, (1/3) * 15 ¼ 5 packets

would be sent again along path 2, and so forth. We can notice that the loop is

partial and the packets will eventually be delivered to the destination (provided that

their IP TTL does not expire before) but such a routing decision is of course highly

undesirable. There could be some partial solutions to address this issue like

avoiding sending a packet over the interface it has been received from, but then

there are more complicated cases with partial loops involving N routers, especially

with asymmetrical link costs that would require more protocol modifications.

Now a legitimate question is whether there is any relationship between IP load

balancing and recovery upon a network element failure. In fact, symmetrical

C3 C4

C1

C5

C2

E3

E2E1

E4

Edge RoutersCore Routers

Asymmetrical Paths Costs

22

11

11

1

1 Link Costs

Figure 4.27 Asymmetrical load balancing.



networks with equal cost paths have some interesting properties as far as recovery is

concerned:

1. Reduce the failure impact to a subset of flows between two pairs of routers.

Indeed, if the flows between router X and router Y are load balanced among

paths P1 and P2, the impact of a failure of one path is limited to a subset of

the flows between routers X and Y, provided that the failure does not

simultaneously affect both paths.

2. Improve the convergence time in the case of a failure: Consider the case of the

network depicted in Figure 4.26. Because there are two ECMP paths from

node E1 to E2 (E1-C1-C2-C3-E2 and E1-C4-C5-C6-E2), the traffic between

E1 and E2 is load balanced among the two paths according to one of the

load balancing methods mentioned earlier. In the case of a local link failure,

the traffic from E1 to E2 could immediately be switched from one path to

another without waiting for the recomputation of the routing table.

4.9 QoS during Failure

We saw in detail in the previous sections that upon a network element failure

detection, each node in the network recomputes its routing table to determine the

shortest path to every other node in the network according to the new network

topology.

The objective of IP routing is not to route particular traffic based on some

traffic constraint characteristics other than the IP destination. In other words, by

contrast with other technologies like enhanced optical networks, ATM, or MPLS

traffic engineering, there is no constraint taken into account during the shortest

path computation like bandwidth requirement and resource class affinities. Conse-

quently, IP routing does not try to achieve any goal of QoS guarantee upon a

network element failure. Instead, a new SPT is computed based on the updated

topology that selects the shortest IGP path to every other node in the network.

Because the path computation is exclusively based on the destination, all the traffic

to a particular destination will follow the same shortest path regardless of the traffic

requirements, but that does mean that QoS objectives cannot be met with IP

routing both at steady state and under failure.

4.9.1 IP Traffic Engineering at Steady State

We will first discuss how traffic engineering techniques can be used at steady state

(in the absence of failure) to guarantee some QoS. It is very important to elaborate

some more on the concept of QoS here: Strictly speaking, IP routing does not deal

with QoS, which relates to the set of mechanisms to handle traffic prioritization or

congestion avoidance as described in Section 4.5. That being said, IP routing

determines the traffic paths and so the link loads. Hence, in this section, as far as



IP routing is concerned, the objective of QoS is explicitly related to the link loads

both at steady state and under failure, which inevitably affect the traffic QoS.

The notion of traffic engineering is discussed in detail in Chapter 5 in the

context of MPLS traffic engineering, but in a nutshell, the aim of traffic engineering

is to efficiently use the network resources and try to reduce the link load utilization.

Because the paths followed by the packets between routers are determined by the

shortest path computation, one way to perform traffic engineering consists in

running some off-line55 algorithm that determines the IGP metrics that will result

in more efficient network resource usage.

A common practice is to use IGP metrics inversely proportional to the link

speed. So, for instance, if an OC192 link has an IGP metric of 1, an OC3 link would

have a metric of 64. Although such an approach has the obvious benefit of being

extremely simple and straightforward, it also suffers from some obvious limitations

and may not allow to traffic engineer the IP backbone in a very efficient manner.

At this point, it is probably worth elaborating the notion of ‘‘efficient traffic

engineering techniques for IP’’: This usually refers to the ability to minimize the

maximum/average link utilization rate to improve the QoS and reduce queuing

delays, jitter, and packet loss. Other constraints may also be added like the mini-

mization of the propagation delay experienced by the packets traveling across the

network, or some bound on the maximum tolerable propagation delay. Trying to

minimize the maximum link utilization has the obvious implication of increasing

some path lengths, which implies a higher propagation delay. Hence, the ability to

specify that the propagation delay increase after optimization should not exceed

some threshold is ineluctably useful.

Another interesting objective of the IGP metric optimization is to avoid drastic

IGP metric changes. As already discussed in detail in Section 4.7, temporary loops

may result from link cost changes (increase and decrease). Consequently, the

computation of a new set of IGP metrics should try to minimize the number of

IGP metric changes and the order of magnitude of those changes. Indeed, the larger

the changes are, the more likely temporary loops will appear during the network

convergence period. So one should bear in mind that such techniques relying on

IGP metric changes are not entirely traffic nondisruptive because every IGP metric

change may lead to some temporary loops. Sophisticated IGP metric computation

algorithms should try to limit the number and the degree of those changes to

minimize the likelihood of temporary loops and their diameter.

Note also that such optimization techniques usually try to make extensive use

of ECMPs. If unequal load balancing could be supported by link state protocols,

this would certainly make those optimization techniques even more efficient. As

a side note, the projection may not be entirely accurate because in most of the cases,

the assumption of per-packet load balancing is made, which does not correspond to

55Note that the off-line nature of such an algorithm is an important fact to underscore. By contrast with

adaptive routing mechanisms, current link state protocols use static metrics. When the traffic matrix is

not expected to change too often, those algorithms that compute a set of metrics meeting some expected

target are off-line and the IGP metric values are then manually configured on each router.


4.9 QoS during Failure 263

the most common load balancing techniques for the reasons mentioned in Section

4.8 (such as packet reordering).

Several interesting IP traffic engineering approaches have been proposed during

the past few years to propose some algorithms to determine the set of IGP metrics

that would allow to efficiency traffic engineer an IP backbone, provided that the traffic

matrix is known. Note that this latter assumption is of the utmost importance to

produce efficient results. That being said, when the traffic matrix is not known, some

algorithms exist that try to infer the traffic matrix from the observation of some link

loads (see [TRAF-EST] for a review of some techniques to perform traffic matrix

estimation); those link load statistics can usually be easily gathered from routers

SNMP management information bases (MIBs). Note that some of those traffic

estimation algorithms also provide some level of confidence about the output.

Depending on the level of confidence, the network administrator may decide either

to perform further investigations, to increase the traffic matrix by some fudge factor

to reduce the risk of traffic underestimations, or to just make some projection of

future traffic growth. It is worth noting also that the accuracy of such algorithms is

usually a function of the network topology; in other words, traffic estimation

algorithms are more efficient on some network topologies than others. Another

alternative is to use some management functionality provided by some router

vendors that allow gathering the required statistics to build traffic matrices. Note

also an interesting paper related to IP traffic characterization [IP-TRAF].

4.9.2 QoS Guarantee during Failure

The statement ‘‘QoS guarantee during failure’’ is even truer upon network element

failure. As already stated, IP routing does not try to achieve QoS guarantee along

the backup path upon a network element failure but tries to quickly compute the

new shortest path. That being said, some IGP optimization techniques can also be

used to determine a set of IGP metrics that tries to efficiently traffic engineer the IP

traffic both at steady state and under any single failure scenario. One of the

challenges of those algorithms is to find a set of IGP metrics that does not have

the side effect of routing the traffic at steady state in a non optimal way (or at least

in a way too far from the optimal) because of the constraint of minimizing the

maximum link utilization under link failure.

Such an objective function of minimizing the maximum/average link utilization

at steady state and under failure is undoubtedly more difficult to compute and the

level of efficiency may substantially vary upon the network topology and traffic

matrices. Also it is difficult to optimize the set of IGP metrics to optimize the

network load at steady state and upon network failure, essentially because the set of

IGP metrics must be computed to handle any network failure.

In some network topologies, several alternate paths may exist between a pair of

nodes, but because link state protocols like OSPF and IS-IS do not support

asymmetrical load balancing, the available bandwidth may not be easily usable,

unlike some other connection-oriented technologies in which multiple asymmetrical

backup paths may be used to protect a particular network resource. That being



said, several algorithms have been designed that try to compute a set of metrics so

the maximum link utilization under steady state and single network element failure

is minimized. The additional constraint of delay mentioned earlier can also be taken

into account, as well as the objective of minimizing the number of IGP metric

changes and their order of magnitude.

Link Metric Manipulation

Consider a set S of IGP metrics and an optimization function computing another

set S' of IGP metrics to meet the objectives mentioned earlier. Let us also suppose

that the number of changed IGP metrics is K. Then, from a network operation

standpoint, the number of steps to change the K metrics is K. Why? Because a good

practice will likely be to change one metric at a time, let the routers converge and

then change another IGP metric. Some optimization could actually be done to try

to minimize the number of steps. Furthermore, ideally, the optimization function

should also be able to make sure that the network objectives (like the maximum link

utilization or the delays) are also met during each transition. Indeed, without that

objective in mind, the new set of metrics could be such that the final network state

(once all the IGP metrics have been changed and the network has converged) are

met but not at each step, which is obviously nondesirable, except if such changes are

performed during network maintenance where the traffic load is lower.56 Note that

IP traffic engineering techniques relying on IGP metric changes are not expected to

be performed very often, considering the nonnegligible amount of work and the

potential risks on the network.

Algorithm Complexity

The problem of computing a set of IGP metrics that try to meet a set of objectives

like minimizing the maximum link utilization at steady state and under single failure

along with other constraints is clearly NP complete. Such algorithms make an

extensive use of heuristics that drastically reduce the computation time while trying

to approach the optimal solution (which can be defined as the solution obtained by

solving a multicommodity flow problem). The art of designing such algorithms is in

the ability to come up with a set of appropriate and efficient heuristics.

Definition of efficiency: The discussion on the degree of efficiency of such IP

traffic engineering techniques compared to other technologies making use of call

admission control mechanisms with constraints based routing, for example, is still

difficult to determine for several reasons. First, the degree of efficiency is directly

tied to the algorithms in use to compute the IGP metrics or the TE LSP paths in the

case of MPLS (on-line or off-line). Furthermore, the results are generally quite

topology dependent.

Further readings on the topic of IP traffic engineering are [IP-TE-1] to [IP-TE-6].

56Note that this may be more and more difficult because service providers tend to optimize their network

utilization by running various traffic types at different periods, which progressively leads to the absence

of periods where the network is quiet.


4.9 QoS during Failure 265

4.10 Nonstop Forwarding: An Examplewith OSPF

Several modern router architectures have dedicated hardware for the control

plane and data plane operations. Hence, such architectures can handle

control plane failure without any impact on the data plane. The objective of NSF

is to handle control plane failure on such platform architectures while preserving

the data plane. NSF requires some IGP extensions, which are detailed in this

section.

The usual procedure in the case of control plane (routing) failure is for every

neighboring router detecting the failure to trigger the normal IP rerouting process

described earlier in this chapter (every neighbor of the router whose IGP control

plane has failed originates a new LSA reporting the new topology state) and so all

the routers in the routing domain trigger a new SPF, resulting in the exclusion of the

failing router from the forwarding paths of every other router. As mentioned

earlier, there are several router’s architectures that allow continuing the packet

forwarding operation even in the case of a control plane failure. For instance,

a router experiencing the failure of its RP (in charge of the control plane functions

including routing and signaling) still continues to forward traffic based on the last

state of its routing table while a standby RP takes the control upon the failure of

the primary RP. The period during which the backup RP takes control and

resynchronizes its control plane states is called the restarting period.

As pointed out earlier, upon normal IGP operation, the restarting router would

establish a new routing adjacency; consequently, this would trigger a network

convergence as mentioned later. So both OSPF and IS-IS have been enhanced

to allow such a procedure called ‘‘graceful restart’’ or ‘‘NSF’’57 without impacting

the packet forwarding in the network (see [OSPF-GR] and IS-IS [ISIS-GR]).

Note that a control plane failure covers the case of both a planned and an

unplanned control plane failure. In the former case, the network operator may

have to upgrade the hardware of the RP, for example, with the objective of

not impacting the traffic forwarding in the network. The latter case (unplanned

control plane failure) can occur in the case of hardware or software failure of

the primary RP. If such an unplanned failure occurs, the NSF procedure

will guarantee that traffic forwarding is not impacted; this latter mode can be

supported only on platforms that preserve the forwarding states upon a control

plane failure.

As described later in this section, there is another condition for the NSF

procedure to be completed without forwarding change during the control plane

failure of the failing router; the absence of other network changes during that

period. This is to reduce the probability of creating temporary loops during the

restart period, as described hereafter in this section.

We study the example of OSPF in this section.

57In the rest of this section, the term NSF is used.



As already mentioned, the restarting operation may be triggered either because

of an unplanned RP failure (hardware or software) or to proceed to a hardware

upgrade (e.g., a memory upgrade on the primary RP).

In the rest of this section we will see the general mode of operation, followed by

the detailed mode of operation of the restarting router and its neighbors.

4.10.1 Mode of Operation

When the backup RP takes control upon detecting the failure of the primary RP,

the failing router (also called the restarting router) first sends a notification to each

of its neighbors indicating that it enters in a restarting mode and is requesting a

‘‘grace period.’’ For OSPF, a particular LSA is originated (called an opaque LSA

with a local link flooding scope, which restricts the flooding of the LSA to local

neighbors). This means that during the ‘‘grace period,’’ each neighbor will continue

to advertise in its LSA the restarting router (i.e., no topology state change will be

reported). There is one exception to that rule: If a network topology change occurs

during the ‘‘grace period,’’ every neighbor of the restarting router will switch back

to the regular OSPF mode of operation. This is a safe procedure and limits the risk

of loops. Until the restarting node has converged, it relies on its previous routing

information to forward packets. Hence, if a network topology change occurs during

the grace period, because the restarting node is not capable of recomputing its

routing table, not aborting the NSF procedure could lead to routing loops.

4.10.2 Mode of Operation of the Restarting Router

Entering in a graceful restart mode: Once the backup RP detects the failure of

the primary RP, or if the network administrator forces the restarting mode, the

node enters in a graceful restart mode. Note that the ‘‘grace period’’ can either

be dynamically computed by the backup RP in the former case or manually

specified by the network operator in the second case.

The situations of planned versus unplanned control plane failure are handled in a

slightly different way.

1. Planned control plane failure: After the restarting router has determined

that the forwarding states are operational, it originates the grace LSA (the

grace LSA is flooded to each directly connected neighbor (more accurately,

the grace LSA is flooded out of every local interface where there is a

neighbor) and the flooding scope is local; in other words, the grace LSA is

not flooded beyond the scope of the direct neighbors), which contains the

‘‘grace period’’ value, the reasons for the graceful restart among other

parameters.

2. Unplanned control plane failure: In case of unplanned failure, once the

backup RP has detected the primary RP failure, it must originates

the grace LSA before starting to send any hello packet to its neighbors.

The restarting router must send the grace LSA in an OSPF link state update


4.10 Nonstop Forwarding: An Example with OSPF 267

packet even though the restarting router has not yet reestablished any

adjacency with its neighbors. The restarting router may decide to send

multiple copies of the grace LSA to its neighbors to increase the probability

of successful delivery.

During the graceful restart period (between the time the backup RP becomes active

and the time all the adjacencies are re-established), the restarting router performs

the following set of actions:

1. The restarting router must not generate any new LSA and must not flush or

modify any received self-originated LSA.

2. OSPF calculation can be performed but no new OSPF routes must be

installed. During the restarting period the restarting router must rely on

the forwarding states computed before the failure.

3. OSPF has the notion of designated router (DR) used on multiaccess subnet-

works. In a nutshell, on such subnetworks, it would be expensive to have a

full mesh of adjacencies between routers (n routers on a LAN, for instance,

would lead to n * (n� 1)=2 adjacencies). To solve this issue the concept of

designated router is introduced where a designated router (called a DR

router) is dynamically elected as well as a backup designated router

(BDR). Then each router on the multiaccess subnetwork maintains an

adjacency with the DR and the BDR, which have the responsibility to

flood the LSA updates. So, for instance, if a router X receives a new LSA,

it sends it to the DR and BDR (using specific multicast addresses), which

will then reflood it to all the routers they have an adjacency with using

a multicast address. So, if during the restarting period, the node determines

that it was a DR before the failure (if the restarting router is listed as the DR

in the OSPF hello packet received by a neighbor), it must immediately

consider it as the DR for the multiaccess network.

Exiting from a graceful restart mode: A restarting router exits from the graceful

restart mode if one the three conditions below is met:

1. All the adjacencies have been reestablished: Upon reestablishing new

adjacencies, the process of LSDB resynchronization will allow the restarting

router to retrieve a complete LSDB. The analysis of the LSA the restarting

mode had originated before the failure will tell it the number of expected

adjacencies. Hence, the restarting router can determine when all the adja-

cencies have been reestablished.

2. An inconsistent LSA is received from a neighboring router: Suppose that

the restarting router A is adjacent with a router B that does not support

the graceful restart procedure described here or that B has not received

the grace LSA. Then B will have considered its adjacency with the router

A as down and it may occur that A receives its self-originated LSA from

a neighbor C reporting an adjacency between the routers A and B as well as

an LSA from the node B that does not report this A-B adjacency.

This requires the node to exit from the graceful restart mode. Another



situation is when A does not receive its self-originated LSA from B after the

adjacency reestablishment.

3. The grace period ends.

As soon as the restarting router exits from the graceful restart mode, normal

OSPF operation is resumed (the router reoriginates its LSA, OSPF calculations are

performed, and grace LSAs are flushed).

4.10.3 Mode of Operation of the Restarting Router’s Neighbors

The neighbors of the restarting routers are also called the helper neighbors because

they help the restarting router restart its control plane. As long as no network

topology changes occur, helper neighbors continue to advertise their LSA reporting

an active adjacency with the restarting router.

Entering in a helper mode: When a router B receives a grace LSA from the

restarting router A, it enters in a helper mode, provided that B supports

the graceful restart procedure. Also, the router B must have an adjacency

with the router A, the graceful restart period must not have expired, there

must not be any network topology changes, and there must not be any locally

configured policy preventing the node B to act as an helper node. Finally, to

enter in helper mode, the node B itself must not be in restarting mode.

Exiting the helper mode: A node being in helper mode exits from that mode if

one of the conditions below is met:

a. The restarting node exits from the restarting mode by flushing the grace

LSA.

b. The grace period terminates.

c. A network topology change occurs in the network. More accurately, if

the node B receives a new LSA (an LSA with a new content that

excludes new LSAs received as the result of a refresh) and that LSA

would have been flooded by the node B to the node A under normal

circumstances.

The action of exiting from the helper mode implies that the node reoriginates its

LSA based on the state of its adjacency with the restarting router A.

4.10.4 Backward Compatibility

Note that the graceful restart procedure described above is fully compatible with

existing implementations. Indeed, if a node receives a grace LSA and does not

understand this LSA because it does not support the graceful restart procedure, it

will simply ignore it. Then it will originate a new LSA, reporting the lost adjacency

with the restarting router. Every other neighbor of the restarting node A will then

interpret the receipt of the new LSA as a topology change and will exit from the

helper mode. As for the restarting router, upon reestablishing new adjacencies, it

will receive inconsistent LSAs and will exit the restarting mode, reverting back to


4.10 Nonstop Forwarding: An Example with OSPF 269

the normal OSPF mode. This also means that in order for a restarting node to

perform the graceful restart operation, all its neighbors must also support the

graceful restart mode.

However, what if the secondary RP cannot switch over? In the case of an

unplanned router failure, the grace LSA is sent by the secondary RP. Consequently,

if the secondary RP is not in service, no grace LSA will be sent, and normal OSPF

convergence procedures will apply. If the secondary RP can send a grace LSA but

cannot successfully perform the graceful restart procedure, the adjacency will be

maintained until the grace period has expired, which is not an issue because the

assumption is made that the forwarding states are preserved.

4.11 ACase Study with IS-IS

This chapter concludes with a case study that has the following structure:

Assumptions: Network topology, layer 2/3 protocols, and so on.

Objectives: Convergence time, failure coverage, performance, and so on.

Proposed design: There are obviously several possible network designs that

satisfy a given set of requirements. One specific design (with some variations)

is proposed that meets the set of requirements. For the sake of illustration,

IS-IS configuration examples provided in this case study correspond to the

configuration of Cisco routers.

Assumptions

We consider the network depicted in Figure 4.28.

The network is made of three layers:

. An optical layer providing unprotected optical lambdas only

. A SONET layer that provides protected VCs up to OC3

. An IP layer

The IP routers are interconnected by various link types:

. Within a point of presence (POP): Routers are interconnected by means of

Giga-Ethernet (GE) switches. A typical POP infrastructure is represented in

Figure 4.28. All the POPs have an identical infrastructure: a set of edge

routers and two core routers interconnected to the other core routers via the

wide area network (WAN) links depicted in the diagram. Within a POP, all

the routers (edge and core) are interconnected via one or several layer 2

switches. In the case of a service provider network, the edge routers are used

to aggregate the customer traffic; in other words, the customer routers

are connected to the service provider network via one or several links

(sometimes, there might be multiple customer routers in the same customer

site, each router being connected to a different service provider’s edge

router for redundancy reasons). If the network is an enterprise network,



the edge routers may be used to aggregate to traffic coming from remote

sites. Another possibility is that the remote sites are directly connected to

the core routers; in this case, the POP infrastructure is reduced to the core

routers.

. Up to OC3, the links are provided by the SONET layer and are always

protected.

. OC48 and OC192 links are unprotected lambdas provided by the optical

layer.

No change in terms of layer 1/2 protection can be made in the network. In other

words, unprotected links cannot be protected by the optical layer (to reduce

cost).

Three types of traffic are carried in the network:

. Internet traffic

. VPN traffic: (IPSec, L2TP version 3, etc.)

. Voice traffic

The network is Diffserv enabled and two classes of services are configured in

the core:

. An EF class is used for the voice traffic. The voice traffic is marked

(colored) at the edge of the network with a DSCP of 5. IP packets

MIA

BOS

NYC

WAS

CHI

SLC

DEN

LAX

SFO

SEA

PHX

ATL

HOU

DAL

HEL MIN ONT

Edge Routers

Core Routers

Layer 2Switch

OC48 Link (Unprotected)

OC192 Link

KAN

OC3 Link (SONET – protected)

Figure 4.28 IP case study with IS-IS.


4.11 A Case Study with IS-IS 271

carrying voice are queued in a preemptive queue and are always served

with the highest priority.

. An AF class is used for the data traffic. A congestion avoidance

mechanism like WRED and is configured so Internet traffic is more

aggressively dropped than the VPN traffic in the case of network con-

gestion.

The process of flooding always gets the precedence over SPF triggering: As

discussed earlier in this chapter, a proper IGP implementation upon receiving

a new LSP should always try to flood the new LSP to other neighbors as

quickly as possible. In other words, upon receiving a new LSP, a router should

not first trigger an SPF and then flood the LSP to other neighbors, because this

would have the undesirable effect of slowing down the overall convergence.

IGP control packets receive an appropriate QoS: The queuing delays

experienced by IGP packets can be considered negligible.

All the routers have a distributed architecture and are equipped with a secondary

(backup) RP.

NSF-capable routers are only the edge routers. In this network, the choice is

made to always trigger an IGP convergence upon a core router failure (control

or forwarding plane) because the network has been designed to handle the

rerouting flows if a node failure occurs without QoS degradation. In contrast,

NSF will be used on edge routers in which no alternate path to reach the

customer routers connected to the edge routers exists if those routers are

attached to a single edge router. In conclusion, all the edge routers will be

NSF capable and the core routers will just act in helper mode.

Objectives

Objective 1: The targeted convergence time upon both link and node failure is

1 second for all traffic.

Objective 2: Both at steady state and under single failure scenarios, the total

proportion of voice traffic should never exceed 50% of any link capacity, to

preserve a low delay and jitter for the voice traffic. As already discussed, there is

no ‘‘magic’’ number that should not be exceeded for the maximum proportion

of voice traffic, but 50% is given here for the sake of the example.

Proposed Design

Achieving the objective 1 requires coming up with some IS-IS tuning parameters so

the rerouting time does not exceed 1 second upon link or node failure. This implies

that the following set of tasks must be completed within 1 second:

1. Network element failure detection

2. LSP origination from the nodes detecting the failure (because this case study

is devoted to IS-IS, the IS-IS terminology is used in the rest of this section).

3. LSP propagation throughout the entire network.

4. Routing table update (SPF and RIB computation).



Let us now analyze each component separately and propose an appropriate IGP

parameter tuning.

Network element failure detection: On the WAN links (OC3, OC48, and OC192)

the regular SONET alarms are used, which usually allow detection of a link

failure in a very short period, on the order of a few milliseconds. On the other

hand, within a POP, routers are interconnected by means of layer 2 switches, so

the failure of a link between an edge router and the layer 2 switch requires some

other mechanisms in order for the other neighboring routers to detect the failure

(which might be a failure of the local link or the router itself). Several link/node

failure detection mechanisms have been described in Section 4.3. The current

solution is to reduce the IS-IS hellos frequency and the hold-time timer, but this

has some nonnegligible implications in terms of processing in the router. For

instance, if the number of routers in a POP is 50, then tuning the hello frequency

and the hold-time timer too aggressively (say, the hello frequency is set to 200 ms

and the hold-time to 1 second) would have an impact on the core router because

the core router would have to send a hello message every 4ms on average.

Another alternative is to rely on some other hello protocol mechanism that is

much more scalable. On the other hand, the network administrator may also

make the assumption that the failure of a link within a POP or the failure of a

core router (usually highly redundant) is sufficiently low to be neglected. In this

case study, the following design choices are made:

On the intra-POP interface, the hello interval and the hold-timer are set to

1 and 3 seconds, respectively.

On the inter-POP interface (link between two core routers), the hello interval

and hold-timer are set to 3 and 10 seconds, respectively.

When BFD comes available, the hello interval and the hold-timer could easily

be augmented to 10 and 30 seconds, respectively, which are usually the default

values. By contrast, BFD will use very short hello intervals (on the order of a few

tens of milliseconds).

This means that any link failure within a POP will be detected in at most 3

seconds. Likewise, the failure of the core router will be detected within 3 seconds by

an edge router, but the likelihood of such failure is considered sufficiently low to be

acceptable. On the other hand, the failure of the WAN links, much more common,

will be detected in a few tens of milliseconds.

In the case of a core node failure, the node failure detection time will vary according

to the node failure type:

. Node power supply failure: In this case, the attached link will also fail and

the neighboring nodes will quickly detect the failure via the SONET-SDH

framing alarms.

. Node control plane failure: A control plane failure (which may not affect the

data plane) will be detected in at most 3 seconds. If the failing node is NSF

capable and its neighboring nodes support the helper node, such a control



plane failure will not have an impact on the traffic forwarding. In the

absence of NSF (the core routers in this case study), the neighboring

node, upon detecting the control plane node failure (within a maximum

time frame of 3 seconds), will start rerouting the traffic around the node, but

no traffic will be dropped. On the other hand, if a control plane failure also

affects the forwarding state (this depends on the router architecture), the

traffic will be dropped for at most 3 seconds. Reducing that time also

requires using fast detection mechanisms like BFD. Because the assumption

is made in this case study that the core routers preserve the forwarding state

upon a route processor failure, no fast hello keep-alive mechanism is

required in the core.

Configuration of the SPF, PRC, and LSP Origination Dampening Mechanisms

The following LSP origination, SPF, and PRC dampening mechanisms are config-

ured:

lsp-gen-interval 5 20 50 (A B C): As explained in Section 4.5, the parameters 5

20 and 50 have the following effects:

B ¼ 20 ms is the amount of time the router waits after the first link failure

has been detected before originating a new LSP. If the router has several

local links sharing some SRLG (multiple local links belong to the same

SRLG), then waiting for 20 ms allows to maximize the probability to origi-

nate an LSP reflecting the actually local state of the router because in the

case of an SRLG failure, this will give the router a chance to detect all the

local failures and trigger one LSP.

C ¼ 50 ms corresponds to the amount of time the router will wait before

advertising a second LSP if a second local state change occurs.

A ¼ 5seconds is the maximum amount of time between two successive LSP

originations according to the exponential back-off algorithm described in

Section 4.3.

spf-interval 5 30 20 (A B C)

prc-interval 5 30 20 (A B C)

B ¼ 30 ms is the amount of time a router waits before triggering a new SPF

after having received a new LSP reflecting a topology change.

P Important note: There are two interesting situations to consider here, as follows:

Situation 1: The network does not contain an SRLG. Then if a link failure

occurs, any router in the network should trigger an SPF quickly after receiving

the first LSP reflecting the topology change (as explained in Section 4.5, the

receipt of one LSP is sufficient to exclude the link from the network topology).

If a node failure occurs, all the neighbors of the failing node will originate a new

LSP. Depending on the network topology, one can expect that the respective

LSPs will be received in a short time, so waiting for 30 ms is probably long

enough to maximize the chance to run a new SPF on an up-to-date LSDB.

Now, if link failures are much more frequent than node failures, reducing that



timer to a shorter value is perfectly reasonable and helps improve the conver-

gence time.

Situation 2: The network contains a substantial number of SRLGs. If several

links geographically distant from each other share some SRLG, then a slightly

higher value for B may be advisable. Indeed, various routers will originate new

LSPs, and because they can be distant from each other, waiting for a slightly

longer period than 30 ms may be desirable.

C ¼ 50 ms corresponds to the amount of time the router will wait

before triggering a second SPF in the case a second new LSP would be

received.

A ¼ 5 seconds is the maximum amount of time between two successive SPF

according to exponential back-off algorithm described in Section 4.4.

P Important note: The values noted should not be considered set in stone. Indeed,

every network is different and the IGP parameters tuning must be driven by both

the network constraints and the convergence time objectives.

Asmentioned in Section 4.7, heterogeneous IGP timersmay have the undesirable

effect of increasing the duration of temporary loops during the network convergence

period. So the recommendation is to use identical timer configurations across

all nodes in the network. In addition, incremental SPF is configured on each

router, which will significantly decrease the SPF duration in the vast majority of

the cases.

Are the Objectives Met?

Now we analyze whether the initial objectives are met with the proposed design.

Objective 1: The targeted convergence time upon both link and node failure is

1 second for all traffic (Figure 4.29).

Case of an intra-POP link failure (failure 3 in Figure 4.29): As mentioned

earlier, the worst case of failure detection is 3 seconds, so one cannot

guarantee a 1-second convergence time if such a failure occurs. That being

said, the assumption has been made that such failures are rare enough to

tolerate a longer rerouting time, if such a failure occurs. The same reasoning

applies to hardware core node failures with respect to the traffic originated

in the POP.

Case of an inter-POP link failure (failure of the WAN links—e.g., failure 2 in

Figure 4.29). Thanks to the SONET/optical fast failure detection, a link

failure will be detected in a few tens of milliseconds. Then a new LSP

origination occurs after 20 ms and the LSP is flooded throughout the net-

work.

The two following assumptions were made:

. The process of flooding has a high priority.

. IGP control plane messages receive appropriate QoS.



This implies that the newly originated LSP will be quickly flooded throughout the

network and the delays experienced by an LSP originated by node i to reach node j

are reduced to the propagation delay along the links plus some processing delays at

each hop. Let us carefully analyze two failure examples demonstrating that object-

ive 1 is met:

Failure 1 is the Chicago’s power supply node failure: All of its attached links

will fail and the node failure will be detected by means of link failure detection

by every neighbor. The failure detection time will roughly be on the order of

a few tens of milliseconds. After 20 ms, every Chicago routers’ neighbor will

originate a new LSP reflecting the topology change. Let us analyze the impact

of this router failure on the node of PHX (Phoenix). If one makes the assump-

tion that the first LSP received is the one of DEN (Denver) reflecting a loss

of adjacency with the node of CHI (Chicago), that new LSP will likely be

received less than 10 ms after having been sent by the DEN router (because of

the short propagation delay between DEN and PHX). Then upon receiving the

first new LSP, every node in the network will wait 30 ms before triggering an

SPF, according to the IGP tuning proposed in this case study; this will give

enough time for the node of PHX to receive the other LSP originated by the

neighbors of the failing node of Chicago, that is, Kansas City (KAN), Seattle

(SEA), Boston (BOS), New York City (NYC), Washington, D.C. (WAS),

MIA

BOS

NYC

WAS

CHI

SLC

DEN

LAX

SFO

SEA

PHX

ATL

HOU

DAL

Failure 1

KAN

HEL MINONT

Edge Routers

Failure 3

OC3 Link (SONET – Protected)

OC48 Link (Unprotected)

OC192 Link

Customers Routers

x Propagation Delay

10

8

8

Layer 2Switch

Failure 2

15

12

13

10

12

12

Figure 4.29 A case study with IS-IS link and node failures.



Atlanta (ATL), and Dallas (DAL). The fact that the node of PHX waits some

time before triggering its SPF is quite interesting in this particular scenario

because it will have received the LSP of the node KAN. Hence, it will avoid

rerouting its traffic through KAN, which does not have any alternative (in fact

if the PHX had not waited, it would have had to rerun another SPF upon

receiving KAN’s LSP to properly reroute its traffic). That being said, the vast

majority of failures are link failures (usually more than 80%), so the option of

reducing the value of B to a few milliseconds should reduce the convergence

time in most failure scenarios and slightly increase it if a node failure occurs,

which is less often.

Hence, the total convergence time is a few tens of milliseconds for the failure

detection time (e.g., 20 ms) plus 20 ms (time for the Chicago’s node neighbors to

originate a new LSP) plus a few tens of milliseconds for the PHX node to receive the

LSPs (say, 20 ms) plus 30 ms (initial time to trigger the first SPF) plus SPF duration

time.

But what is the SPF duration time?

As described in Section 4.6, this is made of two components: the SPT computation

and the RIB update. The first component is on the order of tens of milliseconds in a

modern router architecture for a few hundreds of nodes. The second component

(RIB update) depends on the number of routes, which can be reduced and at least

some mechanisms can be used to prioritize the most important prefixes. In a

nutshell, the total convergence will very likely be less than 0.5 seconds, which

completely meets the first objective.

Now we turn to the second type of failure (failure 2 in Figure 4.29).

Failure 2 is the failure of the link between the nodes of Ontario (ONT) and

Boston (BOS). Notice the rerouting time from the perspective of the node of

Seattle (SEA) to reach the node of Boston (BOS). The noticeable difference

with the previous failure scenario is that the traffic from SEA to BOS that

typically follows the path SEA-HEL-MIN-ONT-BOS (Seattle-Helena-

Minneapolis-Ontario-Boston) is rerouted by a node that is three hops upstream

to the failure, which implies additional delays to the rerouting time. If one

assumes that the propagation delay is due to the speed of light plus a few

milliseconds of per-hop processing delay, it turns out that one should add

probably less than 100ms compared to the previous convergence time. The

aim of the second example is just highlight that in some cases (especially when

propagation delays are quite high and the first node being able to reroute the

traffic is several hops upstream to the failure), the convergence time may get

easily increased by a few hundreds of milliseconds, which is still perfectly in line

with our overall 1-second total convergence time target, but this highlights the

challenge with restoration mechanisms (especially when the backup path is not

local) to be in the range of tens or hundreds of milliseconds by contrast with

other local protection recovery mechanisms like OPTICAL protection,

SONET/SDH, or MPLS TE fast reroute.



P Important note: In the two previous failure scenarios, one should bear in mind that

there might be some variations in terms of network convergence in networks having

a very large number of IP prefixes, as discussed in Section 4.6. That being said, the

objective of 1 second can certainly be met because in the two previous cases an

estimate for the rerouting time was 0.5 seconds, which gives a time budget of 0.5

seconds for the RIB computation, a perfectly reasonable objective in most modern

router architectures, at least for the most important IP prefixes (see Section 4.6 for a

definition of an ‘‘important’’ prefix).

Failure 3 is the control plane failure of an edge router. For instance, consider

the control plane failure of the first edge router on the left depicted in Figure

4.29. In most cases, all the customers will be attached to a single service provide

edge router and this is where NSF can help if a control plane failure occurs in a

distributed router architecture (hardware or software route processor failure)

and some centralized router platform architectures. Indeed, in such a case, the

backup route processor will handle the failure without any impact on the

forwarding state. This is particularly useful and avoids some very undesirable

traffic disruptions. Of course, this requires for the edge router to be able to

maintain the forwarding states and for the neighbors to help the failing router

to recover.

Objective 2: Both at steady state and under a single failure scenario, the total

proportion of voice traffic should never exceed 50% of any link capacity to

preserve a low delay and jitter for the voice traffic.

Typically, such an objective is met by means of some external off-line IGP

metric computation tool that provides a set of IGP metrics so the objective is met

upon a single network failure. Note that other objectives like the propagation delay

increase can also be added to the objective function in some off-line traffic engi-

neering tool, as mentioned in Section 4.9.

4.12 Summary

The objective of the Sections 4.1 to 4.11 was to cover in detail the properties of IP

routing, a restoration protocol that relies on the concepts of distributed and

dynamic routing algorithms in every node to compute the shortest path between

the computing node and every other node in the network. This first part of this

chapter focused on link state routing protocols, which corresponds to the vast

majority (if not all) of the routing protocols used in large service provider and

enterprise networks.

Each component of IP routing has been studied in detail: the network topology

dissemination by means of a reliable flooding protocol, the shortest path computa-

tion algorithms, the various mechanisms allowing IP routing to provide quick

reaction, and fast convergence time upon network element failures while preserving

the network stability in case of network resource oscillation thanks to efficient

dampening algorithms.



Because IP routing relies on the computation of alternate paths upon network

failure detection by using distributed algorithms, a significant part of this chapter

has been devoted to the detailed study of transient states as the network converges

(until all the nodes in the network share an identical network topology view, leading

to distributed consistent routing decisions).

Although the main aspect of IP routing in terms of recovery is related to the

convergence time, it has also be shown how traffic engineering techniques based on

IGP metric optimization can be used to achieve a certain QoS objective during

failure conditions.

Another class of recovery mechanisms called NSF has also been studied in this

chapter, which specifically handles routing control plane failures on routers where

the forwarding states can be preserved in the event of control plane failures.

One must admit that there are some incompressible delay components that

prevent IP routing protocols from achieving a rerouting time on the order of tens of

milliseconds, especially in sparse network topologies in which the node capable of

rerouting the traffic along an alternate path may be several nodes upstream to the

failure. Moreover, because of the inherent distributed nature of the routing compu-

tation, there is some degree of nondeterminism in terms of convergence times,

which are driven by the network topologies and routers architectures, to mention

a few criteria. That being said, throughout this chapter, it has been shown that

efficient routing protocol implementations used in conjunction with appropriately

designed networks allow achieving some efficient subsecond rerouting times, which

will hopefully remove the widespread misleading perception that link state routing

protocols converge in tens of seconds at best. Finally, this chapter concludes with a

complete case study on the link state protocol IS-IS.

The next sections discuss some advanced topics of IP routing, starting with a

section related to algorithm complexity, a topic of the utmost importance for IP

recovery because upon a network failure, alternate paths are computed ‘‘on the fly,’’

as with any other restoration mechanism. Then a second important optimization of

the SPF computation known as incremental SPF is detailed, that allows to drasti-

cally reduce the routing computation under various network change circumstances.

Finally, the concluding section is a discussion of the interaction between IP fast

convergence techniques and NSF.

4.13 AlgorithmComplexity

4.13.1 Definition of Algorithm Complexity

The complexity of an algorithm is an important notion when evaluating an

algorithm and greatly determines its usefulness. An algorithm is a step-by-step

procedure that must be performed to solve a particular problem, where a step can

either be an arithmetic operation, a comparison function between two numbers, or

any other monolithic operation. The efficiency of an algorithm usually refers to its

ability to find a solution to a particular ‘‘problem instance’’ in a reasonable time;


4.13 Algorithm Complexity 279

more precisely, one usually refers to the amount of time required by the algorithm

in order to solve the problem, in the worst case. A ‘‘problem instance’’ refers to

a particular set of variables (e.g., a particular network topology).

It is usually admitted that a good/efficient algorithm is one that provides

a solution in a reasonable time. A common way of characterizing efficient algo-

rithms is by determining the nature of the algorithm complexity function and in

particular whether the function is polynomial or higher than polynomial.

We call n the size of the problem, which can be the number of routers in the

computation of the shortest path, for instance.

Table 4.1 and the Figure 4.30 illustrate how the complexity of an algorithm

evolves with the size of the problem for several algorithms’ complexities.

Table 4.1 Examples of Algorithm Complexitiesas a Function of the Problem Size n

n n2 n3 2n

1 1 1 2

2 4 8 4

3 9 27 8

4 16 64 16

5 25 125 32

6 36 216 64

7 49 343 128

8 64 512 256

9 81 729 512

10 100 1000 1024

11 121 1331 2048

12 144 1728 4096

13 169 2197 8192

14 196 2744 16,384

15 225 3375 32,768

16 256 4096 65,536

17 289 4913 131,072

18 324 5832 262,144

19 361 6859 524,288

20 400 8000 1,048,576

30 900 27,000 1.07Eþ09

40 1600 64,000 1.1Eþ12

50 2500 125,000 1.13Eþ15

60 3600 216,000 1.15Eþ18



So for instance, if the execution time of one step is set to 10�9 seconds (one

nanosecond), for n ¼ 60, the total execution time of an algorithm with a complexity

of n2 is 3:6 ms compared to 36.55 years with an algorithm whose total execution

time is 2n.

This raises an interesting question: How is the CPU power used to run non-

polynomial algorithms?

There are two ways of answering such a question:

1. Consider two CPU powers and compare the sizes of the problem that can

be solved with a nonpolynomial algorithm during the same amount of

time.

2. Consider two CPU powers and try to compare their running times to solve a

problem of size n.

First, we consider the first question with CPU 2 ¼ k *CPU 1 (i.e., CPU2 can

perform a basic operation k times faster than CPU 1) for an algorithm having a

complexity of 2n. If the running time is fixed, then the size of the problem resolved

with CPU2 will be log2 (k) greater than the size of the problem solved with CPU1.

To take a concrete example, if CPU2 is 10,000 times faster than CPU1, the size of

the problem that be solved with CPU2 is just increased by log2 (10, 000) ¼ 13:28

compared to CPU1, for a fixed amount of time. So an augmentation of the CPU

Polynomial Versus Exponential Algorithm Complexityas a Function of the Problem Size n

0

1000

2000

3000

4000

5000

1 2 3 4 5 6 7 8 9 10 11 12Number of Routers n

Com

plex

ity

Polynomial Versus Exponential Algorithm Complexityas a Function of the Problem Size n

0

5E+29

1E+30

1.5E+30

1 9 17 25 33 41 49 57 65 73 81 89 97

Number of Routers n

Com

plex

ity

2n

n2

n3

2n

n2

n3

Figure 4.30 Algorithm complexity as a function of the problem size for three algorithm complexities:n2, n3, and 2n.



power does not allow you to drastically increase the problem size with a nonpoly-

nomial algorithm when the problem size n gets large.

Now we consider the second question: If the size of the problem is fixed, how

much faster will the CPU2 solve a problem of size n compared to CPU1? Exactly k

times faster. Unfortunately, for large values of n, this gain is negligible compared

to the number of operations. Back to the same example of an algorithm having

a complexity of 2n for n ¼ 60, the number of operations will be 1.15Eþ18, so

reducing the running time by a factor of 10,000 is not sufficient.

So the bottom line is that the CPU power does not sufficiently help solve

a problem having nonpolynomial complexity. This clearly illustrates why

algorithms with polynomial complexity are considered efficient compared to

nonpolynomial algorithms. It is quite common to consider a problem with non-

polynomial algorithm complexity as nontractable.

That said, algorithm complexity is always computed for the worst case, and

fortunately a large number of problem instances have a much smaller complexity.

Hence, some well-known exponential time algorithms have been used to solve

various problems, but in general the ultimate goal is to find an algorithm having

a polynomial complexity.

Two important comments can be made here:

1. Measuring the efficiency of an algorithm by the amount of time required to

solve a particular instance of a problem is particularly adequate for recovery

mechanisms. By definition, recovery mechanisms rely on the computation of

an alternate path upon a network element failure ‘‘on the fly’’ (when the

failure is detected). So the time required to find such a backup path is

particularly important for recovery mechanisms.

2. The number of steps required by an algorithm may vary with each problem

instance; in other words, the same algorithm may perform well to solve some

problem instances and badly for other instances. When evaluating the

algorithm efficiency, one always evaluates the worst case, but there are

multiple ways of evaluating algorithm efficiency, as follows:

. Try to evaluate the average complexity: The algorithm complexity is

evaluated by considering a probability distribution of the various pos-

sible instances of the problem. Though quite simple, this approach has

several limitations because the results are highly dependent on the algo-

rithm implementation, problem instances, and others parameters.

. Analysis driven by experience: The algorithm efficiency is determined by

running it on several instances of the problem.

. Worst case: In this case, the algorithm complexity is determined by

computing the upper bound of the number of required steps for any

possible instance of the problem. The worst-case approach has the

beneficial merit to be objective and provides strict upper bounds for

any problem instance. There are of course some cases in which some

particular (and potentially) rare instances of a problem may provide an

upper bound of the complexity far beyond the average case.



We just saw that algorithm complexity is usually computed by considering

the worst-case scenario; other aspects of algorithm complexity include the

following:

. Constants are ignored.

. Just the dominant term is listed.

. The complexity is evaluated for large values of the variables.

So, for instance, consider a problem P1 with one variable n. If the worst-case

complexity of an algorithm A1 solving P is c1 * n2 þ c2 * log (n)þ c3 (where c1, c2,

and c3 are three constants), then the overall algorithm complexity is n2 also noted

O(n2). If the complexity of an algorithm A2 solving P is c1 * n2 þ c2 exp (2n)þ c3,

then the overall algorithm complexity is exp(n) also noted O(exp(n)).

Now consider a problem P2 with two independent variables n and m. If the

worst-case complexity of an algorithm A1 solving P2 is c1 * n2 þ c2 *m, where c1

and c2 are two constants, then A1’s complexity is n2 also noted O(n2).

Another example would be a worst-case complexity of c1 * n2 þ c2 *m3 þ c3.

This case is quite interesting; for sufficiently large values of m, the dominant factor

the of the complexity is m3 even if c1 >> c2, so the resulting worst-case complexity

is O(m3). That being said, there might be some problem instances in which consi-

dering just one factor or the other might be relevant, in particular if there is any

control on the set of values the variable n and m can get. For instance, if one

variable is restricted to a very limited subset of values, then the other factor will be

dominant. On the other hand, if both variables can get any arbitrary value in a wide

range of values, just the dominant factor wins.

Another very important point to mention is that the algorithm efficiency is

also highly driven by the implementation choices. For instance, consider the cost

of searching for an element in a list, which is what the Dijkstra algorithm does

to select the node to remove from the TENT list and add it to the PATH list.

As mentioned in Section 4.6, in the worst case, this task must be performed n

times and the number of elements to be scanned is n� k (where k is the iteration

number), so the complexity is on the order O(n2). Now, if the TENT list is sorted,

the cost of searching the element in the TENT list to select the node and move it

to PATH is reduced to O(1). Of course, this now requires sorting the TENT list

and using a sorting algorithm. For instance, one can use the following simple

sorting algorithm:

Simple sort: Consider a simple array of size n and the simple following sorting

algorithm:

For I¼1 to n

For j ¼ 1 to nIf table(i)>table(j) then swap(table(i),table(j))

End

End

where table(i) is the element stored in the array at the index i and swap(a,b) is the

basic operation of swapping the elements a and b.



This algorithm complexity is (n� 1)þ (n� 2)þ . . .þ 1 ¼ n * (n� 1)=2, hence

O(n2).

Note that there are multitude of sorting algorithms that all differ in terms of

complexity. For instance, a very well known sorting algorithm, called quick sort,

has a worst-case complexity of O(n2) but an average complexity of O(n * Log(n) ).

This clearly highlights the importance of the implementation choice and the data

structure used.

4.13.2 NP Complete Problem

An NP complete problem is a problem for which no algorithm having a polynomial

complexity has been found (yet). During the several last decades mathematicians

have been trying to determine whether P (class of problem for which a polynomial

algorithm has been found) is different from NP. An interesting property of NP

complete problem is that if a polynomial algorithm for any NP complete problem

exits, there are polynomial algorithms for all NP complete problems; hence the

considerable amount of energy spent by mathematicians to find out polynomial

algorithms to solve several very well-known NP complete problems.

Problem Reduction

‘‘Easy’’ Problems (Problems that Can Be Solved in Polynomial Time)

An efficient way to evaluate algorithm complexity is to perform some problem

reduction function. If a problem P1 can be reduced to another problem P2 by using

a polynomial algorithm (we say that there exists a polynomial reduction of P1), and

there is a polynomial algorithm to solve P2, then P1 is ‘‘easy’’ (there exists a

polynomial algorithm solving P1).

NP Complete Problems

When one cannot find an efficient (polynomial) algorithm for a specific optimiza-

tion problem, a very common practice is to prove that the problem is NP complete

and a very efficient way of proceeding is to find a polynomial function that

transforms the problem to another problem known as NP complete.

In practice, NP complete problems are approximated, using heuristics that

drastically help reduce the problem instances to provide a solution close to the

optimum in a reasonable amount of time. The art of finding good heuristics is

indeed critical in computer science.

For instance, the Dijkstra shortest path computation algorithm has a polyno-

mial complexity; by contrast, the very well known ‘‘Traveling Salesman’’ (see

[TRAVEL-SALESMAN]) problem is known as NP complete. This problem con-

sists of trying to find the shortest closed path a traveling salesman should follow to

visit a set of cities exactly once, where the distances between cities are known. More

can be found on algorithms in the following references: [ALGO-1], [ALGO-2],

[ALGO-3] and [ALGO-4].



4.14 Incremental Dijkstra

In Section 4.6, we saw the ‘‘regular’’ Dijkstra algorithm that computes a loop-free

shortest path in a network. In this section, a very useful optimization called

incremental SPF (iSPF) is presented, which allows reducing the SPT computation

to some subtree (instead of the complete tree); in some cases detailed in this section,

the SPT computation can even be completely avoided.

4.14.1 Motivation

As described earlier in this chapter, in the normal IGP link state mode of operation,

upon the receipt of a new LSA (an LSA with a newer content, not a refreshed LSA),

a full SPF is triggered, potentially after some time has elapsed, regardless of the

network state change reported in the LSA.

To highlight the motivations for iSPF, let us analyze various situations

(Figure 4.31).

Consider the network depicted in Figure 4.31 and suppose that the link H-I

fails. Upon a failure of the link H-I, two LSAs will be originated by the nodes H and

I to reflect the network topology change, which will trigger an SPF computation on

the node A (among others, but in this example we focus on the node A). A simple

observation shows that link H-I is not used in the SPT computed by A and the same

observation can be done for several other links in the network. This means that a

E

H I

B

F

J

G

DC

ON

MLK

QP

E

H I

B

F

J

G

DC

ON

MLK

QP

Thick Lines Represent theComputed SPF

Computing Node

All Links Have a Cost =1

AA

Figure 4.31 Motivation for incremental SPF.


4.14 Incremental Dijkstra 285

new SPF will be needlessly computed in this case. So one obvious optimization

would be to avoid running an SPT when a new LSA is received that advertised a

state change for a link, which is not used in the SPT. Now analyze another situation

depicted in Figure 4.32.

In this second example, consider the failure of the link I-L as depicted in Figure

4.32. By contrast to the previous failure scenario, that link is used in the SPT but

with a regular SPF the entire SPT is recomputed, although just a subtree needs to be

recomputed in this case. Now suppose that the link N-R comes up. Similarly,

instead of recomputing the whole SPT, just the subtree rooted at node N should

be computed.

A final example of interesting optimization can be given; for example, a new IP

address is added locally on a router, and this triggers the origination of a new LSA

(LSA type 1 in OSPF, new LSP in IS-IS); hence, a new SPT is computed on every

router, which is clearly not necessary in such a case.

In addition to the processing time required to run an SPF, there is another

downside of systematically recomputing an entire SPT upon receiving a new LSA.

It is not uncommon to have multiple equal cost paths between two nodes, especially

in highly symmetrical networks. When an entire SPT is recomputed, this may lead

to the selection of another path to reach a particular node even if the node is

reachable via an equivalent path. The drawback here is that this may lead to some

unnecessary routing changes.

E

H

I

B

F

J

G

DC

ON

MLK

QP


E

H I

B

F

J

G

DC

ON

MLK

QP

New SPT After the Failure of the Link I-L

ImpactedSub-trees

R

A

A

Figure 4.32 Motivation for incremental SPF.



This highlights the whole idea of iSPF, which is to limit the SPT computation to

the portion of the tree affected by the network topology change instead of system-

atically recomputing the entire SPT.

4.14.2 History

The original iSPF algorithm has been originally designed by Eric Rosen during the

ARPANET days in the late seventies (see [ARPA-2]). Since then, the algorithm has

been slightly changed and optimized to handle additional network scenarios, in

particular in the case of multiple equal cost paths. Note that the algorithm has been

implemented in commercial products. For instance, Cisco Systems supports iSPF

both for IS-IS and for OSPF.

4.14.3 Algorithm Description

This section describes in detail the original iSPF algorithm, proposed by E. Rosen

in 1978. Because there are various events that must be differentiated that lead to

different sets of action when performing iSPF, this section starts by listing the

various situations and required set of actions, followed by the iSPF algorithm itself.

First, the assumption is made that node S (source) has already computed an SPT.

Situation 1: Link Cost Increase

An LSA is received that reports a link cost increase for some link Lij (link between

the node i and the node j) in the network. This is also usually referred to as a ‘‘bad

news.’’ Note that this covers the cases of both a link increase because of a configur-

ation change performed by the network administrator and a link failure in which

the cost of the corresponding link is infinite.

As already mentioned, if link Lij does not belong to the SPT, no action is

required. Indeed, because the cost of link Lij has increased, there cannot be any

shortest path to any node in the network via this link because it was already unused

and the link cost increase would even increase the length of any path through that

link. On the other hand, if the link is in the SPT, then not only the distance to the

node j is increased but also the distance to any other node reachable by means of the

node j; thus, all nodes belonging to the subtree rooted at node j are candidates for

routing changes (there might be a better path via some other nodes from the source).

Note that those nodes are just ‘‘candidates’’ for routing changes, which does not

mean that the shortest path to those nodes will systematically change, but the metric

will. On the other hand, any other node that does not belong to this subtree is not

affected by the network change.

This is illustrated in the following examples:

Example 1: Bad news and the link does not belong to the SPT (all the links have

a cost of 1) (Figure 4.33). So for instance, in Figure 4.33, if the link cost of the



link F-G is increased from 1 to 2 (or from 1 to infinity in case of link failure),

this does not result in any SPT change.

Example 2: Bad news and the link belongs to the SPT (Figure 4.34).

Let us now consider Figure 4.34 (note that a few link costs have been changed

for the sake of this example). If link I-L fails (or if its cost is increased to, say, 20),

then not only the path to node L changes but also the path to any other node that

belongs to the subtree rooted at node L (i.e., nodes O and P). Conversely, any other

node that does not belong to the subtree rooted at node L is not affected.

This implies the following changes to the original SPF algorithm:

. Identify the nodes that belong to the subtree rooted at node j, and update

their distance from source S.

. Try to find a shorter path to each subtree node K by routing K via those of

its neighbors, which are not in the subtree. If such a shorter path can be

found, add K to the TENT list.

Situation 2: A Link Cost Decrease (Good News)

An LSA is received that reports a link cost decrease for some link Lij in the network.

This is also usually referred to as good news. Note that this covers the cases of both

a link decrease because of a configuration change performed by the network

administrator and a link recovery.

E

H I

B

F

J

G

DC

ON

MLK

QP

Thick Lines Represent the Computed SPF

There is Absolutely NOImpact on the SPT

A

Figure 4.33 Example 1 (bad news and the link does not belong to the SPT).



If the link Lij belongs to the SPT, the paths to node j and any other nodes

belonging to the subtree rooted by j do not change because the path cost to those

nodes via j will be decreased and the shortest paths from S to those nodes were

already by means of j. Furthermore, any node being at a shorter distance from S

than the new distance from j to the source will follow an identical path.

Example 3 of good news and the link belongs to the SPT (Figure 4.35).

Figure 4.35 shows an interesting example that illustrates the possible impact on

the tree resulting from a link cost decrease that belongs to the SPT. Again, some list

costs have been changed in this example compared to previous examples. So let us

consider, for instance, that the link cost of link C-G decreases from 3 to 1. Because

link C-G already belongs to the SPT, the path to node G and all the nodes in the

subtree rooted at G (i.e., the nodes J and Q) do not change. The new shortest

distances from the source node A to the nodes G, J, and Q are now 2, 3, and 4,

respectively. All the nodes at a distance of 2 or less from node A (i.e., the nodes B,

D, C, E, and H) are not candidates for routing changes.

On the other hand, any node that does not belong to the subtree rooted at j and

whose distance d to S is higher than the new shortest distance between S and j

are candidates for routing change because there might exist a shorter path than the

current path. So back to our example, the nodes F, I, K, L, M, N, O, and P are

candidates for routing changes because their distance to S is strictly higher than 2

E

H I

B

F

J

G

DC

ON

MLK

QP


E

H I

B

F

J

G

DC

ON

MLK

QP

New SPT after the Failure of the Link I-L

Impacted Sub-Trees

3

x Link Cost

5 5

A A

Figure 4.34 Example 2 (bad news and the link belongs to the SPT).



(the new distance from A to G); indeed a new shortest path might be found by

means of G. Of course, this does mean that those nodes will get a better path.

Figure 4.35 depicts the new resulting SPT.

This implies the following changes to the original SPF algorithm:

. Identify the nodes that belong to the subtree rooted at node j and update

their distance from the source S.

. Then, try to find a shorter path for each node that is not in the subtree

rooted at j but is an immediate neighbor of a node in the subtree. If such a

path is found, add the node to the TENT list. In other words, this step

consists of checking whether a node that is not on the subtree could now get

a shorter path via the subtree rooted at j.

If link Lij does not belong to the SPT and its cost decreases, the algorithm must

check whether there now exists a shorter path to j via the link Lij by calculating

Delta ¼ d(i) (distance from the source S to i)þ cost(Lij)� d(j)

If Delta > 0, this good news has basically no effect on the existing SPT.

On the other hand, if Delta < 0, the shortest path to j is now via link Lij. So the

operation reattaches node j to node i (his new predecessor58). With that first

operation, the situation is now perfectly identical to the previous case where the

58We sometimes use the term parent to name the predecessor in the SPT.

E

H I

B

F

J

G

DC

N

MLK

QP

Initial State

E

H I

B

F

J

G

DC

ON

MLK

QP

Impacted Sub-Trees

35

Example 3: linkC-G decreasesfrom 3 to 1

15

3

4

3

4

2

5

23

3

O

33

5

AA

Figure 4.35 Example 3 (good news and the link belongs to the SPT).



link Lij was in the SPT and its cost has decreased. An example is given in the Figure

4.36.

Situation 3: Now let us consider the case of a node failure. If node j fails, this

implies that all the nodes in the subtree rooted at j must be reattached to the

SPT. The rest of the required operations are similar to those in the situation 1,

except that node j is now excluded from the SPT.

Situation 4: Node recovery. If node j now recovers after a failure, the first

operation to perform is to compute the shortest path to node j, which can easily

be done by checking each neighbor of node j and adding the cost of the link Lij.

Then all the candidate nodes for routing changes are those whose distance from

the source is greater than the distance from S to node j. At this stage one can

perform a complete SPF algorithm starting with node j in the TENT list.

Final Incremental SPF Algorithm

The aim of the previous paragraph was to highlight the required changes in each

case; it is now time to provide the complete algorithm that basically consolidates all

the previously described changes.

Several variables are used during iSPF:

Delta

Lij: link between nodes i and j

c(Lij) cost of the link Lij

c(Lij) new cost of link Lij (after a change occurs on the link cost)

E

H I

B

F

J

G

DC

N

MLK

QP

Initial State

E

H I

B

F

J

G

DC

ON

MLK

QP

Impacted Sub-Trees

35

Example 4: Thecost of the linkD-G decreasesfrom 5 to 1

31

3

4

3

5

4

2

5

23

3

O

33

AA

Figure 4.36 Example 4 (good news, the link does not belong to the SPT and Delta < 0).



d(i) is the shortest distance from the source to node i

S: a subtree

Step 1 If there is no existing tree, go to step 7.

Step 2 If the change is related to a node status change (node recovery or failure—see cases

3 and 4 above), then Delta ¼ infinite.

If the status change ¼ node recovery, go to step 3.

If the status change ¼ node failure, go to step 4.

Step 3 If the change is related to link Lij, then

If the link Lij belongs to the SPT, Delta ¼ c'(Lij)� c(Lij)

If the link Lij does not belong to the SPT, Delta ¼ d(i)þ c(Lij)� d(j)

If Delta >0 then stop

=* Comment: if Delta > 0, the algorithm stops because in this case, link Lij did

not belong to the SPT and the change corresponds to a bad news; in other

words, there is no better path to reach the node j than the existing one, so there

is no impact on the SPT *=.

Step 4 Put node j and all of its descendants59 in S.

Step 5 For each node k in S, d(k) ¼ d(k) þ Delta.

=* Comment: In other words, each descendant of the node j is having its

distance updated by the distance change resulting from the link cost change

of the link Lij*=.

Step 6 For each node k in S

If Delta > 060 (bad news), try to find a shorter path to node k via each of its

neighbor that does not belong to S (try to reattach the node to different subtree

that would provide a better path). If such a better path is found, put node k in

the TENT list.

If Delta < 0, try to find a shorter path to each neighbor k' of k, which does not

belong to S by means of node k. If such a better path is found, then put k’ in

TENT.

=* Comment: If Delta > 0 (bad news, link Lij is in SPT), the algorithm tries to

reattach each node of S to another subtree offering a shorter path. If Delta < 0

(good news), the algorithm checks whether some other nodes, not in the

impacted subtree, could be reattached to that subtree to follow a more optimal

path *=.

Step 7 Move the node x to PATH such that d(x) ¼ min {d(y)for y 2 TENT}.

Step 8 For each neighbor z of node x

If the node z is already in PATH, then

If d(z) < d(x)þ cos t(Lxz), then do nothing

59A descendent of a node i is a node that belongs to some subtree A rooted at node i.60Note that in this case, the link Lij is in the SPT and the change is a bad news.



If d(z) > d(x)þ cos t(Lxz), then remove z from PATH, put in TENT and

update d(z) ¼ d(x)þ cos t(Lxz)

If the node z is not in TENT, then move z to TENT and update

d(z) ¼ d(x)þ cos t(Lxz)

If the node z is already in TENT

If d(z) < d(x)þ cos t(Lxz), then do nothing

If d(z) > d(x)þ cos t(Lxz), then update d(z) ¼ d(x)þ cos t(Lxz)

Step 9 If TENT is empty, stop; otherwise go to step 5.

Since then, various optimizations of the original algorithms have been proposed in

particular to reduce the running time and handle specific cases like equal cost paths.

Note an excellent reference: [ROUTING-THESIS].

4.14.4 iSPF Efficiency

It can be observed that the gain varies with the location of the failure of the newly

announced link with respect to the computing node. Indeed, when a link far

downstream from a node fails, the gain of running iSPF as opposed to a full SPF

is substantial because the impacted subtree is minimal compared to the entire SPT.

When the link is not used in the SPT (which is not a rare event when one considers

the proportion of links in the SPT), the gain is maximal because no new computa-

tion is triggered.

On the other hand, the gain diminishes as the failure gets closer to the comput-

ing node, so in some cases running iSPF is not worth the slight computation

complexity increase. There might even be some very particular cases where the

computation time could be slightly greater than a full SPF running time if the

failure is very close to the computing node.

Now, in most network failures, the gain of iSPF just offsets the extra work of

computation. This is particularly true for large networks with hundreds of nodes.

Note also that the gain is not only limited to the SPT computation but also applies

to the RIB computation, which is, as already pointed out, a nonnegligible compo-

nent of the overall network convergence. Just to give a rough idea, some extensive

tests ran on large network topologies showed that the potential gain can be as large

as 90% with a very significant average gain (tens percentage), whereas the worst case

when the failure is close to the computing node never exceeds a few percentage

points in very large topologies.

4.15 Interaction Between Fast IGP Convergence andNSF

Both the IGP convergence aspects and NSF have been studied in this chapter

devoted to IP. At a first sight, IGP timers tuned to achieve fast convergence and

NSF may look contradictory, although they both share the same goal of minimizing

packet loss upon network element failure. Indeed, the IGP tries to find an alternate


4.15 Interaction Between Fast IGP Convergence and NSF 293

path around the failed network element. Conversely, the NSF procedure keeps on

forwarding traffic to the failed node, making the assumption that the node has just

experienced a control plane failure unaffecting the proper forwarding operation of

packets.

It must first be underscored that the failure scopes are different. Indeed, the

IGP handles a broader scope of network failure types: link failures and node

failures; moreover, it covers multiple node failure types: control plane failure,

power supply failure. By contrast, NSF just handles the case of a control plane

failure, which can be recovered using a second route processor.

Furthermore, the preference for one mode over the other may be driven by

several factors:

. The router’s location in the network: For instance, in the case of an edge

router connecting a customer premises edge (CPE) device to the service

provider router, it is not rare to have a single link between the CPE and the

SP’s edge router. In the case of an edge router route processor failure, the

only usable mechanism to avoid affecting the traffic sent by and to the CPE

is to support NSF.

. There might be some other situations in which NSF may be useful in the core:

For instance, when a core router is fully redundant and the alternate paths

around the routers are not able to carry the extra traffic routed through the

node at steady state. In such a case, it might also be useful to always try to

keep forwarding the traffic across the restarting router.

That being said, it is foreseen that in most cases, the IGP timers will be tweaked

appropriately to meet the convergence objectives in the core, whereas NSF will

preferably be configured at the edge of the network.

The following is a discussion of IGP and NSF timers tuning when both fast

IGP and NSF are configured on a router.

. What happens if the IGP timers (hello, hold-down timers) are set to very

small values in the network to achieve fast convergence and an unplanned

route processor failure occurs on an NSF-capable neighbor router?

. Does the restarting node have enough time to complete its restarting procedure

before its neighbors declare it down and trigger an IGP convergence?

Before trying to answer this question, it worth analyzing the consequences of such

an event. If a neighbor incorrectly declares a restarting node down, then the traffic

will be rerouted around the restarting node. Such an event, sometimes referred to as

false-positive condition, would lead to unexpected behavior but in most cases will

not have dramatic consequences other than triggering unnecessary traffic reroutes

in the network (except for the IP prefixes not reachable via other means: e.g., a local

area network locally attached to the restarting router). Note also that once the

restarting node will reestablish its adjacencies, the nodes in the network will

reconverge and will start reusing the restarted node in their path computation.

There are several other situations in which false-positive events can occur, which are

discussed in Chapters 5 and 6.



A false-positive event will be triggered on a node A if its restarting NSF-

capable neighbor B cannot send the grace LSA before the expiration of its

RouterDeadInterval61 interval (for its neighbor B). Indeed, suppose that the

RouterDeadInterval is set to x seconds on the node A for its neighbor B and B

experienced an unplanned control plane failure. Then, to avoid a false-positive

event on A, A must receive B’s grace-period LSA before x has expired. Whether

this objective is achievable or not will depend on both the platform’s ability to

quickly detect the route processor failure and originate the grace LSA and the IGP

parameter setting (in particular the RouterDeadInterval).

4.16 Research-Related Topics

IP routing protocols continue to evolve to meet new requirements. The following

are a few ongoing research topics on IP routing:

. SRLG SPF-aware protocol: This would allow some implementations to take

into account the knowledge of SRLG (explored in detail in Chapter 5)

during the SPF computation to minimize the routing convergence upon

SRLG failure.

. Temporary loop reduction: Several enhancements are being designed to

reduce the effect of temporary loop and in some cases just eliminate those

temporary loops (e.g., in the case of a link restoration or the failure of a link

protected by other recovery mechanisms).

. Local protection for IP: Some mechanisms could be used to provide local

protection for IP to avoid the RIB computation upon network failure and

the LSA propagation to a rerouting router being able to find an alternate

path to the destination.

61As a reminder, the OSPF RouterDeadInterval is the timer that defines the maximum amount of time a

router A can wait without receiving any OSPF hello message from a neighbor B before declaring the A-B

adjacency down. The corresponding IS-IS timer is called the hold-time.


4.16 Research-Related Topics 295


C H A P T E R 5

MPLS Traffic EngineeringRecoveryMechanisms

Multi-Protocol Label Switching (MPLS) traffic engineering (TE) has encountered

an ineluctable success during the past years, which led to the development of a rich

set of MPLS TE recovery techniques.

This chapter starts with a refresher of the MPLS TE technology, followed by

the motivation for deploying such a technology in a data network. The recovery

techniques are then examined with the objective to provide a detailed description of

their mode of operation and their respective pros and cons, the type of the network

design they preferably apply to, and aspects of design that operators find important

for deployment in their network.

Furthermore, various properties of each recovery technique are analyzed.

These properties are of the utmost importance when choosing a particular recovery

technique in a network: the recovery time, the impact on scalability, the ability

to provide some quality-of-service (QoS) guarantees along the alternate path, and

the technique efficiency with respect to the amount of bandwidth dedicated to

recovery path. These are just a subset of the aspects covered for each recovery

technique.

This chapter covers the default restoration mode of operation of MPLS TE, as

well as the global and local protection recovery schemes. A rich set of examples are

provided throughout this chapter that illustrate the mode of operation and how

those various recovery techniques can be deployed in a network. An entire section is

devoted to a complete set of case studies that show how an operator can use those

MPLS recovery techniques to satisfy a set of recovery objectives while respecting

network constraints. It is worth highlighting that most of these case studies are

inspired by existing or foreseen deployment scenarios. After a summary section, this

297

PUB1

Vasseur / Network Recovery Final 9.6.2004 9:48pm page 297

first part of this chapter concludes with the standardization aspects of the MPLS TE

recovery techniques.Then, the secondpartof this chapter isdevoted tosomeadvanced

topics of MPLS recovery. The aim of those two sections is to cover in detail the

signaling aspects of MPLS local protection (Section 5.14) and the interesting topic

of the backup path computation (Section 5.15) and may be skipped by the reader

without altering the good understanding of the MPLS recovery techniques. Finally

this chapter concludes with a section that describes various related topics of research.

5.1 MPLS Traffic Engineering Refresher

In this section, we first provide a brief refresher on the notion of traffic engineering.

Then the terminology specific to MPLS TE is shown through an example, and after

having reviewed the main components of MPLS TE, we detail the motivation for

deploying MPLS TE in a network.

5.1.1 Traffic Engineering in Data Networks

One of the major challenges of network design has always been traffic engineering;

that is, how to route the traffic so network resources are efficiently used. The term

‘‘efficiently’’ requires some explanations though. An obvious objective of network

design is to avoid congestion. If the network is fully congested, traffic engineering

cannot really help and the network has to be upgraded (i.e., bandwidth and/or

switching/routing capacities must be added). On the other hand, if some regions of

the network are congested while others have spare capacity, then trying to alleviate

the congestion spots by rerouting some flows along an alternate path (where

capacity is available) certainly helps.

In other words, TE defines how flows should be routed to efficiently use

network resources. Even in the absence of congestion, a more optimal traffic load

balance may help increase the QoS. For instance, suppose that some links are used

at 60% capacity (on the average), which strictly speaking cannot be considered a

congested link whereas other links are loaded at 10%. It is worth noting that delay-

sensitive traffic traversing a link loaded at 60% may experience some undesirable

delay and jitter, especially without queuing mechanisms. Thus, achieving a better

traffic load balance with the objective of minimizing the average link utilization

might be another motivation for TE.

Traffic engineering is not per se specific to MPLS. Various network types have

been using TE methods like public voice networks, ATM, Frame Relay, and Internet

Protocol (IP).

The Classic Fish Problem

Let us consider the following classical fish problem to highlight how situations in

which congestion appears in some parts of the network while other regions of the

network have spare capacity may occur in an IP network (Figure 5.1).

PR2

PUB3


298 C H A P T E R 5 MPLS Traffic Engineering Recovery Mechanisms

Figure 5.1 depicts two IP routers R1 and R2 sending traffic to the router

R8 (and beyond). Both R1 and R2 will compute the shortest path to reach R8

using a routing protocol like Open Shortest Path First (OSPF) or Intermediate

System to Intermediate System (IS-IS). Because all the links have an equal metric of

1, the flows from R1 and R2 to R8 will both follow the same path (‘‘north’’). If the

sum of their traffic exceeds the bandwidth capacity of the path ‘‘north’’ (R3-

R4-R5), this will result in some congestion, although some capacity is still available

along the path ‘‘south.’’ Changing the link metric in this case will not help because

IP routing protocols base their routing decision on the IP destination address. So

whether a packet whose destination is R8 is received from R1 or R2, it will be

routed by R3 along the same path. Another option in this very simple case is to set

up the link metric so the north and south paths have an equal cost to use load

balancing, but real networks are more complicated, and if other nodes are

connected to routers R4, R5, R6, and R7, load balancing becomes much more

challenging to achieve.

That said, TE with IP routing is of course possible and has already been

discussed in Chapter 4.

One solution to obtain better resource utilization is to use tunneling techniques

between source(s) and destination(s) so intermediate nodes do not participate in the

routing decision. ATM was extensively used to reach that goal; ATM permanent

virtual circuits (PVCs)/switched virtual circuits (SVCs) are established between

switches with characteristics based on the traffic requirements of each circuit

(e.g., bandwidth and QoS). ATM PVCs/SVCs are routed based on the network

resources and link costing using off-line or on-line path computation methods (e.g.,

Private Network–Network Interface [PNNI]). Then, once a packet (encapsulated in

ATM cells) is routed onto an ATM PVC, it strictly follows the ATM PVC path.

R1

R3

R2

R8

R5

R7

R4

R6

Path North: R3-R4-R5-R8

Path South: R3-R4-R5-R8

All Links Have a Metric = 1Routing DecisionBased on the IPDestination Address

Figure 5.1 The classic ‘‘fish problem.’’

AU4


5.1 MPLS Traffic Engineering Refresher 299

Although relatively efficient to improve network bandwidth usage, there are several

significant drawbacks with this approach:

. An additional layer (ATM) has to be managed and maintained in the

network (ATM), which implies additional cost in terms of equipment and

network operation.

. The number of routing adjacencies maintained by each router is potentially

very high because every router has a number of routing neighbors equal to

the number of routers in the mesh, which introduces some routing protocol

scalability limitations. Indeed, a mesh of n routers requires for each of them

to maintain n adjacencies and the route computation (shortest path first

[SPF]) is also increased significantly.

This is where MPLS TE comes into play. MPLS TE is also a ‘‘tunneling’’

mechanism using TE Label Switch Paths (TE LSPs; the terminology TE LSP is

detailed hereafter), which are established between pair of routers.

Each TE LSP has its own set of constraints—like bandwidth, affinities, and

rerouting constraints, to mention a few—and the network topology and resources

are taken into account along with the set of constraints to compute the TE LSP

path that satisfies the set of requirements. Different path computation methods can

be used to achieve that objective: distributed (each router is responsible for the

computation of its TE LSP path) or centralized (an off-line tool performs the path

computation of all the TE LSPs in the network).Then once a TE LSP is established,

IP packets are routed onto the TE LSP and strictly follow the computed path;

intermediate routers do not make any routing decision.

For instance, in Figure 5.2, suppose that the sum of required bandwidths

between R1 and R8 and R2 and R8 exceeds the available bandwidth on the north

AU5

R1

R3

R2

R8

R5

R7

R4

R6

Traffic Engineering LSP Routed through the North Path

All Links Have a Metric = 1

Traffic Engineering LSP Routed through the South Path

Figure 5.2 Optimizing network resources with MPLS traffic engineering.



path (R3-R4-R5). By using MPLS TE, once the TE LSP between R1 and R8 is

established, R2 figures out that the bandwidth available on the north path is

not sufficient to accommodate its traffic demand and selects the south path

(R2-R3-R6-R7-R5-R8) to establish its TE LSP. This allows better network resource

utilization and avoids traffic congestion.

Note that compared to the previous case with an ATM overlay network, just

one layer is required (IP/MPLS). Moreover, routers are not required to maintain

routing adjacencies over TE LSP. It is important to note that MPLS TE is a control

plane reservation protocol, so this is fundamentally a Call Admission Control

(CAC) mechanism. In other words, when a TE LSP is set up, no particular

resources in the data plane are reserved. The purpose of MPLS TE is to ensure

that a TE LSP is not routed along a path where other TE LSPs have already

reserved the bandwidth. For instance, on an OC3 link, if three TE LSPs have

already been reserved a total bandwidth of 120 Mbps, the remaining available

bandwidth (not already reserved in the control plane) is 35 Mbps and a TE LSP

requiring more than 35 Mbps will have to be routed along another path. This is in

contrast to IP in which IP packets are routed along the shortest path without

considering the traffic flow and available resources along this path.

5.1.2 Terminology

Because there are several terms specific to MPLS TE recovery techniques, which are

used throughout this chapter, we illustrate each of them via an example (Figure 5.3).

As depicted in Figure 5.3, three TE LSPs, called T1, T2, and T3, are signaled.

For instance, the TE LSP T1 starts on R1 and terminates on R8. We say that R1 is

the head-end label switched router (LSR) of T1 and R8 is its head-end LSR. Any

other LSR traversed by T1 is a midpoint LSR (e.g., R3, R4, and R5 are all midpoint

LSRs). Note that an LSR can play the role of a head-end LSR for an LSP while

being a midpoint or a tail-end LSR for other TE LSPs.

Notion of Disjoint Paths

Two TE LSPs are said to be link disjoint if they do not have any link in common

(e.g., T1 and T2 in Figure 5.3 are link disjoint). The terminology link diverse is also

used. On the other hand, two TE LSPs are said to be node disjoint if they do not

share any TE LSR (e.g., T1 and T3 are node disjoint), except potentially their head-

end and tail-end LSRs. The term node diverse is also used.

The recovery-specific terminology aspects are covered in their respective

sections. For instance, several terms are specific to the local protection techniques,

and these are covered in the section devoted to local protection techniques.

Shared Risk Link Group

The notion of shared risk link group (SRLG) is crucial when studying network

resiliency and specifically refers to the notion of simultaneous failures of multiple



network elements that can be caused by the failure of a single element. Let us

consider the network scenario in Figure 5.4.

Figure 5.4A shows a set of six optical cross-connect OXC1 through OXC6,

which are interconnected by a set of fibers, which constitutes an optical layer used

to interconnect the LSRs R1 through R5. More precisely, the various links are

routed in the optical layer as follows:

. Link R1-R2 follows the optical path OXC1-OXC2.

. Link R1-R4 follows the optical path OXC1-OXC4-OXC5.




. Link R5-R4 follows the optical path OXC6-OXC4-OXC5.

In this scenario, the two optical paths followed by the links R1-R4 and R4-R5 share

a common resource: the optical fiber interconnecting the OXC4 and OXC5. We say

that the two links share a unique SRLG because the failure of a single resource (the

optical fiber OXC4-OXC5) would provoke the simultaneous failure of the two

links.

By default the IP/MPLS layer does not have any visibility of the optical layout,

which may lead to an incorrect path selection for TE a LSP. To remedy to this

problem, an Internet Gateway Protocol (IGP) extension has been defined. As

described in Section 5.1, the TE-related information is flooded within an OSPF

area using an opaque LSA type 10 (for IS-IS the TE-related information is flooded

in a specific type-length value [TLV]). This opaque LSA carries one top-level TLV,

which can be one of the two following types: router address (type 1) or link (type 2).

The link sub-TLV is made of several sub-TLVs. One of them is the SRLG sub-TLV

(type 16); it has a variable length with 4 bytes per SRLG value.

R1

R3

R2

R8

R5

R7

R4

R6

T1

Head-End LSR

TE LSP (Traffic EngineeringLabel Swtiched Path)

LSR (Label Switched Router)

Mid-Point LSR

Tail-End LSR

T2

T3

Figure 5.3 Illustration of MPLS traffic engineering recovery.

AU6



P Important notes:

. A link may belong to multiple SRLGs.

. The IGP extensions allow carrying the SRLG values. On the other hand,

having the knowledge of the underlying optical/SONET-SDH topology is

not always possible. Indeed, an operator may rely on another carrier to

provide optical lambda, and in that case, the SP does not always have the

knowledge of the actual physical path and the potential SRLG. Moreover,

an optical path may be dynamic and so its path may change over the time.

This requires updating the SRLG value each time a change occurs if the

SRLG changes also.

Notion of SRLG disjoint: A TE LSP is said to be SRLG disjoint from a link L or a

node R if and only if its path does not include any link or node that is part of the

SRLG of that L or R. For instance, back Figure 5.4, a TE LSP T1 following the

path R1-R2-R3-R4 is SRLG disjoint from the link R1-R4. Two TE LSPs are said

to be SRLG disjoint if the respective set of links they traverse do not have any

SRLG in common.

5.1.3 MPLS Traffic Engineering Components

The aim of this section is to review the main components of MPLS TE:

1. Configuration of TE LSP on head-end LSR: The first step consists of config-

uring the TE LSPs’ attributes on the head-end LSR. Various attributes can be

configured like the destination (address of the tail-end LSR), the required

bandwidth, the required protection/restoration, the affinities, and others.

Shared Risk Link Group

Optical Layer

Optical Fibers

R1

R4

R3R2

R5

OXC1

OXC6

OXC5

OXC4

OXC3OXC2

Same SRLG

Figure 5.4 Shared risk link group.



2. Topology and resource information distribution: To compute a path obeying

the set of specified constraint(s), the head-end LSR needs to gather top-

ology and resource information. Note that this applies only to situations in

which the TE LSPs path is dynamically computed by each LSR (also

referred to as distributed or on-line path computation) by contrast with

centralized or off-line path computation in which the LSPs’ path is computed

by an off-line tool. In such a case, the topology and resource information is

distributed by a link state routing protocol (OSPF or IS-IS) with

TE extensions that reflect links characteristics and reservation states. TE

TLVs have been defined and are carried within an LSP for IS-IS and TE

opaque LSA type 10 for OSPF to flood the reservation states and other

parameters.

3. TE LSP computation: As already stated, the computation of a TE LSP path

can either be performed by an off-line tool or on-line. In the former case, an

external tool simultaneously computes all the TE LSPs paths according to

the network resources. In the latter case, every router (LSR) uses its resource

and topology database (IS-IS or OSPF), takes into account the set of

requirements of the TE LSP, and computes the shortest path satisfying the

set of constraints usually using a constraint shortest path first (CSPF)

algorithm. Various types of CSPFs can be used.

4. TE LSP setup: Once the path of a TE LSP has been computed, the head-end

LSR signals the TE LSP by means of the Resource Reservation Protocol

(RSVP) signaling protocol with the corresponding set of extensions defined

in [RSVP-TE]. For instance, in Figure 5.3, R1 computes a path for the LSP

T1: R1-R3-R4-R5-R8 based on T1’s attributes and the network and

resources topology information disseminated by the routing protocol.

Once T1’s path is computed, T1 is signaled by RSVP-TE. TE LSPs are

then signaled, maintained (refreshed) and potentially torn down using vari-

ous RSVP messages: Path, Resv, Path Error, Path Tear, Reservation Error,

Resv Confirmation, and Resv Tear. Also, various new objects have been

defined in [RSVP-TE] for the purpose of MPLS TE, for example, to allocate

labels to TE LSPs that will then be used in the MPLS data plane. Note that

labels are assigned in the upstream direction using RSVP messages (Resv

message) and intermediate LSRs are programmed accordingly. For

instance, when the TE LSP T1 is signaled, labels are assigned by LSRs in

the upstream direction: R8 provides a label to R5, R5 provides a label to R4,

and so on.

Note: It is worth mentioning that RSVP has often been criticized for its scalability,

in particular the number of states required in the network. As a matter of fact,

currently deployed networks can handle thousands of RSVP TE reservations (TE

LSPs) on a single router without any problem. Moreover, various protocol

enhancements have been defined (see [REFRESH-REDUCTION]) to further

increase the scalability, if needed. Finally, MPLS TE can be deployed with multiple

levels of hierarchies, if required, in very large networks.

AU7



5. Packet forwarding: Once a TE LSP is set up, the head-end LSR can

update its routing table and start using TE LSP to forward IP packets.

A label of 32 bits is pushed onto the IP packet, which is then label switched

across the network (intermediate routers do not make any routing decision).

5.1.4 Notion of Preemption in MPLS Traffic Engineering

There is one interesting property called ‘‘preemption’’ defined in MPLS TE, which

deserves to be slightly elaborated in the chapter because upon network element

failure, preemption mechanisms may be triggered. [RSVP-TE] defines the notion

of preemption or priority for a TE LSP. This parameter is signaled in the

SESSION-ATTRIBUTE object of the RSVP TE Path message (more precisely,

the RFC defines two priorities known as the ‘‘setup’’ and ‘‘holding’’ priorities,

which define the priority of a TE LSP with respect to taking and holding resources,

respectively).

When a new TE LSP is signaled, an LSR considers the admission of this

newly signaled TE LSP by comparing the requested bandwidth with the bandwidth

available at the priority specified in the setup priority. If the requested bandwidth

is available but this requires preempting other TE LSPs having a lower

priority, then the newly signaled TE LSP is admitted and one or more TE LSPs

with a lower priority are preempted. Note that the selection of the set of

lower priority TE LSPs to be preempted is a local decision and is generally

implementation specific. More details of preemption policies can be found in

[PREEMPTION-POL].

The preemption process implies the set of following actions for each preempted

TE LSP:

. The corresponding local RSVP states are cleared and the traffic is no longer

forwarded.

. Messages are sent both upstream (RSVP Path Error message) and

downstream (RSVP Resv Error) so all the states corresponding to the

preempted TE LSP are cleared along its path. Then the head-LSR LSR of

a preempted TE LSP initiates a TE reroute procedure as detailed earlier to

reroute the TE LSP along another path.

This means that hard preemption is by nature a disruptive mode. So the concept of

soft preemption has been introduced in [SOFT-PREEMPTION] and proposes a

different mode of preemption. If a TE LSP must be preempted to accommodate a

higher priority TE LSP requests, the preempting LSR performs the following

actions:

. The preempting LSP signals to the respective head-end LSR the need to

reroute the TE LSP in a nondisruptive fashion (so-called ‘‘make before

break’’ procedure).

. The local states of the soft preempted TE LSP are not cleared and no RSVP

Path Error/RSVP Error messages are sent.

AU8



Hence, the preempting node keeps forwarding the traffic of a soft preempted TE

LSP for a certain period. This gives a chance for the soft preempted TE LSPs head-

end LSR to reroute their TE LSPs along an alternate path without disrupting traffic

flow.

It is worth pointing out that this implies to temporary provoke reservation

overbooking on some links because until the soft preempted TE LSPs are rerouted

by their respective head-end LSR, the sum of admitted bandwidth is higher than the

maximum allowed. Note that some algorithms can be carefully designed to preempt

hard preemptable62 TE LSPs first. Moreover, appropriate MPLS Diffserv mecha-

nisms can be used to make sure that high-priority traffic is served adequately.

5.1.5 Motivations for Deploying MPLS Traffic Engineering

Once the concept of TE and the main components of MPLS TE have been

reviewed, it is time to highlight the various motivations for deploying MPLS TE

in a network.

1. Bandwidth optimization: As pointed out in Section 5.1, MPLS TE can be

deployed to achieve better network resource utilization, usually referred to

as bandwidth optimization.

2. Strict QoS guarantees: Another motivation for deploying MPLS TE in a

network is to enforce strict QoS guarantees for various service types includ-

ing sensitive traffic flows like voice, video, and circuit emulation. As already

mentioned, MPLS TE acts on the control plane and as such takes care of the

routing decision. For instance, consider a network with a single class of

service (CoS), MPLS TE allows an operator to reduce the average and

maximum link utilization. Hence, a direct implication is that the probability

of traffic queuing delay is decreased, which correlates with a better QoS.

Another example is the case of a network with multiple classes of service.

Making sure that appropriate treatment of sensitive flows is performed in

the data plane requires various mechanisms like marking, queuing, and

congestion avoidance in the data plane. In such networks, MPLS TE will

allow control over the proportion of high-priority traffic versus medium-

and low-priority traffic on a per-link basis, which will increase the QoS.

Although this has already been highlighted, to provide QoS guarantees

between two nodes, specific actions must be taken in the IP/MPLS data

plane, implementing the Differentiated Services (Diffserv) model. Indeed,

MPLS TE is responsible for finding a path obeying a set of constraints, but

once the packets are sent onto that TE LSP, each node along the path has to

serve the packet appropriately according to the required CoS.

3. Fast recovery: Several mechanisms for MPLS TE are described throughout

this chapter, allowing for fast recovery along with other requirements like

QoS protection during failure. Those mechanisms have been generating a

62The hard/soft preemptable property of a TE LSP is explicitly signaled in RSVP Path message.



growing interest for MPLS TE, and the sole interest for fast convergence,

even if bandwidth optimization or strict QoS guarantees are not required,

may justify the deployment of MPLS TE. Several large networks have

deployed MPLS TE to benefit from the set of fast recovery mechanisms.

The aim of the previous short paragraph was to introduce the motivation of

deploying MPLS TE in a network: bandwidth optimization, strict QoS guarantees,

and fast recovery. They are of course nonexclusive. For example, consider an IP/

MPLS network where the resource utilization is not optimal and fast recovery is

desired. Then MPLS TE with, for instance, any fast recovery technique described in

this chapter can be deployed. Another example is an IP/MPLS network where strict

QoS guarantees are required for the voice traffic, for instance, as well as fast

recovery for the virtual private networking (VPN) traffic and the voice traffic.

Finally, as already pointed out, MPLS TE can be deployed for the sole motivation

of benefiting from fast recovery. Consider an overprovisioned network in which

neither bandwidth optimization nor strict QoS guarantees are necessary (QoS

guarantee is achieved by overprovisioning), but fast recovery is a must. Then

MPLS TE is a good candidate for its fast recovery property.

5.2 Analysis of the Recovery Cycle

Before studying the various recovery techniques used in IP/MPLS networks, it is

worth spending some time on the recovery cycle analysis introduced in Chapter

1 and depicted in Figure 5.5.

5.2.1 Fault Detection Time

As with any other recovery techniques at any layer, the fault detection time is a key

component of the total recovery time and highly varies depending on the fault

detection mechanism in use and the underlying layer 1 and layer 2. For instance, the

AU9

Time


Fault Detection Time

Hold-Off Time

Recovery Operation Time

Traffic Recovery Time

Failure

Fault Detected

Recovery Time

Figure 5.5 Recovery cycle.


5.2 Analysis of the Recovery Cycle 307

fault detection time can vary from a few tens of milliseconds when two LSRs are

interconnected via a SONET/SDH VC or an optical lightpath to a few hundreds of

milliseconds or seconds when hello mechanisms are required. (Section 4.3 in Chap-

ter 4 has been entirely devoted to the important aspects of failure profile and fault

detections aspects.)

5.2.2 Hold-Off Timer

A hold-off timer can be very useful if the underlying layer has a recovery scheme.

Those aspects of multilayer protection/restoration strategies are covered in detail in

Chapter 6. In a nutshell, consider, for instance, a multilayer network where fast

recovery mechanisms are implemented both at the optical layer and at the MPLS

layer. Then, when the failure occurs, one should generally avoid any racing condi-

tions where both recovery mechanisms simultaneously try to perform a reroute

along an alternate path. In that case, a bottom-up timer-based approach can be

adopted, in which the MPLS layer will wait for a hold-off timer to expire before

trying to perform a reroute, to give the optical layer a chance to restore the failed

resources. If the optical layer does not succeed in restoring the failed resource before

the hold-off timer expires, the MPLS recovery mechanism will be triggered to

restore the failed resource at the MPLS layer (the interlayer recovery mechanisms

are more extensively discussed in Chapter 6).

5.2.3 Fault Notification Time

To perform traffic recovery, an LSR must first be informed of the failure. As we will

see in this chapter, depending on the MPLS TE recovery mechanism used, the

traffic recovery may be performed on the node immediately upstream to the failure

or on the head-end LSR (the LSR originating the TE LSP); we call the fault

indication signal (FIS) the signal of the failure to the node in charge of performing

the traffic recovery. Hence, once the fault has been detected by an LSR R, the FIS is

propagated until reaching an LSR that has the ability to reroute the TE LSP

affected by the failure. The fault notification time (time for the FIS to be received

by the node in charge of the traffic recovery) will vary depending on whether the

recovery technique is local or global, as shown in Chapter 1, Section 1.5.4.

It is usually desirable to guarantee through appropriate scheduling on the

various LSRs that the FIS receives the proper QoS, to minimize and guarantee

the fault notification time. For instance, as mentioned in Chapter 4, the IGP

flooding should be prioritized. In addition, IGP and RSVP messages should be

queued appropriately and of course should never be dropped in the case of conges-

tion. Refer to Chapter 4, Section 4.5, for further details on QoS mechanisms.

RSVP Reliable Messaging

As we saw in the Chapter 4, IGP updates are always sent in reliable mode; this is

inherent to link state routing protocols. By contrast, RSVP messages are sent by

default in nonreliable mode. So a loss of a Path Error message (which is used to AU10



report an LSP failure to upstream nodes) may significantly increase the fault

notification time, especially if the IGP has not been tuned to provide fast notifica-

tion (see Chapter 4 for details). [REFRESH-REDUCTION] proposes a mechanism

to send RSVP messages in reliable mode.

Two additional RSVP objects are defined: the MESSAGE-ID and the MES-

SAGE-ID-ACK objects. Each RSVP message sent in reliable mode contains a

unique MESSAGE-ID object and is acknowledged by a MESSAGE-ID-ACK

object (note that it may be piggybacked to any other RSVP messages or to

an RSVP acknowledgment message). The retransmission of a nonacknowledged

message for which an explicit acknowledgment had been requested is based on

an exponential back-off procedure; when an LSR has to send a message in

reliable mode, it inserts a MESSAGE-ID object in the RSVP message and sets a

particular flag in the MESSAGE-ID header called the ACK-Desired flag.

Upon receiving the RSVP message, a neighboring LSR will send back an

RSVP message containing a MESSAGE-ID-ACK object. When the message is

acknowledged, the transmission procedure is terminated. If the sending LSR does

not receive any acknowledgment before a dynamic timer has elapsed, the message

is retransmitted. The dynamic timer Tk is exponentially increased until a maximum

value is reached. Tk is first set to an initial retransmission value (generally a

short value).

For example, let us suppose that a message is sent for the first time, and Tk ¼ T1

is set to initial timer (the recommended value is 500 ms).

. If the message is not acknowledged after T1, then it is retransmitted.

Otherwise the procedure is stopped.

. Then Tk is set to Tk-1* (1þdelta) (the recommended value for delta is 1).

. The maximum value for k is set to a fixed value (k ¼ 3 is recommended).

In summary, the sending LSR waits 500 ms and then retransmits the message,

then waits for the 500ms*2, then 500 ms*4 with exponential increased waiting

times. If the maximum retransmission value is set to 3, the message is no longer

retransmitted after three trials.

5.2.4 Recovery Operation Time

Any recovery technique involves a set of actions to be completed. This includes

potential synchronization between network elements to coordinate.

5.2.5 Traffic Recovery Time

The traffic recovery time represents the time between the last recovery action and

the time the traffic is completely recovered. Each component described earlier is

analyzed for the various recovery techniques described in this chapter. We just saw

a brief description of each phase of the recovery cycle.

There are multiple types of MPLS TE recovery techniques (Table 5.1):

AU11

AU12


5.2 Analysis of the Recovery Cycle 309

. MPLS TE global default restoration (Section 5.3). This is the default

mode of recovery of MPLS TE, whereby the failure is notified to the

head-end LSR by means of RSVP and the routing protocol, which in turn

recomputes a new path and finally resignals the TE LSP along that new

path.

. MPLS TE global protection (Section 5.4): The basic principle is that two TE

LSPs are set up by the head-end LSR: a primary LSP and a backup. Once

the head-end LSR is notified of a failure along the LSP path, it starts using

the backup LSP.

. MPLS TE local protection (Fast Reroute; Section 5.5) is a local repair

recovery scheme in which upon failure detection the LSPs affected by the

failure are locally rerouted by the node immediately upstream to the failure.

5.3 MPLS Traffic Engineering Global Default Restoration

MPLS TE global default restoration is the default recovery technique. Once a

failure is detected by some downstream node, the head-end LSR is notified by

means of RSVP and the routing protocol (FIS). Upon receiving the notification, the

head-end LSR recomputes the path and signals the LSP along an alternate path.

5.3.1 Fault Signal Indication

It is probably worth elaborating on the nature of the FIS in the context of MPLS

TE because this aspect might be a source of confusion. In the context of an IP/

MPLS TE network, the FIS is either an IGP update63 or an RSVP Path Error

message. Actually, both will be generated independently. In the case of the IGP, a

node detecting a loss of routing adjacency will generate an LSA/LSP update (see

Chapter 4 for a detailed description of IP routing from a recovery perspective).

When a link fails between two nodes, the nodes attached to the failed link will send

an IGP update. In the case of a node failure, all the neighbors of the failed node will

send an IGP update. As discussed in Chapter 4, the timing sequence will highly

63In the rest of this chapter, the term IGP will be used in place of routing protocol.

AU13

Table 5.1 Categories of MPLS Recovery Mechanisms

Protection Restoration

Local recovery Local protection

(Section 5.5)

Global recovery Global protection

(Section 5.4)

Global default

restoration (section 5.3)



depend on the failure detection time and IGP parameter tuning. Moreover, every

node detecting a failure will also generate an RSVP Path Error message sent to each

head-end LSR having a TE LSP traversing the failed resource. For instance, in

Figure 5.3, if the link R3-R4 fails, as soon as the node R3 detects the link failure, it

sends a notification (RSVP Path Error message) to R1, the head-end LSR of T1

because T1 traverses the failed link. In addition, an IGP update will be sent by both

the nodes R3 and R4 to reflect the new topology. Again, the timing sequence

depends on the IGP tuning (see Chapter 4). Usually, the RSVP Path Error message

is received by the head-end LSRs within a few tens of milliseconds so generally

before the IGP update, but regardless of which FIS is first received, the head-end

LSR will get notified. As pointed out in Section 5.2, the FIS delivery is of the utmost

importance with MPLS global default restoration, because it triggers the rerouting

of the affected LSPs by the head-end LSR.


When a TE LSP is configured on a head-end LSR, its set of attributes is specified:

destination (IP address of the tail-end LSR), bandwidth, priority, protection/

restoration requirements, and other MPLS TE parameters. As far as the recovery

is concerned, an important parameter is the TE LSP path. As mentioned in Section

5.1, the path of a TE LSP can be computed in either a distributed or a centralized

fashion. In the former case, the configuration does not specify any particular path

and the head-end LSR dynamically computes the LSP path, taking into account the

constraints and available resources in the network. In the latter case, the path for

the TE LSP is statically configured on the head-end LSR. Some MPLS TE imple-

mentations allow the configuration of both options with an order of preference.

In Table 5.2, a TE LSP is defined, with its corresponding parameters/

constraints: destination address (10.0.1.100), bandwidth (10000), and priority (1).

In addition, the notion of path-option allows specifying in order of preference the

list of paths that the LSP should follow. In this example, the preferred path is

a static path (path 1) for which the set of hops is statically configured on the

head-end LSR. If path 1 is not available (path broken, not all the required con-

straints can be satisfied along this path), path 2 is the second preferred path.

Note that this corresponds to the off-line path computation method already

mentioned for MPLS TE where the LSP path is computed by some other tool

(not by the head-end LSR itself ). Then, if none of the static paths is available, the

head-end LSR will try to find a path that complies with the requested constraints

using the CSPF algorithm (this is the path option 3). Note that in addition, it might

be possible to have different sets of constraints for different path options. For

example, suppose that no path satisfying the bandwidth constraint (10000) can be

found. Then one solution could be to try a lower value. Of course, that example of

configuration shows a combination of static and dynamic paths for the sake of

illustration. Just one dynamic path could have been configured or one or more

static paths.

AU14

AU15


5.3 MPLS Traffic Engineering Global Default Restoration 311

Recovery Cycle with Global Default Restoration

The mode of operation of global default restoration is relatively simple: When the

head-end LSR is informed of the link/node failure, if an alternate path is specified,

the head-end LSR will check to see whether the configured path satisfies the

constraints for the TE LSP. If so, the TE LSP is reestablished along that path. If

no preconfigured path is specified on the head-end router and if configured as such,

then it triggers a new path computation for the set of affected TE LSPs, calling the

CSPF process (this exactly corresponds to the example in Table 5.2: If a notification

is received reporting that path 1 is unavailable, the head-end LSR tries to determine

whether it can use path 2, and if path 2 is not valid for some reason, it tries to

compute a path itself).

Note 1: Various existing MPLS TE implementations allow relaxing con-

straint(s) upon failure, which might sometimes be necessary. A slightly more

complicated example could be given in which for each path option, a set of

different constraints is specified. For instance, consider a network with rela-

tively high link utilization in terms of bandwidth reservation; a major node

failure may cause the inability for several TE LSPs to find an alternative path.

In this case, one of the options is to relax some constraints, like the bandwidth

constraint so the TE LSP can be routed. There is one undesirable side effect

though: Allowing a TE LSP to be rerouted as a 0 bandwidth TE LSP implies

that traffic will flow over this tunnel without any CAC. Thus, no bandwidth can

be guaranteed in this case. There are also various constraints a TE LSP can be

configured to support. Bandwidth is just one of them. Another example is

affinities. This allows, for instance, to ensure some TE LSPs will avoid particu-

lar network resources, using some bit masks. This can be seen as color. As an

Table 5.2 An example of MPLS Traffic Engineering TE LSP Configuration

interface Tunnel1

ip unnumbered Loopback0

no ip directed-broadcast

tunnel destination 10.0.1.100

tunnel mode mpls traffic-eng

tunnel mpls traffic-eng priority 1 1

tunnel mpls traffic-eng bandwidth 10000

tunnel mpls traffic-eng record-route

tunnel mpls traffic-eng path-option 1 explicit name path1

tunnel mpls traffic-eng path-option 2 explicit name path2

tunnel mpls traffic-eng path-option 3 dynamic

Path1 ¼ {192.170.14.2, 192.170.10.1, 192.170.4.5}

Path2 ¼ {192.170.13.2, 192.170.17.1, 192.170.20.5}

AU16



example, some network links might be colored in red (with red meaning ‘‘high

propagation delay’’ or ‘‘poor quality’’). This affinity link property is propa-

gated through IGP TE extensions (see [OSPF-TE] and [IS-IS-TE]). This way, a

TE LSP carrying very sensitive traffic like voice-over-IP (VoIP) will be config-

ured so red links are excluded from the path selection. In such a case, a major

network failure may imply for the affected TE LSP to be non-reroutable

without crossing one or several red links. In this case, it might be desirable to

relax the affinity constraint.

Note 2: A large proportion of deployed MPLS TE networks rely on distributed

computation in which no static path is configured; in this case, just a dynamic

path is configured and the head-end just recomputes a new path based on the

LSP constraints and its knowledge of the network and resource topology

information provided by the IGP.

A usual question is: What is the CSPF duration time? And the systematic

answer is: That depends. Indeed, the CSPF duration time is a function of the

network size and the CSPF algorithm in use. Finding the shortest constraint path

in a very large network obviously requires more time than in a small network.

Furthermore, the CSPF complexity may be variable depending on the algorithm in

use. Finally, the router CPU should also be taken into account. That said, in an

order of magnitude, an average CSPF computation time using a classic CSPF

algorithm on a network with hundreds of nodes rarely exceeds a few milliseconds.

It is worth noting that one CSPF must be triggered per affected TE LSP. Indeed, if

N LSPs starting on a head-end LSR R1 traverse a failed link, R1 will have to

compute a new path for each of them.

Once a new path has been found and computed, the TE LSP is signaled along

the new path. The final operation before any traffic can be routed over the newly

signaled TE LSP consists of updating the routing table for the destinations that can

be reached via the TE LSP.

5.3.3 Recovery Time

Providing hard numbers is not a realistic exercise because a significant number of

factors influence the rerouting time, but we describe the different components of the

recovery cycle with global default restoration through an example. Figure 5.6 shows

the different steps of the recovery cycle with MPLS TE global default restoration.

Step 1: The link R3-R4 fails, and an FIS (RSVP and IGP update) is sent to the

head-end LSR. As already pointed out, the sequence timing of IGP update and

the RSVP Path Error depends of many factors. The receipt of one of them is

sufficient for the head-end LSR to be notified of the failure.

Step 2: The FIS is sent to the head-end LSR. Note that the propagation delay

might be nonnegligible and is made up of two components: the propagation

delay (on wide area networks; this can be on the order of tens of milliseconds

and can become as large as 100 ms between two continents where the optical

path can be very long) and the queuing and processing delays for the FIS to


5.3 MPLS Traffic Engineering Global Default Restoration 313

reach the head-end router. As mentioned in Section 5.2, an appropriate

marking and scheduling in the forwarding path is highly recommended to

ensure that the queuing and processing delays are both minimized.

Step 3: Upon receiving the failure notification, the head-end LSR (R1 in this

example) tries to find an alternate path satisfying the set of constraints for each

TE LSP affected by the failure.

Step 4: The TE LSP is signaled along the new path. The RSVP signaling set up

time is also made of several components: the propagation delay along the path

(round trip) and the queuing and processing delays at each hop in both direc-

tions (upstream and downstream).

Step 5: The routing table of R1 is updated to use the newly signaled LSP.

In conclusion, because the different components of the recovery time are highly

dependent of the network characteristics, the resulting recovery time may vary from

a few milliseconds to hundreds of milliseconds, sometimes a few seconds. Testing

MPLS traffic reroute in a lab made of a few routers will probably result in a very

short convergence time (a few milliseconds); indeed, the propagation delays are

negligible, as is the FIS processing delay. The CSPF computation is also very short

because the network size is limited, and finally the set up time will also be negligible.

In contrast, a network with 1000 nodes, links with high propagation delays, and

hundreds of TE LSPs to reroute will require a much more significant amount of

time to converge.

5.4 MPLS Traffic Engineering Global Path Protection

MPLS TE global path protection (also usually referred to as path protection) is a

global 1:1 protection recovery mechanism. As defined in Chapter 1, Section 1.5.4,

this implies that the head-end LSR performs the rerouting (global recovery) and a

presignaled backup LSP is used (protection) if the protected LSP fails.

R1 R2

R8

R4 R5R3

R7R6

T11

IGP Update/RSVPPath Error

1

4

3

2

New Path ComputationFor the Set of AffectedTE LSPs

TE LSP Signalled

Figure 5.6 Event scheduling in the case of link/node failure with MPLS TE reroute.

AU17




Figure 5.7 describes the mode of operation of global path protection. In this figure,

there are two primary TE LSPs, T1 (which follows the path R2-R3-R4-R5-R6) and

T2 (which follows the path R7-R8-R9-R6). For each primary TE LSP, a dedicated

backup LSP is set up, before any failure occurs. It is worth noting that a backup TE

LSP (also called secondary TE LSP) must be link diverse or node diverse from the

primary TE LSP. In this example, the backup LSP of T1 follows the path R2-R1-

R10-R11-R5-R12-R6, which is link diverse64 from T1. By contrast, the backup LSP

of T2 follows the path R7-R2-R3-R4-R5-R6 and is node diverse from T2. The

aspects related to the backup path computation are covered in Section 5.15.

A backup (secondary) TE LSP is a regular TE LSP; that is, as far as RSVP signaling

is concerned, a backup TE LSP is signaled as any other TE LSP and the backup TE

LSP can be configured with either the same attributes as the primary TE LSP (in

this case, the backup TE LSP satisfies the same set of constraints as the primary TE

LSP) or with different constraints (e.g., no affinities, less bandwidth [say, 50% of the

primary TE LSP]). For instance, if the backup TE LSP is configured with 50% of

the primary TE LSP bandwidth, when used, the traffic will be forwarded along a

path where 50% of the bandwidth has been reserved. This does not mean that the

traffic will suffer from QoS degradation, depending on the actual use of the other

LSPs sharing the same network resources along its backup path.

64The terms disjoint and diverse are used interchangeably.

R2 R3

R8

R6R4

T1

R9

R12

R7

R1

R5

R10 R11

Primary TE LSP

Backup (Secondary) TE LSP

T2

BackupUp T2

BackupUp T1

Figure 5.7 MPLS traffic engineering path protection.


5.4 MPLS Traffic Engineering Global Path Protection 315

The mode of operation is quite straightforward: Once the failure is detected by

some downstream node, an FIS is sent to the head-end LSR of each affected LSP

(by affected LSP, we mean each LSP traversing the failed resource).

Note that all the aspects related to the FIS delivery described in Section 5.3

identically apply here because both the global default restoration and the global

path protection rely on the FIS delivery to trigger an LSP recovery.

Then upon receiving the FIS, the head-end LSR immediately switches the

traffic onto the backup TE LSP and updates its routing table accordingly.

5.4.2 Recovery Time

Compared to global default restoration, no routing computation has to be

done ‘‘on the fly’’ to find an alternate route for the failed TE LSP. Moreover,

with global path protection, the backup tunnel is already signaled, so no signaling

round is required to set up the backup TE LSP. It is important to note that the

saving in convergence time is predominately provided by the presignaling of the

TE LSP.

5.5 MPLS Traffic Engineering Local Protection

After a brief section introduction to the specific terminology used for MPLS TE

local protection, we describe the principle and mode of operation of two local

protection techniques called MPLS TE Fast Reroute. The last section describes

two deployment strategies of local protection recovery techniques. Note that the

terms MPLS TE local protection and Fast Reroute are used interchangeably

throughout this chapter.

5.5.1 Terminology

We begin this section by defining the terminology specific to MPLS TE Fast

Reroute through an example (Figure 5.8).

As shown in Figure 5.8, an LSP T1 is signaled that follows the path R1-R2-R3-

R4-R5. T1 is said to be ‘‘fast reroutable’’ if it is signaled with a specific attribute set

in the RSVP Path message that indicates its desire to benefit from local recovery in

the case of a failure.65 As shown in further subsections, Fast Reroute is a local

protection recovery scheme; hence, the LSPs affected by a failure are locally

rerouted by the node immediately upstream to the failure. This node is called the

point of local repair (PLR). For instance, the node R2 is a PLR if the link R2-R3 or

the node R3 fails. Fast Reroute uses backup tunnels to reroute affected LSPs. When

a backup tunnel terminates to PLR’s next hop (direct adjacent neighbor), it is an

NHOP backup tunnel. When the backup tunnel terminates on the neighbor of the

65See Section 5.14 for the details on RSVP signaling for Fast Reroute.



PLR’s neighbor, the backup tunnel is an NNHOP backup tunnel. Back to our

example, B1 is an NHOP backup tunnel of the PLR R2 and B2 is an NNHOP

backup tunnel of R2. The node where the backup tunnel terminates is called the

merge point (MP); hence, R4 is the MP of B2. Finally, a fast-reroutable LSP is said

to be protected at a node R if there exists a backup tunnel that can be used in the

case of a failure. T1 is protected at R2 by B1 and B2.

The terminology of detour merge point used in one Fast Reroute technique

(one-to-one protection) is discussed in Section 5.14.

5.5.2 Principles of Local Protection Recovery Techniques

We use the generic term MPLS TE Fast Reroute or Fast Reroute to describe local

protection techniques. There are two techniques of Fast Reroute (both are local

protection techniques) that are described in this chapter:

. Facility backup (also referred to as bypass)

. One-to-one backup (also referred to as detour)

Although the terminology might appear difficult to understand, the terminology

used in this section is in line with the corresponding standardized documents.

Both methods described are local repair techniques using local protection:

. Local: In the case of a link or node failure, a TE LSP is rerouted by the node

that is immediately upstream to the failed link or node. Compared to the

global default restoration and global path protection where the TE LSP is

rerouted by the head-end LSR, in the case of local protection, the protected

LSP is rerouted at the closest location upstream to the failure. This presents

the very significant advantage of eliminating the need for the FIS to be

received by the head-end LSR to reroute the affected TE LSP along an

alternate path.

R1 R2

R8

R4 R5R3

R7

R6

Fast Reroutable LSP

NNHOP Back-Up LSP

PLRMerge Point

Protected LSP

B2

T1

B1

NHOP Backup LSP

Figure 5.8 Terminology (MPLS local protection).

AU18


5.5 MPLS Traffic Engineering Local Protection 317

. Protection: As seen in the Chapter 1, with protection recovery mechanisms,

a backup resource is preallocated and signaled before the failure. With both

local protection recovery methods (facility backup and one-to-one backup),

the backup LSPs are established before the failure occurs. When a failure

occurs and is detected, every protected TE LSP traversing the failed re-

source (usually referred to as affected TE LSP) is rerouted over a backup

TE LSP without having to compute a backup path ‘‘on the fly.’’

Although both methods are local repair techniques, they significantly differ in

terms of backup LSPs. With facility backup, a single (or a very limited number of)

backup LSP(s) is used to protect all the fast-reroutable TE LSPs from the failure of

a link or node, which is a major benefit of the MPLS label stacking property. By

contrast, the one-to-one backup creates a separate backup LSP for each protected

TE LSP at each hop. More details about their respective scalability are provided in

Section 5.5.8.

To ease the understanding on each local protection technique, the following

approach is followed: First, a quick overview of each local protection method is

provided via an example. Then each method is described in detail in subsequent

subsections.

5.5.3 Local Protection: One-to-One Backup

As depicted in Figure 5.9, with one-to-one backup, at each hop, one backup LSP

(called a Detour LSP) is created for each fast-reroutable TE LSP. So, for instance,

at the node R3, to protect the set of fast-reroutable TE LSPs T1, T2, and T3, the

following set of backup TE LSPs are set up:

. One Detour LSP D1 for the protected TE LSP T1, following the path

R3-R10-R11-R5-R6

R2 R3

R8

R6R4

T1

D2

R9

R12

R7

R1 T3

T2

R5

R10

R11

D3

D1

Figure 5.9 Illustration of the Detour LSP with one-to-one backup.




R3-R8-R5-R9


R3-R10-R11-R12

Note that this only protects the fast-reroutable TE LSPs T1, T2, and T3 against a

failure of the link R3-R4 and the node R4. Similarly, each node along the fast-

reroutable TE LSP paths will perform the same operation.

At each PLR along the fast-reroutable TE LSP path, a local backup tunnel

called Detour LSP that avoids the protected resource and terminates on the tail-end

LSR for the fast-reroutable TE LSP is set up. In the previous example, for the

fast-reroutable TE LSP T1, R3 sets up a Detour LSP D1 originated at R3 and

terminated at R6 that avoids both the link R3-R4 and the node R4.

Figure 5.10 shows the label allocation for both the primary TE LSP T1 and

the Detour TE LSP D1 protecting T1 against a failure of either the link R3-R4 or the

node R4. The respective labels of the protected TE LSP T1 and the Detour LSP

D1 originated on R3 are shown in Figure 5.10. For example, when a failure of the

node R3 occurs, as soon as the PLR R3 detects the failure, the fast-reroutable TE

LSP T1 is locally rerouted by the PLR to follow the Detour LSP, as shown in Figure

5.11. It is worth noting the label swapping change here: Once R3 detects the R4 node

failure, the label 1 is no longer swapped from 1 to 2 and forwarded to the R3-R4

interface but is now swapped from 1 to 10 and is sent to the outgoing interface

R3-R10.

Detour LSP merging: Various merging rules allow for the reduction of the

number of Detour LSPs and are described in Section 5.14.

AU19

R2

R8

R6R4

T1

R9

R12

R7

R1

R5

R10 R11

D1

R3 1 2 3

11

10 12

Figure 5.10 Mode of operation of one-to-one backup.



5.5.4 Local Protection: ‘‘Facility Backup’’

By contrast with one-to-one backup, with facility backup, just one backup tunnel per

NHOP is required to protect against a link failure and one NNHOP backup tunnel is

required to protect against a node failure. Of course, an NNHOP protects against not

only a node failure (the bypassed node) but also the link between the immediately

upstream node and the bypassed node. As discussed later, there are some benefits in

setting up both NHOP and NNHOP backup tunnels. More accurately, a small set of

backup tunnels may be required if bandwidth protection must be guaranteed (see

Section 5.15 for more details on bandwidth protection), but the key point is that the

number of required backup tunnels is not a function of the number of TE LSPs in the

MPLS network, which is a crucial property to preserve scalability.

In Figure 5.12, a single NNHOP backup tunnel (bypass) is configured on R3

(PLR) to protect any fast reroutable TE LSP traversing the node R3 and following

the R3-R4-R5 path against a failure of the link R3-R4 or the node R4 (indeed, the

same NNHOP backup tunnel can be used in both failure scenarios). R5 is the merge

point. Hence, for instance, the two fast-reroutable TE LSPs T1 and T2 are pro-

tected by the NNHOP bypass tunnel B1 that follows the path R3-R10-R11-R5.

Let us now consider a fast-reroutable TE LSP T1 that follows the path R2-R3-

R4-R5-R6. As shown in Figure 5.12, the corresponding labels are distributed in

RSVP Resv messages (R5 distributes the label ‘‘3’’ to R4, R4 distributes the label ‘‘2’’

to R3, R3 distributes the label ‘‘1’’ to R2). In this example, a bypass tunnel B1

starting at the PLR R3 is also set up to protect against a link failure of the link R3-R4

and a node failure of R4. The corresponding labels are depicted in Figure 5.12.

Note: In the case of an NHOP backup tunnel, this is often referred to as MPLS TE

Fast Reroute link protection. When the backup tunnel is an NNHOP backup tunnel,

this is usually called MPLS TE Fast Reroute node protection.

R2

R9

R6R4

T1

R1

R5

R7 R8

D1

R3 1

11

10 12

Figure 5.11 One-to-one backup: Example of the mode of operation when the node R4 fails and theprotected TE LSP T1 is locally rerouted by the PLR R3 onto its Detour LSP D1.



A PLR can have NHOP and NNHOP backup tunnels. Furthermore, a PLR

can have multiple NHOP backup tunnels and multiple NNHOP backup tunnels

between a pair of LSRs to guarantee the bandwidth to the protected LSPs. This is

discussed in detail in Section 5.15.

Let us now consider a node failure and see the mode of operation of facility

backup (Figure 5.13). As shown in Figure 5.13, in the case of a node failure of R4,

as soon as the failure is detected by the PLR (R3), each protected TE LSP following

R2

R3

R8

R6R4

B1 (Bypass)

T1

B1

R9

R12

R7

R1

T2

R5

R10 R11

1 2 3

IP Packet

10

11

Figure 5.12 Facility backup operation.

R2 R3

R8

R6R4

B1 (Bypass)

T1

R9

R12

R7

R1

T2

R5

R10 R11

1

3

IP Packet

10

11

3

3

Figure 5.13 Facility backup: Example of the mode of operation when the node R4 fails and theprotected TE LSP T1 is locally rerouted by the PLR R3 onto the NNHOP backup tunnel B1.



the path R3-R4-R5 will be rerouted onto the bypass tunnel B1. The rerouting

operation consists of swapping the incoming label to the appropriate outgoing

label, pushing an additional label corresponding to the backup tunnel label,

and redirecting the traffic onto the outgoing interface of the backup tunnel. The

‘‘appropriate’’ label is the label expected at the MP for the protected TE LSP.

It is worth elaborating on what the expected label is. So let us consider the two

following situations:

Situation 1: The backup tunnel is an NHOP backup tunnel, in which case, the

MP is also the PLR’s NHOP for the protected LSP before failure occurs. Upon

link failure, the PLR must perform a similar swap (no label change) as before

the failure occurs; then the MP will receive the same label as before the failure

but from a different interface.

This is illustrated in Figure 5.14.

In Figure 5.14, an NHOP backup tunnel B1 is set up from R3 to R4, which

follows the path R3-R8-R4, protecting against a failure of the link R3-R4. The

backup label distributed by R8 to R3 is 10 and a PHP (penultimate hop popping

[PHP]) operation is performed between R8 and R4. Once the link failure is detected

by the PLR (R3 in this example), for all the protected TE LSPs traversing the link

R3-R4, the PLR R3 performs the following operations:

. Label swap of the protected TE LSP using the same label as before the

failure

. Push of the label corresponding to the NHOP backup tunnel

. Redirect the traffic onto the backup tunnel outgoing interface

Figure 5.15 shows the situation after the link R3-R4 has failed and the PLR R3 has

locally rerouted the protected TE LSP T1 onto the NHOP backup tunnel B1.

R2 R3

R8

R6R4

T1

B1 (NHOP)

R9

R12

R7

R1

R5

R10 R11

1 2 3

IP Packet

10

Figure 5.14 Facility backup: Example of the mode of operation when the node R4 fails and theprotected TE LSP T1 is locally rerouted by the PLR R3 onto the NHOP backup tunnel B1.



The PLR R3 performs the following operations to locally reroute the protected

TE LSP T1 onto the NHOP backup tunnel B1: R3 swaps 1 to 2 (as before), pushes

the label 10 and redirects the traffic onto B1’s outgoing interface (R3-R8). R4

(the MP) receives a label-switched packet containing the same label as before the

failure but from a different interface.

Situation 2: With an NNHOP backup tunnel, the MP is now the PLR’s next-

next hop of the protected LSP before the failure. So the PLR must perform a

swap so the MP receives a label switched packet with the expected label (but

from a different interface).

To highlight this mechanism, let us consider the example depicted in Figure

5.16. Remember, at steady state (without any failure) the label swapping operation

performed by R3 for the fast-reroutable TE LSP T1 is 1 to 2. In the case depicted in

Figure 5.16, the MP R5 expects to receive a label 3 (label distributed by R5 to R4

for T1). So when the failure of the link R3-R4 or the node R4 occurs, R3 must swap

1 to 3 (instead of 2 before the failure), push the label 10, and redirect the traffic onto

B1’s outgoing interface (R3-R10). This way, R5 (the MP) receives an identical

packet as before the failure but from a different interface. By default, the PLR

does not have the knowledge of the label used between its NHOP LSR and NNHOP

LSR; it just learns from its direct downstream neighbor the label it must use for the

TE LSP. An extension to an existing RSVP object (RRO object) is used to learn the

label used between the NHOP and the NNHOP LSR (that signaling extension is

described in Section 5.14).

R2 R3

R8

R6R4

T1

B1 (NHOP)

R9

R12

R7

R1

R5

R10 R11

1 3

IP Packet

10 2 2

Figure 5.15 Situation after the failure of the link R3-R4 and the PLR R3 has locally rerouted theprotected TE LSP T1 onto the NHOP backup tunnel B1.

AU20



P Important notes:

Note 1: An identical operation is performed for every protected LSP rerouted

onto the same backup tunnel; indeed, with facility backup, the same backup

LSP is used for all the rerouted TE LSPs that intersect the backup tunnel on

both the PLR and the MP.

This is illustrated in Figure 5.17. This figure shows two primary tunnels T1

and T2 that used to follow the paths R1-R3-R4-R5-R6 and R7-R3-R4-R5-R6

before the failure. The labels in use are 100 (between R1 and R3), 101 (between

R3 and R4), 102 (between R4 and R5) and PHP (between R5 and R6) for T1

and 110 (between R7 and R3), 111 (between R3 and R4), 112 (between R4 and

R5) and PHP between R5 and R6. Because both T1 and T2 intersect at R3 and

R5, the same NNHOP backup tunnel B1 can be used in the case of failure of the

link R3-R4 or node R4. This is of course a very important scaling property of

facility backup that uses MPLS stacking. Note also that the same property

applies to NHOP backup tunnels.

Note 2: In both cases (NHOP and NHOP bypass tunnels), no additional RSVP

states are created along the backup paths for the rerouted TE LSPs. In other

words, the LSRs along the backup path do not ‘‘see’’ the rerouted TE LSPs as

far as the control plane is concerned. This is also a crucial property for the

scalability properties of this solution.

R2 R3

R8

R6R4

B1 (Bypass)

T1

R9

R12

R7

R1

R5

R10 R11

1

3

IP Packet

10

11

3

3

Figure 5.16 Situation after the failure of the link R3-R4 and the PLR R3 has locally rerouted theprotected TE LSP T1 onto the NNHOP backup tunnel B1.



5.5.5 Properties of a Traffic Engineering LSP

When using MPLS TE local protection, there are three properties a TE LSP can

have:

1. Fast Reroute desired

2. Bandwidth protection desired

3. Node protection desired

Fast Reroute desired TE LSP: Fast Reroute is a technology that can be used

for some TE LSPs only (as already stated, such TE LSPs are called fast-

reroutable TE LSPs), so if a backup tunnel has been configured on a PLR,

just the TE LSP signaled as ‘‘fast reroutable’’ will be fast rerouted in the case of

a failure. Typically, this provides fast recovery using local protection to a subset

of TE LSPs having stringent recovery requirements (e.g., the TE LSPs carrying

sensitive traffic like VoIP or ATM-over-MPLS), whereas other TE LSPs carry-

ing less sensitive traffic (e.g., Internet traffic) will be rerouted using TE LSP

reroute. This obviously requires the ability to explicitly signal this fast-rerou-

table property of a TE LSP. The details of the signaling aspects are covered in

Section 5.15.

Bandwidth protection desired: The notion of bandwidth protection is extensively

covered in Section 5.15, but here is a high-level description of this important

notion. The previous section described the mode of operation of Fast Reroute

for both the facility backup and the one-to-one backup method. When a TE

LSP is signaled, one of the TE LSP attributes of the TE LSP is the bandwidth.

A TE LSP is said to be bandwidth protected at a node R only if it can be fast

rerouted and the selected backup tunnel offers an equivalent bandwidth as the

primary TE LSP used to receive along the primary path (before the failure). In

other words, the TE LSP does not suffer any QoS degradation along the

alternate path. Note that the QoS may be a function not just of the bandwidth

R2 R3

R8

R6R4

B1 (Bypass)

T1

R9

R12

R7

R1

R5

R10 R11

110

102

IP Packet

102

11 102

100

10 112

11 112

112

T2

10

Figure 5.17 Illustration of the use of the MPLS stacking property by facility backup: Several protectedTE LSPs are rerouted onto a single NNHOP backup tunnel B1 upon R4 node failure.

AU21



but also of the propagation delay or jitter. Section 5.15 details how backup

paths can be computed to provide such guarantees. When signaled, a protected

TE LSP can explicitly request bandwidth protection.

Node protection desired: In some cases, also further discussed in Section 5.15, it

might not be possible for a PLR to find both an NHOP and an NNHOP

backup tunnel offering full bandwidth protection. For example, let us consider

the simple case of three routers R1, R2, and R3 connected in a row, and the

R1-R2 link bandwidth is 20 Mbps and the R2-R3 link is 10 Mbps. Then

the PLR may try to find an NHOP backup tunnel with 20 Mbps worth of

bandwidth and an NNHOP backup tunnel with min(20,10) ¼ 10 Mbps worth

of bandwidth. Suppose that no such NNHOP backup tunnel can be found but

just an NNHOP backup tunnel of 5 Mbps. Then as new TE LSPs requesting for

bandwidth protection are signaled, it may happen that no NNHOP backup

tunnel offering bandwidth protection can be found. In this case, having an

additional signaled parameter explicitly requesting node protection is desirable

and can be used as a tie break. So if the PLR has two requests for bandwidth

protection and cannot select an NNHOP backup tunnel for both of them

because of insufficient bandwidth on the NNHOP backup tunnel, it can pref-

erably select the NNHOP backup tunnel for the TE LSP having expressed

a desire to get node protection in addition to bandwidth protection. Such a

parameter has been standardized in [FAST-REROUTE] and is described in

Section 5.15.

Notion of Class of Recovery

The various TE LSP recovery requirements mentioned earlier allow an operator

to define multiple CoRs and assign a different CoR to each TE LSP according to

its recovery requirements. For instance, very sensitive traffic like voice-over-IP/

MPLS or ATM-over-MPLS could be routed over protected TE LSPs with

bandwidth and node protection. In the case of a link or node failure, those TE

LSPs would be very quickly rerouted, while maintaining an equivalent QoS. On the

other hand, MPLS VPNs traffic could be routed onto protected TE LSPs without

bandwidth protection. Finally the less sensitive traffic could be routed over non-

protected TE LSPs.

Defining multiple classes of recovery provides the two following benefits:

. The set of rerouting operations can be prioritized. Indeed, every LSR

will preferably start to recover the TE LSPs that belong to the highest

CoR.

. When bandwidth protection is required, this implies reserving some backup

capacity in the network. With multiple CoRs, the amount of backup

capacity is limited to the set of TE LSPs that belong to the CoR for

which bandwidth protection is required. This allows to significantly opti-

mize the required backup capacity.



5.5.6 Notification of Tunnel Locally Repaired

As described earlier, upon detection of a link/node failure, the PLR immediately

starts rerouting the set of protected TE LSPs over their respective backup tunnels

(bypass tunnels or Detour LSPs). This may result in following a suboptimal end-to-

end path. Consequently, in addition to performing the local reroute, the PLR sends

a specific RSVP Path Error message for each rerouted TE LSP to their respective

head-end LSR to indicate that a local reroute has occurred. This type of RSVP Path

Error is sometimes qualified as nondisruptive because no RSVP states are cleared; it

serves as a pure indication to the head-end LSR. The receipt of such of message will

then trigger a reoptimization on the head-end LSR for the affected TE LSP. Indeed,

as previously mentioned MPLS TE Fast Reroute is a temporary network recovery

mechanism; the protected TE LSPs are quickly and locally rerouted onto backup

tunnels using a local protection technique, but the path followed by the rerouted

flows might no longer be optimal. This is illustrated in Figure 5.18.

In Figure 5.18, a protected TE LSP, T1, following the path R0-R1-R2-R8 is set

up. At router R1 (PLR), T1 is protected by an NHOP backup tunnel B1 against

a failure of the link R1-R2 (B1 follows the path R1-R3-R4-R5-R2). When the link

R1-R2 fails, upon detecting the link failure, the PLR (R1) reroutes the LSP T1

onto B1 and sends a Path Error ‘‘tunnel locally repaired’’ to T1’s head-end

LSR (R0). As you can see in Figure 5.18, the path followed by T1 is not optimal

(R0-R1-R3-R4-R5-R2-R8). The receipt of the Path Error triggers a reoptimization

on R0, which in turn reroutes the TE LSP T1 along the path R0-R3-R4-R5-R2-R8,

which is more optimal than the path followed by the rerouted flows during failure

(R0-R1-R3-R4-R5-R2-R8). In this example, we assume that all the links have the

R5

R1

R7R6 R8

R2

R0

R3LSP1

B1 (Backup)

LSP1 Path Once Rerouted

R4

Figure 5.18 Notification of local repair followed by head-end reoptimization.



same metric. Of course, the TE LSP reoptimization should always be performed

using the ‘‘make before break’’ procedure, avoiding any traffic disruption.

The head-end will also be informed of the link failure via the receipt of an IGP

update from one of the routers adjacent to the failed link. Either upon the receipt of

an RSVP Path Error notify message ‘‘tunnel locally repaired’’ or an IGP update,

the head-end triggers a TE LSP reoptimization.

Case of a Multiarea (OSPF) or Multilevel (IS-IS) Network

In the case of a multiarea (OSPF), multilevel (IS-IS), or multiautonomous systems

network, if the failure does not occur in the head-end LSR area/level, no IGP

notification will be received by the head-end LSR. This means that the head-end

LSR exclusively relies on the receipt of the RSVP Path Error message to be

informed that a local repair has been performed on a downstream node. Consider

the network depicted in Figure 5.19.

In Figure 5.19, a fast-reroutable interarea TE LSP (T1) is routed from R0 to R4

and spans multiple areas. On R2, a NHOP backup tunnel that follows the path

R2-R5-R6-R7-R3 protects any fast-reroutable TE LSPs traversing the link R2-R3

from a failure. When the link R2-R3 fails, the TE LSP T1 is rerouted onto the

backup tunnel B1, but in this case the head-end LSR R0 does not receive any IGP

update. Indeed, the failure occurred in the backbone area, so R0 does not have any

visibility of the backbone area topology. A failure in the backbone area is invisible

to R0 (R2 might send a new summary LSA if some addresses are no longer

reachable, but generally the address aggregation scheme will be such that no

summary LSA will be flooded into the area 0.0.0.1). Because the RSVP Path

Error notify message is the only mechanism allowing the head-end LSR to be

informed of a local repair that occurred on a downstream node that does not reside

in the head-end area, a best common practice consists of sending the RSVP Path

Error message in reliable mode.

AU22

R7

R2

R8

R4R3

R1

T1

B1

T1 Path Once Rerouted

R0

Area 0.0.0.0

Area 0.0.0.2

Area 0.0.0.1

R6

R5

Figure 5.19 Notification of local repair followed by head-end reoptimization in a multiarea routingdomain.



5.5.7 Signaling Extensions for MPLS Traffic Engineering LocalProtection

By contrast with MPLS global default protection and MPLS TE global protection,

which do not require any signaling protocol extensions beyond those of RSVP TE

defined in [RSVP-TE] for the signaling of MPLS TE LSP, MPLS TE local protec-

tion (Fast Reroute) requires several signaling extensions. Although they are

undoubtedly important, their detailed understanding is not a prerequisite to grasp

how local protection works. Consequently, the signaling aspects of Fast Reroute

are covered in detail in Section 5.14.

5.5.8 Two Strategies for Deploying MPLS Traffic Engineering for FastRecovery

As mentioned in Section 5.1, there might be several motivations for deploying

MPLS TE:

. Bandwidth optimization: So that the network resources are used in a more

efficient way. This also helps in providing better QoS.

. Providing strict QoS guaranties to some specific traffic flows.

. Fast recovery.

In some networks, there might be an interest in MPLS TE for its fast recovery

property only. In other words, bandwidth optimization and/or strict QoS guaran-

tees are not required, but the operator would like to benefit from the fast recovery

property of Fast Reroute without tuning its IGP parameters as described in

Chapter 4. This section proposes two strategies for deploying MPLS TE when the

only objective is to get fast recovery by using Fast Reroute.

For instance, consider an underutilized (or overprovisioned) network. Such a

network does not require any bandwidth optimization because it is not congested.

Also, depending on the network load, QoS guarantees could rely on the simple

assumption that no link is congested and the link loads are very low. In such a

situation, MPLS TE is not required, and paths computed by the routing protocol

are perfectly satisfactory. However, such a network may require fast recovery of

link or node failures, making Fast Reroute a good candidate. Because Fast Reroute

requires TE LSPs, the solution includes deploying TE LSPs but in a quite specific

way, which we describe in this section.

There are two strategies for deploying MPLS TE when the sole objective of the

operator is to use Fast Reroute:

1. With a full mesh of unconstrained TE LSPs

2. With one-hop unconstrained TE LSPs

Network Design with a Full Mesh of Unconstrained TE LSPs

A simple and efficient strategy is to deploy a full mesh of unconstrained TE

LSPs. An unconstrained TE LSP is an LSP without any constraint. For instance,



the required bandwidth is 0, and no affinities are defined. The only property of

such a TE LSP is to be fast reroutable. Indeed, the objective is not to use the

traffic engineering property of MPLS TE (in the sense of ‘‘traffic engineer’’ the

flows across the network). So the available bandwidth and other TE link–related

information are still flooded by the IGP TE extensions but will never change.

When a head-end LSR computes a path for an unconstrained TE LSP, the

same CSPF algorithm is used as with any other TE LSP, but the obvious

outcome is that the TE LSP will follow the IGP shortest path. In other words,

the traffic routed onto unconstrained TE LSPs will follow the same paths as

IP routed traffic, but in the case of link and/or node failures, fast-reroutable

TE LSPs will be rerouted by MPLS TE Fast Reroute, which was the initial

objective.

Network Design with Unconstrained One-Hop TE LSPs

If the requirement is to use Fast Reroute for link protection only, then exactly one

primary unconstrained TE LSP plus one single NHOP backup tunnel are required

for every link to protect.

The idea is to set up a one-hop tunnel following the same path as the link to

protect. One way of achieving this is to set up an unconstrained TE LSP. This way

the CSPF algorithm will just follow the most direct path between the head-end LSR

and the tail-end LSR (the next hop of the head-end LSR in this case). Note that in

this case the PLR node is also the head-end LSR. Then the one hop primary TE

LSP must be configured so that all the traffic follows the TE LSP.

It is important to note that because the TE LSP is a one-hop LSP, if PHP is

used, no label is added once the traffic is routed over the primary TE LSP. Such a

strategy is depicted in Figure 5.20.

In the example shown in Figure 5.20, the objective is to protect the link R2-R3.

So a single-hop tunnel (T1) is configured from R2 to R3 and all the traffic is routed

onto this one-hop primary TE LSP through this link. T1 has no constraint, so this

TE LSP follows the path R2-R3. An NHOP backup tunnel B1 is configured

between R2-R3 with the constraint of being diversely routed from the protected

link and follows the path R2-R8-R3. As discussed in Section 5.15, additional

constraints may be added to also provide bandwidth protection. In the case of

failure of the link R2-R3, the PLR (R2) will trigger Fast Reroute and all the traffic

that used to be routed over the link R2-R3 will be rerouted over B1, following the

path R2-R8-R3. Then the primary TE LSP T1 will be rerouted (reoptimized) and

will follow the new shortest path between R2 and R3. Finally, the routing protocol

will be informed of the link failure and will recompute a new path, which may or

not follow B1’s path.

The same configuration has to be repeated for each link to protect using Fast

Reroute. Note that if the link R2-R3 is protected using a SONET/SDH protected

VCs, Fast Reroute may also be used to protect against a router interface failure on

the R2 or R3 side. In that case, one must ensure that both mechanisms are not

AU23



simultaneously triggered. This aspect is covered in Chapter 6. Existing implemen-

tations support mechanisms to automate the creation of both the primary and the

backup TE LSP, because in this case their set of attributes is known in advance to

alleviate the configuration burden. The only constraint of the backup tunnel is to be

diversely routed from the link to protect (some implementations support the com-

putation of SRLG-diverse paths).

Protection against link and node failures: To guard against both link and node

failures, a similar approach is followed, with the only difference that at each

hop, both one unconstrained TE LSP and one NNHOP backup tunnel per

next-next hop must be configured.

Why are one primary and one NNHOP backup tunnel required per NNHOP?

Let us consider the example in Figure 5.21. As shown in Figure 5.21, in the case

of a node failure of R3, all the traffic traversing the protected LSR needs to be

rerouted onto some appropriate backup tunnels. That requires setting up one

primary TE LSP for each possible traffic path traversing the protected node. In

Figure 5.21, there are three paths leaving R2 that traverse the node R3 to consider:

R2-R3-R4, R2-R3-R7 and R2-R3-R8. So three unconstrained TE LSPs are config-

ured and set up on R2: T1, T2, and T3. Because each of these tunnels needs to be

rerouted over a diversely routed backup tunnel, three NNHOP backup tunnels are

configured: B1 protecting the traffic following the path R2-R3-R4 and routed onto

the tunnel T1, B2 protecting T2, and finally B3 protecting T3. As in the case of link

protection, the protected TE LSPs are unconstrained and follow the shortest IGP

path.

This explains the requirement for one unconstrained TE LSP and one backup

tunnel per NNHOP. In the previous example, the number of NNHOPs of R2 is

equal to 3: R7, R3, and R8.

R1 R2

R8

R4 R5R3

R7R6

B1

T1 (UnconstrainedTE LSP)

Figure 5.20 Deploying MPLS TE Fast Reroute with one-hop tunnel to protect against link failure.



P Important note: Conversely to the previous case of link protection, the traffic must

start flowing onto the primary 2-hops tunnels only when the failure occurs.

Comparison of Both Approaches

Both the ‘‘unconstrained full mesh TE LSPs’’ and the ‘‘unconstrained approach’’

can be used and have their respective pros and cons. Indeed, the unconstrained

approach clearly has the advantage to require the configuration and set up of a very

limited number of TE LSPs. If just link protection is required, for every link to

protect with Fast Reroute, just two TE LSPs are required: the primary one-hop TE

LSP and an NHOP backup tunnel diversely routed from the link to protect. If node

protection is required, one pair of TE LSPs (primary and backup) is needed for

every next-next hop, as described earlier, which is still a very reasonable number.

Note that at the time of publication, commercial implementations support only the

1-hop unconstrained approach. Moreover, some implementations ease the config-

uration process with the use of very few commands to automate the configuration

of such primary and backup TE LSPs.

On the other hand, the unconstrained full mesh TE LSPs approach also offers a

very easy migration path to the use of MPLS TE for other purposes like bandwidth

optimization and strict QoS guarantees. Indeed, if at some point, one of those

requirements appears, the operator will just need to set constraint(s) on the TE

LSPs. For instance, bandwidth can be configured and then the TE LSPs will start

using alternate path(s), if required.

In terms of existing implementations, some solutions are available that auto-

mate the configuration process when setting up a full mesh of TE LSPs. In a

nutshell, those solutions rely on several components:

. A discovery process is in charge of discovering the members of a mesh.

In some MPLS TE networks, there might be multiple TE LSP meshes: one

mesh of TE LSPs between LSRs acting as VoIP gateways, for instance, and

R1 R2

R8

R4 R5R3

R7R6

B2B1

T2

T1

T3B3

Figure 5.21 Deploying MPLS TE Fast Reroute with two-hop tunnel to protect against both link andnode failures.

AU24



another full mesh of TE LSPs between routers carrying the Internet traffic.

Each TE mesh has its own set of characteristics in terms of bandwidth,

priority, and protection/restoration, to mention just a few requirements.

Then each router uses an IGP extension (OSPF or IS-IS) to advertise that it

is a member of one or multiple TE meshes. This mechanism allows every

router to discover all the other routers that belong to the same TE mesh.

. Then, once a router has discovered all the routers that belong to the same

mesh, it can use a ‘‘template’’ (where the constraints specific to that par-

ticular mesh are locally specified) to set up the mesh of TE LSPs. Note that

in this particular context of using MPLS TE for fast recovery only, the

template is very restricted because the primary TE LSPs are unconstrained.

In terms of IGP, both methods are equivalent. The TE-related information is

flooded by the IGP but will never changed because the TE LSPs are unconstrained

and never reserve bandwidth.

5.6 AnotherMPLS Traffic Engineering Recovery Alternative

Another MPLS TE recovery alternative has been proposed but never got any

traction in the industry because of severe limitations: 1þ1 packet protection

whose principle is to permanently bridge the IP/MPLS traffic over two diversely

routed TE LSPs. The traffic bridging is made on the head-end LSR, and the

decision to switch the traffic is performed by the tail-end LSR, which permanently

compares the two identical received flows from the primary and secondary TE

LSPs. When a failure occurs in the network, the traffic received from one of the

TE LSPs is affected. Once the tail-end LSR detects the failure, it switches to the

secondary TE LSP. Note that such a mechanism is also called a single-ended

protocol because the switching decision process is made by a single entity (the tail-

end LSR in this case) without requiring any signaling exchange between the nodes.

A failure may be a traffic interruption, an unacceptable error rate, or any other kind

of defects. Once the tail-end LSR has performed the switch, it can either decide to

stay indefinitely on this TE LSP and start using the other TE LSP (once restored) in

the case of failure of the currently selected TE LSP or decide to switch back to the

original TE LSP, once restored.

Although this mechanism is simple and efficient in terms of recovery time, it has

two major drawbacks that drastically limit its applicability:

. The amount of traffic forwarded in the network is doubled for each TE LSP

protected with this 1þ1 mechanism. This is a serious issue because it

basically implies at least66 a bandwidth wastage of 50%.

66This technique implies at least a bandwidth wastage of 50% because one of the constraints of the

backup TE LSP is to be disjoint from the protected TE LSP, which usually means that it will follow a

longer path.


5.6 Another MPLS Traffic Engineering Recovery Alternative 333

. The failure discovery at the tail-end LSR usually requires some hardware

changes and thus equipment replacement, which can also be expensive.

For those reasons, such a mechanism has never been implemented or deployed but

is just mentioned here for the sake of completeness in describing MPLS TE recovery

techniques.

5.7 Load Balancing

Load balancing is a technique to forward the traffic from a source to a destination

across multiple paths. With equal load balancing, the traffic is balanced across

multiple equal-cost paths. IGP, like OSPF or IS-IS, performs equal load balancing.

This can be done on a per-packet basis (packets are sent along N equal-cost paths

using a round-robin algorithm) or via some more sophisticated techniques avoiding

packet reordering described in Chapter 4. With MPLS TE, both equal and unequal

load balancing are supported. For instance, if there are two TE LSPs, T1 and T2

between two LSRs, LSR1 and LSR2, with respective bandwidth Bw1 and Bw2,

then LSR1 can decide to balance the traffic whose destination is LSR2 (or beyond)

in proportion to the respective bandwidths Bw1 and Bw2. Usually load balancing in

MPLS TE–enabled networks is used when a single path obeying the set of con-

straints cannot be found between two LSRs. For instance, a TE LSP of B Mbps is

required and no path with the required amount of bandwidth is available. Then the

solution is to set up N LSPs so the sum of their bandwidth is equal to B. Another

constraint can be added when the path computation of the set of N LSPs is

performed like path diversity (the set of network elements traversed by the TE

LSPs are disjoint).

Strictly speaking, load balancing is not an MPLS TE recovery technique, so why

dedicate a section to it?

Because a positive side effect of load balancing is that when the flow between two

points is balanced across multiple paths, the probability of simultaneous failures of

all those paths is reduced compared to a single path, especially if those paths are

explicitly diversely routed. Hence, the overall availability is increased. This property

has been used by some operators to reduce the impact of network failure on specific

flows.

Let us illustrate that statement through the example in Figure 5.22. In this case,

strictly speaking, the network availability is not increased but the impact of a

network element failure on the traffic flows between two points is reduced.

In Figure 5.22, the two POPs of Sevilla and Barcelona are made of two VoIP

gateways and one LSR connected to the core of the network. The VoIP traffic is

carried onto TE LSPs. In this case, even if all the traffic between LSR1 and LSR2

could be carried onto a single TE LSP, two diversely routed TE LSPs are estab-

lished between LSR1 and LSR2 (with the same bandwidth or different bandwidths)

and the traffic is balanced onto those two TE LSPs. In the case of a network failure,



just a proportion of the traffic between LSR1 and LSR2 is affected (the traffic

carried onto T1 in the previous example).

That said, we must admit that such a design choice has the two following

drawbacks:

. The number of states in the network is nonnegligibly increased: indeed, at

least two TE LSPs are required between two LSRs.

. The constraint of computing diverse paths may result in computing non-

optimal paths compared to a single TE LSP.

But on the other hand the impact of a single element failure on the voice traffic

between the two POPs is reduced.

Note: One can, for example, increase the capacity of each TE LSP to be able to

absorb the excess traffic resulting from the failure of one TE LSP. For instance, if N

TE LSPs of B Mbps are set up between two routers R1 and R2 (let us call it a

bundle of N TE LSPs), by allocating B * N/(N-1) Mbps to each of them instead of B

Mbps; this allows the survival from the failure of one of them. In this case, the

backup capacity reserved in the network is strictly equal to the capacity of one TE

LSP in the bundle.

VoIPGateways

Point ofPresence (POP)of Sevilla

Point ofPresence (POP)of Barcelona

T1

T2

LSR1

LSR2

VoIP VoIP

VoIP VoIP

Figure 5.22 An example of MPLS TE load balancing.



5.8 Comparison of Global and Local Protection

As previously described in Chapter 1, the evaluation of a recovery mechanism

requires the consideration of several parameters: scope of recovery (link, node,

SRLG), recovery time, guaranteed bandwidth, backup capacity requirements, state

overhead, scalability, reordering, additive latency and jitter, signaling requirements,

stability, and others. Throughout this chapter, we saw several MPLS TE recovery

techniques, so the natural question that comes to mind is, which one to use. Although

there is no unique answer because each network has its own constraint and require-

ments, the aim of this section is to provide a comparison of the global protection and

local protection techniques with a particular focus on three key performance aspects:

. The recovery time

. The state overhead, which is directly correlated to the scalability

. The ability to perform bandwidth sharing when bandwidth protection is

required

5.8.1 Recovery Time

With global protection, rerouting is performed by the head-end LSR, which means

that this requires for the head-endLSR to receive the failure notification to reroute the

affected traffic onto their respective backup paths (whose paths have been precom-

puted and signaled). So in terms of recovery time, the delta between global and local

protection is the failure indication signal propagation time to the head-endLSR.How

large this delta is highly depends on the network characteristics. Thus, for instance,

a network confined to a small country generally implies short propagation delays

(less than 10 ms); on the other hand, an international network may easily experience

much longer propagation delays . . . up to a few hundreds of milliseconds. In that

case, convergence of a few tens of milliseconds requires the use of local protection

techniques. Furthermore, queuing delays to process the control plane notification

(RSVP and/or IGP) messages can be reduced via the use of QoS mechanisms.

Note that in terms of recovery time, the two local protection schemes described

earlier (i.e., ‘‘one-to-one’’ and ‘‘facility backup’’) are equivalent; they both rely on

local protection where fast-reroutable TE LSPs are locally rerouted on presignaled

backup tunnels and then reoptimized by their respective head-end LSR.

In summary, as far as the recovery time is concerned, the key difference between

local and global protection is in the failure propagation notification time to the head-

end LSR which, in the case of global protection is made of incompressible propaga-

tion delays and queuing delays that can be reduced by means of QoS mechanisms.

5.8.2 Scalability

Scalability is undoubtedly one of the major aspects to consider when evaluating a

recovery mechanism, and to that respect, global path protection, one-to-one

backup, and facility backup local protection differ very significantly.



Scalability is a relatively generic term that requires clarification in this context.

Protection mechanisms require setting up backup tunnels before any failure to

provide fast convergence (by contrast with global default restoration, the backup

path is already computed and signaled). The configuration of backup tunnels can be

facilitated via an automatic process, but setting up backup tunnels in a network is

not entirely cost free. Although the scalability of RSVP is very high, in large

networks, the number of backup tunnels can be significant as shown below,

which requires to potentially handle a large number of states on routers. Moreover,

the troubleshooting task is even more complicated for the team in charge of

operating the network. So scalability is considered in terms of number of required

backup tunnels in this context.

Let us evaluate the number of required backup tunnels with global path

protection, Fast Reroute facility backup, and one-to-one, based on the following

assumptions:

D: network diameter (average number of hops between a head-end LSR and a

tail-end LSR)

C: degree of connectivity (average number of neighbors)

L: total number of links to be protected with Fast Reroute67

N: total number of nodes (LSRs)

T: total number of protected TE LSPs in the MPLS network

Bu: number of backup tunnels required

K: number of class of recovery (as mentioned in Section 5.5.5, there might be

several classes of TE LSPs, each requiring different CoR. In this case, each CoR

has a dedicated set of backup tunnels)

S: average number of splits (as discussed in Section 5.15, in some cases where

bandwidth protection is required and backup bandwidth is a very scarce

resource, more than one backup tunnel per protected link/node may be

required if a single backup tunnel with enough bandwidth cannot be found)

Note: Realistic assumptions for S and K are as follows:

. S < 4: Generally S will very rarely exceed 3. In a network where bandwidth

protection is required but backup capacity is not a very scarce resource S ¼ 1.

If bandwidth protection is not required, then S ¼ 1.

. Also K < 3.

M: number of meshes in the network (e.g., there may be multiple meshes of TE

LSPs in a network serving different purposes: one mesh for the voice traffic and

one mesh for the data traffic).

It follows that

! L < N *C (because some links may not be protected by Fast Reroute)

! T ¼ M *N * (N�1) (assuming a full mesh TE deployment)

67Some links may be protected via other means like SONET/SDH and optical protection/restoration.


5.8 Comparison of Global and Local Protection 337

Let us now compute the total number of required backup tunnels Bu with global

path protection, Fast Reroute one-to-one, and facility backup.

1. Computation of Bu with global path protection

The number of backup tunnels is equal to the number of primary TE LSPs:

Bu ¼M * T ¼M * N * (N� 1)

One has to keep in mind that the number of backup TE LSP grows propor-

tionally with the number of primary TE LSPs and as the square of the

number of LSRs in a full mesh scenario. This can have a nonnegligible

impact on the overall network scalability. Consider a full mesh of

200 LSRs: The total number of primary TE LSPs in the network will be

199 * 200 ¼ 39,800. Using path protection in this context doubles

the number of TE LSPs, which gives a total number of TE LSPs equal to

79,800.

This basically has nonnegligible consequences on state overhead: Every head-end

LSR will see its number of TE LSPs to manage doubled. However, one must admit

that this is not a major concern because the total number of TE LSPs on every

head-end LSR is generally limited (equal to the number of LSRs in every mesh to

which the head-end LSR belongs). On the other hand, especially in networks

sparsely interconnected from a layer 1/layer 2 perspective, the total number of TE

LSPs per midpoint LSR can be substantially large.

Consider the example of the network depicted in Figure 5.23. This simple

network is made of two levels of hierarchy:

. A high-speed core backbone with high-capacity LSRs interconnected by

high-speed links (OC48, OC192)

. An edge layer with a large number of smaller LSRs connected (or dual

connected to the high speed core) via medium speed links

The edge LSRs are fully meshed with each other (for the sake of readability,

just the TE LSPs from R1, R2, and R3, to R4 are represented). Observe the number

of TE LSPs traversing the high-speed core LSRs. This example shows that

the number of TE LSPs per midpoint LSR can be quite high in such a network

and the proportion of TE LSPs passing through those high-speed nodes can

be substantial in comparison to the total amount of TE LSPs in the network. In

typical existing networks, this can be as high as 20% to 30%, at steady state. In the

case of failure of a high-speed core link or LSR, this number would be even more

increased.

The scalability impact can be characterized through various aspects:

. Memory impact on the midpoint LSR: Each TE LSP requires some memory

to handle the RSVP states.

. States refresh: RSVP is a soft-state protocol. This requires for each TE LSP

to refresh the RSVP states, exchanging RSVP Path and Resv messages at



regular intervals between neighbors. Note that the impact of TE LSP

refresh can be drastically reduced, using two methods:

Refresh reduction: this mechanism described in [REFRESH-REDUC-

TION] consists of using specific messages (SREFRESH) so that an LSR

sends a unique message to its neighbor to refresh a large set of TE LSPs,

instead of sending an individual RSVP Path message per TE LSP.

Refresh interval: Moreover, the RSVP refresh frequency can be decreased;

in this case, other liveness mechanisms like RSVP hellos (see Section 5.10)

can be used.

. Recovery time on the midpoint LSR: When local recovery mechanisms are

used on the midpoint LSRs, the number of TE LSPs to reroute may have an

impact on the recovery time.

2. Computation of Bu with local protection: facility backup

Situation 1: If just links are protected with Fast Reroute, then Bu¼ L *K *S

Situation 2: If both links and nodes are protected with Fast Reroute then:

Bu ¼ L * K * SþN * C * (C� 1) * K * S

Bu ¼ (LþN * C * (C� 1) ) * K * S

3. Computation of Bu with local protection: one-to-one backup

Without merging, Bu� T * D ¼M *N * (N� 1) * D

EdgeLSR

High SpeedCore LSR

R3

R2

R1

R5

R4

High Number of Mid-PointTE LSPs on Core LSRs

Primary TELSPBackup TE

LSP

Figure 5.23 State overhead with MPLS traffic engineering path protection.



Because we have now computed the theoretical formulas, let us make a few

(realistic) assumptions that will help figure out the scalability impact.

We consider a fully meshed network with the following characteristics:

. D (diameter) ¼ 5

. C (degree of connectivity) ¼ 4

. M (number of meshes) ¼ 2 (one mesh for voice and one mesh for data

traffic)

. K ¼ 2 (two classes of recovery: one for voice with bandwidth protection

and one for data without bandwidth protection)

. S ¼ 2 (on the average, two backup tunnels are necessary to get the required

backup bandwidth between a PLR and a MP)

. All links must be protected by Fast Reroute: L ¼ N * C

Let us now compare Bu for global path protection, Fast Reroute one-to-one, and

facility backup, using the previous formulas:

Global path protection:Bu ¼M * N * (N� 1) ¼ 2 * N * (N� 1)

Local protection–facility backup (node protection):

Bu ¼ (N * CþN * C * (C� 1) ) * K * S ¼ N * C2* K * S ¼ 64 * N

Local protection–one-to-one backup:

Bu ¼M * N * (N� 1) * D ¼ 10 * N * (N� 1)

Figure 5.24 shows the value of Bu for the three MPLS recovery methods as a

function of the number of nodes in the network (from 10 to 50 nodes and from 10 to

150 nodes).

Figure 5.24 clearly shows that both global path protection and Fast Reroute

one-to-one backup scale poorly in large environments. The number of backup

tunnel per midpoint LSR can rapidly cause some scalability issues. Indeed, in a

full mesh network of very reasonable size (50 nodes), with the assumption made

above, the total number of primary TE LSPs is 4900 and the number of backup

tunnels with each MPLS TE recovery techniques is as follows:

. With global path protection: 4900

. With local protection facility backup: 3200

. With local protection one-to-one backup: 24,500

Although merging of Detour LSPs can help reduce the number of backup

tunnels, their number stays very high in large networks.

5.8.3 Bandwidth Sharing Capability

The last criteria we want to evaluate in this comparison is the ability to perform

bandwidth sharing with both global and local protection. To be cost-effective, the



backup capacity (bandwidth reserved for backup tunnels) should of course be

minimized. Section 5.15 will show that this goal can be efficiently met thanks to

the interesting property of bandwidth sharing under the single failure assumption.

Trying to conclude on the respective efficiency of global and local protection as far

as bandwidth sharing is concerned is almost impossible because their relative

performance is highly driven by the algorithms in place and even more importantly

by the network topology. So the objective of this section is to provide some general

facts about each of them with respect to the bandwidth sharing capability.

Global Path Protection

Performing bandwidth sharing between backup path protecting independent

resources is of course possible. By contrast with local protection, a path completely

diverse/disjoint from the primary TE LSP must be found as opposed to protecting a

single local resource; on the other hand, the level of granularity is higher (protect a

TE LSP instead of a link or a node) and the backup capacity can be spread through

the entire network.

One of the major constraints with global path protection is that it requires an

off-line computation for both the primary and the backup TE LSP when the

objective is to achieve optimal bandwidth sharing.

As already pointed out, with MPLS TE, the TE LSP path computation can be

performed either by an off-line tool or in a distributed fashion. If TE LSP primary

0

50000

100000

150000

200000

250000

10 25 40 55 70 85 100 115 130 145

Number of Nodes

Nu

mb

er o

f B

acku

p T

un

nel

s B

u

Bu (Global Protection)

Bu (Facility Backup)

Bu (One to OneBackup)

Number of Backup Tunnels

0

5000

10000

15000

20000

25000

30000

10 15 20 25 30 35 40 45 50

Number of Nodes

Nu

mb

er o

f B

acku

pT

un

nel

s

Bu (Global Protection)

Bu (Facility Backup)

Bu (One to OneBackup)

Assumptions

Diameter = 5

Degree of Connectivity = 4

Number of Meshes = 2

Number of Splits = 2

Number of Class of

Recovery = 2

Figure 5.24 Comparison of the set of required backup tunnel with global protection, local protection‘‘facility backup and one to one.’’



path computation is done off-line, the tool can find a TE LSP placement satisfying

the set of constraints while trying to compute their respective backup path whose

placement maximize the degree of bandwidth sharing. Although this problem is

clearly NP complete, sophisticated algorithms have been proposed along with a

large set of heuristics to achieve that goal. On the other hand, if primary TE LSP

path computation is done in a distributed fashion, trying to achieve bandwidth

sharing between backup paths protecting independent resources would require

some synchronization between every head-end LSR, which is by default not the

case with a distributed computation. This would require very extensive signaling

extensions and overhead of the control plane (signaling and routing), which makes

this option virtually impossible and certainly not desirable.

The bottom line is that if one decides to use global path protection with

bandwidth guarantee and requires minimizing the backup capacity via bandwidth

sharing, the only possible option is to perform the path computation of both the

primary and the backup path by using an external off-line tool. Also, as the set of

constraints is quite strict (compute primary and backup path simultaneously), this

sort of solution is generally not very flexible. Indeed, a change in bandwidth

requirement for a specific subset of TE LSPs may end up in a relatively important

set of changes on other primary and backup TE LSPs. Algorithms may try to

minimize the set of changes (this is known as the minimal perturbation problem, but

this is not always possible). For instance, suppose a set of 5000 TE LSPs with their

corresponding backup tunnels, so a total of 10,000 TE LSPs. All the TE LSPs are

up and running in the network. After some time, the operator requests for more

bandwidth for a few TE LSPs (because of traffic growth in a specific region of the

network). The bandwidth increase of those few TE LSPs may require displacing a

significant number of other TE LSPs, especially if the bandwidth is scarce. More-

over, the other constraint added by path protection is that diverse paths must be

found while achieving optimal bandwidth sharing; those additional constraints

certainly amplify the phenomena and the risk to end up with significant changes,

which represents a nonnegligible constraint in terms of network operations.

Local Protection: ‘‘Facility Backup’’ and ‘‘One-to-One Backup’’

As shown in Section 5.15, bandwidth sharing is perfectly achievable with local

protection facility backup using either a centralized or a distributed backup path

computation model. Also, the performance in terms of bandwidth sharing depends

on the path computation algorithm efficiency. To give some rough estimates, the

numbers obtained on several large networks using some off-line backup tunnel path

computation tools showed a degree of bandwidth sharing up to 5; in other words,

the sum of bandwidth of the backup tunnels on the links was on average five times

more than the actual backup capacity, thanks to the single failure assumption,

which allows a high degree of bandwidth sharing. This means that the use of the

independent CSPF-based model described in Section 5.15 would have required five

times more backup bandwidth on each link. This degree of efficiency is of course a

function of various aspects; the network topology (degree of connectivity, number



of SRLGs, elements to protect, and their protected bandwidth to mention a few of

them), backup bandwidth capacity, and in particular the efficiency of the backup

tunnel path computation algorithm.

By contrast, performing bandwidth sharing is very difficult with one-to-one

backup. Indeed, an individual backup tunnel is set up for each individual protected

TE LSP. Bandwidth can be shared via merging but not between backup tunnels

protecting independent resources. This would require very extensive signaling and

routing overhead, as well as synchronization between various PLRs, which would

increase the scalability impact even more. With facility backup, when backup tunnel

path computation is performed, a new backup tunnel path computation does not

need to be triggered (if a bandwidth pool is protected) each time a new TE LSP is set

up or torn down; a facility (like a link or a node) can be protected by a set of backup

tunnels regardless of the set of TE LSPs actually traversing the protected resource.

5.8.4 Summary

In the previous sections of this chapter, we saw various MPLS TE recovery

techniques in detail from various angles; the protocol extensions, the mode of

operations, the capability of each technique, along with several other aspects. The

aim of this section is to highlight the main advantages and disadvantages of each of

them with the objective of providing some guidance of where each recovery mech-

anism preferably applies.

MPLS Traffic Engineering Default Global Restoration

Quick Summary

Global default restoration is the default mode of MPLS TE. When a failure is

detected by the head-end LSR of one or several TE LSPs, for each affected TE LSP,

a new path is computed ‘‘on the fly’’ (using CSPF to find a path that obeys the

constraints or using some preconfigured alternate paths), and if a new path can be

found, the TE LSP is signaled along that path. The traffic is then restored using the

new TE LSP.

Advantages and Drawbacks

Advantages

. Global default restoration does not require any additional configuration of

backup path (unless the network administrator decides to explicitly config-

ure the backup path). So, for instance, if a TE LSP is configured as dynamic

(the path is computed using a CSPF algorithm), no other configuration is

required.

Drawbacks

. Global default restoration is the slowest recovery mechanism compared to

the other protection mechanisms, because it implies the FIS propagation to

AU25



be received by the head-end LSR, a dynamic path computation (which

grows with the number of TE LSPs to reroute and the network complexity)

and TE LSP signaling. It cannot be used to provide recovery times on the

order of tens of milliseconds. Note that a separate CSPF must be computed

per TE LSP to reroute.

. Lack of predictability. In some cases, there is no guarantee that the TE LSP

could be rerouted upon failure. A last-resort option is to relax all the TE

LSP constraints, which guarantees that CSPF will always find a path for the

TE LSP (which will be the IGP shortest path) and so the TE LSP will stay

‘‘up’’ (provided that there exists a path between the TE LSP’s head-end and

tail-end LSRs).

MPLS Traffic Engineering Global Path Protection

Quick Summary

With MPLS TE global path protection, a diversely68 routed backup TE LSP is

computed and signaled for each primary TE LSP before any failure. The constraint

for the backup TE LSP can be identical or different from the primary TE LSP. In

the case of a failure along the path, once the failure notification is received by the

head-end LSR, it switches the traffic over the backup TE LSP and the traffic is

recovered.


Advantages

. In networks with many links and nodes and a limited number of TE LSPs

to protect, this mechanism is easy to deploy and requires a limited amount of

provisioning. For instance, suppose a very large network, where just a limited

number of TE LSPs must be protected. With global path protection, just a

few diversely routed TE LSPs must be configured and set up. On the con-

trary, the use of local protection would require the protection of every

network element with backup tunnels along any potential primary path.

. Because the backup tunnel is signaled before the failure, the path is deter-

ministic and this provides a strict control of the backup tunnel path.

Drawbacks

. Global path protection requires doubling the number of TE LSPs, which

has a significant scalability impact in full mesh networks, as shown earlier.

. Global path protection cannot in most cases (especially in international

networks), provides tens of milliseconds of recovery time, which might be

an issue to protect very sensitive traffic like voice or ATM/frame relay over

IP/MPLS networks. This is due to the need for the failure notification to be

68Diversely routed means link, node, or SRLG disjoint.



received by the head-end LSR before switching the traffic over the backup

path.

. If bandwidth guarantee is required, to provide bandwidth sharing, path

protection requires the use of an external off-line tool for the computation

of both the primary and secondary TE LSPs.

. The requirement for an end-to-end diversely routed path may imply in some

cases to select a nonoptimal path for the primary TE LSPs.

MPLS Traffic Engineering Local Protection

Quick Summary

MPLS TE Fast Reroute is a local protection recovery mechanism. There are two

flavors of Fast Reroute:

. Facility backup: For each protected network element, a backup tunnel is set

up, before any failure. The number of backup tunnels is equal to 1 for link

protection (potentially more if bandwidth protection is required, and

a single backup tunnel with the required capacity cannot be found) and to

N for node protection (where N is the number of next-next-hops for each

LSR). This applies to each LSR in the network where protection is required.

Potentially, as in the link protection case, more than one backup tunnel

might be required per next-next hop if a single backup tunnel with the

required capacity cannot be found. When the link or node fails, upon failure

detection, the node immediately upstream to the failure switches all the fast-

reroutable TE LSPs onto their appropriate backup TE LSP (using the

MPLS label stacking property).

. One-to-one backup: For each fast-reroutable TE LSP using the one-to-one

backup method, a separate diversely routed TE LSP is set up at each hop

that terminates at the tail-end LSR. The number of backup TE LSPs (called

Detour LSP) is a function of the number of fast-reroutable TE LSPs and the

network diameter. Merging rules can help reduce the number of Detour

LSPs in the network. When a link or a node fails, upon failure detection, the

node immediately upstream to the failure switches all the fast-reroutable TE

LSPs onto their Detour LSP.

With both methods, once the PLR (node immediately upstream to the failure) has

locally rerouted the protected TE LSPs affected by the failure onto their respective

backup tunnel, it sends a notification to every head-end LSR of the fast rerouted TE

LSPs, so that the head-end LSR(s) can potentially trigger a reoptimization and

reroute the TE LSPs over a more optimal path in a nondisruptive fashion.


Advantages

. MPLS TE Fast Reroute is a local protection mechanism and can provide

very fast recovery time, equivalent to SONET-SDH/optical protection. This



is particularly important to protect TE LSP carrying very sensitive traffic.

Facility backup and one-to-one backup are equivalent in terms of recovery

time.

. Fast Reroute can provide bandwidth, propagation delay, and jitter guaran-

tees in the case of link/SRLG/node failure. In the case of facility backup, the

required backup capacity can be drastically reduced thanks to the notion

of bandwidth sharing between backup tunnels protecting independent

resources.

. High granularity: The concept of CoR allows to offer a wide range of

protection coverage with a high granularity because the CoR is a per-TE

LSP property.

. The facility backup method has a high scalability because the number of

backup tunnels is a function of the number of network elements to protect

and does not grow with the number of fast reroutable TE LSPs.

. Can easily be used even in networks where full mesh of TE LSPs are not

deployed (see Section 5.8).

Drawbacks

. Requires configuring and setting up a number of backup TE LSPs, which

can be nonnegligible in large networks.

. Might be more complex to troubleshoot.

. The Fast Reroute one-to-one backup method has a limited scalability in

large networks.

5.9 Revertive versus NonrevertiveModes

There is another important aspect that we have not discussed so far in the context of

MPLS TE recovery, which has been introduced in Chapter 1: the notion of revertive

versus non-revertive mode. Indeed, once a network element failure occurs, recovery

mechanisms are responsible for finding an alternate path. But once the resource is

restored, how is the traffic rerouted onto that resource? This depends on whether the

recovery mechanism is revertive or non-revertive and this is the subject of this

section (Section 5.9).

5.9.1 MPLS Traffic Engineering Global Default Restoration

With MPLS TE global default restoration, when a link, an SRLG, or a node fails,

each TE LSP affected by the failure is rerouted over an alternative path determined

by its head-end LSR. When the failed resource is restored, any head-end LSR has

the possibility to reuse the restored resource. This relies on the reoptimization

process by which a head-end LSR tries to evaluate for each of its TE LSPs whether

a better path exists.

There are several possible configurations for a TE LSP.

AU26



. Several static paths are configured: The head-end LSR reevaluates whether

a preferred path (different than the path in use) is available.

. The TE LSP is configured as purely dynamic (no static path is specified):

The head-end LSR reevaluates whether a more optimal path exists (more

optimal usually means ‘‘shorter’’ path using either the IGP or the TE metric

[see [SECOND-METRIC]]).

When is the reoptimization task performed?

Several existing implementations support multiple reoptimization triggers:

. Event driven: A new IGP OSPF LSA or IS-IS LSP has been received and the

head-end LSR determines that triggering a reoptimization may be appro-

priate because a better path may have appeared.

. Timer driven: Each x seconds, the head-end LSR reevaluates whether a more

optimal path can be found.

5.9.2 MPLS Traffic Engineering Global Path Protection

With MPLS TE global path protection, upon link or node failure notification, the

head-end LSR switches the traffic onto the backup LSP. When the link/node

recovers, the head-end LSR can trigger either a revertive or a non-revertive action.

In the former mode, the traffic is immediately switched back to the primary TE LSP

once the primary TE LSP is restored (successfully resignaled). This should be done

without traffic disruption but may provoke some packets reordering. In the latter

mode, the traffic keeps flowing over the backup TE LSP. This option might be

avoided if the backup TE LSP is less constrained than the primary TE LSP (i.e., has

less bandwidth or follows a longer path). In the revertive mode, the switch-back

action can be either event driven or timer driven.

P Important note: A side effect of trying to reuse a restored resource is the risk of

multiple traffic disruption in case of resource flapping.

5.9.3 MPLS Traffic Engineering Local Protection

There are actually two kinds of revertive modes with MPLS TE Fast Reroute,

which are both specified in the Internet Engineering Task Force (IETF) specifica-

tion [FAST-REROUTE]:

1. The globally revertive mode: In this case, the decision to reuse a restored

resource is left to the head-end TE LSP upon reoptimization (which can be

event or timer driven, as previously mentioned).

2. The locally revertive mode: When the PLR detects that the link/node is

restored, it tries to resignal all the TE LSPs that are currently rerouted

over a backup tunnel along the restored resources. If the resignaling attempt

fails, the fast-rerouted TE LSPs keep using the backup TE LSP; if the

attempt succeeds, the TE LSPs are switched back to their original path.


5.9 Revertive versus Nonrevertive Modes 347

Note that the locally revertive mode tries to switch back all the TE LSPs

along the restored path contrary to the globally revertive mode where the

head-end LSR can decide to reuse the restored resource on a per-TE LSP

basis, depending on the TE LSP attributes.

It is worth noting that the locally revertive mode may have undesirable effects:

. In case of resource flapping, the revertive mode would potentially cause

multiple traffic disruptions; consequently, a locally revertive mode should

implement some dampening revertive mechanism, as described in Chapter

4. Otherwise, if the resource flaps, the PLR constantly switches the TE LSP

between the primary link and the backup tunnels, which results in multiple

traffic disruptions.

. Limited TE LSP attributes view: Contrary to the globally revertive mode,

the PLR makes the switch-back operation without a complete knowledge of

the TE LSP attributes. Suppose the following situation: A TE LSP T1 is

signaled along a path P1. A link along P1 fails and Fast Reroute is

triggered. A new link along another (shorter) path between T1’s head-end

and tail-end LSR goes up. The failed link L1 is restored. A locally revertive

mode would switch the traffic back to the restored link even if a better path

exists between T1’s head-end and tail-end LSR. The globally revertive mode

would have been more efficient in this case.

That said, there are some circumstances in which a locally revertive mode might

be useful though.

For the reasons mentioned earlier, the MPLS TE Fast Reroute specification

([FAST-REROUTE]) recommends the globally revertive mode, whereas the locally

revertive mode is optional.

5.10 Failure Profile and Fault Detection

Section 4.3 of Chapter 4 is devoted to the subject of failure profile and fault

detection; it was pointed out that the subjects discussed were applicable to both

IP and MPLS. So only the MPLS-specific aspect will be covered in this section.

5.10.1 MPLS-Specific Failure Detection Hello-Based Protocols

In the context of MPLS TE, another hello-based protocol has been defined, called

‘‘RSVP hello protocol extension’’ (see [RSVP-TE]). The basic mode of operation is

similar to any other hello mechanism. RSVP hello messages are sent at a certain

frequency, and if no RSVP hello messages have been received during a configurable

amount of time (usually some number of times of the hello frequency), the RSVP

hello adjacency is considered down. It is worthwhile noting that RSVP hello is a TE

LSP property, but a proper implementation needs ensure that just one RSVP hello

adjacency is activated per set of TE LSPs traversing the same interface. To illustrate



that statement, let us consider the case of two routers R1 and R2 interconnected via

n links L1, L2, . . . , Ln and where several sets Si of TE LSPs traverse each link L1,

L2, . . . , Ln. A very poorly scalable solution is to activate one RSVP hello adjacency

per TE LSP. Instead, for each set Si, the routers R1 and R2 should select one TE

LSP for which the RSVP hello adjacency will be activated. If the link Li fails, the

RSVP hello adjacency of the selected TE LSP will go down and the router will

declare all the TE LSPs traversing the link Li as impacted by the failure. So the total

number of RSVP hello adjacencies will be n in this case.

As with any other hello-based protocol, the important question of the scala-

bility impact arises and there is no exception with RSVP hellos; sending RSVP hello

messages requires some processing treatment by an LSR, which might not be an

inexpensive operation. This explains why running fast hellos at very high frequency

like 5 ms must be avoided. Moreover, a large number of neighbors also has an

impact on the scalability of such a solution.

Hence, if the number of neighbors is not too high and the RSPV hello fre-

quency is reasonable, RSVP hellos may be a candidate for failure detection when

lower layer fast detection mechanisms are not available.

Of course, those numbers are highly dependent on the platform, but to give

some rough numbers, at the time of writing, some routers can currently support 20

neighbors with RSVP hello messages sent every 100 to 200 ms without any problem.

Note that the potential issue of platforms not being able to sustain RSVP hello

is the potential triggering of false-positive alarms. A false-positive alarm occurs

when Fast Reroute is inappropriately triggered by a loss of RSVP adjacency not

because of a failure but just because the neighboring router is too busy to echo the

RSVP hello message. This would not create any traffic black-holing, but the

protected TE LSPs would be rerouted on their backup tunnel, although this was

not required. Moreover, if the backup tunnel does not offer an equivalent QoS, the

rerouted traffic may experience some performance degradation. Then they would

very likely be eventually reoptimized by their respective head-end LSR along the

initial path, but clearly this is not very desirable and should not happen too

frequently.

5.10.2 Requirements for an Accurate Failure Type Characterization

In the context of MPLS TE local protection, being able to differentiate a link from a

node failure may be particularly useful. In Section 4.3 of Chapter 4, we saw why

such a differentiation is not always obvious. In this section we will see why such a

capability can be very useful and we will describe some potential solutions.

Let us now analyze the situations where being able to differentiate a link from a

node failure may be desirable:

Situation 1: Optimal Backup Path Selection

Let us consider the network depicted in Figure 5.25 where two backup tunnels are

configured on the PLR R0: the NHOP backup tunnel B1 and the NNHOP backup



tunnel B2. A conservative approach might be to systematically select B2 upon link

failure detection because the PLR cannot tell a link from a node failure upon

detecting the link failure. This way, if the failure was a node failure, the decision

was correct. On the other hand, if the failure was a link failure, a better choice

would have been to reroute the set of protected TE LSPs traversing the failed link

onto B1. The fast rerouted TE LSPs (onto B2) could have followed a shorter path if

B1 had been selected in this case. This is mainly due to the additional constraints

imposed for the NNHOP backup tunnel path computation. It is worth highlighting

that this drawback might be relevant only in some networks; typically, in a non–

heavily loaded national network where the propagation delays are not significant,

choosing a slightly longer path for a short period (until the TE LSPs are rerouted by

their respective HE LSRs) is not necessarily an issue. On the other hand, in a poorly

connected network with international links (having significant propagation delays),

rerouting along a longer path is not desirable. This is even more true if the network

is congested because the temporary rerouting along the B2 backup tunnel is likely to

increase the level of congestion over a larger number of links.

Situation 2: Bandwidth Protection Violation

As we will see Section 5.15, one can benefit from the single failure assumption to

achieve bandwidth sharing between backup tunnels protecting independent

resources. Unfortunately, the inability to differentiate a link from a node failure

can lead to situations where backup tunnels protecting independent resources are

simultaneously used, resulting in bandwidth protection violation.

Let us consider the example in Figure 5.26. In the network depicted in Figure

5.26, B1 is a NNHOP backup tunnel originating on R1 and terminating on R3

protecting against a node failure (R2) and B2 is a NNHOP backup tunnel originat-

ing on R2 and terminating on R0 protecting against a node failure (R1). Because

those two backup tunnels protect independent resources (R1 and R2), by virtue of

B1: NHOP Backup Tunnel

B2: NNHOPBackup Tunnel

R1R0 R2

Figure 5.25 Optimal backup tunnel selection.



the single failure assumption, they can share bandwidth because they are never

simultaneously active (see Section 5.15 for further details). This is true in particular

on the link R4-R5. Adopting the same backup tunnel selection strategy as in

situation 1, as soon as the link failure is detected by the PLRs R1 and R2, they

will both start rerouting protected TE LSPs on both B1 and B2, which would result

in a bandwidth protection violation.

This example clearly illustrates the need for some mechanism allowing to

unambiguously differentiate a link from a node failure. An alternative (only avail-

able if the set of backup tunnel paths are computed by a central entity) is to make

sure when computing NNHOP backup tunnels that two NNHOPs backup tunnels

protecting adjacent nodes never collide (i.e., never share bandwidth on their

common section). The counterpart of such an additional constraint is the increase

of the path computation algorithm complexity and a lower bandwidth sharing

efficiency.

So the two examples clearly highlight the benefits of a solution that would allow

a PLR to differentiate a link from a node failure.

A solution has been proposed in [linknode-failure], which relies on sending hello

messages along a link diverse path upon link failure detection; typically, an obvious

candidate for the alternate path is the NHOP backup tunnel itself. In the example in

Figure 5.27, several backup tunnels are configured: On R1, there is one NNHOP

backup tunnel B1 protecting against a failure of the node R2 and one NHOP backup

tunnel B3 protecting against a failure of the link R1-R2. Likewise, on R2, there is one

NNHOP backup tunnel B2 protecting against a failure of the node R1 and one

NHOP backup tunnel B4 protecting against a failure of the link R2-R1.

Mode of operation: Upon link failure detection (by means of layer 2 link failure

notification or RSVP/IGP hellos time out), R169 starts sending some hello

message to R2 via the NHOP backup tunnel B3. If a response is received from

B2 B1

R2R1 R3R0

BandwidthSharing

R5

R4

Figure 5.26 Bandwidth protection violation.

69R2 performs the same set of operations.



the adjacent node (R2), R1 can conclude that the failure is just a link failure and not

a node failure. On the contrary, if no response is received from R2, the failure is a

node failure.

If we assume that such a failure characterization scheme is available, there

are two strategies that can be put in place in terms of MPLS TE Fast Reroute

decision:

Option 1: Start using the NNHOP backup tunnel and switch back if required:

In this option, as soon as the link failure is detected by the PLR (R1), all the

protected TE LSPs traversing the failed link are rerouted onto the NNHOP

backup tunnel. Then the failure characterization mechanism mentioned earlier

is activated. If it turns out that the failure is a link failure, the rerouted TE LSPs

are switched from their NNHOP back up tunnel to their NHOP backup tunnel.

If the failure is characterized as a node failure, no particular action is required.

Option 2: Start using the NHOP backup tunnel and switch back if required:

This option basically does the opposite: Upon detecting the link failure, the

PLR starts rerouting the protected TE LSPs traversing the failed link onto the

NHOP backup tunnel. Then the failure characterization mechanism is acti-

vated; if it turns out that the failure is a link failure, no particular action is

required. On the other hand, if the failure is characterized as a node failure, the

rerouted TE LSPs are switched from their NHOP back up tunnel to their

NNHOP backup tunnel.

Pros and cons of each approach: The failure characterization process takes some

time Tc (this amount of time depends on the protocol and timers used). With option

1, in the case of link failure, this might cause temporary bandwidth protection

violation and/or nonoptimal backup path selection for the reasons mentioned

earlier, but this option always minimizes the packet loss. With option 2, in the

case of node failure, the duration of traffic loss is increased by Tc, but bandwidth

B2 B1

R2R1 R3R0

R5

R4

B3

B4

Figure 5.27 Mechanism allowing to differentiate a link from a node failure.



protection is always preserved and a more optimal backup path is selected in the

case of link failure. Hence, depending on the network objectives, one may prefer

one option or the other, provided that a failure characterization mechanism is

available.

In summary, being able to differentiate a link from a node failure is desirable to

optimally select the backup tunnel to use in the case of MPLS TE Fast Reroute

local protection ‘‘facility backup.’’ That said, this is certainly not an absolute

requirement and should just be considered an optimization.

5.10.3 Analysis of the Various Failure Types and Their Impacton Traffic Forwarding

A large set of possible failures can occur in a network. Section 4.3 of Chapter 4

provides an analysis of the impact on the forwarded traffic of various failure

profiles and the set of failure detection mechanisms that can be used to detect

those failures. Because most of the material covered in Chapter 4 also applies to

MPLS TE, we will just focus on the MPLS TE specific aspects here:

1. Link failure: Link failures always affect the data traffic until an alternate

path is found and data traffic is rerouted over some backup paths. Various

mechanisms have been described in this chapter to handle link failures and

find an alternate path.

2. Node failure: As mentioned in Chapter 4, there are multiple possible causes

of node failures, and their nature has a different impact on the forwarded

traffic.

. Power supply outage: The traffic is black-holed until it is rerouted over a

backup path.

. Route processor failure: In centralized platform architectures, a route

processor failure usually implies that packets forwarded to the failing

routers are dropped. On the other hand, on distributed platform archi-

tectures this type of failure usually does not affect the data plane, and

packets are still forwarded by the router, but just the control plane fails.

The expected behavior in this case is that after some period,70 either the

IGP or the RSVP hello adjacency will go down. In the former case (IGP

adjacencies go down), the IGP neighbors of the failing routers will flood

an updated LSA (router link LSA for OSPF) or LSP (for IS-IS). Every

head-end LSR will detect that one or more of their TE LSPs traverse

a failed LSR and should take the appropriate action (usually a graceful

TE LSP reroute will be triggered in a nondisruptive fashion). In the

latter case (the RSVP hello adjacency goes down), the node immediately

upstream to the failed node will issue an RSVP notification (an RSVP

Path Error message) to every head-end LSR having one or more TE

AU27

70This period depends on the IGP or RSVP timer’s configuration.



LSPs passing through the failed node. Every head-end LSR should then

in turn take an appropriate action. This description does not apply to

the case of graceful restart procedures.

. Software failure: The impact of a software failure on forwarded traffic is

highly coupled to the nature of the software failure, which can vary from

the simple generation of a warning message followed by an automatic

recovery (via restorable module) handled by the operating system to a

situation where the router is completely hosed and can no longer recover

from the failure, which might require a complete reinitialization. In the

latter case, the traffic is black-holed until the control plane detects the

node failure.

. Planned node failure: Because the failure is ‘‘planned,’’ various ac-

tions can be taken before performing the upgrade. The traffic may be

gracefully rerouted around the node. Various methods can be used to

meet that goal: For instance, the link costs of every adjacent node can be

manually increased or an updated IGP LSA for OSPF or IS-IS LSP can

be flooded by the node to be upgraded. The consequences will be that

the IGP will smoothly reroute the traffic around this node and every

head-end LSR upon triggering a TE LSP reoptimization will likely

reroute its TE LSPs along some other path in a nondisruptive fashion.

The node to be upgraded will no longer carry any transiting traffic and

could be safely upgraded without risking any traffic disruption.

P Important note: Some software and hardware architectures support ‘‘hitless’’ soft-

ware and hardware upgrades without requiring any of the actions mentioned above.

5.11 Case Studies

This section is entirely devoted to case studies where the various concepts covered in

this chapter will be illustrated.

Each case study will have the following structure:

. Assumptions: network topology, layer 2/3 protocols, . . .

. Objectives: convergence time, failure coverage, performance, . . .

. Proposed design: There is obviously no unique possible design to address a

specific set of requirements. At least one possible design will be proposed for

each case study.

Three case studies are proposed in this section with a gradual complexity.

5.11.1 Case Study 1

Assumptions

Let us consider the following network made of 12 nodes (LSRs) in the United

Kingdom (Figure 5.28).



. The network is made of several layers:

. An optical layer, having the capability to offer protected or unprotected

lambdas

. An SDH layer, offering both protected and unprotected VCs

. An IP/MPLS layer

. As shown in Figure 5.28, the LSRs are interconnected by different types of

links with various level of protection:

. Unprotected lambda—e.g., link R1-R3

. Protected lambdas or protected SDH VCs—e.g., link R9-R10

. Gigabit Ethernet links—e.g., a layer 2 switch is used to interconnect

the two LSRs R11 and R12, which are co-located within the same

POP

. The IGP is OSPF, configured with the following timer values:

. OSPF hello interval is 10 seconds

. OSPF RouterDeadTimer is 40 seconds

. No incremental SPF, no fast LSA propagation, no fast SPF triggering

. The vast majority of network element failures are link failures. Node failure

is a rare event, so the current IGP convergence to handle node failure is

sufficient.

Unprotected Lambda

Protected Lambdaor Protected SDH

Giga Ethernet

R1

R3

R8

R4

R9

R6R7

R2

R11

R10

R5

R12

Figure 5.28 Case Study 1.


5.11 Case Studies 355

. No change in terms of layer 1-2 protection can be made on the network. In

other words, an unprotected link cannot be protected by the optical or SDH

layer to optimize cost. Likewise, there is no desire to unprotect a link that is

already protected (e.g., because the optical/SDH equipments in place are

already paid off).

. Every link of this network is independent of other links. In other words,

there is no SRLG (no single point of failure between the various links).

. The network has enough capacity to carry the traffic with the required QoS

even during a single failure using OSPF routing. MPLS TE is not deployed

in this network for bandwidth optimization and/or strict QoS guarantee.

. The network is not Diffserv enabled.

Objectives

. The objective is to get fast recovery (in less than 50 ms) in the case of link

failure of unprotected links. All the IP/MPLS traffic must be protected

without any differentiation.

Proposed Design

The requirement for fast recovery implies the use of Fast Reroute. Facility backup

will be selected for its scalability property.

Because MPLS TE is not deployed in this network, the most appropriate design

to benefit from the fast protection of Fast Reroute is to deploy one-hop tunnels,

where Fast Reroute is required (where links are unprotected at the optical or

SDH layer). So for each link to protect with Fast Reroute, two TE LSPs will be

configured:

. An unconstrained primary TE LSP configured as ‘‘fast reroutable’’: No

constraint will be applied to this TE LSP because the objective is just to get

a tunnel to protect. Because no constraint is applied to this tunnel, it will

follow the shortest path, that is, the direct link. Note also that because this

tunnel is a one-hop tunnel, when an IP or MPLS packet is routed onto this

tunnel, no label is pushedon it. This is because the head-endLSRalso acts as a

penultimate hop pop (PHP) LSR—that is, no label is pushed between the

penultimate hop LSR and the tunnel destination. All the traffic will be routed

onto this tunnel so that all the traffic routed over this link according to OSPF

is protected via MPLS TE Fast Reroute in the case of a failure.

. A backup TE LSP will be used to reroute all the protected TE LSPs in the

case of a link failure. The path for this backup tunnel is link diverse from

the link to protect and is either statically configured on the LSR or dynam-

ically computed by the LSR. As mentioned earlier, the network has enough

capacity to satisfy the required QoS even during a single failure. This

implies that the only constraint that should be taken into account for the

backup tunnel path computation is route diversity.



Example: To protect the link R1-R5 with Fast Reroute, two TE LSPs are

configured:

. A one-hop primary LSP without any constraint (0 bandwidth, no affin-

ities, . . . ). All the traffic routed according to OSPF onto this link is routed

onto this primary tunnel.

. A backup tunnel B1 is configured from R1 to R5 and follows the R1-R2-R5

path.

For each link nonprotected by a lower layer protection/restoration mechanism,

this must be done in both directions because TE LSPs are in this case unidirectional.

The same configuration is then applied to every unprotected link in the network: the

links R1-R2, R2-R5, R1-R5, R1-R3, R3-R4, R4-R7, R4-R5, R5-R7, R2-R6, R6-

R7, R7-R8, R2-R9, R9-R8 and R11-R12.

Note that because the network has enough capacity to satisfy QoS re-

quirements even under a single failure scenario, the backup tunnel path does

not need to be constrained to make sure enough capacity is available along the

backup path. Also, because this network is a domestic U.K. network, the increase

of the propagation delay along the backup path is considered negligible and

acceptable.

For instance, in the case of failure of the link R1-R5, once the failure is detected

by the LSR, the fast-reroutable one-hop TE LSP T1 is rerouted onto the

backup tunnel B1 within 50 ms. All the traffic that used to be routed onto

the link R1-R5 will now be rerouted along the path R1-R2-R5; note that the

rerouted traffic is label switched at R2, which does not make any routing decision

(Figure 5.29).

Then T1 is reoptimized by R1 to follow the new OSPF shortest path from R1 to

R5. This shortest path could be R1-R2-R5 or R1-R3-R4-R5 depending on the

OSPF metrics; the assumption is made here is that the shortest OSPF path is R1-

R3-R4-R5) (Figure 5.30).

As mentioned above, every link is protected using the same configuration.

There is just one exception in this network: the link R11-R12. Indeed, in contrast

with the other links of this network that provide fast failure detection, this link is a

Gigabit Ethernet link with a layer 2 switch between the LSRs R11 and R12. So an

interface failure of R12 cannot be detected by R11 other than by means of a hello

protocol. In this case, OSPF will eventually detect the failure after 40 seconds,

which does not satisfy the fast convergence requirement. The solution is to run

RSVP hellos between R11 and R12. The hello frequency will depend on the router

implementation and the number of RSVP hellos’ adjacencies (one in this case). For

instance, a hello’s frequency of 100 ms and a number of missed acknowledgments of

four could be configured. In the case of an interface failure on R12, the link failure

will be detected by R11 in 400 ms.



R1

R3

R8

R4

R9

R6 R7R2

R11

R10

R5

R12

RSVP HelloSession

R1

R3

R8

R4

R9

R6R7

R2

R10

R5

RSVP HelloSession

R11 R12

1-Hop PrimaryTE LSP T1

Backup TE LSPB1 Used to Protectthe R1-R5 Link

T1 is Reroutedonto B1


R1

R3

R8

R4

R9

R6R7

R2

R10

R5

RSVP HelloSession

R11 R12

T2 is then Reoptimizedto Follow the ShortestOSPF Path




5.11.2 Case Study 2

In this case study, the following set of assumptions and objectives are added to the

previous ones mentioned in Case Study 1:

Additional Assumptions

. As depicted in Figure 5.31, some links share SRLGs. For instance, the links

R1-R2 and R1-R5 are in the same SRLG. Likewise, the links R1-R5 and

R1-R4 belong to the same SRLG. This means that the failure of a single

component in the network (e.g., an optical equipment or a fiber) would

result in the simultaneous failure of both links.

Note: Having SRLG-diverse links is not always possible. Indeed, an operator may

not have a sufficiently meshed optical network. When leasing optical lambdas to

another operator, SRLG diversity is also a generally more expensive option. In the

example above, it was not possible to have SRLG-diverse links between R1 and R5

and R1 and R2, but the links R1-R5 and R1-R3 are SRLG diverse. At least, one

should make sure that a node cannot be isolated as the result of a single failure. For

example, having the four links R1-R3, R1-R4, R1-R5, and R1-R2 in the same

SRLG would have completely isolated the node R1 in the case of a failure of that

SRLG.

. MPLS TE is deployed in the network for network resource optimization.

Hence, the 12 LSRs are fully meshed with TE LSPs and routed using

distributed CSPF (132 TE LSPs are up and running). There is no specific

constraint of bandwidth protection; upon a network element failure, pro-

tected TE LSPs are rerouted onto their respective backup tunnel for a short

period (until they are rerouted/reoptimized by their head-end LSR). QoS

degradation during that short period (generally on the order of a few

hundreds of milliseconds) is considered acceptable.

Additional Objectives

An additional objective is added: Both links and nodes should be protected by

MPLS TE Fast Reroute. Even if the failure of a node is not extremely frequent, the

impact of a node failure is important in terms of volume of affected traffic by the

failure. Fast recovery upon node failure is a requirement.

Proposed Design


will be selected for its scalability property. MPLS TE is deployed in this network, so

contrary to the previous case study, configuring one-hop primary tunnels is not

required. Note that primary TE LSPs may be configured to carry all the traffic



routed in this network and all the traffic carried onto protected TE LSPs will be

protected by Fast Reroute. Another option is to carry some traffic onto protected

TE LSPs (like the voice traffic for instance): In this case, data traffic is routed using

OSPF routing and so relies on OSPF as the recovery mechanism whereas voice

traffic is protected by Fast Reroute. Both links and nodes must be protected

according to the set of requirements listed above.

Example: For the sake of simplicity in the illustration, just the backup tunnel

originated by the node R1 to protect the fast reroutable TE LSPs against a failure of

the link R1-R5 and the node R5 are shown in Figure 5.32; a similar approach is

followed for the other links and nodes of the network.

Link protection: For each unprotected link, an NHOP backup tunnel is config-

ured. No constraint applies to the backup tunnel path as QoS degradation during

failure is considered as acceptable. For instance, the link R1-R5 is protected by the

backup tunnel B1. It is worth mentioning that B1’s path must be SRLG diverse

from the link R1-R5. Hence, B1’s path cannot follow the paths R1-R2-R5 and R1-

R4-R5, because:

. The links R1-R2 and R1-R5 share the same SRLG.

. The links R1-R4 and R1-R5 also share the same SRLG.

AU28

Unprotected Lambda


Giga Ethernet

R1

R3

R8

R4

R9

R6R7

R2

R11

R10

R5

R12

SRLG: Share RiskLink Group

Full mesh of TE LSPs − Just the TE LSPsfrom R1 to Other LSRs are Represented

R1

R8

R4

R9

R2

R11

R10

R3




So in the case of a fiber cut, both the link R1-R5 and B1 would fail, which would

make Fast Reroute protection ineffective. Hence, the only SRLG-diverse path for

B1 is R1-R3-R4-R5.

Node protection: The number of NNHOP backup tunnels that must be config-

ured on each neighbor of the protected node is equal to n � 1, where n is equal to

the number of neighbors of the protected node. In this example, R1 has three

NNHOP neighbors: R2, R4, and R7. Hence, the following NNHOP backup tunnels

are configured:

. B2-path: R1-R2. Protects the TE LSPs that follow the path R1-R5-R2

against a failure of R5.

. B3-path: R1-R3-R4. Protects the TE LSPs that follow the path

R1-R5-R4 against a failure of R5.

. B4-path: R1-R2-R6-R7. Protects the TE LSPs that follow the path

R1-R5-R7 against a failure of R5.

Notes:

. The same approach is followed for the backup tunnels originated on other

R5’s neighbor nodes (e.g., R2, R4, and R7). For instance, for the node R2,

R1

R3

R8

R4

R9

R6R7

R2

R11

R10

R5

R12

Backup TE LSPB1 Used to ProtectAgainst a Failure ofthe Link R1-R5

Backup TE LSP B2 Usedto Protect the TE LSPsFollowing the Path R1-R5-R2Against a Failure of theNode R5

Backup TE LSP B3 usedto Protect the TE LSPsFollowing the Path R1-R5-R4Against a Failure of theNode R5

Backup TE LSP B4 Usedto Protect the TE LSPsFollowing the Path R1-R5-R7Against a Failure of theNode R5

Unprotected Lambda


Giga Ethernet

SRLG: Share RiskLink Group




one next-hop backup tunnel is configured that follows the path R2-R6-R7-

R5 to protect the TE LSPs that follow the path R2-R5 and three NNHOP

backup tunnels are configured and follow the paths R2-R6-R7, R2-R1 and

R2-R1-R3-R4.

. When a backup tunnel is configured, some implementations allow to con-

figure multiple static paths. For instance, the NNHOP backup tunnel B4

could be configured with two paths:

Path 1 (preferred): R1-R2-R6-R7

Path 2: R1-R2-R9-R8-R7

. If the link R6-R7 fails, B4 is rerouted along path 2 and is still usable to

protect the primary LSPs that follow the path R1-R5-R7 against a failure of

the node R5. Another mode consists in configuring the NNHOP backup

tunnel as purely dynamic and the PLR will compute its path using CSPF.

Both modes can also be combined: In this case, a set of static path(s) is given

by order of preference, and if none of the static paths is available, the PLR

computes the path itself.

. In the absence of a mechanism allowing to differentiate a link from a node

failure, a perfectly valid approach when both NHOP and NNHOP backup

tunnels are configured consists of systematically using the NNHOP backup

tunnel in the case of a link failure (because the PLR does not know a priori

whether the failure is a link or a node failure). The only exception is for the

protected TE LSPs that terminate on the NHOP LSR, which are rerouted

using the NHOP backup tunnel.

5.11.3 Case Study 3

This case study is definitely the most complicated one; it is the collection of the most

complete set of requirements and allows highlighting all the possible sets of mech-

anisms provided by MPLS TE Fast Reroute.

Assumptions

. The network is made of two layers:

. An optical layer providing unprotected optical lambdas only

. An IP/MPLS layer

. The LSRs are interconnected by unprotected lambdas of different speeds:

OC3, OC48, and OC192

. The IGP is IS-IS, configured with default values:

. IS-IS hello interval is 10 seconds.

. IS-IS hold time is 30 seconds.

. No incremental SPF, no fast LSP propagation, no fast SPF triggering.

. No change in terms of layer 1-2 protection can be made on the network. In

other words, unprotected links cannot be protected by the optical layer (to

reduce cost).



. Some links share a common SRLG, as shown in Figure 5.33.

. Three types of traffic are carried in the network:

. Internet traffic is IP routed (not MPLS switched) according to the routes

computed by IS-IS and BGP.

. VPN traffic: The network supports VPN traffic (might be MPLS VPN,

IPSec, . . . ).

. Voice traffic.

. The network is Diffserv enabled and two classes of services are configured

in the core:

. An EF class: used for the voice traffic.

. An AF class for the data traffic: A congestion avoidance mechanism like

WRED is configured so Internet traffic is more aggressively dropped

than the VPN traffic in the case of network congestion.

. MPLS TE is deployed in this network for bandwidth optimization and

service differentiation (strict QoS guarantees). Two meshes of TE LSPs

are set up in the network:

. One mesh of TE LSPs for the data traffic (called data primary TE LSP).

. One mesh of TE LSPs for the voice traffic (called voice primary TE

LSP).

MIA

BOS

NYC

WAS

CHI

SLC

OC48 link

DEN

LAX

SFO

SEA

PHX

ATL

HOU

DAL

OC3 LinkOC48 LinkOC192 Link




P Important note: Diffserv-aware MPLS TE (also called DS-TE ) could be used in this

network if different CAC mechanisms were required for the data and voice primary

TE LSPs. DS-TE allows to apply different underbooking/overbooking ratios to

different types of traffic. This way, one can ensure that no more than x% of voice

TE LSP will be routed on every link whereas up to y% of overbooking will be

allowed for data traffic. For instance, an operator could decide that to provide a

strict QoS to the voice traffic, an appropriate queuing discipline must be configured

for voice (e.g., priority queuing) and the proportion of voice traffic routed on every

link must be limited to 30%. The limitation of voice traffic ensures that when a

failure occurs, the offered rate of voice will not exceed the service rate of any link

across the alternate path. On the other hand, an overbooking of 150% is perfectly

tolerable for data traffic because of the statistical multiplexing property and lack of

strict QoS guarantees. This case study could easily be extended to the Diffserv-

aware MPLS TE case.

. The traffic matrix is such that a maximum of 20% of voice traffic is routed

onto every link.

. Requirements for bandwidth and propagation delay protection only apply

to the case of a single failure. In other words, the situation of multiple

simultaneous failures is considered as rare enough not to be considered.

Abbreviations

The following abbreviations are used in Figure 5.33:

. Phoenix: PHX

. Dallas: DAL

. Washington: WAS

. Miami: MIA

. Houston: HOU

. New York: NYC

. Atlanta: ATL

. Chicago: CHI

. Los Angeles: LAX

. Seattle: SEA

. San Francisco: SFO

. Salt Lake City: SLC

Objectives

. Objective 1: Both link and node failure must be protected with very fast

convergence time.

. Objective 2: Voice and data traffic have different classes of recovery (CoR):

. Data traffic must be rerouted within 50 ms in the case of a link and/or

node failure and QoS degradation is acceptable both in terms of propa-

gation delay increase and bandwidth protection.

. Voice traffic must be rerouted within 50 ms in the case of a link and/or

node failure and the offered QoS for voice must be guaranteed during



failure (time during which protected TE LSPs are rerouted over backup

tunnels). Protected voice TE LSPs must be rerouted onto backup tun-

nels offering an equivalent bandwidth and a propagation delay increase

bounded to 50%. Also the maximum amount of total voice traffic during

failure must not exceed 30% of the link capacity (this is to ensure that

the proportion of voice traffic is bounded to guarantee a strict QoS to

voice traffic). Moreover, a voice TE LSP must always be soft preempted

(if required).

. Objective 3: The backup bandwidth should be minimized (the network

backup capacity is the bandwidth reserved to place the backup tunnels

dedicated to reroute voice LSPs).

Proposed Design


will be selected for its scalability property.

Various backup tunnel path computation tools can be used to compute the

backup tunnel paths to fulfill the set of requirements. What follows is an example of

potential output that highlights how the requirements can be met by a set of

appropriate backup tunnels.

Examples are provided for the backup tunnels required to protect against a

failure of the node of Dallas and its adjacent links. Similar approaches are taken for

the other links and nodes.

The Internet traffic is IP routed (not carried onto TE LSP); hence, no particular

configuration is required for the Internet traffic. Note that tuned IS-IS timers allow

a better recovery time in the case of failure, as mentioned in Chapter 4.

The primary TE LSPs are configured in the following manner:

. Data TE LSP: fast-reroutable LSP

The following bits of the SESSION-ATTRIBUTE object of RSVP Path

message are set/cleared:

‘‘Local Protection desired’’ ¼ 1

Bandwidth protection desired’’ ¼ 0

‘‘Node protection desired’’ ¼ 0

‘‘Soft preemption desired’’ ¼ 0

. Voice TE LSP: fast-reroutable LSP with bandwidth protection and soft

preemption

The following bits of the SESSION-ATTRIBUTE object of RSVP Path

message are set/cleared:

‘‘Local Protection desired’’ ¼ 1‘‘Bandwidth protection desired’’ ¼ 1

‘‘Node protection desired’’ ¼ 1

‘‘Soft preemption desired’’ ¼ 1

Alternatively, the FAST-REROUTE object can be included in the RSVP Path

message.



Link Protection

Every link is protected by two types of NHOP backup tunnels:

. A data NHOP backup tunnel: The only constraint for data backup tunnel is

to be SRLG diverse from the link to protect. Indeed, QoS degradation is

acceptable for data traffic based on the set of requirements. No bandwidth

needs to be reserved for data NHOP backup tunnels.

. A set of voice NHOP backup tunnels: Two constraints are taken into

account for voice backup tunnels: the bandwidth and the propagation

delay increase, which must be bounded to 50% (Figure 5.34).

Example: The PHX-DAL link is protected by the following set of NHOP backup

tunnels:

Data backup tunnels: Because bandwidth/propagation delay protection is not

required for data TE LSPs, a single backup tunnel BD1 (backup data) is

configured and follows the path PHX-MIA-DAL. Note that BD1 cannot

follow the path PHX-DEN-DAL (although this is a shorter path) because the

links DEN-DAL and PHX-DAL belong to the same SRLG. Note that in the

case of failure of the link PHX-DAL, traffic congestion may occur for data TE

MIA

BOS

NYC

WAS

CHI

SLC

DEN

LAX

SFO

SEA

PHX

ATL

HOU

DAL BD1PHX-MIA-DAL

BV1 (1G)PHX-DEN-CHI-DAL

BV4 (250M)PHX-MIA-DAL

BV3 (500M)PHX-HOU-MIA-DAL

BD

BV

OC3 LinkOC48 LinkOC192 Link

G = Gbps – M-Mbps

BV2 (250M)PHX-DEN-WAS-DAL

Figure 5.34 Link protection for data and voice backup tunnels.



LSP; indeed, no particular bandwidth constraint has been applied on the BD1

path computation. For example, the node DEN could also decide to route its

NHOP backup tunnel protecting the TE LSPs traversing the link DEN-DAL

along the path DEN-PHX-MIA-DAL. In the case of failure of the SRLG

containing the links DEN-DA and PHX-DAL, both backup tunnels would

be simultaneously active. In addition, all the primary TE LSPs routed along the

link PHX-MIA may send traffic; this just shows how a congestion for the data

traffic could occur with any particular precaution regarding the data backup

tunnel path, but as mentioned in the set of requirements, this is considered

perfectly acceptable for the data traffic in this case study (the backup tunnels are

used for a short period [until the primary TE LSPs using them are reoptimized

by their head-end LSR], so the potential congestion will also be temporary).

Another alternative would have been to select the PHX-LAX-SFO-SEA-BOS-

NYC-WAS-DAL where the minimum link bandwidth is OC48 (compared to

OC3 along the path PHX-MIA-DAL), but then the propagation delay is much

longer in this case; this is a trade-off.

Voice backup tunnel: First the required amount of bandwidth (protected band-

width) must be computed. Because not more than 20% of voice is routed on

every link (either because of the traffic matrix or DS-TE is deployed in the

network), the required capacity for the set of voice backup tunnel protecting

the link PHX-DAL is 0.2 *OC192 ¼ 2 Gbps. Because the other adjacent links

are OC48, establishing a single 2 Gbps NHOP voice backup tunnel from PHX

to DAL would result in carrying more than 30% of voice traffic during failure

on every link.

Let us consider the path PHX-DEN-CHI-DAL: Every link is OC192 and

can carry up to 20% of voice at steady state (2 Gbps) and 30% during failure

(3 Gbps). Routing a single voice backup tunnel of 2 Gbps would results in 2

Gbps þ 2 Gbps ¼ 4 Gbps worth of voice traffic: 40% of voice traffic! Hence, the

maximum backup capacity that can be used for voice backup on those links is

1 Gbps to respect the constraint of 30% of voice traffic on every link during failure.

Hence, multiple NHOP voice backup tunnels (noted as BV) are configured:

. BV1: 1 Gbps follows the path PHX-DEN-CHI-DAL.

. BV2: 250 Mbps follows the path PHX-DEN-WAS-DAL (the link WAS-

DAL is an OC48 link so the maximum backup capacity it can offer is (30%

� 20%) ¼ 10% of OC48, which is 250 Mbps).

. BV3: 500 Mbps follows the path PHX-HOU-MIA-DAL (along this path,

the maximum backup capacity is 10% of 2 *OC48 (there are two OC48

links between HOU and MIA), which gives a total of 500 Mbps.

. BV4: 250 Mbps follows the path PHX-MIA-DAL.

So the sum of bandwidths of BV1, BV2, BV3, and BV4 is 2 Gbps (as required),

and during failure, no link receives more than 30% of traffic. The propagation delay

increase constraint is also satisfied as the backup paths never increase the propaga-

tion delay by more than 50%, which was another requirement. Note that the path

AU29

AU30

AU31



PHX-SFO-SEA-BOS-CHI-DAL offers 1 Gbps of backup bandwidth but does not

satisfy the propagation delay increase constraint.

Once the set of data and voice backup tunnels are computed and signaled, when

a data fast-reroutable TE LSP is signaled across the PHX-DAL link, the BD1

backup tunnel is selected. If the link PHX-DAL fails the data TE LSP are rerouted

within 50 ms, without any QoS guarantees during the failure. The situation for voice

TE LSPs is slightly different because the node of PHX (acting as a PLR) must select a

voice backup tunnel among the set of available backup tunnels BV1, BV2, BV3, and

BV4. Various algorithms can be implemented to make the appropriate selection. It is

worth underscoring that a fast-reroutable TE LSP is always rerouted on a single

backup tunnel; its traffic is not load balanced over multiple backup tunnels.

Node Protection

A set of data and voice backup tunnels must also be configured to protect the TE

LSP traversing the node of Dallas from a node failure. As in the case of link

protection, for every NNHOP neighbor, a set of NNHOP backup tunnels must be

computed. There are two types:

. Data NNHOP backup tunnel: The only constraint for the data backup

tunnels is to be SRLG diverse from the node to protect. Indeed, QoS

degradation is acceptable for data traffic based on the set of requirements.

. Set of voice NNHOP backup tunnels: Two constraints are taken into ac-

count for voice backup tunnels: the bandwidth and the propagation delay

increase that must be bounded to 50%.

A similar approach as in the case of link protection is taken for the computation of

the set of required NNHOP backup tunnels.

Illustration of the Concept of Bandwidth Sharing

The concept of bandwidth sharing between backup tunnels protecting independent

resources is extensively covered in Section 5.15. Objective 3 of this case study clearly

states that the network bandwidth capacity must be minimized. Figure 5.35 shows

how the backup bandwidth is shared between voice backup tunnels protecting

independent network resources. In Figure 5.35 some link capacities have been

modified compared to Figure 5.33.

Let us consider the set of voice backup tunnels from the node of Phoenix

required to protect the voice TE LSPs following the paths PHX-DAL-DEN and

PHX-DAL-WAS in the case of a failure of the node Dallas. From the node of

Phenix, several sets of backup tunnels must be computed:

. BV1: from PHX to DEN to protect the TE LSPs following the path

PHX-DAL-DEN against a node failure of DAL

. BV2: from PHX to WAS to protect the TE LSPs following the path

PHX-DAL-WAS against a node failure of DAL

. BV3: from PHX to CHI to protect the TE LSPs following the path

PHX-DAL-CHI against a node failure of DAL



. BV4: from PHX to MIA to protect the TE LSPs following the path PHX-

DAL-MIA against a node failure of DAL

Let us just consider the two sets of backup tunnels (BV1 and BV2) to simplify

the example. One can make several interesting observations:

. Under the assumption that no more than 20% of voice traffic can be routed

on every link, the required capacity for BV1 is 0.2 * (min(PHX-DAL link

capacity)(DAL-DEN link capacity) ) ¼ 0.2 * (min(OC192,OC48) ) ¼ 500

Mbps.

. Under the assumption that no more than 20% of voice traffic can be routed

on every link, the required capacity for BV2 is 0.2 * (min(PHX-DAL link

capacity), (DAL-WAS link capacity) ) ¼ 0.2 * (min(OC192,OC48) ) ¼ 500

Mbps.

. In the case of failure of the Dallas node, both BV1 and BV2 are simultan-

eously active, so they cannot share any bandwidth along any path that they

have in common.

So the computed paths for the voice backup tunnel BV1 and BV2 are PHX-DEN

and PHX-DEN-WAS and the amount of voice reserved from the backup band-

width pool on the link PHX-DEN is 500 þ 500 Mbps ¼ 1 Gbps.

MIA

BOS

NYC

WAS

CHI

SLC

DEN

LAX

SFO

SEA

PHX

ATL

HOU

DAL

BV1

BV2

BV5

OC3 Link

OC48 Link

OC192 Link

Figure 5.35 Bandwidth sharing between voice backup tunnel protecting different nodes.

AU32



Let us now consider the set of voice backup tunnels required to protect the

TE LSPs following the SEA-CHI-NYC path in the case of a node failure of

the node of Chicago. The required capacity is 20% of min (SEA-CHI link capacity,

CHI-NYC link capacity) ¼ 500 M. Hence, a single NNHOP voice backup tunnel

BV5 that follows the path SEA-SLC-DEN-WAS-NYC satisfies the constraints in

terms of required bandwidth: The maximum amount of voice traffic routed on

every link during failure (30%), and propagation delay increase bound (50%).

Because BV1 þ BV2 and BV5 protects voice TE LSPs against the failure of

independent resources (respectively the nodes of Dallas and Chicago), thanks to

the single failure assumption (valid in this case study as mentioned in the set of

assumptions), they can share the backup bandwidth! So the amount of required

capacity of both BV2 and BV5 on the DEN-WAS link is max(500 Mbps, 500 Mbps)

¼ 500 Mbps and not the sum of their bandwidth. This allows to very significantly

reduce the required backup capacity in the network. Note that the maximum

amount of backup bandwidth available on the link DEN-WAS is (30%� 20%) ¼10% of an OC48 link ¼ 500 Mbps. Without this property of bandwidth sharing the

link DEN-WAS could not have accommodated the bandwidth requirements of BV2

and BV5.

5.12 Standardization

Before we elaborate on the standardization aspects of the MPLS traffic recovery

mechanisms described in this chapter, it is worth highlighting one important com-

ment: You might have noticed that several references are provided, which refer

to IETF drafts that are not RFC (Request For Comment) yet. Strictly speaking,

even if an IETF draft is not a standard yet, this does not preclude from being a

technology already, available in commercial products and deployed in existing

networks. A good illustration of this statement is MPLS TE local protection

(Fast Reroute). At the time of writing, this is still an IETF draft: draft-ietf-mpls-

rsvp-lsp-fastreroute, which will likely become an RFC soon. For some other drafts,

there might still be individual submission that will potentially never become RFCs.

Note that the IP and MPLS standards are specified by the IETF.

The aim of this section is not to provide an exhaustive list of all the standards

related to MPLS TE recovery but to highlight the most important ones. Several

IETF drafts have been listed throughout the chapter in the related sections. The

ultimate web site where all the related IETF drafts and RFCs can be consulted is the

IETF web site, at www.ietf.org.

We first saw MPLS TE global default restoration: By definition this does not

imply any standards other than the RFC that defines the signaling extension for

MPLS TE: [RSVP-TE]. In addition, an interesting standard to consult is [FM-

RECOV] that defines a framework of the MPLS based recovery protocols.

Then, the next MPLS TE recovery mechanism covered in this chapter was

MPLS TE global path protection. Because it simply relies on the set up of diversely

AU33



routed TE LSPs, there is no specific standard in addition to MPLS TE. Indeed, the

path computation of diversely routed TE LSPs does not need to be standardized.

Finally, the last MPLS TE recovery mechanism that has been studied in

detail is MPLS TE local protection (Fast Reroute): The main IETF draft specifying

both the facility backup and the one-to-one backup local protection recovery

techniques is [FAST-REROUTE]. In addition, various other drafts related to

the backup path computation models are [FACILITY-BACKUP], [KINI], and

[BP-PLACEMENT]. Several other IETF drafts have been listed throughout this

chapter.

Usually, the question that immediately arises is: Does a protocol specification

have to be a standard to be implemented in commercial product? This answer is a

definite no, and we saw several examples in this chapter. Indeed, vendors may

decide to implement a protocol that is not yet an RFC based on customers’

demands or its confidence in the fact that the IETF draft will become an RFC.

5.13 Summary

This first part of this chapter is devoted to the study of the MPLS TE protection

and restoration mechanisms: Global default restoration, the default rerouting

mode of MPLS TE, was first introduced. Then various protection mechanisms

were covered: global path protection, which provides a substantially faster conver-

gence time than global default restoration but adds a significant amount of backup

states, which may be a limitation in large network. Moreover, in networks where

the propagation delay can be significant, convergence time of a few tens of millisec-

onds is not achievable. Then a large part of this section was dedicated to Fast

Reroute, which allows not only a fast convergence time (tens of milliseconds) upon

a link or node failure but also with strict QoS during failure in terms of bandwidth

and propagation guarantees. Furthermore, MPLS TE allows the use of different

classes of recovery assigned on a per TE LSP basis, with a high degree of granu-

larity. As mentioned in this chapter, strict QoS during failure is required neither in

every network (but just where bandwidth is very scarce) nor for every traffic type.

Finally, for the sake of reference, another last recovery mechanism (not really

pursued in the industry) has been briefly presented: the 1þ1 protection. Then a

comparison of the different sets of mechanisms has been provided with their

respective advantages and drawbacks.

Strictly speaking, load balancing cannot be considered a recovery technique.

That said, it has been shown that load balancing can contribute to reducing the

impact of a network failure on the traffic flow between two points, with some

drawbacks though.

An entire section dealt in detail with the delicate problem of failure detection

and characterization and the impact of each failure profile on the forwarded traffic.

Finally, to conclude the first part of this chapter, three case studies were

proposed, each having a different set of assumptions and objectives: from a simple

objective of fast convergence upon link failure in an IP network to a more complex

AU34


5.13 Summary 371

network with a wide set of objectives including fast convergence upon link or node

failures with different classes of recovery where network backup bandwidth has to

be minimized.

In the second part of this chapter (Sections 5.14 and 5.15), we explore some

specifics advanced topics, which are not required to understand the MPLS recovery

mechanisms but might be interesting if you want to read advanced material on the

subject. First, we investigate in detail the signaling extensions that have been

specified for MPLS TE Fast Reroute. The reader interested by the mechanisms of

Fast Reroute but not in the detailed signaling aspect may want to skip that section.

In the first part of this chapter, we saw that both MPLS TE global path

protection and local protection use preestablished backup tunnels; we will see

that there are several techniques to compute the path of those backup tunnels

depending on the set of recovery objectives and network constraints.

5.14 RSVP Signaling Extensions forMPLS TE Local Protection

This section defines the set of RSVP signaling extensions for the two local repair

techniques: facility backup and one-to-one backup. Note that some RSVP exten-

sions are common to both techniques, whereas others are specific to either facility

backup or one-to-one backup.

5.14.1 SESSION-ATTRIBUTE Object

The format of the RSVP SESSION-ATTRIBUTE object carried in an RSVP Path

message is depicted in Figure 5.36. Let us detail the various flags defined in the

RSVP SESSION-ATTRIBUTE object:

4 Bytes

There are Two SESSION-ATTRIBUTES Format with andwithout Resource Affinities. This Format Corresponds to theFormat with Resource Affinities (Class = 207, C-Type = 1)

SetupPrio Flags

Include-any

Exclude-any

HoldPrio

SESSION Name

Include-all

Flags

Name-length

LocalProtectionDesired

Local recordingDesired

SE Style Desired

BandwidthProtectionDesired

NodeProtectionDesired

SoftPreemptionDesired

ERO ObjectExpansionRequest

Figure 5.36 RSVP SESSION-ATTRIBUTE object.



1. Setup and holding priority flag: These flags are not specific to Fast Reroute.

The setup priority characterizes the ability of a TE LSP to get resources,

potentially preempting existing TE LSPs with lower priority. The holding

priority defines the priority of a TE LSP once set up (used to decide whether

the TE LSP can be preempted by another TE LSP). The reader may refer to

[RSVP-TE] for details.

2. Flags field.

Local protection desired: 0x01

When set, this flag signals a fast-reroutable TE LSP

Label recording desired: 0x02

This flag indicates that the labels used for this TE LSP must be recorded in the

RSVP RRO object carried in the RSVP Resv message. The RRO object is described

in Section 5.14.4. The label recording flag must be set for a fast-reroutable TE LSP.

As mentioned earlier, label recording is necessary to discover the label used between

the NHOP and NNHOP LSR in the case of facility backup with NNHOP backup

tunnels. So when this flag is set, every node along the TE LSP path will insert an

IPv4 subobject in the RRO object carried in the RSVP Resv message which travels

in the upstream direction (from the tail-end LSR to the head-end LSR). This

provides the required information about the label used by downstream nodes.

SE (Shared Explicit) Style desired: 0x04

The ‘‘Shared Explicit’’ flag allows two TE LSPs to share some reservation and is

used when a TE LSP is rerouted (e.g., when a TE LSP is reoptimized along a shorter

path, the new TE LSPs share its reservation with the ‘‘old one’’ before this ‘‘old’’

reservation is torn down: This is known as the make before break procedure). When

requesting Fast Reroute, the head-end LSR should set this flag.

Bandwidth protection desired: 0x08

When set, this flag indicates that the TE LSP requests bandwidth guarantees during

failure (period during which the fast-reroutable TE LSP is rerouted onto its backup

tunnel) and so should not suffer from QoS degradation during failure. If a different

value for the bandwidth is requested during failure (less than the original band-

width), then the bandwidth (in case of failure) is specified in the FAST-REROUTE

object defined below.

Node protection desired: 0x10

When set, this signals to the LSRs along the path that a NNHOP backup tunnel

should preferably be selected over a NHOP backup tunnel.

Soft preemption desired: 0x40

The soft preemption flag is used to indicate that soft preemption is desired (as

opposed to ‘‘hard’’ preemption).

PR35


5.14 RSVP Signaling Extensions 373

P Important note: It is important to underscore the term desired. If the request cannot

be satisfied, the PLR can decide either not to set up the TE LSP or to select a

backup tunnel not satisfying the request. This is a local decision. For instance,

suppose a TE LSP carrying voice traffic requesting both Fast Reroute and band-

width protection. If a PLR along the path cannot find a backup tunnel with the

requested amount of bandwidth, the PLR may select a backup tunnel to fast

reroute the TE LSP in the case of failure, even if the bandwidth request in the

case of failure is not satisfied. Note that additional mechanisms should be used to

ensure that this decision will preserve the bandwidth guarantees that might have

been provided to other TE LSPs.

Anotherobject, theRROobject,describedlater inthissection, isusedbyeachPLR

to indicatewhether therequest is satisfied. Inotherwords,whetherabackuptunnelhas

been selected, the nature of the selected backup tunnel (NHOP or NNHOP backup

tunnel), and finally whether the bandwidth protection request could be satisfied.

5.14.2 FAST-REROUTE Object

The purpose of the FAST-REROUTE object is to signal the requirements of the

backup tunnel to use for a fast reroutable TE LSP. The FAST-REROUTE object is

carried in RSVP Path messages and its format is described in Figure 5.37.

Each RSVP object has a class and a C-type. The class of the FAST-REROUTE

object was not determined at the time of writing but will use the form 11bbbbbb for

compatibility (this allows an RSVP implementation that does not recognize

this object to just ignore it and forward it unchanged to the downstream nodes).

The C-type value is 1. Let us now detail the different fields of the FAST-

REROUTE object depicted in Figure 5.37A).

Setup and holding priorities: Both the setup and the holding priorities are used

to specify the priorities of the backup tunnel. They have an identical usage as

any other TE LSP as defined in [RSVP-TE].

4 Bytes

Length (Bytes)

Bandwidth

SetupPrio

C-TypeClass-num

Include-any

Exclude-any

4 Bytes

Length (Bytes)

Avoid Node ID 1

PLR ID 1

C-TypeClass-num

Avoid Node ID n

PLR ID n

HoldPrio

Hop Limit Flag

Include-all(a) (b)

Figure 5.37 FAST-REROUTE and DETOUR objects.



Hop-limit: The hop-limit field specifies the maximum number of hops between a

PLR and an MP (a value of 0 means that just direct links can be used).

Flags: Two methods for Fast Reroute local repair have been presented in the

first part of this chapter: facility backup and one-to-one backup. This flag

allows specifying the requested method at each PLR along the path:

One-to-one Backup Desired: 0x01

Facility Backup Desired: 0x02

Bandwidth: This field indicates the required bandwidth to protect the TE LSP.

This field is a 32-bit IEEE floating point integer, in bytes-per-second.

Exclude-any: 32-bit vector representing a set of attribute filters associated with

a backup path any of which renders a link unacceptable.

Include-any: 32-bit vector representing a set of attribute filters associated with a

backup path any of which renders a link acceptable (with respect to this test).

A null set (all bits set to zero) automatically passes.

Include-all: 32-bit vector representing a set of attribute filters associated with a

backup path all of which must be present for a link to be acceptable (with

respect to this test). A null set (all bits set to zero) automatically passes.

Note: Using attributes filters can be very useful. Indeed, MPLS TE allows to use

colors (also called affinities; see [TE-REQ] for detailed requirements). Affinities can

be used during path computation to include or exclude some particular links. For

instance, let us suppose that links with long propagation delays are marked with the

color red (this would correspond to a particular bit of the resource class affinity

vector). This property is carried within IGP TE extensions. Then one of the

attributes of a primary TE LSP carrying voice traffic will be to exclude from its

path all the red links (links with long propagation delays). The same set of rules

applies to the backup tunnel. If the PLR requires setting up a backup tunnel to

protect fast-reroutable TE LSPs requesting for bandwidth protection, for instance,

it can use affinities to avoid red links. Note that the affinity constraint of the fast

reroutable TE LSP may be different than the ones of the corresponding backup

tunnel.

When used, the FAST-REROUTE object can only be inserted by the head-end

of a TE LSP and cannot be changed by any other LSR along the TE LSP path.

5.14.3 DETOUR Object

The RSVP DETOUR object (whose format is depicted in Figure 5.37B) is specific

to the one-to-one backup method and is used to identify Detour LSPs. The

DETOUR object does not have a class-num normalized at the time of writing,

but it will have the form: 0bbbbbbb. It is worth noticing at this point that the high

order bit of the class-num is 0, which implies that a node receiving a path message

with a DETOUR object must reject the path message if it does not support that

object, and it must send an RSVP Path Error message to the PLR. The C-type is 7.

Let us now describe the different fields of the DETOUR object.

AU36



PLR ID and Avoid Node ID: The PLR ID is an IPv4 address of the PLR and

the ‘‘Avoid Node ID’’ contains an IPv4 address of the immediate downstream

neighbor (preferably its router-ID) that the PLR wants to avoid. The reason for

multiple possible (PLR ID, Avoid Node ID) pairs is that Detour LSPs might be

merged to reduce the total number of Detour LSPs in a network. In that case,

when multiple Detour LSPs are merged by the Detour Merge Point (DMP), the

DETOUR object of the merged Detour LSP contains all the pairs of (PLR ID,

Avoid Node ID) of the merged Detour LSPs. An example will be provided later

in this section that illustrates the use of the PLR ID and Avoid Node ID.

5.14.4 Route Record Object

Another important object to describe is the RRO object (Figure 5.38). This object

has not been explicitly defined for Fast Reroute, but several new flags have been

added which are required for Fast Reroute. The RSVP RRO object has a Class-

num ¼ 21 and a C-Type ¼ 1:

The RRO object is used to record route, labels, and other useful information

detailed hereafter along a TE LSP path and is made of variable length subobjects:

IPv4 address subobject is quite simple: The type 0x01 defines an IPv4 address

and the IPv4 address specifies a regular IPv4 address of the recording node.

Then several important flags for Fast Reroute are defined:

0x01 Local protection available: This flag indicates that a backup tunnel is

available at the PLR adding the subobject.

0x02 Local protection in use: When Fast Reroute is triggered on a PLR, because

of a link or node failure, the PLR sets this flag in the corresponding IPv4

subobject. This indicates that Fast Reroute is in use and that the protected TE

AU37

4 Bytes

Sub-Objects

4 Bytes

(b)

(c)

Length Ipv4 AddressType

Ipv4 Address (cont) FlagsPrefixLength

Label

4 Bytes(a)

Figure 5.38 RRO object and subobject.



LSP is rerouted over a backup tunnel at this node. Before any failure occurs,

this flag must be cleared.

0x04 Bandwidth protection: As mentioned above a TE LSP has the option

to signal its desire to be protected with a backup tunnel offering an equivalent

bandwidth (the TE LSP is said ‘‘bandwidth protected’’), either by setting

the ‘‘bandwidth protection desired’’ bit in the SESSION-ATTRIBUTE object

or by including a FAST-REROUTE object in the RSVP Path message.

When the bandwidth protection request can be satisfied (a backup tunnel

offering an equivalent bandwidth can be selected by the PLR), the ‘‘bandwidth

protection’’ flag of the IPv4 subobject is set. If bandwidth protection is

requested, then each PLR must set this flag appropriately. If bandwidth

protection is not explicitly requested, the PLR has the choice to set the bit

or not.

0x08 Node protection: Desired protection from node failure can be explicitly

requested for a particular TE LSP by setting the ‘‘node protection desired’’

bit in the SESSION-ATTRIBUTE object. If the PLR can find an NNHOP

backup tunnel, then the ‘‘node protection’’ bit is set; otherwise (an NHOP

backup tunnel has been selected), this bit is cleared; in this case, just the

‘‘Local protection available’’ bit is set. Similar to the previous case, if ‘‘Node

protection’’ is requested, each PLR must set this flag appropriately. If

node protection is not explicitly requested, the PLR has the possibility to set

the bit or not.

As already mentioned, there may be some situations where a request cannot be

fully satisfied. Suppose, for instance, that a TE LSP requests local protection

(setting the ‘‘Local protection desired’’ bit of its SESSION-ATTRIBUTE

object or using the FAST-REROUTE object) along a path R1-R2-R3-R4-R5.

If all the nodes can select a backup tunnel in case of link/node failure except the

node R3 (because its backup tunnel is down or just not configured), the RRO

object carried in the RSVP Resv message sent from R5 to R1 (in the upstream

direction) will contain a set of IPv4 subobjects listing all the nodes from R5 to

R1 with the Ipv4 subobject of R3 having its ‘‘Local protection desired’’ bit

cleared. Another example is if the TE LSP has requested bandwidth protection

and the node R2 can find a backup tunnel but not offering an equivalent

bandwidth. In that case, the ‘‘Bandwidth protection desired’’ bit of the IPv4

subobject of R2 will be cleared. The RRO object is a very efficient way of

signaling the protection status at each hop. This can be used for troubleshooting

on the head-end LSR or to take some appropriate actions at the head-end LSR.

Label subobject: This field contains a 32-bit label and is used to learn down-

stream labels and must be included by each node if the ‘‘label recording desired’’ bit

of the SESSION-ATTRIBUTE object carried in the RSVP Path message has been

set. Note that the presence of this subobject is of the utmost importance for Fast

Reroute facility backup so the PLR learns the label to use when rerouting some

protected TE LSPs onto an NNHOP backup tunnel, as previously explained in

detail.



5.14.5 Signaling a Protected Traffic Engineering LSP with a Setof Constraints

As already mentioned, a head-end LSR can either use the ‘‘Local protection

desired’’ of the SESSION-ATTRIBUTE object or the FAST-REROUTE object

to signal the fast reroutable property of a TE LSP. Note that even if the FAST-

REROUTE object is used, [FAST-REROUTE] recommends to also set the ‘‘Local

protection desired’’ bit of the SESSION-ATTRIBUTE object.

Some other parameters/constraints pertaining to the protected TE LSP can also

be signaled: the request for bandwidth protection and/or node protection. This just

requires to set the ‘‘bandwidth protection desired’’ and ‘‘node protection desired’’

bit, respectively, in the SESSION-ATTRIBUTE object of the RSVP Path message.

If additional control over the backup tunnel is required, the head-end can also

include a FAST-REROUTE object in the path message, specifying the bandwidth,

attributes filters, hop limit, and priorities that apply to the backup tunnel.

If the head-end requires the PLR along the TE LSP path to use a particular

local repair technique (facility backup or one-to-one backup), the corresponding

flag should be set in the FAST-REROUTE object.

An example of the mode of operation for facility backup with node protection

has been provided in Section 5.5.4. It was mentioned that in this case, a discovery

label process is required so the PLR can discover the label used between the NHOP

and NNHOP to perform the appropriate label operation when Fast Reroute is

activated. The complete backup label discovery process is described below. At this

point, one just needs to mention that the ‘‘Label recording desired’’ bit must be set in

the SESSION-ATTRIBUTE of the RSVP Path message. This will trigger the label

recording process at each hop from the TE LSP tail-end LSR to the head-end LSR.

5.14.6 Identification of a Signaled TE LSP

A TE LSP is uniquely identified by two objects carried in the RSVP Path message:

the SESSION and the SESSION-ATTRIBUTE objects. More precisely, the

following fields present in those two objects uniquely identify the TE LSP:

. The IPv4 (or IPv6) tunnel endpoint address (IPv4 [or IPv6] address of the

egress node for the tunnel).

. The Tunnel ID (a 16-bit identifier used in the SESSION object that remains

constant over the life of the tunnel).

. The Extended Tunnel ID (a 32-bit [IPv4] or 128-bit [IPv6] identifier used in

the session object that remains constant over the life of the tunnel). Nor-

mally set to all zeros. Ingress nodes that wish to narrow the scope of a

SESSION to the ingress-egress pair may place their IP address here as a

globally unique identifier.

. The IPv4 (or IPv6) tunnel sender address (IPv4 [or IPv6] address for a

sender node).

. The LSP ID (a 16-bit identifier used in the SENDER_TEMPLATE and the

FILTER_SPEC that can be changed to allow a sender to share resources

with itself).



AU38

With one-to-one backup, the backup LSP (also called Detour LSP) must be

differentiated from the protected LSP. Likewise, when a protected TE LSP is fast

rerouted using the facility backup method, the signaling must be updated so one can

differentiate the fast rerouted TE LSP from the original one. This differentiation is

necessary for merging and to perform appropriate states treatment.

Two methods have been defined to achieve this objective:

Method 1: The Sender-Template-Specific method (referred to as STS): With

this method, when the RSVP Path message of the rerouted TE LSP is sent

along the backup path, the five attributes mentioned above are unmodified,

except the ‘‘IPv4 tunnel sender address,’’ which is set by the PLR to one of its

local address (if the PLR is also the head-end LSR this address must be

different from the original one).

Method 2: The Path-Specific method (referred to as PS): With that second

method, both the SESSION and the SESSION-ATTRIBUTE object are

unchanged, but an additional object (the DETOUR object) is added. This

way the PLR can differentiate the protected TE LSP (also called the fast-

reroutable TE LSP) because it contains a FAST-REROUTE object or the

‘‘Local protection desired’’ bit of its SESSION-ATTRIBUTE is set from the

backup LSP that contains a DETOUR object.

Facility backup always uses the STS method, whereas the one-to-one backup may

use either the STS or the PS method.

5.14.7 Signaling with Facility Backup

Earlier in this chapter, we described the mode of operation of Fast Reroute facility

backup: In a nutshell, to protect a facility like a link or a node, one or more backup

tunnels are preestablished and maintained by a PLR. When a TE LSP is first

signaled, a PLR analyzes the signaled parameters and selects the appropriate

backup to use in case of a failure. All those operations are performed before any

failure and upon a link or a node failure, the set of protected TE LSPs are rerouted

onto their backup tunnel. This section details the signaling operations performed by

the PLR and the MP at each step of the rerouting procedure.

Point of Local Repair Behavior before the Failure

To select a backup tunnel for a TE LSP, when the TE LSP is first set up, any PLR

along the path first determines the TE LSP properties and requested attributes

explicitly signaled through RSVP:

1. Label recording desired (mandatory with facility backup): If set, the PLR

must insert a label subobject in the RRO object carried in the RSVP Resv

message sent upstream.

2. Local protection desired: If the ‘‘Local protection desired’’ bit of the SES-

SION-ATTRIBUTE object of the corresponding RSVP path message is set

and/or a FAST-REROUTE object is present in the RSVP Path message (in

this latter case, an optional preference for the facility backup or one-to-one



local protection technique may be signaled), then the TE LSP is said ‘‘fast

reroutable’’ and a backup tunnel must be selected. If the PLR can success-

fully select a backup tunnel for the TE LSP, then it must reflect it in the

RRO object carried in the corresponding RSVP Resv message forwarded

upstream (the ‘‘Local protection available’’ bit of the RRO IPv4 object is

set; otherwise, the bit must be cleared). For example, if a backup tunnel is

selected for a protected TE LSP and goes down, the bit must be cleared in

the subsequent Resv messages sent upstream.

3. Bandwidth protection desired: If the ‘‘Bandwidth protection desired’’ bit of

the SESSION-ATTRIBUTE object is set and/or a FAST-REROUTE object

is present in the RSVP Path message with a ‘‘bandwidth’’ field set to the

required bandwidth during failure, then a backup tunnel guaranteeing an

equivalent QoS during failure should be selected. If the request can be

satisfied, then the ‘‘Bandwidth protection’’ bit of the IPv4 RRO subobject

carried in the corresponding RSVP Resv messages forward upstream must

be set.

4. Node protection desired: If the ‘‘Node protection desired’’ bit of the

SESSION-ATTRIBUTE object is set and/or a FAST-REROUTE object is

present (with hop limit > 0), the PLR should try to find a backup tunnel that

does not terminate to the NHOP (i.e., a backup tunnel that does not just

protect against a link failure). If this is not possible, the ‘‘Node protection’’

bit of the IPv4 RRO subobject carried in the RSVP Resv message forwarded

upstream must be cleared.

It is worth reemphasizing that the backup tunnel selection is a local decision and

different implementations may make different choices; the bits defined above express

a ‘‘desire.’’ So for instance, an implementation may decide to provide bandwidth

guarantees to a fast-reroutable TE LSP if such a service can be offered even if

bandwidth protection has not been explicitly desired, provided that other requests

for TE LSPs that have explicitly requested bandwidth protection can also be satisfied.

As already mentioned, if facility backup is in use, another task that the PLR

must perform is to identify the label used between the NHOP and the NNHOP LSR

for the protected TE LSPs for which an NNHOP backup tunnel has been selected.

Let us consider Figure 5.39.

As illustrated in Figure 5.39, when the TE LSP T1 is first signaled, because the

‘‘label recording desired’’ of the SESSION-ATTRIBUTE object carried in RSVP

Path message is set, each node includes in the RSVP Resv message traveling in the

upstream direction (from R6 to R2 in the example) both an IPv4 subobject and a

label. This way, the PLR R3, for instance, will learn the label used between R4

(NHOP) and R5 (NNHOP). Note that this is just required with facility backup if

the selected backup tunnel is an NNHOP backup tunnel. Indeed, in the case of an

NHOP backup tunnel, the label used is the same as the fast-reroutable TE LSP.

There is just one exception to this discovery procedure, which is related to the

per-interface label space platform. By contrast with global label space platforms

where the label space is shared between all interfaces, some platforms (e.g., the



ATM LSR platforms) have different label spaces per interface. Consequently, an

MP may use different labels for a TE LSP for different interfaces. With a global

label space platform, for a given incoming label, an MPLS packet will be identically

switched regardless of the incoming interface. For instance, in Figure 5.39, when T1

is fast rerouted by the PLR R3, the traffic from T1 will be received either from the

link R4-R5 (prior to failure) or from the link R11-R5 (during failure) but always

with the same incoming label. Hence, it will be forwarded to the link R5-R6 in both

cases. With a per-interface label space platform, the PLR will have to perform a

specific procedure consisting in sending, before any failure, a path message onto the

backup tunnel (as if the protected LSP was fast rerouted) to discover the label that

the MP (R5) expects to receive for T1 when Fast Reroute is triggered. Note that the

vast majority of packet LSRs use a global label space.

Point of Local Repair Behavior during Failure

Upon failure detection, the PLR triggers Fast Reroute and the protected TE LSPs

are rerouted onto their respective backup tunnels. Besides the traffic rerouting, the

PLR must also perform a set of control plane operations.

Because RSVP is a soft state protocol, the RSVP Path messages for the

rerouted TE LSP(s) must be sent onto the backup tunnel to refresh the TE LSP

states on downstream nodes. Indeed, without any specific action, the RSVP states

for the rerouted TE LSP would not be refreshed and would time out; after a certain

period, downstream nodes would tear down the TE LSPs. In the previous example,

R2 R3

R8

R6R4

B1 (Bypass)

T1

R7

R1

T2

R5

R10

R11

310

11

3

3

R3 Needs to Discover theBackup Label Used BetweenR4 and R5

Sub-Objects

Length Ipv4 AddressType

Ipv4 Address(cont)

PrefixLength

Label

+

1

Figure 5.39 Illustration of the label discovery process with Fast Reroute facility backup.



after Fast Reroute has been triggered, the PLR (R3) sends the RSVP messages of

T1 onto the backup tunnel. Note that intermediate nodes (R10 and R11) do not see

those control messages because they are label switched. Then the MP (R5 in this

example) continues to receive RSVP Path messages and can refresh the correspond-

ing RSVP states. Compared to the original RSVP Path message that used to be

forwarded before the failure on the R3-R4 link, the RSVP Path message sent onto

the backup tunnels contains the following changes:

. The ‘‘local protection desired,’’ ‘‘Bandwidth protection,’’ and ‘‘Node pro-

tection desired’’ bits are cleared.

. The IPv4 tunnel sender address of the SENDER-TEMPLATE object is

changed and set to an local address of the PLR (the STS method is always

used by facility backup).

. The RSVP-HOP object is set to a local IPv4 address of the PLR.

. The ERO (Explicit Route Object) is updated: the RSVP ERO object carried

in an RSVP Path message of a TE LSP always contains the list of hops

that a TE LSP must follow. When a node receives an ERO object it first

checks that the first node listed in the ERO object corresponds to one of is

local interface. So in Figure 5.39, without any specific ERO object update,

R5 would receive an ERO object listing an address of R4 and not one of its

own addresses. So the PLR (R3) needs to update the ERO object such that

the next listed node is the MP (R5) before sending the RSVP Path message

onto the backup tunnel.

. The RRO object is updated: The RRO object sent in Resv messages in the

upstream direction (to R2) is updated as already mentioned, and the ‘‘local

protection in use’’ bit of the IPv4 subobject is set.

Note: The RSVP messages sent onto the backup tunnel are path, path tear, and

ResvConf messages.

Merge Point Behavior during Failure

Once Fast Reroute becomes active, the PLR starts sending path messages onto the

backup tunnel for every rerouted TE LSP that will be received by the MP, which in

turn refreshes the corresponding states.

5.14.8 Signaling with One-to-One Backup

With one-to-one backup the procedure is significantly different. Indeed, with facil-

ity backup, no signaling occurs for the protected TE LSP before the failure;

the backup tunnel is maintained as any other TE LSP and there is no signaling

for the set of protected TE LSP that may use this backup tunnel. By contrast, with

one-to-one backup, each protected has a Detour LSP originated at the PLR and

terminating at the tail-end LSR, which must be set up and maintained. In this

section we describe the signaling operation to set up and maintain those Detour

TE LSPs.



1. Remember, the one-to-one backup technique may either use the sender-

template specific or the path specific method to identify a protected TE

LSP and its Detour LSP:

. If the sender-template specific method is used, then when signaling the

Detour LSP, the PLR replaces the IPv4 (IPv6) address present in the

SENDER-TEMPLATE object by one of its local address (which must

be different from the one used in the protected TE LSP). A DETOUR

object may also be added, but this is not mandatory because the new

address in the SENDER-TEMPLATE object is sufficient to differen-

tiate it from the protected TE LSP.

. If the path specific method is used, the PLR adds a DETOUR object in

the path message of the Detour LSP.

2. The ‘‘local protection desired,’’ ‘‘Bandwidth protection,’’ and ‘‘Node

protection desired’’ bits are cleared.

3. The PLR also removes the FAST-REROUTE object that may have been

present in the original protected TE LSP.

4. RSVP-HOP object is set to a local IPv4 address of the PLR.

5. The ERO object is updated; indeed, the ERO object of the protected TE

LSP used to contain the list of hops to follow from the PLR to the tail-end

LSR for the protected TE LSP.

Let us illustrate the ERO object update operation through the example

shown in Figure 5.40.

In this example, the primary TE LSP T1 follows the path R2-R3-R4-R5-

R6. When the PLR R3 signal its Detour LSP (called D1 in the Figure 5.40),

the ERO object is updated from R4-R5-R6 to R10-R11-R5-R6, which is the

path computed by the PLR for the Detour LSP.

R2

R8

R6R4

T1

R9

R12

R7

R1

R5

R10 R11

D1

R3

ERO Objectfor theDetour LSP D1:R10-R11-R5-R6

Figure 5.40 ERO object calculated for the detour LSP with Fast Reroute ‘‘one-to-one backup.’’



6. The bandwidth advertised in the Sender-TSPEC object reflects the band-

width of the Detour LSP, which can be equal to either the bandwidth of the

protected TE LSP if there was no FAST-REROUTE object and the ‘‘band-

width protection desired’’ bit was set in the SESSION-ATTRIBUTE object

of the RSVP Path message or the bandwidth explicitly specified in the

FAST-REROUTE object signaled in the RSVP Path message of the pro-

tected TE LSP.

7. The RRO object is updated: The RRO object sent in Resv messages in the

upstream direction (to R2) is updated, as already mentioned; the ‘‘local

protection in use’’ bit of the Ipv4 subobject is set when Fast Reroute is

triggered.

5.14.9 Detour Merging

As pointed out previously, MPLS TE Fast Reroute one-to-one backup has the

drawback of generating a potentially considerable number of TE LSPs because the

number of required backup tunnels (Detour LSPs) is a function of the number of

protected TE LSPs and the network diameter (number of hops traversed by each

protected TE LSP). One method helping in alleviating this concern is to proceed

to LSP merging. Several rules are defined in [FAST-REROUTE] to handle Detour

LSP merging, but the concept is quite simple and described in Figure 5.41.

Let us consider the network depicted in Figure 5.41, A protected TE LSP T1 is

set up and follows the path R0-R1-R2-R3-R4-R5 and the local repair technique

used in this network is one-to-one backup. So taking the PLR R0 as an example, R0

computes a Detour LSP D0 following the path R0-R6-R7-R8-R2-R10-R11-R4-R5

according to the requirements for the protection of T1 and the topology and

resource information flooded by the IGP. Likewise, R1 computes a Detour LSP

D1 following the path R1-R7-R8-R9-R4-R5 and R2 computes a Detour LSP D2

AU39

R0

Situation Priorto Merging

R10

R4R2

T1

R11

R6

R3

R7 R8

R5

R9

D0

D1 ERO: R1-R7-R8-R9-R4-R5

D2D1

R1

D0 ERO: R0-R6-R7-R8-R2-R10-R11-R4-R5

D2 ERO: R2-R8-R9-R4-R5

Figure 5.41 Backup tunnel (Detour LSP) path computation with MPLS TE Fast Reroute ‘‘one to one.’’



following the path R2-R8-R9-R4-R5. For the sake of simplicity, both R3 and R4

will perform the same operation, but their respective Detour LSPs are not repre-

sented in the diagram.

As shown on the Figure 5.42, it follows that R7 detects the presence of two

detour LSPs D0 and D1 that both protect the same TE LSP: T1. So they can be

merged. Because D1 has a shorter path than D0, the resulting merged Detour LSP

will be D1 (note that when those Detour LSPs are merged, there are some add-

itional rules to compute the resulting DETOUR object, which are defined in

[FAST-REROUTE]). Likewise, R8 can also perform a merging of D1 and D2.

And finally, a third detour merging operation can be performed by R4, but in this

latter case, the situation is slightly different. Indeed, when the LSR R7, for example,

performs a detour merging operation, it merges two Detour LSPs, whereas in the

case of the LSR R4, the merging of a Detour LSP and the primary TE LSP is

performed. When an LSR merges the protected LSPs with a Detour LSP, the result

is always the protected TE LSP. Figure 5.42 shows the result after merging.

5.15 Backup Path Computation

In this section, we cover the aspects related to the backup path computation for

each of the MPLS TE recovery techniques studied in this chapter. This section is

quite dense because of the complexity of the problem to solve, which greatly varies

with the set of objectives. Indeed, simple algorithms can be used to compute a

diverse path for global path protection or local protection. By contrast, for

R0

SituationAfterMerging

R10

R4R2

T1

R11

R6

R3

R7 R8

R5

R9

D0 ERO: R0-R6-R7-R8-R2-R10R11-R4-R5

D0

D1 ERO: R2-R8-R9-R4-R5

D1 ERO: R1-R7-R8-R9-R4-R5

D2D1

Merging1 (D0 with D1)ERO(D0)>ERO(D1)

R1

Merging 2 (D0with D2)

Merging 3 (D1with ProtectedTE LSP)

Figure 5.42 Illustration of the merging rules with Fast Reroute ‘‘one-to-one backup.’’


5.15 Backup Path Computation 385

example, the algorithms to find a set of backup tunnels for Fast Reroute to provide

bandwidth guarantees and a bounded increase of the propagation delay while

trying to minimize the required amount of backup network capacity can certainly

be very complex. This section deals with all the issues of backup tunnel path

computation with respect to the set of recovery objectives.

5.15.1 Introduction

As previously mentioned, several aspects must be considered when evaluating a

protection/restoration scheme. In the previous sections, we saw various MPLS

traffic recovery techniques: global default restoration, global path protection, and

local protection. Each recovery technique requires the computation of backup path,

which can be calculated ‘‘on the fly’’ with restoration techniques or precomputed

when using protection techniques like global path protection and local protection.

In this section, we focus on the backup path computation aspects of MPLS TE

protection techniques.

The first aspect that crosses one’s mind about recovery techniques is the

recovery time, which is a crucial aspect but not the only one. Indeed, the QoS

during failure, in other words, the QoS provided to the rerouted flows along the

backup path is also a very important aspect that is directly correlated to the backup

path computation.

The backup tunnel path computation complexity is essentially driven by the set

of objectives and increases nonlinearly with the set of associated constraints. So for

instance, in the case of Fast Reroute, if the only objective is to compute a diversely

routed backup tunnel from the protected section (link, node, SRLG) to provide fast

convergence in case of resource failure, then the path computation complexity is not

very high (rerunning a regular CSPF on a subgraph is usually sufficient). On the

other hand, if the objective is to provide a recovery mechanism offering fast

convergence and strict QoS guarantees during failure (e.g., bandwidth guarantee

and bounded increase of the propagation delay) while trying to minimize the

required backup capacity, then this increases the backup tunnel path computation

complexity by an order of magnitude.

This section explores those different requirements and details for each of them

some possible backup tunnel path computation techniques.

5.15.2 Requirements for Strict QoS Guarantees during Failure

Typically, voice traffic does not tolerate QoS degradation for a long period without

being perceptible by the users. So an operator may decide to provide QoS guaran-

tees to the TE LSP carrying voice traffic, even during failure. The same reasoning is

likely to apply to ATM CBR traffic carried over MPLS. On the other hand, some

other TE LSPs carrying less sensitive traffic could tolerate a QoS degradation

during failure (until they are reoptimized by their respective head-end LSR). This

can be part of the service-level agreement (SLA) between an operator and its



customers. This highlights the notion of CoR introduced earlier. As previously

mentioned, in the case of Fast Reroute, the requirement for bandwidth guarantee

during failure is explicitly signaled and so can be applied on a per TE LSP basis,

providing a high granularity.

5.15.3 Network Design Considerations

Every network is different and the constraints on backup path computation are not

just driven by the set of objectives but also by the network design considerations.

The aim of this section is to describe several typical network designs to illustrate the

network design implication on the backup path computation for a defined set of

recovery objectives in term of QoS during failure.

QoS Considerations in Typical Backbone Network Profiles

We can list three typical networks designs:

1. Overprovisioned networks: A simple strategy to provide QoS is to put in

place strict planning rules and make sure that the network has always

enough bandwidth to accommodate the traffic demand while respecting

the QoS objectives. For instance, if at any time the maximum utilization

of any link is less than 20% (this is of course just an example for the sake of

illustration), there is no need for any particular QoS mechanisms and/or TE

mechanism in the network. Note that failure simulation should help figuring

out whether the utilization rule mentioned above is still valid under various

failure scenarios. In such a network, the backup tunnel path just needs to be

diverse from the protected section (link, node, or SRLG). This approach is

simple, efficient, but expensive (and requires some network planning tools

and a reasonably accurate traffic matrix knowledge).

2. MPLS Diffserv-aware networks: In networks where multiple classes of ser-

vice must be provided with different QoS objectives, one can use the Diffserv

architecture where the traffic is marked (colored) based on its CoS and then

queued appropriately in the data plane to reach the QoS objectives on a per-

CoS basis. In addition to the queuing, congestion avoidance disciplines like

WRED can be used. This ensures that delay-sensitive traffic is serviced

appropriately while best-effort traffic gets a lower priority and can poten-

tially suffer from some congestion (of course the number of CoS is not

limited to two). A very well-known fact is that strict bounded delays and

jitter can be provided to high-priority traffics (like voice) provided that some

appropriate queuing mechanisms are deployed in the network and the

proportion of high-priority traffic served by the high-priority queue (usually

preemptive queue) is limited to a fixed percentage of the total amount of

traffic forwarded on a specific link. So if the network is designed so the

proportion of voice traffic on every link is bounded to, for instance, 30% in

both steady state and under failure scenarios, then no particular constraint

needs to be applied to the backup tunnel path computation; the backup



tunnel just needs to be diversely routed from the protected section (link,

node, or SRLG).

3. Traffic-engineered network: In some other networks, there is clearly a need

for traffic engineering (voice, ATM, IP, and MPLS) to optimize network

resource utilization. Various studies have been conducted during the last 20

years to propose IGP metric computation algorithms so the traffic is routed

in an ‘‘optimal’’ way to prevent situations where some links are heavily

congested while some other links in the networks have some spare capacity;

this refers to IP traffic engineering techniques and have been covered in

Chapter 4. Another way of achieving traffic engineering in MPLS-enabled

networks is to rely on MPLS TE where the traffic is routed on TE LSPs

whose path computation is based on the network topology and available

resources with call control admission schemes; in that case, one makes sure

that a TE LSP is routed in the network so every traversed network link can

accommodate the traffic demand.

Of course traffic-engineered networks can also be Diffserv aware. Diffserv

mechanisms ensure that each traffic receives the level of required QoS, whereas

traffic engineering is in charge of computing a path that can meet the bandwidth

and other requirements. Furthermore, MPLS Diffserv aware TE allows to enforce

different CAC schemes (and so underbooking/overbooking) on a per-class type

basis, which provides a very high degree of granularity.

Guaranteeing QoS during Failure

Things get a bit more complicated when QoS objectives must also be met during non–

steady state periods. What if a link or a node fails? An operator may simply decide that

the QoS objectives may not be respected in the case of failure in its network. Let us

first consider the simple case of an overprovisioned network (Figure 5.43).

AU40

AU41

R5R3

R8

R1

R12R11 R13

R2

R4 R7

R10

R0OC48

All Links are OC3 Links (except R1-R2)

R6

R9

Figure 5.43 Bandwidth guarantee during failure in an overprovisioned network.



Although this chapter is dedicated to MPLS TE, let us consider the case of the

pure IP (non-MPLS) overprovisioned network depicted in Figure 5.43. Clearly, in

such a network, even if IP traffic engineering techniques are used (tuning of the IGP

metrics) to avoid congestion on any link at steady state, a link failure is likely to

provoke congestion on alternate paths. As depicted on Figure 5.43, all the IP flows

destined to R2 and beyond and traversing the nodes R3, R0, and R10 will be

rerouted along their next shortest path in case of the failure of the link R1-R2

(through the south path); a maximum of 30% worth of traffic at steady state could

potentially result in a congestion of the links R10-R11, R11-R12, R12-R13, and

R13-R2 in the case of failure of the link R1-R2. As pointed out in Chapter 4, some

IGP metric optimization techniques try to solve that issue for both steady state and

under single network failure scenarios. The result varies with the effectiveness of the

algorithm in use and the network topology. Also, the degree of granularity is

relatively poor because all the IP traffic must be rerouted along the same alternate

path by contrast with MPLS TE where several backup paths (backup tunnels) can

be computed to reroute a subset of the traffic.

Hence, MPLS TE provides a higher flexibility and granularity, which eases the

finding of appropriate backup paths to provide QoS guarantees during failure. For

instance, back to our previous example in Figure 5.43, in the case of failure of the

link R1-R2, the TE LSPs originally routed through the link R1-R2 will be rerouted

along alternate paths obeying the set of required constraints; so TE will play its role,

trying to avoid congestion, and if necessary, multiple backup tunnels will be used to

be able to reroute the TE LSPs requiring an equivalent QoS during failure.

During failure, protected TE LSPs are rerouted over their respective backup

tunnel. As illustrated in Figure 5.44, in this particular example, if a single NHOP

backup tunnel is provisioned to reroute all the protected TE LSPs traversing the

AU42

R5R3

R8

R1

R12R11 R13

R2

R4 R7

R10

R0OC48


R6

R9

Backup Tunnel 1

Backup Tunnel 2

Backup Tunnel 3

Figure 5.44 Bandwidth guarantee during failure with Fast Reroute, using multiple backup tunnels.



link R1-R2, in case of failure of this link, fast recovery is certainly achieved, but

without QoS guarantee.

Hence, the solution consists of provisioning multiple backup tunnels (Figure

5.44). As shown in Figure 5.44, in the case of failure of the link R1-R2, three backup

tunnels are used to reroute the set of primary TE LSPs requiring fast recovery and

bandwidth protection that traverse the link R1-R2. So this example illustrates the

statement upon which the use of multiple backup tunnels can help achieving the

goal of QoS guarantee during failure.

Let us now consider another example (Figure 5.45). Figure 5.45 shows the

situation of an MPLS TE network using Fast Reroute. Let us suppose that the

NNHOP backup tunnels originated at R3, R0, R10, R11, and R12 are computed

without trying to ensure bandwidth guarantees. What could happen (as depicted on

the Figure 5.45) is that those NNHOP backup tunnels may be routed over the same

path (the IGP shortest path). In this case, the sum of traffic carried by the set of

protected TE LSP rerouted onto those tunnels will very likely provoke some

congestion along the south path. This example highlights the fact that node failures

usually have a greater impact than link failures so the statements mentioned in the

case of link failure are even more valid in this case.

The examples above brought out several important considerations, which are

worth being summarized before considering in more details the backup path com-

putation aspects.

Let us again briefly consider the following steps during a failure process when

using a local protection recovery:

. t0: The network element (link, node or SRLG) failure occurs.

. t1: Protected TE LSPs are rerouted onto their respective backup tunnel.

. t2: TE LSPs are reoptimized by their respective head-end LSR along a

new path satisfying their respective constraints (if such a path exists).

. t3: The failed resource is restored.

R5R3

R8

R1

R12R11 R13

R2

R4 R7

R10

R0 OC48


R6

R9

Figure 5.45 Bandwidth guarantee during failure.



. During t1-t0: The traffic is dropped (t1-t0 is the recovery time).

. During t2-t1 (also called during failure): Protected TE LSPs are rerouted

onto their respective backup tunnel.

. During t3-t2 (also called after failure): TE LSPs are rerouted over an

alternate path (if such a path exists).

. After t3: The initial network capacity is restored.

Situation 1: The network is overprovisioned and QoS objectives can be met at

steady state, during, and after the occurrence of a failure. For those networks, a

perfectly reasonable approach consists in provisioning the backup tunnels

without applying any constraint, except of course the one of being diversely

routed from the link/SRLG/node that they protect. In the case of failure, the

traffic is quickly rerouted and does not suffer from any QoS degradation.

Situation 2: The network is overprovisioned at steady state, but upon a link or a

node failure, congestion may appear:

If QoS degradation during failure (t2-t1) is acceptable but not after failure

(beyond t2), then a reasonable approach is to limit the backup path

computation to a single constraint: being diversely routed. In this case,

during failure, the rerouted TE LSPs may suffer from QoS degradation,

but this is considered as acceptable. After a short period, they will be

rerouted along an alternate path (if such a path exists) that offers the

required QoS.

If QoS must be guaranteed also during failure, then the additional con-

straint is to compute backup tunnel paths such that the QoS is preserved

along the backup path, at least for some Class of Recovery.

Situation 3: The network uses of MPLS TE for network resource optimization

and/or strict QoS guarantees at steady state. The same conclusions as with

situation 2 apply.

In summary, the previous discussion demonstrates that the constraints on the

backup paths are driven by both the QoS objectives and the network design. In

overprovisioned networks (at steady state and under failure) or in networks where

QoS during failure is acceptable, then the constraint of the backup path is minimal;

the backup tunnel path just needs to be diversely routed from the protected section,

a problem whose complexity is not greater than computing a regular TE LSP path

on a sub-graph. On the other hand, in non–overprovisioned networks where QoS

guarantees must be ensured during failure for some traffic, backup paths must

satisfy additional constraints. Undoubtedly, MPLS TE makes those objectives

more likely achievable by allowing to restrict those requirements to a subset of

the traffic and by using multiple backup paths for different TE LSPs to reroute.

Before covering in detail the backup path computation aspects, there is another

important fact to notice. As already pointed out, in some networks, MPLS TE is

deployed for the sole interest of fast recovery and several deployment scenarios

have been described in Section 5.5. Let us consider the very realistic scenario, where

at steady state, no particular traffic engineering measures should be taken. The

traffic load on every link is perfectly acceptable and the QoS objectives are met.

AU43



This does not mean that under failure congestion does not appear in some regions

in the network. For instance, a fiber cut of a core network router failure can

sometimes result in severe congestion spots event though the network load at steady

state was perfectly acceptable without any need for traffic engineering. Hence, an

interesting strategy can consist of deploying MPLS TE where the TE LSPs are

configured with their respective bandwidth but follow the IGP shortest path at

steady state (because every IGP shortest path has enough capacity to accommodate

the traffic demand). That said, as pointed out, this may no longer be true during

failure. Then during failure, the TE LSP will be rerouted over non-IGP shortest

paths and congestion will be avoided or at least reduced. This is another application

of MPLS TE: bandwidth optimization after failure (until the resource is restored).

Note that some failures may last several hours or even days before being fixed.

5.15.4 Notion of Bandwidth Sharing between Backup Paths

The previous section provided several examples where backup tunnels must follow

a path offering QoS guarantees (in terms of bandwidth and sometimes propagation

delay). Backup tunnels are regular TE LSP, so a simple approach consists of setting

backup tunnels with bandwidth as any other primary TE LSP. But this may lead to

a very inefficient backup bandwidth usage as shown in the Figure 5.45. So at this

point, the very important and simple notion of bandwidth sharing is introduced:

Two backup tunnels can share some bandwidth only if they cannot be simultan-

eously active. For instance, as depicted in Figure 5.45, if two backup tunnels T1 and

T2 protect two independent resources R2 and R7 and one makes the assumption

that R2 and R7 cannot simultaneously fail, then the total amount of bandwidth

that must be reserved on the links they both traverse is the maximum of their

bandwidths, not the sum, which highlights why simply setting up backup tunnels

with the required bandwidth would be quite inefficient in term of network band-

width usage (Figure 5.46).

In Figure 5.46, both T1 and T2 are backup tunnels used in the context of local

protection to protect against a failure of the nodes R2 and R7, respectively.

Suppose also that QoS guarantee during failure is required. If T1 and T2, respect-

ively, require X and Y Mbps, at first sight, one might think that the amount of

required bandwidth for both T1 and T2 on the link R4-R5 is X þ Y. But if we

assume that either R2 or R7 can fail (but they cannot simultaneously fail), then T1

and T2 are never simultaneously active; hence, the required bandwidth for T1 and

T2 on the link R4-R5 is max(X,Y) instead of X þ Y, which results in considerable

bandwidth gain in terms of required network backup capacity.

It is probably worth defining more accurately what simultaneously fail means.

When a link or a node fails (at time t0), the protected TE LSPs traversing the failed

resources are rerouted onto their backup tunnel. Then those TE LSPs are rerouted

by their respective head-end LSR along an alternate path (at time t2, according to

the terminology previously introduced), if such a path exists. After a period of

ta ¼ t2 � t0, those TE LSPs are no longer rerouted over their backup tunnel. The

single failure assumption assumes that a second failure will not happen during ta, so

PR44



two backup tunnels protecting independent resources cannot be simultaneously

active. Then if a second failure occurs after Ta, bandwidth protection can still be

ensured (provided that the backup tunnels that used to be routed over the previous

failed resources have been reestablished). Hence, the benefit of the single failure

assumption is very straightforward; bandwidth sharing between backup tunnels

protecting independent resource is possible and results in very significant band-

width saving as the required amount of backup capacity in the network is drastic-

ally reduced. Moreover, the single failure assumption is considered perfectly

realistic in many networks, especially because router node failure probability is

generally very low.

Note also that the diagram depicted in Figure 5.46 is generic and equally

applies to global or local protection (facility backup). In the former case, the

backup TE LSP are end to end (between head-end LSR and tail-end LSR). In the

latter case, those two backup tunnels are between a PLR and an MP.

Now that the general concepts of QoS guarantees and bandwidth sharing have

been illustrated, it is time to describe how those concepts apply to the backup path

computation in the context of global path protection and local protection.

5.15.5 Backup Path Computation: MPLS TE Global Path Protection

As described in Section 5.4, global path protection requires the ability to compute

diversely routed paths. Indeed, the backup path must be diversely routed from the

primary TE LSP path. As already mentioned, two paths can be either link or node

disjoint; it is obvious that two node-diverse paths are necessarily link diverse, but

the opposite is not true. Hence, the constraint of finding node diverse paths is

T1

R2R1 R3

R4

R7R6 R8

T2

R5

The Backup Tunnel T1 ProtectsR1 from a Failure of the LSR R2

The Backup Tunnel T2 ProtectsR6 from a Failure of the LSR R7

SINGLE FAILURE ASSUMPTION: The Assumption is Made that a SimultaneousFailure of the Node R2 and R7 is Not Likely to Happen

Figure 5.46 Notion of bandwidth sharing between two backup tunnels protecting independentresources.



stricter than finding link diverse paths. Multiple algorithms have been proposed to

compute link- or node-diverse paths (see [SURVIVABLE]) and a simple algorithm

(referred to as the two-step approach) is described here.

A Simple Algorithm for Diverse Path Computation: The Two-StepApproach

A simple approach for computing two diverse paths is to use the two-step approach

algorithm (referred to as the 2SA algorithm). This algorithm consists of first

running CSPF to find the first path, then prune any link (for link diverse paths)

or node (for node diverse paths) traversed by this shortest path and run a second

iteration of CSPF to find the second path.

Although very simple and fast, this algorithm has the following limitations:

. It may fail to find two link- or node-diverse paths for some pair of nodes

even if such a solution actually exists.

. The resulting solution may be suboptimal in finding two diverse paths so the

sum of their cost is minimal (Figure 5.47).

To illustrate this statement, let us consider the double-square network diagram

depicted in Figure 5.47A) and the two pairs of LSRs (R1-R3) and (R4-R3). This

network has links with costs 1 or 2 as shown on the figure.

Using the 2SA algorithm, two link- and node-diverse paths can easily be found

between R1 and R3 (Figure 5.47A). On the other hand, the situation is different

Two Link and NodeDiverse Paths Can BeFound Between R1and R3 Using the2SA Algorithm

R3R2R1

R6R5

(a)

(b)

R41 2

1

12

2

Link Cost

R3R2R1

R6R5R41 2

21

12

2

Two Link and NodeDiverse Paths Cannot beFound Between R4 andR3 Using the 2SAAlgorithm

2

Figure 5.47 Computation of diversely routed paths using the 2SA algorithm.



between the LSRS R4 and R3; the 2SA fails to find two link- or node-diverse paths

(Figure 5.47B).

This raises another interesting question related to the objective the computa-

tion of two diverse paths has to meet: How should one define the optimality criteria of

the paths computation? To illustrate this, let us go back to Figure 5.47, where it can

be easily seen that two node-diverse paths can be found. They would follow the

paths R4-R1-R2-R3 and R4-R5-R6-R3.

Remember, there may be two situations in which two diverse paths are

required:

. Situation 1: MPLS TE global path protection is used (Section 5.4).

. Situation 2: The traffic between R4 and R3 is load balanced between two

diverse paths (Section 5.7).

In situation 1, this means that the traffic follows a nonoptimal path at steady

state (with a cost of 5 instead of 3 for the shortest path between R4 and R3). This

is clearly a trade-off that should be considered when evaluating MPLS TE global

path protection because at steady state, traffic may not follow an optimal path

to satisfy the requirement of having a diverse path for the backup tunnel. Further-

more, the backup tunnel will just be used in the case of failure along the primary

TE LSP path for a short period, so is it worth following a nonoptimal path at steady

state (during the vast majority of the cases) just to be able to get a diversely routed

path under failure? A possible compromise is to add a constraint to the diverse

path computation: the cost increase of the primary TE LSP cost compared to

the shortest path obeying the set of constraints. If no such paths can be computed,

fall back to the global repair mechanism. For instance, if to satisfy the constraint

of finding two diversely routed paths, this results in a path cost increase of 50% for

the primary TE LSP (compared to the shortest possible path satisfying the set

of constraints), then global default restoration should be used instead of global

path protection.

In situation 2 (load balancing), it would then be interesting to find a solution

where the sum of the costs of the two diverse paths is minimized, something that the

2SA algorithm cannot guarantee either. Let us consider Figure 5.48 and study the

performance of the 2SA algorithm in trying to find two diverse paths so the sum of

their costs is minimized.

Let us now run the 2SA algorithm and determine the sum of the costs of the

two diverse paths between the node pair (R4,R3). Figure 5.49 shows the two diverse

computed paths that would be obtained by running the 2SA algorithm described

above. As shown in Figure 5.49, the two diverse paths obtained with the 2SA

algorithm provide two paths so the sum of their cost is 4 þ 13 ¼ 17, which is

clearly not the most optimal set of diverse paths that could have been obtained

(here, the best set of diverse paths have a sum of costs of 11 instead of 17).

What does this highlight? The two examples depicted above show that the 2SA

algorithm, though very simple, is not always very efficient because it sometimes fails

to find two diverse paths even though such paths exist, and when an additional

objective of minimizing the sum of costs of the two diverse paths is added, which



can be useful in the case of load balancing, for example, this algorithm cannot meet

that objective either.

Hence, more optimal diverse path computation algorithms have been proposed

that can always find diverse paths if such paths exist and that can compute two

diverse paths so the sum of their cost is minimized (see [SURVIVABLE]) where, for

R3R2

R6R5

1

21

11

1

R4

R7

21

5

R1

1

10

Figure 5.48 Optimization of the sum of the cost of two diversely routed paths.

R3R2

R6R5

1

21

11

1

R4

R7

21

5

R1

R3R2

R6R5

1

11

1

R4

R7

21

5

R1

Optimization of the Sum ofthe cost of Two DiverselyRouted Paths with the 2SAAlgorithm

Sum of the Costs = 4 + 13 = 17

Optimization of the Sum ofthe Cost of Two DiverselyRouted Paths with anOptimized Algorithm

Sum of the Costs = 7 + 4 = 11

1

10

2

1 1

10

Figure 5.49 Sum of the cost of two diversely routed paths using the 2SA algorithm.



instance, two runs of (modified) Dijkstra algorithms71 allow for the computation of

diverse optimal paths. Additional constraints like introducing a trade-off between

path diversity and path cost increase can be added but at the cost of increasing the

algorithm complexity.

5.15.6 Backup Tunnel Path Computation: MPLS TE Fast Reroute FacilityBackup

Let us now describe the backup path computation in the context of MPLS TE local

protection.

Backup Tunnel Path Computation without QoS Guarantee during Failure

The simpler case of backup tunnel computation without QoS guarantee during

failure is first considered. As already discussed, in several deployment scenarios, the

unique constraint that must be considered for the backup tunnels computation is to

find a diversely routed path from the protected facility (link/node/SRLG).

This can be for one of the following reasons:

. The network is overprovisioned: In this case, regardless of the backup tunnel

path, the rerouted TE LSPs will follow a noncongested path. This ensures

QoS guarantees during failure.

. QoS guarantee during failure is just not a requirement: Fast recovery is the

unique constraint and the flows rerouted over a backup path can suffer

from QoS degradation during failure for a limited period (until they are

rerouted along another path by their respective head-end LSRs, provided

such a path can be found).

Because the backup path complexity is drastically reduced in those cases, there are

just two aspects to discuss:

1. Manual configuration versus dynamic backup tunnel path computation

2. Backup tunnel path computation triggers

Manual configuration versus dynamic backup tunnel path computation: As previously

discussed, with MPLS TE Fast Reroute facility backup, the number of backup

tunnels is a function of the number of protected resources, not the number of

protected TE LSPs. Because the number of backup tunnels that must be configured

is limited, the network administrator may just decide to manually configure the

backup tunnels paths; in this case, no dynamic computation is performed by the

LSRs.

On the other hand, as stated earlier, the backup tunnel path computation is, in

this case, quite straightforward and not CPU intensive, so another option is to rely

on some distributed path computation where each PLR computes its own set of

backup tunnels:

71Other algorithms can also be used.

PR45



Let us consider the example in Figure 5.50.

Figure 5.50 depicts a simple network where the PLR R0 requires setting up the

following set of backup tunnels:

. An NHOP backup tunnel to protect against a failure of the link R0-R1

(no other constraint other than computing a diversely routed path).

. A set of NNHOP backup tunnels to protect against a failure of

the node R1 (no other constraint other than computing a diversely

routed path). Note that one NNHOP backup tunnel is required per

NNHOP.

The LSR R0 needs to perform the following steps:

Step 1: Compute a NHOP backup tunnel path to protect the link R0-R1:

Because no other constraint than the diverse route computation is required

for the NHOP backup, a single algorithm consists of pruning the protected

section (link R0-R1 in this case) and running CSPF over the remaining top-

ology. The selected path will be the shortest path, taking into account either the

IGP or the MPLS TE metric, because no bandwidth is required for the backup

tunnel. The resulting NHOP backup tunnel is depicted in Figure 5.50.

Step 2: Compute a set of NNHOP backup tunnel paths, one for each NNHOP.

In this particular example, R0 has 4 NNHOPs: R6, R7, R2, and R4. For each

of them, the PLR R0 performs a CSPF computation over the remaining

topology (after having pruned the protected resource R1).

R0

R2R1

R6

R5

R4

R3

R7

R0

R2R1

R6

R5

R4R3

R7

R0

R2

R6

R5

R4R3

R7

Computation of NNHOPBackup Tunnels

Computation of an NHOPBackup Tunnel

Figure 5.50 Backup path computation of NHOP and NNHOP backup tunnels without strict QoSguarantee.



Such a backup tunnel path computation is straightforward and several existing

implementations support dynamic backup tunnel computation.

P Important note: So why not always adopting a dynamic backup tunnel path com-

putation scheme?

Although such a backup path computation can easily be handled in a distrib-

uted fashion, there might be another reason why manual configuration is required:

the nonsupport or configuration of the IGP TE extensions specifying the SRLG. In

this case, the PLR does not have the required knowledge to compute an SRLG-

diverse path. To illustrate this issue, let us consider Figure 5.51.

In the example depicted in Figure 5.51, the two lightpaths interconnecting the

pair of LSRs (R1,R4) and (R4,R5) belong to the same SRLG. In other words, they

have some equipment in common (at the optical layer in this case) whose failure

would provoke the failure of both lightpaths. By default the IP/MPLS layer does

not have such visibility and the topology seen by the IP/MPLS layer is reduced to

the topology described in Figure 5.51B.

Computing the backup tunnel path to protect the link R1-R4 from a link

failure would result (in this simple example) in selecting the shortest path diversely

routed from the protected link, hence the path R1-R5-R4 (supposing that all the

links have an equal metric and the path satisfies other constraints), as shown in

Figure 5.52A.

R4

R3R2

R5

R1

R1

R4

R3R2

R5

OXC1

OXC6

OXC5OXC4

OXC3OXC2

Same SRLG

IP/MPLSTopology View

Optical Layer

(a)

(b)

Figure 5.51 IP/MPLS logical view.



Unfortunately, such a backup tunnel path would not be the right choice.

Indeed, a failure of the SRLG shared by the links R1-R4 and R5-R4 would

imply the failures of both the protected link R1-R4 and its associated backup

tunnel because the link R5-R4 would also fail. This highlights the importance

of being able to compute a SRLG diverse path for the backup tunnel by

means of, for example, distributed CSPF SRLG-diverse backup path computation

algorithms.

Backup path computation triggers: Now another interesting question arises.

When should a backup tunnel (with facility backup) path computation be

triggered?

The backup tunnel path computation and establishment can either be triggered

when the link goes up (for an NHOP backup tunnel) or when the neighbors

adjacency is first established (for an NNHOP backup tunnel). Another alternative

is to set up a NHOP or NNHOP backup tunnel when the first protected TE LSP

traversing the protected resource is signaled.

Furthermore, a PLR can trigger backup tunnel path reoptimization at regular

intervals to determine whether a better path (shortest path) exists.

Backup Tunnels Path Computation with Strict QoS Guaranteesduring Failure

Undoubtedly, when QoS guarantees during failure are required, backup tunnel

path computation is getting significantly more complicated because the requirement

of ensuring that the backup tunnel paths offer QoS guarantees (at least for some

CoRs) is added. This section explores the various aspects of the backup tunnel path

computation to satisfy such a set of constraints. Strict QoS guarantees can be

R4

R3R2

R5

R1

SRLG DiverseBackup Path

(a)

(b)

“IP Diverse”PathR4

R3R2

R5

R1

Figure 5.52 Computation of an SRLG diverse path.

PR46

AU47



reduced to the ability to reroute TE LSPs over a backup tunnel providing an

equivalent bandwidth and sometimes a bounded increase of the propagation

delay. This is why the terms QoS guarantee and bandwidth protection are used

interchangeably throughout this section.

It is worth reinforcing the fact that Fast Reroute is a temporary mechanism

(i.e., a protected TE LSP is rerouted onto a backup tunnel until it gets reoptimized

by its respective HE LSR). Therefore, while the protected TE LSPs are rerouted

over their backup tunnel, the QoS provided to those TE LSPs is dictated by the

amount of bandwidth of the backup tunnel and the propagation delay experienced

along the backup tunnel path.

To compute a set of backup tunnels that satisfy such a set of requirements, one

must follow several steps:

Step 1: First answer the following set of questions:

1. What is the amount of bandwidth to protect?

2. What is the network backup capacity?

3. What are the backup tunnel path computation triggers?

Step 2: Choose a backup tunnel path computation path model.

Step 1: Answer the Following Set of Questions

1. What is the Amount of bandwidth to protect?

When trying to achieve bandwidth protection with Fast Reroute, one must

first determine the amount of bandwidth to protect (also called the protected

bandwidth).

The protected bandwidth is the amount of bandwidth required for the backup

tunnel(s) (i.e., amount of bandwidth that needs to be protected).

At first glance, this seems a quite obvious question. That said, there are two

approaches that can be taken here, each having its respective pros and cons:

Approach 1: Protect the actual reserved bandwidth. To illustrate that first

approach, let us consider the following example of an OC3 link where just

10 TE LSPs have been signaled that traverse this link and such that the sum

of their bandwidth is 50 Mbps. Suppose also that just a subset of them

requires bandwidth protection and the sum of their bandwidth is 30 Mbps.

In this model, the idea consists of computing an NHOP or NNHOP

backup tunnel having a capacity of 30 Mbps: The protected bandwidth

is 30 Mbps. Indeed, why try to protect the entire OC3 capacity if only

30 Mbps worth of traffic must be protected?

The advantage of this approach is that just the amount of required

bandwidth is reserved, not more, which thus allows optimal backup band-

width usage in the network. The immediate counterpart is the requirement

for more frequent backup tunnel path computations. Indeed, the protected

bandwidth changes as new TE LSPs are signaled and torn down. If a new

backup tunnel path computation is triggered each time the protected

bandwidth changes in the network, this will generate the computation



and signalling of new backup tunnels more frequently. One might try to

limit this frequency by the introduction of a threshold mechanism—for

instance, for an OC3 link, set a threshold every 20 Mbps (a more efficient

mechanism would not adopt a nonlinear spacing of the thresholds though).

When the protected bandwidth crosses a threshold a new backup tunnel

path computation is triggered. Another set of thresholds is defined when

the protected bandwidth decreases of course.

Approach 2: Protect a bandwidth pool regardless of the actual amount of

reserved bandwidth. The protected bandwidth does not depend on the

actual amount of reserved bandwidth by a set of protected TE LSPs

requesting bandwidth protection that traverse a protected resource. So

typically, if an OC3 link has a capacity of 155 Mbps, one tries to find a

set of backup tunnels for 155 Mbps. Similarly, a protected SDH-SONET

VC of 155 Mbps reserves 155 Mbps of backup capacity in the network,

whether the protected VC carries some traffic or not.

P Important notes:

. Because bandwidth protection can be requested on per–TE LSP

basis, if the operator knows a priori that the proportion of TE

LSP requesting bandwidth protection will never exceed x% of each

link capacity, then the protected bandwidth can be limited to x% of

each link capacity. For example, if bandwidth protection is just

required for the voice traffic and the operator knows a priori that

each link will never carry x% of voice traffic, then the required

protected bandwidth for each facility to protect is limited to x% of the

link capacity.

. When MPLS Diffserv aware TE is configured on the network, more

than one pool of bandwidth can be configured. The aim of such a

model is to allow different CACs for different classes of traffic. For

instance, an OC3 link can be configured so the maximum amount of

voice traffic does not exceed a fixed percentage of the link capacity, for

instance, 50 Mbps and the maximum amount of data traffic does

not exceed 200 Mbps. This interesting model can guarantee different

overbooking/underbooking ratios per class of traffic. In the example

mentioned above, the maximum amount of voice traffic admitted for

the TE LSP carrying voice will never exceed 50 Mbps, whereas up to

200 Mbps of TE LSPs carrying data traffic can be admitted for this OC3

links. A proper scheduling mechanism then needs to be configured to

guarantee that each class of traffic will be served appropriately. Hence,

the network administrator may decide to protect the bandwidth of a

certain pool, for instance, the bandwidth pool dedicated to the voice

traffic. This allows guarantees fast recovery with Fast Reroute for the

data traffic and fast recovery with bandwidth guarantee for the voice

traffic. So the protected bandwidth is in this case limited to a specific

bandwidth pool, which reduces the amount of required backup capacity.



Though a bit less optimal because it potentially required more protected

bandwidth than necessary, this approach is more scalable than the previous one,

as backup tunnel path computation is much less frequently triggered.

2. What is the network backup capacity?

The backup capacity is defined as the network capacity dedicated for backup tunnels

requiring bandwidth and that cannot be used by primary TE LSPs. The ratio of the

required backup capacity divided by the available bandwidth is an important

efficiency factor of a recovery mechanism. Typically, if the required capacity to

provide bandwidth protection is 20% of the total network capacity, the recovery

mechanism can be defined as extremely efficient as far as the bandwidth usage is

concerned. Indeed, this means that just 20% of the network capacity is dedicated to

backup while being able to provide bandwidth protection when required, by con-

trast with SONET-SDH, for instance, where a protected VC requires to allocate

twice the VC bandwidth: once for the primary VC and the same bandwidth for the

backup.

Hence, for each link, the network administrator defines the following:

. The primary bandwidth pool(s):72 This determines the maximum

amount of bandwidth that can be admitted on the given resource for

primary TE LSPs.

. The backup pool: Total amount of bandwidth that can be used by

backup tunnels.

This is illustrated in Figure 5.53. In Figure 5.53, the network administrator config-

ures the proportion of the link that can be used by regular TE LSPs and the amount

of bandwidth reserved for the backup tunnels used by TE LSPs requiring band-

width protection. The overlay backup network is the network with link capacity

equal to the backup bandwidth pool on each link.

P Important note: An important aspect of the reserved backup capacity in an IP/

MPLS network is that the bandwidth is unavailable in the control plane but still

fully available in the data plane. So, for instance, if an OC3 link is configured with a

backup pool of 30 Mbps and a reservable bandwidth pool for the primary TE LSPs

of 125 Mbps, no more than 125 Mbps of bandwidth can be reserved by all the

primary TE LSPs traversing the link (CAC function). That said, the bandwidth is

still available in the data plane. In other words, the packets forwarded onto the link

R0-R1 will be served at link speed rate. This offers a higher QoS at steady state

(when the backup tunnels making use of the backup pool are not active), a major

difference with other recovery mechanisms at lower layers where the backup

capacity cannot be easily reduced by primary traffic. For instance, in the optical

plane, the optical backup capacity cannot be used by the active primary optical

paths. So to avoid some bandwidth waste, one technique consists of allocating the

72Potentially multiple bandwidth pools will be defined on a per class type basis if MPLS Differsv aware

TE is used. For the sake of simplicity, we consider that a single pool is defined.



backup capacity to low-priority optical paths that are preempted in the case of

failure by the high-priority rerouted optical paths

Overlay backup capacity network discovery: We describe below a backup tunnel

path computation model whereby the entity responsible for the backup tunnel

path computation will first have to acquire the knowledge of the backup

capacity on each link to perform the backup tunnel path computation (i.e.,

the backup network capacity). There are two ways by which the entity respon-

sible for computing the backup tunnel paths can acquire the knowledge of the

overlay backup network:

1. Via a local static configuration: The network administrator just manu-

ally configures the amount of backup capacity for each link in the

network. An alternative would consist in using for the capacity

the difference between the actual link speed and the maximum reserv-

able bandwidth. Indeed, when a link is configured with MPLS TE, the

network administrator configures the maximum reservable bandwidth,

as already mentioned. Furthermore, the link speed is advertised by the

IGP. So the entity could implicitly conclude that the backup capacity is

equal to link-speed–maximum reservable bandwidth. For example, an

OC3 link is configured with maximum reservable bandwidth of

120 Mbps; in this case, the entity in charge of the backup tunnel path

computation could implicitly deduce that the backup capacity on this

link is OC3-link-speed ¼ 155 Mbps � 120 Mbps ¼ 33 Mbps. Unfortu-

nately, this approach does not work with overbooking. Suppose that the

Primary Reservable Pool

Backup Capacity

Overlay Backup Network

Figure 5.53 Illustration of the network backup capacity.



network administrator decides to apply overbooking on some links.

If the maximum reservable bandwidth is 200 Mbps to allow for an

overbooking ratio, this strategy no longer works.

2. Via an automatic IGP discovery: This just requires a simple and straight-

forward IGP (OSPFor IS-IS) extension so every node can explicitly signal

through its IGP the amount of backup capacity on each of its attached

link(s). Such an extension has been proposed in [FACILITY-BACKUP]

and a new sub-TLV (called backup bandwidth pool sub-TLV ) has been

defined. This IGP extension does not have any IGP scalability impact,

which isan importantaspect thatmustbehighlighted. Indeed, everyrouter

advertises the backup bandwidth pool for each of its attached link. In the

FACILITY-BACKUP model studied in this section, this value does not

change as new backup tunnels are dynamically signaled.

3. What are the backup tunnels path computation triggers?

The same backup path computation triggers as in the previous case (backup tunnel

computation without QoS) are valid here. In addition, a backup tunnel path

computation is also triggered when there is a change in the protected bandwidth

and/or the network backup capacity.

Step 2: Choose a Backup Tunnel Path Computation Path Model

Once the protected bandwidth on each link is determined and the network backup

capacity is known, the next step is to choose a backup tunnel path computation

model. Several models have been proposed and listing all of them is virtually

impossible. Some of these models rely on distributed backup tunnel path computa-

tion (each PLR is responsible for computing its set of backup tunnels) and others

explicitly rely on centralized backup tunnel path computation. They all differ by

their degree of efficiency, required set of protocol signaling extensions, complexity,

and scalability, along with other criteria.

Hence, two models, known as the independent CSPF-based model and the

facility backup model, are described in detail in the rest of this section, but bear in

mind that they are not the only backup tunnel path computation models available.

Model 1: the independent CSPF-based model: A simple approach to provide fast

recovery and bandwidth guarantees during a failure is to simply set up a

backup tunnel with a bandwidth equal to the protected bandwidth. In this

model, each PLR simply executes the following set of tasks:

. The PLR first determines the amount of bandwidth required73 for the

NHOP or NNHOP backup tunnel (protected bandwidth).

. Compute a path for the backup tunnel applying the bandwidth

constraint as for any regular TE LSP (could either use the regular

reservable bandwidth or the backup bandwidth).

. Set up the backup tunnels with their associated bandwidth.

PR48

73The determination of the required amount of bandwidth to be protected is discussed later in this chapter.



Although this approach is certainly simple and meets the requirements, it suffers

from several limitations:

1. Bandwidth sharing between backup tunnels protecting independent

resources cannot be performed, requiring much more backup band-

width than necessary in the network.

2. Inability to find a placement of the backup tunnels even if a solution

exists (in some cases).

Let us illustrate those limitations through two examples:

1. Inability to perform bandwidth sharing: Although the concept of band-

width sharing has already been introduced, it is worth providing

another example to highlight its benefits.

As explained previously, under the single failure assumption, two backup

tunnels protecting independent resources can share bandwidth. By contrast, two

backup tunnels originated by two LSRs that protect against the failure of the same

resource (link or node) cannot share their bandwidth because upon failure of the

resource they protect, they will be simultaneously active.

In Figure 5.54, B1 and B2 are NNHOP backup tunnels originated on the PLRs

R8 and R0, respectively, to protect against a failure of the node R1, whereas B3 is

an NNHOP backup tunnel originated at the PLR R5 to protect against a failure of

AU49

R6

R4

R1

R10R9 R11

R2

R3

R8

R0

R7R5B3

B2B1

All Links are OC3 Links (except R0-R1, R8-R1, R5-R3)

15M

10M

30M

Figure 5.54 Illustration of the inability to perform bandwidth sharing with the independent CSPFmodel.



the node R6. As mentioned in Figure 5.54, all links are OC3 except R0-R1

(10 Mbps), R8-R1 (15 Mbps), and R5-R6 (30 Mbps). In this example, we assume

that the protected bandwidth is equal to the link bandwidth (e.g., a backup tunnel

of 10 Mbps is required to protect from a failure of the link R0-R1 or the node R1).

Moreover, the single failure assumption is made, which is that two LSRs cannot

simultaneously fail. In this example, B1 and B2 both protect against the failure of

the same resource (node R1). This means that upon R1’s failure, both B1 and B2

will be active so the required bandwidth on the link R3-R4, for instance, for both of

them is 10 Mbps þ 15 Mbps. On the other hand, because the backup tunnel B3

protects R5 from a failure of the node R6, the backup tunnel B3 can share the

bandwidth with B1 and B2, so the required bandwidth on the link R3-R4 is max

( (B1þB2),B3) ¼ 30 Mbps and not 10Mbpsþ 15 Mbpsþ 30 Mbps ¼ 55 Mbps. So

the single failure assumption allows the use of bandwidth sharing and 25 Mbps of

backup bandwidth is saved on the link R3-R4.

Another interesting fact is that in some scenarios, the actual amount of reserved

bandwidth for B1 and B2 may not be the sum of their bandwidth. Suppose now that

the bandwidth of the link R1-R2 (or bandwidth pool; see below) is now equal to

5 Mbps. The maximum amount of traffic originated by R0 and R8 that can traverse

the link R1-R2 is bounded by the R1-R2 bandwidth pool: 5 Mbps. So in this case,

the total amount of protected bandwidth for B1 and B2 on the link R3-R4 is indeed

5 Mbps not 10 Mbps þ 15 Mbps ¼ 25 Mbps.

Therefore, this example clearly highlights the benefit of bandwidth sharing

under the assumption of a single failure. Unfortunately, with the independent

CSPF model, each PLR determines the amount of bandwidth to be protected and

sets up its own backup tunnel; there is no synchronization between PLRs. This is

why no bandwidth sharing can be achieved. With the independent CSPF model, B1,

B2, and B3 are signaled with 10 Mbps, 15 Mbps, and 30 Mbps, respectively, and the

amount of reserved bandwidth on the link R3-R4 is 55 Mbps.

2. Inability to find a placement of the backup tunnels even if a solution exists

(in some cases): This is the second limitation of the independent CSPF-

based model. By definition, the independent CSPF model relies on the

uncoordinated backup tunnel path computation of various LSRs; con-

sequently, the order of setting is arbitrary, which can result in the

inability to find a backup tunnel placement even though a solution

exists. Let us illustrate this statement through an example depicted in

Figure 5.55.

In Figure 5.55, R0 requires an NNHOP backup tunnel of 10 Mbps (capacity of

the link R0-R1) to protect against a failure of R1 and R8 requires an NNHOP

backup tunnel of 20 Mbps (capacity of the link R8-R1) to protect against a failure

of R1. Suppose also that the backup bandwidths on the links R3-R4 and R8-R9 are

20 Mbps and 10 Mbps, respectively.

If the first node starting its backup tunnel computation is R0, it will likely select

the shortest path satisfying the constraints for its backup tunnel: R0-R3-R4-R2.

Then no path obeying the required constraint of 20Mbps of bandwidth can be

AU50



found by R8, although a backup tunnel placement could be found as depicted in

Figure 5.56.

Of course, a solution could have been found even with the first placement

allowing for load balancing with two backup tunnels having 10 Mbps of bandwidth

each and following the paths R8-R0-R3-R4 and R8-R9-R10-R11. But with the

R4

R1

R10R9 R11

R2

R3

R8

R0

20M

10M

20M

All Links are OC3 Links Unless Mentioned Otherwise

10M

BackupBandwidth

ProtectedBandwidth

B1 (10M)

Figure 5.55 Illustration of the potential inability to find a placement of backup tunnel with indepen-dent CSPF model.

R4

R1

R10R9 R11

R2

R3

R8

R0

T1 (10M)

All links are OC3 links unless mentioned otherwise

BackupBandwidth

ProtectedBandwidth

20M

10M

20M

10M

Figure 5.56 Illustration of the potential inability to find a placement of backup tunnels with theindependent CSPF model, although a solution exists.



independent CSPF model, for a fixed number of backup tunnels (also called splits),

a similar example could be found where potentially a solution cannot be found to

satisfy the requirement of protecting X Mbps.

Model 2: The facility-based computation model provides strict QoS guar-

antees to a set of specific protected TE LSP requesting bandwidth protec-

tion with an efficient backup bandwidth usage. This aspect is indeed

extremely important for cost effectiveness. The ‘‘facility-based computa-

tion’’ model is described in [FACILITY-BACKUP].

1. Centralized versus distributed path computation models: The facility-

based computation model specifies two possible methods for the com-

putation of the set of required backup tunnel paths:

a. The centralized model, in which a central server (also called a path

computation element [PCE]74) computes the paths for the set of

backup tunnels that protect all the network resources

b. The distributed model, in which each router (LSR) is responsible for

the computation of a subset of backup tunnel paths

In any case, there is a set of variables that the PCE must take as input to perform

backup tunnel path computation:

1. The amount of protected bandwidth

2. The backup capacity

3. The network topology and resources

Centralized backup tunnel path computation: There are actually two subcases that

must be considered independently, depending on whether the central PCE is re-

sponsible for both the primary and the backup tunnel path computations or just the

backup tunnel path computation.

Situation 1: The PCE is responsible for both the primary and the backup tunnel

path computations.

This assumes that MPLS TE is used in the network for bandwidth optimization

and/or strict QoS guarantees. In addition, Fast Reroute is deployed for fast

recovery.

In this case, the PCE knows both the amount of protected bandwidth, which is

equal to the actual reserved bandwidth (because it is also responsible for the

primary tunnel placement) and the backup capacity, which is nothing but the

remaining capacity once all the primary TE LSPs have been placed. So the PCE

can protect one element at a time (an element being either a link, a node, or an

SRLG), using all the network backup capacity. This will ensure that bandwidth is

shared between backup tunnels protecting independent resources; indeed, suppose

that the PCE tries to compute the set of backup tunnels to protect all the TE LSPs

74There are several possible terms to refer to the capability of computing a TE LSP path for a client LSR:

path computation server, path computation element, and path computation router.

AU51



requesting bandwidth protection that traverse a node R1 in the case of failure of the

node R1. The protected bandwidth is equal to the sum of their bandwidth and the

PCE can use all the available bandwidth on every link not consumed by primary TE

LSPs. Once this set of backup tunnels has been computed, the PCE can start

considering the protection of the TE LSPs traversing another node R2. The amount

of backup capacity available for that new set of backup tunnels is strictly equal to

the amount of bandwidth considered in the previous case. Why? Simply because

under the single failure assumption, the resources R1 and R2 cannot simultaneously

fail so their respective set of backup tunnels cannot be simultaneously active and so

they can share the backup bandwidth.

A few comments can be made at this point:

Comment 1: The backup tunnel path computation is an NP-complete problem

whose complexity renders its computation intractable without the use of some

heuristics to speed up the path computation.

Comment 2: Some complex algorithms can be used to find an optimal

placement for the primary TE LSPs while trying to fully protect bandwidth

and achieving an optimized bandwidth sharing, but this might not always be

possible. For the sake of illustration, if the bandwidth is a scarce resource

and the bandwidth cannot be fully protected if the primary TE LSPs are

placed in an optimal fashion, then the PCE may decide (based on some

preconfigured local policy) to displace some primary TE LSPs from their

optimal path to free up some bandwidth on some path to get a complete

bandwidth protection.

As far as the network topology is concerned, the PCE can acquire it either via

routing or any connection to a seed router. In the first case, the PCE can be adjacent

to any LSR in the network and run an IGP like IS-IS or OSPF. The only requirement

is to make sure that the PCE set the ‘‘Overload bit’’ for IS-IS or ‘‘Max metric’’ for

OSPF so that the PCE is not considered as a router by other routers and is never

included in their SPT. Another possibility is for the PCE to acquire the network

topology (IGP database) via a Telnet session or SNMP management information

base. The PCE can collect the IGP database from any router in a routing area because

all routers of the same routing area share an identical IGP database (which is a

fundamental property of link state routing protocols). If the autonomous system is

made of several areas, then the PCE needs to have at least one connection to a router

in each area. It is worth pointing out that the acquisition of the network topology via

routing offers a significant advantage: a real-time view of the network topology. As a

reminder, bandwidth sharing relies on the single failure assumption (i.e., backup

capacity cannot be shared by backup tunnels protecting nonindependent resources

[resources that can fail simultaneously]). Thus, when a failure occurs, rapid backup

tunnel recomputation makes the single failure assumption more reliable.

Situation 2: The PCE is responsible only for the backup tunnel path computa-

tion.

AU52



This case typically applies to two scenarios:

. Scenario 1: The primary TE LSPs paths are computed in a distrib-

uted fashion (by each head-end LSR using a CSPF algorithm),

whereas the backup tunnel paths are computed by the (centralized)

PCE.

. Scenario 2: Separate centralized PCEs are used to compute primary and

backup tunnel paths.

Scenario 1: Because the PCE is responsible only for the backup tunnel path

computation, it cannot use the unreserved bandwidth (not used by the primary

TE LSPs) for the backup capacity. Why? Let us suppose that the PCE, in order to

compute the backup tunnel paths, uses the unreserved bandwidth by the primary

TE LSPs. It will be shown hereafter that the backup tunnels are signaled with 0

bandwidth; this is to avoid some extensions of the CAC process, but let us just make

the assumption that backup tunnels are signaled with 0 bandwidth for the moment.

So under the previous assumption, the PCE computes the set of backup tunnels

(using the current available bandwidth). Because the LSRs compute the path for

their primary TE LSP and do not have any knowledge about the backup tunnels in

place and their respective computed bandwidth (backup tunnels are signaled with 0

bandwidth), this implies that they could at any time draw some bandwidth from the

reservable bandwidth pool, outdating the backup tunnel path computation, which

explains why, when the PCE is just responsible for the backup tunnel path compu-

tation, the PCE cannot consider the unreserved bandwidth as the backup capacity.

The solution is to get non-overlapping pools for primary and backup tunnels (two

pools are defined: one for the primary and one for the backup and they do not

overlap). This way, an LSR could use the bandwidth pool reserved for primary

tunnel and the PCE could use the backup pool reserved for backup tunnels; there is

no overlap, so the set up of new primary TE LSP does not invalidate previously

computed backup TE LSPs.

Scenario 2: Scenario 2 is somewhat similar to scenario 1 because the PCE in charge

of computing the backup tunnels paths cannot use the unreserved bandwidth

(known by the other PCE responsible for the primary TE LSPs path computation);

hence, non-overlapping bandwidth pools are also required.

Distributed model: The aim of the distributed model is to distribute the backup

tunnel path computation among several LSRs instead of relying on a central

PCE to perform backup tunnel path computation. To avoid confusion, it is

worth clarifying the notion of ‘‘distributed computation.’’ In the computer

science world, the notion of distributed computation usually refers to the ability

to involve several processors in a computation task. In the distributed facility

computation model, the computation of a set of TE LSPs to protect a particu-

lar resource is always performed by a unique entity (in this case an LSR). The

notion of distributed computation refers to the fact that the set of backup

tunnels to protect a set of N resources is shared among several entities but the



set of backup LSPs required to protect a particular resource R is always

computed by a unique PCE (an LSR in this case).

Let us now consider the situations in which a set of backup tunnels must be

computed to protect a node, a link, and an SRLG:

Situation 1: protection against a node failure

As previously pointed out, the set of backup tunnels that needs to be computed by a

unique entity (PCE) is the set of backup tunnels protecting against the failure of a

resource R. In other words the set S of backup tunnels protecting against the failure

of a resources R cannot be computed by different entities. Why? Because they

cannot share bandwidth. Let us go back to the diagram depicted in Figure 5.54

for a moment. In the case of failure of the node R1, the backup tunnels B1 and B2

are simultaneously active. Moreover, backup tunnels are signaled with 0 bandwidth

for a reason detailed later in this section. So the implication is that a unique entity

must be responsible for the computation of all the backup tunnels that protect

against the failure of R1 (this is required to make sure that the backup tunnel paths

offer the required bandwidth). A very natural choice for this entity is the node R1

itself! In the distributed model, to protect against a node failure (the failure of R1),

R1 will compute all the backup tunnels from every neighbor to their set of next-next

hops: from R0 to R2, R0 to R8, R0 to R9, R0 to R10, R8 to R0, R8 to R2, R8 to

R9, R8 to R10, R9 to R8, R9 to R0, R9 to R2, R9 to R10, R10 to R9, R10 to R8,

R10 to R0, R10 to R2, R2 to R10, R2 to R9, R2 to R8, and R2 to R0. Likewise, R6

performs the computation of backup tunnels from each of its neighbors to their

NNHOP in the case of its own failure (from R5 to R7 and R7 to R5). Neither

synchronization nor communication is required between the two PCEs R1 and R6

because they compute backup tunnels to protect independent resources (R1 and

R6). Each of them may use the whole backup network capacity, which allows them

to naturally perform bandwidth sharing.

As in the case of the centralized model with the PCE responsible for the backup

tunnel path computation only, a separate backup bandwidth pool is required.

Communication between a node acting as a PCE and its neighbors requires some

signaling protocol detailed later in this section.

Situation 2: protection against a link failure

To protect a link L, if unidirectional TE LSPs are used, two NHOP backup tunnels

are required (one in each direction). If the link fails in one direction (e.g., a laser on

the sender side or a photodiode on the receiver side fails), then one NHOP backup is

used. On the other hand, in the case of a bidirectional link failure (e.g., fiber cut),

both NHOP backup tunnels will be used. This requires for the two NHOP backup

tunnels offering bandwidth protection to be computed by a single PCE to avoid

bandwidth protection violation. This is illustrated in Figure 5.57.

So let us consider the network depicted in Figure 5.57: An NHOP backup

tunnel B1 protects the fast-reroutable TE LSPs traversing the link R1-R2 against a

AU53



failure of the link R1-R2. Another NHOP backup tunnel B2 protects the fast-

reroutable TE LSPs traversing the link R2-R1 against a failure of the link R2-R1.

Let us now suppose that the two NHOP backup tunnels B1 and B2 are computed

independently; they may share bandwidth! Indeed each NHOP backup tunnel path

computation will be performed independently. But in the case of a bidirectional link

failure, both NHOP backup tunnels will be active, which will result in bandwidth

protection violation, hence the requirement for two NHOP backup tunnels to

be computed by a single PCE. This can clearly be seen in Figure 5.57; in the case of

a bidirectional failure of the link R1-R2, both B1 and B2 are simultaneously active

on the link R4-R5, which results in a bandwidth protection violation. A simple

solution consists in electing one of the two ends of the link as the PCE for the

computation of the set of NHOP backup tunnel paths protecting against a bidirec-

tional link failure (e.g., the LSR with the smaller routed ID could be selected).

Situation 3: protection of an SRLG

Likewise, the protection of an SRLG requires to elect a PCE to compute the set of

required backup tunnels in the case of failure of this SRLG (Figure 5.58).

As shown in Figure 5.58, the set of required backup tunnels to protect against a

failure of SRLG S1 must be performed by a unique PCE elected among the set of

LSRs: R1, R4, and R5.

Signaling of backup tunnel with 0 bandwidth

Several times in this section, we made the statement that backup tunnels providing

QoS guarantees are signaled with 0 bandwidth in the ‘‘facility-based computation’’

model. To illustrate why backup tunnels are signaled with 0 bandwidth

(although their path is computed to provide bandwidth guarantees), let us consider

Figure 5.59.

B2 B1

R2R1 R3R0

NO BandwidthSharing

R5

R4

Figure 5.57 Computation of NHOP backup tunnels with bandwidth protection with the facilitybackup model.



R1

R4

R3R2

R5

OXC6

OXC5

OXC3OXC2

SRLG S1

R4

R3R2

R5

R1

OXC1

OXC4

Figure 5.58 Protection of an SRLG with the facility backup model.

R6

R4

R1

R10R9 R11

R2

R3

R8

R0

R7R5

B1: 30M

B2: 50M

B4: 20M

B3: 50M

BandwidthSharingCall Admission Control

Figure 5.59 Signaling backup tunnels with 0 bandwidth with the facility backup model.



In Figure 5.59, a set of backup tunnels have been computed:

. B1 (30 Mbps) and B2 (50 Mbps) protect fast-reroutable TE LSPs traversing

the path R8-R1-R2 against a node failure of R1.

. B3 (50 Mbps) protects fast-reroutable TE LSPs traversing the path R0-R1-

R2 against a node failure of R1.

. B4 (20 Mbps) protects fast-reroutable TE LSPs traversing the path

R5-R3-R4-R7 against a node failure of R6.

For the sake of simplicity, just a few backup tunnels are shown in Figure 5.59

(e.g., there are other backup tunnels: B5, from R9 to protect the fast-reroutable TE

LSPs traversing the path R9-R1-R2 against a failure of the node R1, to mention

one of them).

B1, B2, and B3 cannot share the bandwidth because they protect different TE

LSPs from the failure of the same resource (node R1 in this case). On the other

hand, under the single failure assumption, they can share bandwidth with B4

because B4 protects from the failure of a different resource (node R6). The paths

of B1, B2, B3, and B4 have been computed to ensure bandwidth protection.

Now, as far as the signaling is concerned, there are actually two options:

. Option 1: Signal backup tunnels with their bandwidth

. Option 2: Signal backup tunnels with 0 bandwidth

Let us now see each option and the respective pros and cons: First, it is worth

reiterating here that a backup tunnel LSP is just a regular TE LSP.75 In other

words, when a backup tunnel is signaled, any LSR along its path performs the same

operation as with any other TE LSP, in particular the CAC checking against the

available bandwidth on the link for the priority signaled in the RSVP Path message.

Option 1: Backup tunnels are signaled with their respective bandwidth.

Although this seems to be the most natural approach, it would require some

CAC modifications on midpoint to allow for bandwidth sharing. Indeed, back

to our previous example depicted in Figure 5.59, R3 would need to figure

out that B4 protects a different resource than B1 and B3 and so they can

share the bandwidth. Its CAC function would need to count for 80 Mbps

(max( (T1þT3),T4) ) instead of T1 þ T3 þ T4 ¼ 100 Mbps. This would not

only require some modification of the CAC module on the midpoint LSRs but

also RSVP signaling extensions so a backup TE LSP should be identified as a

backup tunnel and the resource it protects.

Option 2: This approach consists of signaling backup tunnels with 0 bandwidth,

which prevents from having to implement any RSVP signaling extensions and

CAC modifications. Of course, the fact that a backup tunnel is signaled with 0

bandwidth is completely decorrelated from the bandwidth this TE LSP gets.

Remember that the backup tunnel path has been computed to ensure that the

backup tunnel will get the required bandwidth in case of a failure. So by virtue

of the backup tunnel computation, backup tunnels will have the required

75This applies to the facility backup method of MPLS TE Fast Reroute.

PR54



bandwidth along their respective path, but they are just signaled with 0 band-

width.

The distributed backup tunnel path computation model presented here is just

one model among others. At the time of publication, other distributed models

under investigation could allow for some degree of bandwidth sharing with limited

requirements in terms of extra routing and signaling extensions.

Backup tunnels selection: At this point, a set of backup tunnels have been computed

to provide fast recovery and bandwidth protection. The next interesting question is:

How are those backup tunnels selected as primary TE LSPs requesting for Fast

Reroute and bandwidth protection are signaled?

When a new TE LSP explicitly requiring local protection and bandwidth

protection is signaled, a backup tunnel satisfying the request must be selected

by each PLR along the path. In the example of Figure 5.59, the paths of the backup

tunnels B1 and B2 have been computed to provide 30 Mbps and 50 Mbps

of bandwidth, respectively. Each time a fast reroutable TE LSP traversing the

node R8 and requesting bandwidth protection is signaled, the PLR R8 has to

select a backup tunnel (B1 or B2) satisfying the bandwidth constraint. So the

PLR has to keep track of the total amount of available bandwidth per backup

tunnel, which is equal to the backup tunnel bandwidth minus the sum of the

bandwidths of all the protected LSPs that have selected the backup tunnel.

A detailed example follows.

The algorithm in charge of the backup tunnel selection, called the packing

algorithm, is usually implementation specific.

At this point, it is worth mentioning a few issues that the packing algorithm

must resolve:

1. Constraints prioritization: When a protected TE LSP is signaled that expli-

citly requires bandwidth protection, its set of requirements must be taken

into account to proceed to the backup tunnel selection: amount of the

requested bandwidth, link versus node protection and others. Furthermore,

if several backup tunnels exist at the PLR, they may have different proper-

ties like NHOP versus NNHOP, different bandwidth available, or different

path lengths. The set of required constraints (with potentially some hier-

archy between constraints) and the backup tunnel properties must be con-

sidered by the backup selection algorithm to perform an appropriate

selection. For instance, suppose two backup tunnels: B1 is an NNHOP

backup without enough bandwidth to satisfy the bandwidth requirement

and B2 is an NHOP backup with enough bandwidth. If the protected TE

LSP requires both bandwidth and node protection, a choice must be made

about which constraint will be satisfied first.

2. Bandwidth fragmentation: Various algorithms can be designed for the

backup tunnel selection. Here is just a subset of some possible implementa-

tions for the sake of illustration:

AU55



. A1: always select the backup tunnel with the smallest available band-

width that meets the bandwidth protection requirement

. A2: ‘‘load balance’’

. A3: always select the backup tunnel with the highest bandwidth that

meets the bandwidth protection requirement

Let us illustrate the challenge of the packing algorithm (which is also known

as the ‘‘knapsack’’ problem) through an example. Consider a link R0-R1 with

a protected bandwidth of 25 Mbps (i.e., R0 requires a set of backup tunnels such

that the sum of their bandwidth is equal to 25 Mbps). Because no single backup

tunnel having a 25 Mbps capacity can be found, the backup tunnel path computa-

tion algorithm has calculated two backup tunnels B1 and B2 having a capacity of

10 Mbps and 15 Mbps (B1 and B2 follow different paths), respectively. We note

[X,Y] the remaining backup capacity (RBC) on the respective backup tunnels B1

and B2. In the example, we illustrate the resulting outputs of each packing algo-

rithm described above upon a specific sequence of events. Note that the assumption

is made in this example that all the signaled TE LSPs request for Fast Reroute and

bandwidth protection.

Time t0: a first TE LSP 1 is signaled with a bandwidth requirement of 4Mbps.

With A1: RBC¼ [6,15]


With A3: RBC¼ [10,11]

Time t1: TE LSP 2 is signaled with a bandwidth requirement of 4 Mbps.




Time t2: TE LSP (LSP 3) is signaled with a bandwidth of 12 Mbps.


With A2: FAILS, no backup tunnel can be selected

With A3: FAILS, no backup tunnel can be selected

This simple example shows that A1 proposes the best strategy in this example

to avoid bandwidth fragmentation, but unfortunately, as shown below, this does

not prevent from having to implement a bandwidth defragmentation strategy as

new TE LSPs are signaled and torn down. An interesting analogy is the

de-fragmentation of a hard disk, with a noticeable difference though: In the case

of a file that must be stored on a hard disk, if no single contiguous set of blocks

with the requested file size can be found, the operating system will store the file in

multiple noncontiguous blocks whose addresses are stored in a file allocation

table (FAT). By contrast, when the PLR has to select a backup tunnel, it must

find a single backup tunnel with enough bandwidth to satisfy the requirement of

the newly setup protected TE LSP. Using multiple backup tunnels to reroute

a single protected TE LSP is not desirable (indeed, if multiple backup tunnels are

used to reroute the same TE LSP and these backup tunnels have significantly



different propagation delays, this may lead to undesirable out-of-order packet

delivery).

Let us now continue the example with the assumption that the algorithm A1 is

chosen.

Time t3: A new TE LSP (LSP 4) is signaled with a bandwidth requirement for

the protection of 3Mbps. RBC¼ [2,0].

Time t4: LSP 1 is torn down. RBC¼ [6,0].

Time t5: LSP 3 is torn down. RBC¼ [6,12].

Time t5: A new TE LSP 5 is signaled with a protection bandwidth requirement

of 14 Mbps. The backup tunnel selection algorithm fails as no backup tunnels

with 14 Mbps can be found. This shows that a bandwidth defragmentation

procedure must be triggered at this point to satisfy the new request. The backup

assignment must become:

B1: LSP1, LSP2, LSP4

B2: no TE LSP

So RBC ¼ [3,15]. This allows accommodating the request of LSP5.

Note that the backup bandwidth defragmentation procedure may be triggered

by a timer or a backup tunnel selection failure event. In the former case, the PLR

performs a defragmentation when a timer expires, whereas in the latter case, the

defragmentation is triggered when a new bandwidth protection request cannot be

satisfied. Both approaches can also be combined.

Path computation client: PCE signaling protocol

As already mentioned, the PCE (in this case the entity in charge of the backup

tunnels path computation) can be either a central PCE (in the centralized model)

that performs the backup tunnels computation for all the protected resources in the

network or an LSR (in the distributed model). In both cases, a signaling protocol is

required such that an LSR (a Path Computation Client, or PCC) can request the

computation of a set of backup tunnels to protect its TE LSPs traversing a

particular resource in the case of failure of this resource and the PCE can provide

the set of computed backup tunnels.

Such a signaling protocol has been proposed in [PATH-COMP] and defines

some RSVP TE extensions to address this requirement. [PATH-COMP] defines a

new RSVP message type called ‘‘Path computation’’ message (a specific flag defines

whether the Path computation message is a request or a reply). Then multiple

optional objects have been specified for various purposes. Indeed, the scope of

this protocol is quite wide and its applicability is not restricted to the offload

of backup tunnel path computation. The interesting fact is that the RSVP Path

computation messages will reuse all the RSVP objects carried in the RSVP

Path message to signal a TE LSP. Those objects specify the TE LSP attributes.

For instance, the RSVP Path computation message will carry the SESSION,

SESSION-ATTRIBUTE, SENDER-TEMPLATE, to mention of few objects that

define the set of attributes for the TE LSP. Just a few additional objects are added

that characterize the request. In the particular context of Fast Reroute, a specific



object has been defined in [FACILITY-BACKUP] that allows specifying the

resource to protect, the destination of the set of backup tunnels, optional resource

classes, as well as the maximum number of backup tunnels in the set and for each of

them the minimum required bandwidth. Note that at the time of publication, such

signaling protocol has not yet been standardized; hence the reference [PATH-

COMP] is given for the sake of illustration only.

For instance, let us consider the computation of the set of NNHOP backup

tunnels between two LSRs R0 and R2. Suppose that the protected bandwidth is

50 Mbps and that no single NNHOP backup tunnel of 50 Mbps can be found.

Although 5000 TE LSPs of each 10 Kbps would give a total of 50 Mbps, this is

certainly not a desirable scenario. Indeed, a protected TE LSP can just use a single

backup tunnel at a time; furthermore, the number of backup tunnels would be

considerably high. Another scenario would be to get three backup tunnels having

the following bandwidths: 49 Mbps, 500 Kbps, and 500 Kbps. In some networks,

the last two backup tunnels would be too small to be able to protect any protected

LSP requesting bandwidth protection. This explains why being able to specify some

constraints on both the number of backup tunnels and their minimum bandwidth in

the RSVP Path computation request is desirable.

5.15.7 Backup Tunnel Path Computation with MPLS TE FastReroute One-to-One Backup

As described previously, Fast Reroute one-to-one backup requires one backup

tunnel (Detour LSP) per protected TE LSP at each hop acting as a PLR. In other

words, each LSR along the protected TE LSP path, using one-to-one backup will

have to compute a Detour LSP path originated at this node up to the egress LSR of

the protected TE LSP (destination).

This requires collecting a set of information:

1. The list of LSRs traversed by the protected TE LSP: This information is

available in the RSVP RRO object carried in the Resv message traveling in

the upstream direction (from the tail-end LSR to the head-end LSR). One

may think that the ERO object always contain the complete list of down-

stream nodes, but this is not always the case, for instance, if the ERO object

contains some loose hop(s). A loose hop address is a nondirectly connected

address. This is illustrated in the Figure 5.60.

The example depicted in Figure 5.60 shows the situation where a TE LSP path

is specified as a mix of strict and loose hops. In that example, a TE LSP T1 is set up

from the LSR R2 to R6 with the following ERO object: R3(strict)-R5(loose)-

R6(strict). Typical use of loose hops is when the head-end LSR cannot compute

the whole path of a TE LSP, as it lacks topology and resources information—for

example, when the TE LSP spans multiple routing areas. In this case, one solution is

to specify a list of strict hops in the head-end area followed by a list of loose hops

(the area border routers [ABRs]). Each ABR is then responsible for a partial route

computation, up to the next loose hop (in general another ABR connected to the

AU56

AU57



next hop routing area). Back to our example of Figure 5.60, the PLR R2, for

example, does not compute the path between the LSR R3 and R5 (R5 is specified

as a loose hop); the path between the nodes R3 and R5 is computed by R3. This

highlights an example where the PLR (R2 in this example) may not have the full list

of hops traversed by the protected TE LSP by observing the ERO object; in this

case, the information would be obtained via the RRO object carried in the RSVP

Resv message. More details on the signaling can be found in Section 5.14.

2. The list of downstream links and nodes that the PLR wants to protect. This

information is also available in the RRO object carried in the RSVP Resv

message forwarded in the upstream direction.

3. In addition, the PLR must learn the list of upstream links that the protected

TE LSP traverses. Likewise, this information is available in the RRO object.

The Detour LSP and the protected TE LSP should not share a common next

hop upstream of the failure:

. With the path-specific method, the Detour LSP must not pass through

the same links as the protected TE LSP to avoid an early LSP merging.

. With the sender template–specific method, the reason is that the Detour

LSP and the protected TE LSP would share the bandwidth, although in

case of failure they would be simultaneously active, resulting in band-

width violation.

4. The required protected bandwidth (i.e., the amount of bandwidth required for

the protected TE LSP, which will be the bandwidth of the Detour LSP). The

head-end LSR of a protected TE LSP has the ability to request a backup

R2 R3 R6R4 R5

ERO Object: R3 (strict)-R5 (loose)-R6 (strict)

Received ERO Object:R5 (loose)-R6 (strict)

Computed ERO Object:R4 (strict)-R5 (strict)-R6 (strict)

ERO Object:R5 (strict)-R6 (strict)

ERO Object:R6 (strict)

IPv4 Prefix ERO Subobject Format

TypeL Length

PrefixLength

IPv4 Address (continued)

IPv4 Address (4 Bytes)

Reserved

The L Bit is Setif the SubobjectRepresents aLoose Hop inthe ERO Object

Figure 5.60 Illustration of an ERO object specified as a mix of strict and loose hops, which shows whythe RRO object must be used in some cases to learn the TE LSP path on downstreamnodes.



tunnel with an equivalent bandwidth or a percentage of the primary TE LSP

bandwidth.76

5. The maximum number of hops the backup tunnel path can have between the

PLR and the MP if a FAST-REROUTE object is present. A value of 0

indicates that the backup tunnel protects against link failure only.

6. Finally, some link attribute filters that may be applied to the backup tunnel

path if, for instance, some links should be avoided (e.g., long propagation

delay links).

Once all that information is gathered by the PLR, it tries to find the shortest

path (running a CSPF on the remaining topology, once the protected section have

been pruned) for the backup tunnel taking into account the constraints mentioned

earlier. Note that the destination address of the backup tunnel in the context of Fast

Reroute one-to-one backup is the egress LSR (destination of the protected TE LSP).

Note that the PLR may or may not succeed in finding a path for the backup

tunnel that satisfies the set of requirements. In such a case, the PLR can start a timer

and retry when the timer expires. Furthermore, a PLR can trigger a backup tunnel

path reoptimization at regular intervals to determine whether a better path (i.e., the

shortest path) exists.

5.15.8 Summary

In Sections 5.14 and 5.15, we saw the signaling aspects of MPLS TE local protec-

tion, which include several RSVP TE extensions. This also included the Fast

Reroute mode of operation as far as the signaling aspects are concerned. Another

key aspect covered in detailed is the backup path computation for the MPLS TE

protection techniques (global and local). Although many algorithms and modes

have been proposed for the computation of the backup path, a few of them have

been presented that can both be implemented on a central server (usually referred to

as the off-line approach) or LSRs (distributed computation). The choice between

centralized and distributed backup tunnel path computations is, as usual, driven by

the tradeoff between optimality (centralized computation) and flexibility and reac-

tiveness (distributed computation). That said, the relative degree of optimality of

both approaches is really a function of the algorithms in use and the network

topology. This area is constantly evolving, and undoubtedly, new algorithms will

be designed that will allow taking into account new constraints with a increasingly

higher efficiency; however, it has been shown that the level of complexity grows

nonlinearly with the number of required constraints.

76This information can either be derived from the SESSION-ATTRIBUTE object (‘‘Bandwidth protec-

tion desired’’ bit) or the FAST-REROUTE object, if present. In the former case, if the bandwidth

protection desired bit is set, the requirement is for full protection; in other words, the bandwidth

requirement for the backup tunnel is identical to the protected TE LSP bandwidth. In the latter case,

the bandwidth field of the FAST-REROUTE object specifies the bandwidth requirement for the backup

tunnel, which may be a fraction of the protected TE LSP bandwidth. Again, See Section 5.14 for a

detailed description of the signaling aspects.



5.16 Research-Related Topics

MPLS TE recovery techniques have considerably evolved during the last few years

and is undoubtedly a mature technology, which does not mean that it has stopped

evolving! Consequently, there are several active topics of research that we can

mention:

1. Multiple failure: Throughout this chapter, we usually made the assumption

of single failure whose benefit was, for instance, to perform bandwidth

sharing between backup tunnel protecting against independent resources.

Bear in mind that an SRLG that effectively results in the failure of multiple

links is considered a single failure. There are several ongoing investigations

on the topic of multiple failures, both to study whether multiple failures

occur in existing networks and to propose various backup tunnel path

computation models in this context. For instance, one could extend the

notion of SRLG to include some probability of multiple failures of a set of

elements. Then such information could be considered to compute the

backup tunnel’s paths or multiple backup tunnels’ path, each one protecting

against a set of multiple failures.

2. ‘‘Fast Reroute’’ extensions for ‘‘point-to-multipoint LSPs’’: There are several

proposals to extend the concept of MPLS TE LSP to point-to-multipoint

LSP where packets replication would be performed in the core. Extensions

might be required to Fast Reroute to protect ‘‘point-to-multipoint’’ LSPs

against network element failure.

3. Centralized and distributed backup path computation algorithms: This has

obviously been a constant topic of research and new centralized and distrib-

uted algorithms are regularly proposed to improve their degree of efficiency

with various objectives criteria.



C H A P T E R 6

Multilayer Networks

In the previous chapters, we discussed recovery from the viewpoint of a single

network technology (such as Internet Protocol [IP], Synchronous Digital Hier-

archy [SDH], or Optical Transport Network [OTN] networks). This chapter pre-

sents recovery mechanisms and strategies for multilayer networks. It has three main

parts. The first part (Section 6.1) highlights the current evolution from static to

intelligent optical networks (IONs) based on a distributed control plane (CP). This

includes the Automatic Switched Optical Network (ASON) framework, the proto-

cols (mainly Generalized Multi-Protocol Label Switching [G-MPLS] based) cur-

rently pushed forward to implement such a distributed CP, and different CP

architectures. This information is needed later in the chapter, when dynamic sur-

vivability mechanisms in multilayer networks are discussed, for which operating a

distributed CP is required. In the second part of this chapter (Section 6.2), the

generic issues of multilayer recovery strategies are discussed. Three general categor-

ies for providing recovery in multilayer networks are described: single-layer recov-

ery schemes in multilayer networks with the important issue of in which layer of the

network to provide the recovery scheme; static multilayer recovery schemes where

recovery schemes at several network layers can be provided with an important issue

of how to make them interwork; and then the dynamic multilayer recovery strat-

egies that use dynamic logical topologies for survivability purposes. In the last part

of this chapter (Section 6.3), some concrete examples and case studies of recovery in

multilayer networks are given. These are optical restoration and MPLS Traffic

Engineering (TE) Fast Reroute (FRR); SONET-SDH protection and IP routing;

and MPLS TE Fast Reroute and IP routing.

423


6.1 ASON/G-MPLS Networks

In this section, the current evolution from static to flexible and intelligent optical

networks based on a distributed CP is highlighted. Section 6.1.1 describes the

ASON framework (kind of a meta-model) under standardization by the Inter-

national Telecommunications Union (ITU), and Section 6.1.2 discusses protocols

that are important candidates (mainly a G-MPLS–based solution steered by the

Internet Engineering Task Force [IETF]) to implement such distributed CP.

Finally, Section 6.1.3 illustrates different CP architectures for multilayer networks

assuming different integration levels of the CPs of the different network layers.

6.1.1 The ASON/ASTN Framework

In Chapters 2 and 3, SDH and OTN technologies were considered inflexible; they

provide fixed transmission links between the client network equipment. However,

the traffic pattern offered to the client networks becomes more and more dynamic,

as not only the traffic pattern is continuously changing over time, but also the

location between which traffic is routed changes continuously (traffic churn).

Therefore, being able to reshuffle the transmission capacity in a client network

becomes more critical. This requires that the transport network or optical transport

network allows setting up and tearing down connections on demand and in an

automated way by the customers. Such flexibility of course requires that more

intelligence is pushed in the optical transport network, leading to the concept of

intelligent optical networks (IONs).

In [DeM04] the advantages of ION, the drivers for and the opportunities that are

brought by IONs have been studied. For example, storage area networks (SANs) are

just one example of applications that require bandwidth on demand driving the

development of IONs. But also from a network perspective, IONs can be beneficial.

For example, as mentioned in Chapters 2 and 3, in terms of network recovery, it

becomes possible to set up and tear down connections on demand at the time of a

failure, thereby enabling restoration instead of protection in transport networks or

optical transport networks. Although restoration is able to outperform protection in

terms of capacity efficiency, it cannot meet the same recovery completion times

as protection. As Section 6.2.4 shows, IONs can not only provide restoration in an

optical transport network but also provide on-demand spare capacity for recovery

in a client network. The ability to set up and tear down connection on demand lets

one allocate capacity in the network as long as it is needed; consequently, a highly

dynamic traffic pattern can result in a significant reduction in network capacity

or thus CAPital EXpenditure (CAPEX). But not only a significant CAPEX reduc-

tion can be obtained, also the automation of setting up and tearing down connections

implies an important reduction in OPeration EXpenditure (OPEX).

Major contribution to Section 6.1 is credited to Didier Colle, INTEC, Ghent University


424 C H A P T E R 6 Multilayer Networks

Within the ITU-T, a framework for Automatic Switched Optical Networks

(ASONs) has been defined. More precisely, ITU-T G.807 specifies the requirements

for Automatic Switched Transport Networks (ASTNs) [G807], whereas ITU-T

G.8080 specifies the architecture of an ASON [G8080]. Note that the ASON/

ASTN framework is applicable to any transport network technology and thus is

not restricted only to optical transport networks. Internet drafts [Ala03] and

[Pap03] specify the requirements for the Generalized Multi-Protocal Label Switch-

ing (G-MPLS) routing and signaling protocols, respectively, to support this ASON

framework (as specified in the previously mentioned ITU-T recommendations).

Figure 6.1 illustrates the ASON framework. Important is that in this framework

a distributed control plane has been added to the management and transport planes

(TPs) already present in classic transport networks. This control plane (CP) consists

of a set of optical connection controllers (OCCs) that are connected to and control

the switches in the transport plane (TP). The Generalized Switch Management

Protocol (GSMP) is just one example to implement this connection control interface

(CCI). The OCCs are connected with each other via the network-to-network inter-

faces (NNIs), whereas the switches in the TP are connected via physical interfaces

(PIs). To route and set up or tear down the optical channels (OChs), the OCCs run a

Control Plane

NMI-A

NMI-T

Mgmt Plane

MIB

MIBMIB

MIB

MIB

TelecommunicationMgmt Network

User-NetworkInterface

Request Agent

PhysicalInterface

Switch Fabric

ConnectionControl

Interface

Optical ConnectionController

Network-NetworkInterface

Network ElementMgmt Agent

Network MgmtSystem

Customer Premise EquipmentOptical Channel

Transport Plane

Figure 6.1 The Automatic Switched Optical Network framework.


6.1 ASON/G-MPLS Networks 425

routing and a signaling protocol, respectively, over these NNIs. More precisely, an

interior-NNI (I-NNI) is an NNI between two OCCs inside the same administrative

domain (AD) at least exchanging topology or routing information, service connec-

tion messages, and optional network resources control information, whereas an

exterior-NNI (E-NNI) connects OCCs residing in distinct ADs supporting the

exchange of reachability or summarised network address information, authentica-

tion, and connection admission control (CAC) information and connection service

messages. Although a Generalized Multi-Protocal Label Switching (G-MPLS)

[Ala03], [Pap03]–based protocol suite is the option being pushed forward, it should

also be possible to adopt, for example, a Private Network-to-Network Interface–

(PNNI)–based protocol suite. The client requests the setup of an OCh, when its

Request Agent (RA) (e.g., the user-to-network interface client [UNI-C] in the

Optical Internetworking Forum [OIF] UNI 1.0 specification) sends the appropriate

message over the user-to-network interface (UNI) to an OCC. The UNI should at

least support the exchange of naming and addressing information, authentication

and connection admission control (CAC) information, and connection service

messages. It is important to note that no internal routing or topology information

is disclosed to clients or other administrative domains. Section 6.1.2 gives more

details on these protocols.

Although the control may become more and more important, the management

plane (MP) and the network management system (NMS) will not disappear com-

pletely. For example, most operators will remain interested in billing and account-

ing (typically an MP functionality). The MP is connected to the CP via the network

management interface for ASTN (NMI-A) control plane components. Similarly, it

is connected via the network management interface for transport (NMI-T) network

elements to the TP.

One of the main goals of the CP of an ASTN is to provide a switched connec-

tion (SC) service. But an ASTN should also be able to provide a (hard) permanent

connection ([H]PC) service. The NMS can choose to provision the connection by

itself via the NMI-T or to request the setup from the control plane via the NMI-A.

In the latter case, a soft permanent connection (SPC) service is provided.

The main function of the control plane is connection management and control.

The CP should be able to control and manage (switched and soft permanent)

connections that are either unidirectional or bidirectional point-to-point or unidi-

rectional point-to-multipoint connections. The CP should also be able to support

multihoming, diversity, and other services (like establishment of closed user groups).

6.1.2 Protocols for Implementing a Distributed Control Plane

The previous section mentioned that G-MPLS is the protocol suite currently

pushed forward to implement a distributed CP. As explained in Chapter 5, labels

in regular MPLS are represented as integers, in most cases, attached to an IP packet

as an additional shim header. However, for example, a wavelength channel (color)

can be interpreted as a label, too. Therefore, the concept of Generalized MPLS

(G-MPLS) allows a label to be represented as an integer, a time slot in a Time



Division Multiplexing (TDM) frame, a wavelength or waveband on a fiber, a fiber

in a cable, and so on. Thus, the idea is to reuse the same protocol suite, using

standard MPLS to set up and tear down label switched paths (LSPs), to be able to

control switched instead of permanent connections through transport networks or

optical transport networks. Figure 6.2 illustrates this principle of generalizing the

MPLS paradigm.

As explained in Chapter 5, the Resource reSerVation Protocol with Traffic

Engineering extensions (RSVP-TE) [RFC2205] has been adopted as the signaling

protocol to set up and tear down MPLS traffic engineering LSPs; the required

extensions for TE LSP are specified in [RFC3209]. [RFC3471] specifies the signaling

protocol extensions required specifically for supporting G-MPLS. [RFC3473] maps

these requirements to the necessary RSVP-TE protocol extensions. Of course, the

generalized label request and label objects are critical extensions to the RSVP-TE

protocol to enable G-MPLS support. The generalized label request consists of an

LSP encoding type (e.g., packet versus digital wrapper versus lambda LSPs), a

switching type (e.g., switch the LSP as TDM circuit or as complete wavelength

channel), and a generalized payload identifier (G-PID) (e.g., packet-over-SONET

[PoS]). The encoding of the generalized label object depends on the link on which

the label is used. In addition to the generalized label request and generalized label

objects, the signaling protocol extensions provide support for suggesting a label to

5

7A

B

C

D

IP Payload

IP Header

MPLS Label

IN IF IN LABEL OUT IF OUT LABELA 2 D 3B 5 C 7B 9 D 7

λIN

λOUT

λ IN -->λOUT

A

B

C

D

Figure 6.2 The Generalized Multi-Protocol Label Switching concept.



be chosen by a downstream node, for restricting the label set from which a

downstream node can choose, and for setting up bidirectional LSPs.

In Chapter 4, the principle of link state routing protocols has been covered in

detail. Figure 6.3 summarizes this for a G-MPLS–capable network. Thanks to the

neighbor discovery process, each node in the network learns about all neighbors.

Moreover, a flooding procedure allows each node to flood this information

throughout the network periodically. Each node processes all incoming link state

packets and stores the received status information of each link in its link state

database. By having such a link state database in each network node, each network

node shares a common network topology and resources view and thus can compute

the constraint shortest path along which an LSP has to be set up. To be able to

distinguish between different link types, the link state contains a field indicating the

multiplexing/switching capability of the advertised link. The third column in the

link state database illustrates this; this column states that all links are optical links

(thus, lambda switch capable [LSC]) except link CD that is a link at the IP level

(thus, packet switch capable [PSC]). Consequently, the lightpath is set up between

C and E via B and D.

An important aspect of G-MPLS is that a generalized label can represent

different types of LSPs. For example, thinking about an IP-over-WDM network,

an LSP can be either a lightpath (labels are wavelengths) or a regular LSP in the IP

layer (labels are integers possibly carried in a shim header). The question is now

how to route the latter LSPs over the set of former LSPs (lightpaths) that form a

logical network topology. For this purpose, from the moment an LSP (which has to

function as a logical link) has been set up, it will be advertised by means of a link

B C

D

A

E

CSPF Routing of LightpathBetween C and E

Link-State Database

[AB,BD,BC]

[BD,CD,DE]

[AE,DE]

[AB,AE]

IncomingLink StatePackets

Link Cost Switch Cap.

A B C

E D

AB

AE

BD

BC

CD

DE

CE

1

1

1

1

1

1

1

LSC

LSC

LSC

LSC

PSC

LSC

PSC

[AE

,DE

,CE

]

IP/Optical NE

LSC linkPSC link

Knowledge ofNetwork Topology

Figure 6.3 Principle of link state routing in Generalized Multi-Protocol Label Switching networks.



state packet to all other nodes in the network. In Figure 6.3, the lightpath being set

up between nodes C and E creates a logical link at the IP level between nodes C and

E. Thus an integrated overview of a multilayer network can be obtained [Kom02].

There is no need to bring up a routing adjacency (e.g., launching the discovery

protocol) between both endpoints of such a logical TE link (actually being an LSP

in the underlying network layer or sublayer) after it has been advertised; therefore,

such an adjacency is also called a forwarding adjacency (FA). Not only the creation

of FAs results in TE links over which no routing adjacency is brought up, but in G-

MPLS, some technologies simply do not allow transporting the control information

in-band (thus, requiring out-of-band control channels), and sometimes a bundle of

links is advertised as a single TE link to improve scalability. These and other issues

are still under standardization [Kom03].

In Chapter 3, Section 3.6.3, we discussed that the introduction of an IP-based

optical control plane will make restoration a realistic recovery option for the optical

transport network. Figure 6.4 illustrates how restoration can be provided in

G-MPLS networks. This example assumes that after the lightpath between C and

E has been established in Figure 6.3, link BD fails affecting this lightpath. We assume

that the ingress node C is notified that the lightpath has been affected by a failure and

that node C acts as recovery head-end (RHE). Depending on the information the

RHE receives, it may already recalculate a new CSPF route before the receipt of

updated link state packets or it may wait a while to receive those updated link state

packets. Once it has an updated overview of the network topology and resources, the

RHE can recalculate a new CSPF route for the lightpath (in this example via node

A instead of via node D) and reestablish the lightpath along this new route.

B C

D

A

E

Knowledge ofNetwork Topology

Recalculation CSPF Routing ofLightpath Between C and E

Link-State Database

[AB,BD,BC]

[BD,CD,DE]

[AE,DE,CE]

[AB,AE]

IncomingLink StatePackets

A B C

E D

Link

AB

AE

BD

BC

CD

DE

CE

Cost

1

1

1

1

1

1

1

Switch Cap.

LSC

LSC

LSC

LSC

PSC

LSC

PSC

[AE

,DE

,CE

]

IP/Optical NE

LSC linkPSC link

Figure 6.4 Illustration of restoration in Generalized Multi-Protocol Label Switching networks.



As elaborated in Chapter 3, Section 3.6.3, there are many ways to implement

restoration in a network; the above example illustrates only one possibility. For

example, instead of node C acting as an RHE (and thus needs to be notified of the

failure), node B could also act as an RHE. Assuming only single link or node

failures, node B should be capable to recalculate a new alternative path from itself

to the destination node E for the affected lightpath (in this case the alternative path

goes from node B via node A to node E) from the moment that node B detects a

failure. Because node B detects the failure itself and already has an overview of the

overall network status in its link state database, it should not wait for any updated

link state packet or any other failure indication signal before it can start calculating

the alternative path. This should result in a significant improvement of the restor-

ation time. We call this technique fast topology-driven constraint-based rerouting

(FTCR) [Vhe00], [ColPNC011] because it relies on the topology information in the

link state database of the RHE (topology-driven part) and requires explicit routing

(constraint-based part) because the RHE may already start the signaling process to

set up the affected lightpath along the alternative route before the link state

databases in the intermediate nodes along this alternative path are updated.

Table 6.1 summarizes all possibilities mentioned in Chapter 3, Section 3.6.3, to

implement network recovery. The main difference between protection and restor-

ation is that with the cross-connection on the backup route takes place before the

occurrence of the failure, whereas with restoration the OXCs on the backup route

are only cross-connected after the failure. The calculation of the restoration backup

route can be preplanned or dynamic, just as the wavelength assignment on the

backup route.

As the control plane becomes more and more important in transport networks,

it is obvious that it will get logically separated from the transport or data plane

instead of being a simple add-on or feature of the transport plane. In other words,

the development and upgrade tracks of the transport plane and the control plane

functionality get decoupled from each other. Even more, it should be possible that

the control plane functionality be supplied by a vendor other than the one supplying

the hardware for the data or transport plane: for example, it is not unimaginable

that developing a full-fledged control plane grows beyond the expertise and

capabilities of a hardware supplier.

Table 6.1 Comparison of Protection and Restoration

Backup Route

Calculation

Wavelength Assignment

on Backup Route

Cross-Connection

on Backup Route

Restoration Preplanned Preplanned After failure

Preplanned Dynamic After failure

Dynamic Dynamic After failure

Protection Preplanned Preplanned Before failure



Of course, the mission of the control plane remains controlling the equipment

in the transport or data plane; thus, logically separating the control plane from the

transport or data plane requires a standardized interface between both planes. This

interface is called the Connection Control Interface (CCI) in the ASON framework,

as depicted in Figure 6.1. The General Switch Management Protocol (GSMP)

[RFC3292] is ‘‘a general purpose protocol to control a label switch’’ that allows

separating the switch control from the switch forwarding elements; at the time of

writing, the necessary extensions for supporting generalized labels are under stan-

dardization [Cho03]. In short, GSMP allows a controller to establish and release

connections across the switch, add and delete leaves on a multicast connection,

manage switch ports, request configuration information, request and delete reser-

vation of switch resources, request statistics, and get informed of asynchronous

events such as a link going down. Connection management/control is one of the

most important functions supported by GSMP.

GSMP is in nature a master-slave protocol, meaning that the switch controller

acts as master, launching requests that have to be performed, and the switch

itself acts as slave, responding/acknowledging these requests after taking the neces-

sary actions. The only exception to this is the slave (thus, the switch) informing

the master (thus, the switch controller) of an asynchronous event. A Transaction

ID (TID) carried in the protocol messages allows correlating responses with the

corresponding requests. Response messages will acknowledge the success of

the requested operation or the failure of the requested action; therefore, the

slave can only generate a response message after it has performed the requested

operation.

As depicted in Figure 6.1, the Request Agent of a client requests via the

User Network Interface (UNI) the setup and tear down of switched connections

through the ASON network. The Optical Interworking Forum (OIF) has already

standardized a first version of the UNI protocol and is continuing to develop this

protocol.

First, a distinction has to be made between the UNI-client (UNI-C) and the

UNI-network (UNI-N) side of the UNI. The UNI-C has its own address. However,

connection establishment requests should use UNI-N addresses. To allow one UNI-

C to translate the UNI-C address of the destination into its UNI-N address, the

UNI supports an address resolution service. Not only address resolution queries

need to be supported, but also address registration messages. In addition to the

address resolution service, the UNI also runs a service discovery procedure (SDP),

which negotiates whether LDP or RSVP will be supported as a signaling protocol,

whether address resolution is supported, which framing (e.g., SONET or SDH) and

port level granularity is used, and what transparency is supported.

Even before any service discovery or address registration can take place, the

Neighbor Discovery Procedure (NDP) and control channel establishment should

be finished. Indeed, a control channel is required through the UNI, to serve the

exchange of signaling messages between the UNI-C and UNI-N entities.

The control channel is an IP-based control channel (IPCC). There are three possi-

bilities to support the IPCCs (Figure 6.5). In-fiber/in-band establishes the IPCC



over the data communication channel (DCC)77 of the data wavelengths. In the

in-fiber/out-of-band mode, the IPCC occupies a separate wavelength or TDM

channel on the UNI link. Finally, in the out-of-fiber/out-of-band configuration,

the IPCC may be routed over an IP transport network. Once the IPCC has been

established, the NDP, SDP, and address registration client can start using the UNI

for requesting the setup and tear down of lightpaths through the network.

Note also that the UNI 1.0 specification considers three reference configur-

ations, as illustrated in Figure 6.6. The client to optical network element (ONE)

direct service invocation configuration (bottom of Figure 6.6) is the most intuitive

one; both the client device and the ONE terminate the UNI data and control

channels. The client to network agent direct service invocation configuration (left

side of Figure 6.6) is very similar to the previous one, except that the UNI-N

signaling functionality is shifted from the ONE to a central network agent. This

network agent configures the ONE via signaling over an internal signal interface

(ISI). The last reference configuration is called the client agent to network agent

third-party service invocation (right side of Figure 6.6). In this configuration not

only the UNI-N signaling functionality is shifted to a separate network agent, but

also the UNI-C signaling functionality is shifted to a separate network agent, called

the UNI-C proxy. The network agents drive the configuration of both the ONE

and the client device through an ISI. The use of such signaling network agents may

ease the migration from a traditional OTN toward an ASON (e.g., the Network

Management System [NMS] may serve as such a network agent).

6.1.3 Overview of Control Plane Architectures (Overlay, Peer,Augmented)

Section 6.1.1 explains that the ASON framework assumes that clients request via

the UNI to set up/tear down connections through the ASON network; considering

an IP-over-OTN network, this would mean that the IP network acts as a client from

77The DCC is an auxiliary communication channel that is provided by some dedicated overhead bytes in

the framing overhead of the wavelength channel.

In-Fiber/In-Band(IPCC in DCC Bytes)

In-Fiber/Out-of-Band(IPCC in Separate (Sub)Channel)

Out-of-Fiber/Out-of-Band

Figure 6.5 Support of the IP-based control channel. (‘‘User network interface [UNI] 1.0 SignalingSpecification,’’ Optical Internetworking Forum/User Network Interface Specifications[OIF2000.125.5], June 2001. Available at www.oiforum.com. Accessed May 2004.)



the OTN server network layer. However, Section 6.1.2 shows that the G-MPLS

protocol suite is able to obtain an integrated view of a multilayer network and thus

can control this network as a single network. Section 6.1.2 also shows that the

control plane will get logically separated from the transport or data plane (instead

of being a simple add-on or feature of the transport plane), as the control plane

becomes more and more important in transport networks. For example, this would

allow a vendor other than the one supplying the hardware for the data or transport

plane to supply the control plane functionality.

This section aims to describe different control plane architectures assuming

different integration levels of control planes of the client and server network layers.

Although the discussion is generally applicable, it focuses on an IP-over-OTN

network scenario.

The first control plane model is the overlay model: As shown in Figure 6.7 both

layers run their own control plane in this model. The IP layer acts as client layer or

user layer, and the OTN acts as server layer; therefore, a User Network Interface

(UNI) between both layers allows the client IP layer to request capacity (i.e.,

lightpaths) from the server OTN network. Both control planes are completely

independent from each other. In other words, the client layer’s routing (IP routing

like OSPF) and possibly MPLS signaling is independent from the optical layer

control plane signaling (and routing). Of course, both control planes can be instan-

tiated from the same control plane type (e.g., G-MPLS). However, the indepen-

dence also allows the OTN to run other ASON-compliant protocols.

UNI Data

UNI Data

UNI-C

UNI-C

UNI-N

UNI-N

UNI Control

UNI Control

UNI Control UNI-C Proxy

ISI

ISI

Figure 6.6 User-to-network interface (UNI) reference configurations (bottom, client to ONE directservice invocation; left, client to network agent direct service invocation; right, clientagent to network agent third-party service invocation).



For coupling both layers, an appropriate protocol (or common set of assump-

tions) between both layers is provided, to allow communication between the control

planes of both layers. This protocol provides, for example, address resolution

between the layers and/or initiates the connection request/release.

One of the drawbacks of the overlay model is the duplication of control

functionality (e.g., two separate routing protocols are running in the two layers).

Another disadvantage is the scalability problem; for each established lightpath a

corresponding IP routing adjacency has to be established. This was also a problem

in classic IP-over-ATM because of the increased amount of state and information in

the routing databases. MPLS was able to overcome these issues by using TE

shortcuts, as opposed to establishing adjacencies across each distinct LSP. A final

drawback of the overlay model is that there is a clear client-server relationship; for

example, address resolution is required because of separate address spaces (as

explained in the following discussion, this issue is solved in the augmented model).

On the other hand, the advantage of the separation of the two control plane

instances in the overlay model is that any confidential information from the trans-

port network is not disclosed to any client network (operator). It seems also that the

overlay model is most suited for the interconnection with legacy SDH networks.

A second control plane model is the peer model, shown in Figure 6.8. In this

model a single control plane controls both the IP and the OTN layer. The result is

that IP router forwarding engines and OXC switch fabrics are logically integrated

from a control plane viewpoint. Because the current MPLS control protocols would

only require minor modifications to become G-MPLS compliant, it would be an

interesting scenario to have the LSRs take over the control of the optical cross-

connects. By having a standardized control interface (e.g., GSMP) such a scenario

Figure 6.7 The overlay model. (D. Colle, et al. ‘‘Developing control plane models for opticalnetworks,’’ Technical Digest, 2002 Optical Fiber Communication Conference[OFC2002], Anaheim, CA, March 17–22, 2002, pp. 757–759.)



should not be that unrealistic; such a standardized interface is interesting, particu-

larly in case a single vendor does not supply both equipment types.

So-called IP/OTN control channels are realized over the physical links between

these logical IP/OTN entities. In case of G-MPLS, lightpaths are treated as optical

LSPs and thus do not result in a new peering session between their endpoints (i.e.,

no control channel is established over the lightpaths). Note, however, that this does

not prevent the lightpaths from being advertised into the routing protocol as direct

TE links (or Forwarding Adjacencies [FAs]) between their endpoints.

The peer model has the following advantages. First, duplication of functional-

ity is avoided. Second, the disadvantages of the client-server relationship

between IP and OTN (e.g., problems with address resolution) no longer exist.

Although no additional peering session is required per established lightpath

(which may solve some scalability problems; e.g., no processing of hello messages

for each lightpath), the lightpath has to be advertised to the network as a logical

link.

Clear drawbacks of the peer model are the following. The peer model

is not applicable to all imaginable business models. For example, in an IP-

over-OTN network scenario, the transport or optical transport network operator

may not accept that the ISP (e.g., a competitor) takes over the control of the OTN

(or vice versa). The peer model is also limited to a single domain or autonomous

system.

The third control plane model is the augmented model, illustrated in Figure 6.9.

This model is a compromise between both the overlay and the peer model. It is quite

Data-Plane

Control-Plane

IP/OXCController

ControlChannel

CustomerPremise

Equipment

IP-RouterForwarding Engine

OXC Switch FabricIntegrated IP/OTN Box

Physical(= Fiber)Topology

Figure 6.8 The peer model. (D. Colle, et al. ‘‘Developing control plane models for optical networks,’’Technical Digest, 2002 Optical Fiber Communication Conference [OFC2002], Anaheim,CA, March 17–22, 2002, pp. 757–759.)



similar to the overlay model, in the sense that both layers may have their own

control plane instance. However, some control information like reachability infor-

mation may leak through the interface between both layers. Rephrased more

practically, [Raj00] states in what they call the ‘‘interdomain interconnection

model’’ that the client–layer reachability information is carried through the OTN,

but OTN addresses are not propagated to the client network.

The principle of leaking client-layer reachability information from one side of

the network to the other is similar to the principle of MPLS/BGP VPNs [RFC2547]

and is illustrated in Figure 6.10. Consider that IP router rA is attached via port opA

to the OXC oxA and that IP router rB is attached via port opB to the OXC oxB.

When router rB and OXC oxB run an E-BGP session over the UNI, then OXC oxB

learns the address from rB. More precisely, OXC oxB knows then that rB can be

reached via its port opB. It advertises this relation via an I-BGP session to OXC

oxA. OXC oxA forwards this BGP route over an E-BGP session to router rA, after

removing any optical address from the route. In other words, router rA can easily

learn the address from router rB, while the address resolution is kept inside the

transport network. From this moment, router rA can simply ask the OXC oxA to

establish a lightpath to router rB. It is the responsibility of that OXC oxA

to translate the address rB in the connect request to the appropriate optical port

address.

Finally, note that although everyone agrees that the augmented model is

situated somewhere in between the two extreme overlay and peer models, there is

not yet a clear understanding or definition of this augmented model. Nevertheless, it

is clear that the augmented model tries to find a compromise between the advan-

tages and disadvantages of both extremes.

OTN

IPNNI

NNI

IP-RouterController

IP-RouterForwarding Engine

Logical (= Lightpath)Topology

OXC Controller

OXC Switch-Fabric

Physical (= fiber)Topology

OTN Control Channel

Enhanced UNI

IP Control Channel

Figure 6.9 The augmented model.



6.2 GenericMultilayer Recovery Approaches

In the previous chapters, survivability and recovery mechanisms have been dis-

cussed from the viewpoint of a single network technology, and thus within one

network layer (e.g., IP routing in the IP layer, or 1þ1 optical protection in the OTN

layer). As shown, these schemes can effectively handle a large number of failure

scenarios. The integration of different network technologies, for example, IP and

OTN, into (realistic) multilayer transport networks (see also Chapter 1) leads to

new opportunities and new challenges concerning the survivability of such multi-

layer networks. Those opportunities lie in the fact that in such networks recovery

techniques from the different network layers can cooperate to recover more effi-

ciently or faster from a network failure. This also brings new challenges and

difficulties to the coordination of those mechanisms in the different layers. It is

the intention of this section to give a generic description of the survivability in

multilayer networks; some more concrete and specific case studies of multilayer

recovery mechanisms are discussed in Section 6.3.

This section starts with illustrating why attention should be paid to multilayer

recovery (see Section 6.2.1). Then three generic categories for providing recovery in

multilayer networks are discussed: single-layer recovery schemes in multilayer

networks (see Section 6.2.2) with the important issue in which layer of the network

to provide the recovery scheme; static multilayer recovery schemes (see Section

6.2.3) where recovery schemes at several network layers can be provided with an

important issue of how to make them interwork; and then the dynamic multilayer

recovery strategies (see Section 6.2.4) that use dynamic logical topologies for

UNI UNI

HELLO (I’m rB)rB sits on portopB on oxBI can reach rB

Connect rB

Connect opB on oxB

OK, I can ask thetransport networkto connect me to

router rB!

Router rA Router rBOXC oxA OXC oxB

Optical PortAddres opA

Optical PortAddres opB

Figure 6.10 Illustration of how the ASON carries the client-layer reachability information from oneside of the network to the other.

Major contribution to Section 6.2 is credited to Ilse Lievens, INTEC, Ghent University.


6.2 Generic Multilayer Recovery Approaches 437

survivability purposes. A summary of the multilayer recovery strategies is given in

Section 6.2.5.

6.2.1 Why Multilayer Recovery?

A multilayer (or multitechnology) transport network can be viewed as consisting of

a stack of single-layer (or single-technology) networks. Between the adjacent layers

of this stack typically a client-server relationship exists. If we consider an IP-over-

OTN multilayer network, the IP network layer is the client layer of the underlying

OTN network layer, whereas the OTN layer acts as a server layer to the IP layer,

providing, for instance, transport functionality to the higher client layer.

Because each of these network layers has its own single-layer recovery schemes,

one may be wondering why it is not sufficient to simply deploy a recovery scheme

in only one layer of the multilayer network. One could think of the situation in

which IP routing is deployed in the IP client layer and could be used against the

failure of an IP router or of an IP interface card, or 1þ1 optical protection is

deployed in the OTN server layer to be used against the failure of an OXC or an

optical fiber cut.

Unfortunately not every failure in a particular network layer can be handled by

the recovery mechanism in that same network layer. Consider, for example, Figure

6.11, in which an IP-over-OTN multilayer network is depicted showing the failure

of OXC B. This failure is detected in the optical network and a recovery action may

be initiated in the OTN layer. However, the OTN recovery action cannot recover

the traffic along the working path (which goes from IP router a to IP router d),

IP layer

OTN layer

a

b

c

d

A

B

C

D

E

Working Path Recovery Path

Figure 6.11 Why multilayer recovery?



because from the OTN layer point of view, this traffic is nothing more than two

separate connections A-B and B-D, which are both unrecoverable in the OTN layer

(as those connections terminate in the failing OXC). From the IP point of view, a

number of secondary failures (links a-b, b-c and b-d) are detected, isolating the

router b. Upon detection of these faults, the IP network layer could also initiate

recovery actions, eventually leading to the recovery path indicated in Figure 6.11,

where the traffic from router a to router d traversing router b will be rerouted by the

IP layer via other paths (e.g., via the path a-c-d). Another example leading to the

same situation is the failure of router b that can only be recovered in the IP layer (of

course in both cases, the traffic destined for router b cannot be recovered).

This example illustrates that it is not that simple to decide straightforwardly in

which layer of the multilayer network to provide and deploy a recovery scheme; it

might even be beneficial to deploy recovery schemes in multiple layers of the network.

As will be shown in the following sections, it is important to be able to combine

recovery schemes in more than one layer to benefit from the advantages of each layer.

It should be noted that multilayer recovery strategies should only be applied if indeed

they are more beneficial than single-layer recovery. Implementing a multilayer

recovery strategy does not mean that all the recovery mechanisms will be used at

every layer, as is shown in Section 6.3 where some case studies are discussed.

Let us illustrate the complexity of the trade-off that must be made when

deciding in which layers to provide recovery schemes, with the following reflections:

. Recovery at higher network layers is desired, because lower layers will not

notice failures of higher layer equipment.

. Recovery at higher layers is desired because higher layer equipment can

become isolatedbecauseof a failure in the lowernetwork layer (e.g., the failure

of an OXC in the optical network layer). Only a recovery scheme in the higher

layer is then able to recover the traffic that transits this isolated equipment.

. Recovery at lower layers is desired, because native traffic that is injected in

lower layers cannot be recovered by higher layer recovery strategies.

. Implementing multilayer recovery is typically more complex to monitor,

operate, and design than single-layer recovery.

When discussing multilayer networks and their survivability, two crucial questions

should be answered:

. In which layer or layers should recovery schemes be provided?

. If multiple layers are chosen for this purpose, then how is the survivability in

these layers coordinated?

6.2.2 Single-Layer Recovery Schemes in Multilayer Networks

This section discusses the provisioning of recovery functionality in multilayer

networks by starting from the single-layer recovery schemes. The concepts and

discussions focus on a two-layer network but are generic and can thus be applied

to any multilayer network. We look at how a recovery scheme in one network layer



can be deployed to provide survivability in the multilayer network. This basically

comes down to answering the question in which network layer a recovery scheme

can be deployed, and what are the consequences of such a decision.

Survivability at the Bottom Layer

In this recovery approach, denoted survivability at the bottom layer, recovery of a

failure is done at the bottom layer of the multilayer network. In an IP-over-OTN

network, for example, this implies that the 1þ1 optical protection scheme or any

other recovery scheme that is deployed in the OTN layer attempts to restore the

affected traffic in case a failure occurs.

By recovering a failure at the bottom layer, this strategy has the benefit that

only a simple root failure has to be treated and that the number of required recovery

actions is minimal (the recovery actions are performed on the coarsest granularity).

In addition, failures do not need to propagate through multiple layers before they

trigger any recovery action.

However, one of the major drawbacks of this recovery strategy involves its

inability of coping with problems or failures that occur in a higher network layer,

above the bottom layer in which the recovery scheme is deployed. In addition, there

are also situations in which the recovery process in the bottom layer is not able to

restore all traffic, whereas a higher layer recovery mechanism would be able to. For

example, if we consider an IP-over-OTN network, in which a node failure occurs in

the optical layer (being an OXC failure), the optical network layer recovery mecha-

nism will only be able to restore the affected traffic that transits the failed bottom-

layer node (being the OXC). The co-located higher-layer node (an IP router in this

case) will become isolated because of the failure of the OXC underneath, and thus, all

traffic that transits this IP router cannot be restored in the lower optical layer.

An example is given in Figure 6.12, with a two-layer network. The considered

network carries two traffic flows between client-layer nodes a and c. One traffic flow

(a-d-c, indicated with a full line in the left part of the figure) transits the client-layer

node d (using two logical links a-d and d-c), and the other traffic flow (a-c, indicated

with a thin solid line in the left part of the figure) uses a direct logical link from a to

c and only transits the server-layer node D. A failure occurs in the bottom layer, for

example, the failure of node D. The left part of the figure illustrates that the server

layer cannot recover the first traffic flow a-d-c. This is due to the fact that the client-

layer node d is isolated because of the failure of D, which is terminating both logical

links a-d and d-c. This implies that the client layer has to recover this flow, which is

shown at the right part of Figure 6.12 (client-layer recovery path a-b-c, using two

logical links a-b and b-c). However, the second traffic flow a-c is routed over a direct

logical link between node a and c. This logical link transits the failing node D in the

bottom layer, which means that this traffic flow can be restored by the bottom-layer

recovery scheme, as shown in the left part of the figure with the dotted line.

Survivability at the Top Layer

Another strategy for providing survivability in a multilayered network is to provide

the survivability at the top layer of the network. The advantage of this strategy is



A

D

B

C

E

Client Layer

Server Layer

ad

b

c

A

D

B

C

E

Client Layer

Server Layer

a

d

b

c

Client Layer Primary Path 1 Client Layer Primary Path 2Client Layer Recovery Path 2Client Layer Recovery Path 1

Logical Links Terminated in a Failing Node Cannot Be Recovered

Transit Traffic in Isolated Client Node NeedsRecovery in the Client Layer

Recovered byServer Layer

Recovery

Not Recoveredby Server

Layer Recovery

IsolatedClient Node

IsolatedClient Node

Recovered byClient Layer Recovery

Figure 6.12 Survivability at the bottom layer: Illustration of the impact of a node failure on two traffic flows between the client-layernodes a and c.

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

3:2

3pm

page

441

6.2

Gen

ericM

ultila

yer

Reco

very

Appro

ach

es441

that it can cope more easily with higher layer failures or node failures, also

illustrated in Figure 6.12. A major drawback of this strategy is that it typically

requires a lot of recovery actions, because of the finer granularity of the flow entities

in the top layer. However, by treating each individual flow at the top layer, this

allows differentiating between these flows based on their (service) importance. In

other words, the top layer may choose to recover the critical, high-priority traffic

before any recovery action is taken for the low-priority flows. Such a service

differentiation among traffic flows based on the order of the recovery action is

not possible in lower layers, because the lower layers switch every flow in an

aggregated signal with one single recovery action. Indeed the level of granularity

of recovery is a lambda at the optical layer, a VC at the SONET-SDH layer, a class

of service at the IP layer, and a traffic engineering LSP at the MPLS TE layer.

Under certain conditions, this finer granularity may also lead to more efficient

capacity usage. First, aggregated signals that are poorly filled with working traffic

have enough capacity left to transport spare resources. Second, the finer granularity

allows distributing flows over more alternative paths. However, when comparing

this survivability at the top-layer strategy with the survivability at the bottom-layer

strategy, a trade-off exists between a better filling of the capacity of the logical links

on one hand and the potential larger amount of higher layer equipment required on

the other hand.

Not only the potential difference in granularity between the failing equipment

in a lower network layer and the corresponding affected entities in the top layer

(thus, requiring more recovery actions) is an issue. Also the typically complex

secondary failure scenarios (in the top network layer), as a consequence of a single

root failure in a lower layer, can become a problem. This is illustrated in Figure

6.13, where the failure of an optical link in the bottom layer corresponds with the

simultaneous failure of three logical IP links in the top layer (see also Chapter 5,

Section 5.1.2, in which Shared Risk Link Groups (SRLGs) are discussed in this

context). This implies that the recovery scheme in the top layer will have to recover

from three simultaneous link failures, which is quite complex. If there had been a

recovery scheme at the bottom layer, however (see the section Survivability at the

Bottom Layer), this recovery scheme would only have to cope with the more simple

failure scenario of one link failure.

Slightly Different Variants: Survivability at the Lowest Detecting Layerand Survivability at the Highest Possible Layer

A slightly different variant on the strategy that applies survivability at the bottom

layer is the survivability at the lowest detecting layer strategy. The lowest detecting

layer is the lowest layer in the layered network hierarchy that is able to detect

the failure. This implies that multiple layers in the network will deploy a recovery

scheme, but that the (single) layer that detects the root failure is still the only

layer that takes any recovery actions. With this kind of strategy, the problem

that the bottom-layer recovery scheme does not detect a higher layer failure is

avoided because the higher layer that detects the failure will recover the affected



traffic. However, although this survivability at the lowest detecting layer strategy

can ensure that traffic transiting the failing equipment in the detecting layer is

restored, it still suffers from the fact that it cannot restore any traffic transiting

higher layer equipment isolated by a node failure in the detecting layer. With this

survivability at the lowest detecting layer strategy, the client layer in the example of

Figure 6.12 will deploy a recovery scheme, but the considered traffic flow a-d-c is

still lost, because this client-layer recovery scheme is not triggered by the occurrence

of the node failure in the server layer. So, although this strategy considers the

deployment of recovery schemes in multiple layers, it is still considered a single-

layer survivability strategy in a multilayer network, because for each failure sce-

nario the responsibility to recover all traffic is situated in only one layer (i.e., the one

detecting the failure).

A slightly different variant of the strategy that provides survivability at the top

layer is the survivability at the highest possible layer strategy. Because not all traffic

has to be injected (by the customer) at the top layer, with this strategy a traffic flow

is recovered in the layer in which it is injected, or in other words the highest possible

layer for this traffic flow. This means that this highest possible layer is to be

determined on a per traffic flow basis. For example, a data-centric optical network

(IP-over-OTN network) may also support a leased optical channel service. This

survivability at the highest possible layer strategy is also considered a single-layer

survivability strategy for providing survivability in a multilayer network, even

though it deploys a recovery scheme in multiple layers. Indeed, survivability at

the highest possible layer may lead to recovery schemes in multiple layers, but

these will never recover the same traffic flow. Actually, this strategy deploys the

Client Layer

Server Layer

Figure 6.13 Survivability at the top layer: A single root failure may propagate to many so-calledsecondary failures.



survivability at the top layer strategy for each traffic flow individually (which

implies that in essence, both strategies do not differ from each other).

6.2.3 Static Multilayer Recovery Schemes

In the previous section, some strategies are discussed that apply a single-layer

recovery mechanism (meaning that recovery is strictly limited to one layer of the

network when coping with network failures) to provide survivability in the multi-

layer network. As shown there, both the strategies with survivability at the bottom

or lowest detecting layer and at the top or highest possible layer have their advan-

tages and drawbacks. The advantages of these approaches can be combined, which

implies that recovery mechanisms will run in different layers of the network as a

reaction to the occurrence of one network failure. More generally speaking, the

choice in which layer to recover the affected traffic due to a failure will depend on

the circumstances, such as which failure scenario occurred.

However, this requires some rules or coordination actions to ensure efficient

interworking and coordination between the network layers that are involved in the

recovery process (e.g., the discussion in Chapter 2, Section 2.3.3, on SDH net-

works). This interworking is a so-called escalation strategy and strictly defines how

layers and the recovery mechanisms within those layers react to different failure

scenarios. This section discusses several existing escalation strategies or approaches:

uncoordinated, sequential and integrated. Then the issue of how and where to

provide the spare capacity in multilayer networks is discussed, followed by the

issues of network stability, network operation complexity, and revertive operation

for multilayer recovery. The section ends with a qualitative comparison of the

discussed survivability strategies.

Uncoordinated Approach

The easiest way of providing an escalation strategy is to simply deploy recovery

schemes in the multiple layers without any coordination. This will result in parallel

recovery actions at distinct layers. Consider again the two-layer network of Figure

6.14, with, for instance, the failure of the physical link A-D in the server layer. This

failure of the physical link will also affect the corresponding logical link a-d in the

client layer and the considered traffic flow a-c. Because the recovery actions in both

layers are not coordinated, both recovery strategies in the client and the server layer

will attempt recovery of the affected traffic. This implies that in the client layer the

traffic flow a-c is rerouted by the recovery mechanism of the client layer (e.g., IP

routing in an IP-over-OTN network), resulting in a replacement of the failed path

a-d-c by, for instance, a new path a-b-c. At the same time, the server layer recovers

the logical link a-d of the client-layer topology by rerouting all traffic on the failing

link A-D through node E. In this example, recovery actions in a single layer would

have been sufficient to restore the affected traffic.

The main advantage of the uncoordinated approach is that this solution is

simple and straightforward from an implementation and operational point of view



(e.g., no standardization of coordination signals between both layers is required).

However, Figure 6.14 also shows the drawbacks of this strategy. Both recovery

mechanisms occupy spare resources during the failure (the server layer along A-E-D

and the client layer along a-b-c, implying the occupation of spare resources on A-B

and B-C in the server layer), although one recovery scheme occupying spare

resources would be sufficient. This implies that potentially more extra traffic

(being unprotected preemptable traffic) is squelched or disrupted than necessary.

The situation can be even worse: Consider, for example, that the server layer

reroutes the logical link a-d over the path A-B-C-D instead of A-E-D, then both

recovery mechanisms need spare capacity on the links A-B and B-C. If these higher

layer spare resources are supported as extra traffic in the lower layer, there is a risk

that these client-layer spare resources are preempted by the recovery action in the

server layer, resulting in ‘‘destructive interference.’’ In other words, none of the two

recovery actions was able to restore the traffic, because the client layer reroutes the

considered flow over the path a-b-c, which was disrupted by the server-layer

recovery. Reference [Wau99] illustrates that these risks may exist in real networks;

the authors prove that a switch-over in the optical domain (e.g., for protection

purposes in the optical network) may trigger traditional SDH protection. Further-

more, as discussed in the section Trade-Off between Rerouting Time and Network

Stability for Recovery in Multilayer Networks, such a multilayer recovery strategy

can have a significant impact on the overall network stability and can introduce

A

D

B

C

E

Client Layer

Server Layer

a

d

b

c

Client Layer Primary Path

Client Layer Recovery PathServer Layer Recovery Path

Figure 6.14 The uncoordinated multilayer survivability strategy.



undesirable potential race conditions in the network. In summary, although simple

and straightforward, just letting the recovery mechanisms in each layer run without

a coordinating escalation strategy has its consequences on efficiency, capacity

requirements, and even ability to restore the traffic.

Sequential Approach

A more efficient escalation strategy, in comparison with the uncoordinated

approach, is the sequential approach. Here the responsibility for the recovery is

handed over to the next layer when it is clear that the current network layer is not

able to do the recovery task. Instead of uncoordinated recovery in several network

layers, one ensures that a fault is not resolved in different layers at the same time

(possibly leading to racing conditions), by imposing a chronological order on the

recovery mechanisms. For this escalation strategy two questions must be answered:

in which layer to start the recovery process and when to escalate to the next layer.

Two approaches exist: the bottom-up escalation strategy and the top-down esca-

lation approach, each having different variants.

Bottom-Up Escalation

With the bottom-up escalation strategy, the recovery starts in the bottom or lowest

detecting layer (being the network layer where the failure is detected, ensuring a fast

activation of the recovery mechanism) and escalates upwards. All affected traffic

that cannot be restored in this layer (e.g., because of capacity shortage) will be

restored in a higher layer. The advantage of this approach is that recovery actions

are taken at the appropriate granularity: First the coarse granularities are handled

(restoring the big connections), recovering as much traffic as soon as possible, and

recovery actions on a finer granularity (implying in a higher layer) only have to

recover a small fraction of the affected traffic. This also implies that complex

secondary failures (as a consequence of the propagation of a root failure in the

higher network layers; see the section Survivability at the Top Layer) are handled

only when and if needed (e.g., if recovery in the lower layer because of the root

failure is not possible). In the client-server example of Figure 6.12, for instance,

there is the failure of OXC D as the root failure. This corresponds with the

simultaneous failure of three IP links (a-d, a-c, and d-c) in the client layer. If the

server-layer recovery mechanism copes with the failure of OXC D, then the client-

layer recovery mechanism will only have to handle the recovery of the traffic over

the links a-d and d-c, being less complex than the simultaneous failure of three

links.

An example of the bottom-up interworking approach is shown in Figure 6.15,

where node D in the server layer fails. The server layer starts with the recovery

process, attempting to restore the logical link a-d. The server layer fails in this

recovery because this logical link terminates on the failing node D. As such, the

client-layer recovery scheme is triggered78 to restore the corresponding affected

78The implementation of this trigger mechanism is discussed at the end of this subsection.



traffic flow a-c (originally following the route a-d-c), by rerouting it over node b

instead of node d.

An issue that must be handled in the bottom-up escalation strategy involves

how a network layer knows whether it is the lowest layer that detects the failure so

it can start the recovery or has to wait for a lower layer. Typically the fault signals

that are exchanged to indicate a failure will carry sufficient information so it can

be derived in which layer the failure occurred, and the recovery process can

start. Suppose, however, that this is not the case. Assume that we have a four-

layer network, where a failure occurs in the bottom layer. Assume that the failure

is detected in all four layers simultaneously (this assumes no delay in the propaga-

tion of the failure signals) and that it cannot be derived from those signals in

which layer the failure has occurred. This means that each of the higher

layers can think to be the lowest detecting layer and start with the recovery. This

can be overcome by appropriately using the mechanism of hold-off timers

(see discussion later in this section), which are set progressively higher as we

move upwards in the stack of layers: 0 milliseconds (ms) in the bottom

layer, 20 ms in the first layer, 40 ms in the highest but one layer, and 60 ms in the

top layer. In this way, the recovery mechanisms in the higher layers will wait,

and this gives the chance to the bottom layer (where the failure occurs) to do

the recovery.

Top-Down Escalation

With top-down escalation, it is the other way around. Recovery actions are now

initiated in the top or highest possible layer, and the escalation goes downwards in

the layered network. Only if the higher layer cannot restore all traffic, recovery

actions in the lower network layer are triggered. An advantage of this approach is

D

B

AC

E

ad

b

c

A

D

C

B

E

Client Layer

Server Layer

Client Layer

Server Layer

ad

c

b

Client Layer Primary Path

Client Layer Recovery PathServer Layer Recovery Path

Phase 1: Recovery Action in Server Layer Phase 2: Recovery Action in Client Layer

Server LayerRecovery Failed

Figure 6.15 The bottom-up escalation approach.



that a higher layer can more easily differentiate traffic with respect to service types

so it can try to restore high-priority traffic first. A drawback of this approach,

however, is that a lower layer has no easy way to detect on its own whether a higher

layer was able to restore traffic (an explicit signal is needed for this purpose). So

here the implementation is somewhat more complex and not currently imple-

mented. There is also a problem of efficiency, because it is very well possible that,

for example, 50% of the traffic carried by a wavelength channel in an optical

network is already restored by a higher network layer recovery mechanism, hence

protecting this wavelength in the optical layer as well is only useful for the other

50% of the carried traffic.

Implementation of an Escalation Strategy

In the previous subsections on escalation strategies it was mentioned that at one

point in the recovery process the recovery is handed over from one network layer to

another. The actual implementation of these escalation strategies (referring

to handing over the responsibility for recovery from one layer to the other one) is

another issue. Two possible solutions are described here (for the ease of explan-

ation, the bottom-up escalation strategy is assumed in what follows).

A first possible implementation solution is based on a hold-off timer Tw. Upon

detection of a failure, the server layer can start the recovery immediately, whereas

the recovery mechanism in the client layer has a built-in hold-off timer that must

expire before initiating the client-layer recovery process. In this way, if the fault is

already fixed by the server-layer recovery mechanism before the hold-off timer

expires, no client recovery action will take place. If this hold-off timer expires and

all or part of the traffic is not restored, then the client layer will take over

the recovery actions. This is very straightforward; however, the main drawback of

a hold-off timer is that recovery actions in a higher layer are always delayed,

independent of the failure scenario, because the hold-off timer must expire first.

Moreover, as discussed in a later section, one of the challenges is to determine the

optimal value for Tw that is driven by a trade-off between recovery time and

network stability and performance.

The second possible escalation implementation overcomes this delay by using a

recovery token signal between layers. This means that the server layer sends the

recovery token (by means of an explicit signal) to the client layer from the moment

it knows it cannot recover (all or part of ) the traffic. Upon receipt of this token, the

client-layer recovery mechanism is initiated. This allows limiting the traffic disrup-

tion time in case the server layer is unable to do the recovery. A disadvantage,

compared with the hold-off timer interworking, is that a recovery token signal

needs to be included in the standardization of the interface between network layers.

For the top-down approach, a hold-off timer is probably less appropriate, because

the lower layer must be notified with an explicit signal whether the higher layer

managed to restore the traffic or not.

Note that at the time of publication the timer-based approach is the only

approach currently available in commercial products and therefore used in

deployed networks.



Integrated Approach

A more radical means to ensure coordination between the recovery mechanisms in

different layers is to combine the different recovery mechanisms in one integrated

multilayer recovery scheme. This implies that this recovery scheme has a full

overview of all the network layers and that it can decide when and in which layer

(or layers) to take the appropriate recovery actions. Although this approach is

clearly the most flexible from the recovery point of view, combining different

technologies in one mechanism is often unrealistic from a practical point of view.

Indeed, to profit from this high flexibility, one has to provide the necessary algo-

rithmic intelligence and/or complexity. Another issue is the implementation and

realization of such an integrated approach. It is unlikely that a single recovery

scheme, controlling and having an overview of all network layers, is developed in

current overlaid networks. However, this can become more feasible when looking at

peer-to-peer networks.

Supporting Spare Resources for Multilayer Recovery

Multilayer survivability involves more than just coordinating the recovery actions

in multiple layers. There is also the issue of the spare resources, and how they have

to be provided and used in an efficient way in the different layers of the network.

Several examples are given in Section 6.3, which elaborates on some case studies.

One way or another the logical (spare) capacity assigned to the recovery mecha-

nisms that are deployed at higher network layers must be transported at the lower

layer. Several ways exist to do this.

The most straightforward option is called double protection and is depicted in

Figure 6.16 for an IP-over-OTN network. Note that this figure (as well as Figures

6.17 and 6.18) is highly simplified to focus the discussion on the relevant aspects of

the spare capacity provisioning in multilayer networks. In reality of course there

will be more OXCs and fibers connecting them. Here the spare capacity that is

provisioned in the logical IP network to be used by the IP routing mechanism is

simply protected again in the underlying optical layer. Despite the reduced com-

plexity, this double protection is a rather expensive solution. In Figure 6.16, a

working logical IP link between the outer two router line cards (full line in the

figure) is protected by the logical IP spare link between the inner two router line

cards (dashed line in the figure). These logical IP links are implemented by two

lightpaths in the optical layer. Both these lightpaths are also protected in the optical

layer (top and bottom dashed lines in the OTN layer in the figure). A failure of the

fiber (carrying the lightpath of the working IP link) interconnecting both OXCs

would result in using the backup lightpath of the lightpath implementing the

working logical IP link. Only in the case that, for example, one of the outer router

line cards (on the working IP link) fails would the spare logical IP link be needed.

The added value of protecting and thus investing in an additional amount of spare

capacity in the optical layer is expected to be very low. Only if, for example, one of

the outer router line cards and the top fiber (interconnecting both OXCs) fail

simultaneously, this would result in added value of this double protection. In



some network scenarios (e.g., pan-European networks consisting of 20,000 ki-

lometers [km] of fiber), simultaneous cuts of two fibers might become a concern.

To benefit in such a situation from the expensive double protection, less overlap

would be required between the lightpaths implementing both logical IP links.

However, (optical) transport networks are typically sparse (e.g., having an average

connectivity of less than three) and would not allow such non-overlapping routing.

In summary, investing in double protection is very debatable and probably only

meaningful in a few exceptional network scenarios.

Figure 6.16 shows one point-to-point example of this double protection. This is

of course valid for an entire network; if each IP link is protected in the optical layer

and the IP network is traffic engineered to survive from any single failure (implying

that there is also spare capacity in the IP layer, which is at its turn protected in the

optical layer), the required capacity is more than twice what is actually needed

because that backup capacity is provisioned in both network layers.

A first possibility to save investment in physical capacity is carrying the spare

capacity in the logical higher-layer network allocated to the higher-layer network

recovery techniques, as unprotected traffic in the underlying network layer (or

layers) (see Figure 6.17 for the IP-over-OTN example). This strategy, called logical

spare unprotected, still allows protecting against any single failure: A cut of the

bottom fiber (carrying the lightpath of the working IP link) would trigger the

optical network recovery, whereas a failure of one of the outer router line cards

would trigger the IP layer network recovery. A prerequisite for such a scenario is

that the optical network supports both protected and unprotected lightpaths. It is

OTN

IP

Working IP LinkBackup Lightpath for Lightpathof Working IP Link

Spare IP LinkBackup Lightpath for Lightpathof Spare IP Link

Figure 6.16 Option 1: Double protection.



crucial to guarantee that the unprotected spare lightpath carrying the spare capacity

of the logical higher network layer (in Figure 6.17, these are the logical spare IP

links) is not affected by the failure that triggers the IP layer network recovery (that

actually uses this unprotected spare lightpath). Otherwise, the spare IP capacity

would also become unavailable for recovery of this failure, and the recovery process

would fail.

One step beyond simply carrying the spare capacity of the logical higher network

layers as unprotected traffic in the underlying layer is to preempt this unprotected

traffic by the network recovery technique of the underlying network layer. This is the

common pool strategy, and an example is given in Figure 6.18 for an IP-over-OTN

network. The lightpath implementing the working logical IP link is optically pro-

tected. The lightpath implementing the spare logical IP link is then routed in the

(optical) spare capacity, which is needed to protect the aforementioned lightpath

(the one that implements the working logical IP link). Thus the backup lightpath of

the working IP link overlaps with the lightpath of the spare logical IP link. In case of a

failure of the fiber carrying the working logical IP link, the optical protection will be

triggered, preempting the lightpath implementing the spare logical IP link. In that

case, there is no problem preempting this lightpath because it is not needed in the

failure scenario. However, the preemption of lightpaths carrying logical spare cap-

acity requires additional complexity. In summary, the common pool strategy pro-

vides a pool of physical spare capacity that can be used by the recovery technique in

either the IP or the optical layer (but not simultaneously).

The options logical spare unprotected and common pool, which are discussed

above, are developed to reduce the amount of capacity to invest when deploying a

OTN

IP


Spare IP Link

Figure 6.17 Option 2: Logical spare unprotected.



static multilayer survivability strategy. Let us give a flavor of these cost savings with

the following example on a realistic network scenario, being the Italian backbone

network (see left side of Figure 6.19). A simplified/linear cost model has been

adopted to quantify the network cost. Each fiber has been assigned a cost per

wavelength channel per kilometer to quantify the cost for the optical fibers and

the inline amplifiers that are installed every 70 km. This means that the total cost to

equip the fiber is evenly distributed over all wavelength channels possibly multi-

plexed. In the nodes, a cost per wavelength channel port has been assigned. This

includes the cost for an OXC (averaged over the three OXC sizes for which data was

available) plus a cost for the WDM line system. Finally, an IP layer cost has been

assigned; this is a cost per terminated wavelength channel and includes a router line

card cost and the cost for the OXC port to which the router line card is connected.

In summary, this cost model does not explicitly represent or take into account the

granularity of the system sizes (except the wavelength channels that are assumed to

have a capacity of 2.5 Gbps).

The adopted design methodology is as follows. First, for each node pair, the

cost to establish a 1þ1 protected lightpath is computed. Because a linear cost model

is applied, the shortest, and thus the cheapest, cycle containing both nodes has to be

found. Knowing the cost for each possible logical IP link, the logical IP network

topology is optimized to transport the offered traffic demand, the result of which is

depicted on the right side of Figure 6.19. This result depends of course on the

considered traffic pattern and volume; the traffic pattern is based on the assumption

that most of the traffic is generated in the four major cities that have a connection to

the commodity Internet and where the content servers are installed. Second, each IP

OTN

IP


Spare IP Link

Figure 6.18 Option 3: Common pool.



router failure scenario is simulated; traffic is always rerouted along the remaining

shortest path according to the logical topology. The links in the logical network are

then dimensioned so that sufficient capacity is available to cope with each router

failure. Based on the design of the logical IP network, two optical traffic demands

are generated. The first one is called the logical working capacity (corresponding

to the link capacities, represented as thickness of the lines at the right side of

Figure 6.19). The second one contains the remaining capacity including the logical

spare capacity that is only needed in case of at least one router failure. In a final

step, the underlying network is dimensioned; for this purpose the shortest cycle

routing is adopted to compute the cost for each possible logical link. Both the

working and the backup path are equipped for the logical working capacity. In

addition, the capacity to support the logical spare capacity is also equipped. With

double protection, capacity is added on both the working and the backup path,

whereas with unprotected logical spare capacity, the capacity is only added on the

working paths. For the common pool strategy, the capacity along the working

paths is computed and compared to the optical spare capacity that has already been

installed to support the logical working capacity. Only the part of the former

capacity that cannot be transported in the latter capacity is added to the network

(thus, on each link the maximum instead of the sum of both capacities is installed).

The cost comparison between the three static multilayer survivability strategies

is depicted in Figure 6.20; the nominal cost refers to the case where no recovery

against router failures/isolations would be considered and serves as base for the

comparison (100%). The figure clearly shows that an overall network cost reduction

of 10% can be achieved by transporting the logical spare capacity as unprotected or

Physical Topology

Torino

Genova

Milano

Venezia

Trento

Bologna

RomaCagliari

Napoli

ReggioC

Palermo

Bari

Firenze

Pescara

Nominal Topology (Optimal LogicalTopology in Failure-Free Condition)

Torino

Genova

Milano

Venezia

Trento

Bologna

RomaCagliari

Napoli

ReggioC

Palermo

Bari

Firenze

Pescara

Figure 6.19 Network scenario: Optical transport network layer topology (left), and nominal InternetProtocol layer topology (right).



as extra (this means unprotected and preemptable as in the common pool strategy)

traffic in the underlying optical network. Note, however, that this does not help

reduce the dominant cost to connect the IP routers to the optical network.

Trade-Off between Rerouting Time and Network Stabilityfor Recovery in Multilayer Networks

In multilayer networks where recovery mechanisms are used at multiple layers, the

issue of achieving the recovery objectives while ensuring network stability is of

utmost importance. As described earlier, the uncoordinated approach implies that

the client layer tries to recover from the failure as soon as the failure is detected,

independently of the recovery actions triggered at the server layer. As already

pointed out, this approach has many drawbacks. A more viable approach is the

sequential bottom-up escalation approach. Only the timer-based variant is cur-

rently available in commercial products; the client layer waits for some hold-off

timer (Tw) to elapse before triggering any recovery action, to give the server layer

the opportunity to recover the failure.

This illustrates the trade-off between recovery time on one hand and network

stability on the other hand. Indeed, if the timer Tw is set to a too small value, this

may lead to a so-called false-positive recovery action, where the client layer

will trigger its recovery mechanism before the server layer has completed its set of

recovery actions. For instance, consider the example of an IP layer (as client layer)

and a SONET network layer (as server layer). Another more elaborated example is

given in Section 6.3 with optical restoration (in the optical server layer) and MPLS

TE Fast Reroute (in the IP/MPLS client layer). Suppose that the hold-off timer

Tw is set to 60 ms and it turns out that the SONET protection cannot be completed

under 60 ms (e.g., because the ring has long propagation delays and contains a

IP Spare Protected, Unprotected and Common Pool

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

120%

130%

140%

IP Spare Protected IP Spare Unprotected Common Pool

Strategy

Rel

ativ

e O

pti

cal N

etw

ork

Co

st(P

erce

nta

ge

of

No

min

al C

ase)

Line Cost Node Cost Trib Cost

Figure 6.20 Static multilayer survivability strategies: A comparison based on capacity requirements.



large number of stations). Upon a link failure and after the timer Tw has elapsed,

the IP layer will trigger some network convergence before the SONET layer

will have recovered the failed link. Consequently, every IP router will recompute

its routing table avoiding the failed link. This failed link, however, will effectively

be restored a few tens of milliseconds later by the SONET protection. Once the link

is restored, the network will have to reconverge so this link can be reused. If

the timer was set to a larger value, the SONET protection would have been able

to do the recovery job, and the IP layer would not have been obliged to initiate

any recovery and reconvergence actions. So, the fact that the timer Tw is given a

too small value has the undesirable effect of two unnecessary network convergences

and may provoke network congestion for some time (up to several seconds

depending on the revertive strategy) if the IP layer is not dimensioned to handle

a single link failure without traffic congestion (at least for some flows). Moreover, it

is usually highly desirable to use some dampening mechanism (which can

be implemented at various layers), which has the aim to slow down the rate

of state changes of a network element that is constantly flapping. Several examples

have been given throughout this book. A typical example is the case of a flapping

link (see Chapter 4, Section 4.4). In the IP layer, it is undesirable to generate a

new IP LSP at each link state change, so the LSP origination process is dampened

until the network resource is considered as sufficiently stable (various dampening

algorithms have been described in Chapter 4, Section 4.4). In other words,

because dampening is necessary to preserve the network stability in case of unstable

network elements, the consequences of inappropriately declaring a resource as

down may lead to a situation where the resource, once restored, is not reused for

some time.

As illustrated, setting Tw to too small values may have a negative impact on the

network. On the other hand, setting Tw to a too large value results in longer

recovery times if the fault cannot be recovered in the server layer. Such a compro-

mise is not always straightforward and the recovery time objectives must impera-

tively be balanced with the network stability and performances in multilayer

recovery networks. Of course this is particularly true when the different recovery

mechanisms operate under similar time frame (like SONET-SDH, optical, and

MPLS TE Fast Reroute). The case of SONET-SDH protection in combination

with IP routing not tuned for fast convergence is less problematic.

Network Operation Complexity

Although usually difficult to quantify, multilayer recovery approaches inevitably

increase the network operation complexity. Indeed, the various layers (optical,

SONET-SDH, IP/MPLS) are usually managed by different operators’ teams or

organizations. Hence, upon a network element failure, this requires close collabor-

ation between the teams. Furthermore, the troubleshooting of the root failure may

be quite difficult to determine because several recovery mechanisms are triggered at

different layers. This might not be a show stopper but should be taken into account

when opting for a multilayer recovery strategy.



Revertive Operation in Multilayer Networks with Multilayer Survivability

Various challenges of implementing multilayer recovery in multilayer networks

have been discussed in the previous subsections. Another interesting aspect relates

to the so-called revertive operation that looks at the reuse of a restored resource (so

after a failed resource has been repaired) in the network.

Usually a resource is reused at a client layer once the server layer has declared it

as operational again. For instance, if a SONET-SDH VC is restored, it is quite

common for the SONET-SDH layer to wait for 10 seconds before announcing the

link as operational to the IP layer. Consequently, the IP layer will start reusing

the link once a new IGP adjacency is reestablished with the adjacent router over the

restored link. Then if an additional layer like MPLS TE exists, the link will

eventually be reused to route TE LSPs, if such a link offers a more optimal path

for some existing TE LSPs or if new TE LSPs are established in the network.

Network stability is of the utmost importance in data networks. Hence, it is

usually desirable to wait some time before reusing a restored resource, to ensure

that the resource is not in an unstable state and is not likely to fail again shortly.

Various mechanisms can be used: wait a fixed period of time without any failure

(e.g. 10 seconds) before reusing a restored link or wait for some dynamic timer

taking into account the link failure history.

Qualitative Performance Comparison of Some Recovery Strategiesfor Multilayer Networks

In Section 6.2.2, the use of single-layer survivability strategies in multilayer net-

works is discussed, highlighting a few shortcomings. Section 6.2.3 illustrates how

these disadvantages can be overcome by providing recovery mechanisms in differ-

ent layers and implementing an escalation strategy to ensure the correct interwork-

ing of these strategies.

The qualitative performance of the discussed survivability strategies can be

compared using a number of intuitive criteria, as shown in Table 6.2. Four strat-

egies are compared: bottom layer, bottom-up, top layer, and integrated approach.

Each of the used performance criteria (left-most column) indicates and measures a

certain aspect of the recovery process and is given a qualitative value. For example,

the failure coverage of a survivability strategy can be low, indicating that a small

number of failure types can be handled by the strategy, or high if a broad range of

failure types can be handled. Another performance criterion is the required band-

width resources and indicates the extra capacity that is needed in the network to

restore the traffic compared with the situation in which there is no extra capacity,

and thus, no traffic can be restored. The performance parameter coordination and

management refers to the escalation approach: It is high if an escalation approach is

in place. The strategy complexity, on the other hand, refers more to the complexity

of the survivability strategy when dealing with an actual failure. For example, with

the bottom-up strategy, an escalation strategy is installed, but then the survivability

process falls back to two single-layer restoration strategies (because the escalation

strategy defines and performs the transfer of the restoration process from one layer



to another). This makes the strategy complexity medium, but the coordination and

management high (as a result of the installment of an escalation strategy). The

preferred value—from the viewpoint of a network operator—for each of the per-

formance criteria is shown in the right-most column of the table. This can be used as

a benchmark for comparing the different strategies. Because most strategies have

advantages and disadvantages, we can select strategies according to their behavior

at certain specific and important criteria (from the viewpoint of the operator or

decision maker).

References [Gry98] and [Dem99] illustrate that the spare resource requirements

can be reduced for the case of multilayer survivability, by supporting higher layer

spare resources as extra traffic in the lower layer spare resources (this is the common

pool of spare resources). However, as mentioned in Section 6.2.2, a proper coordin-

ation of the recovery schemes becomes absolutely necessary in such a case.

6.2.4 Dynamic Multilayer Recovery

In the previous section, static multilayer recovery strategies have been discussed.

They are called static, because at the time of a failure the logical network topology

(in an IP-over-OTN network, this is the IP layer topology) is left unchanged (i.e.,

static), and no specific actions are taken to modify it. As such the logical network

must be provided with a recovery technique and the required spare resources, to be

able to survive router failures, for example.

Dynamic multilayer survivability strategies that are the subject of this section

differ from such static strategies in the sense that they actually use logical topology

Table 6.2 Comparison and Summary of Several Qualitative Performance Parameters for SomeSignificant Recovery Strategies

Performance Criteria Survivability Strategy Preferred Value

Bottom

Layer Bottom-Up

Top

Layer

Integrated

Approach

Switching Granularity Coarse Coarse Fine Coarse Coarse

Failure Scenario Simple Simple Complex Simple Simple

Recovery Close to Root Yes Yes No Yes Yes

Failure Coverage Low High High High High

Coordination,

Management

Low High Low Low Low

Required Bandwidth

Resources

Low High Low Low/high Low

Service Differentiation Difficult Difficult Easy Easy Easy

Strategy Complexity Low Medium Low High Low

(Adapted from D. Colle, et al, ‘‘Data-centric optical networks and their survivability,’’ IEEE Journal on

Selected Areas in Communications, vol. 20, no. 1, January 2002.)



modification for recovery purposes. This requires the possibility to flexibly and at

real-time set up and tear down lower layer network connections that implement

logical links in the higher network layer. As was discussed in Section 6.1, for

instance, optical networks will be enhanced with a control plane, which gives the

client networks the possibility to initiate the setup and tear down of lightpaths

through the optical network. This could be used to reconfigure the logical IP

network when it is affected by a network failure. This approach has the advantage

that the logical network spare resources should not be established in advance in the

logical IP network (at least no spare line capacities) and thus the underlying optical

network should not care about how to treat (as protected, unprotected, etc.) these

client layer spare resources. This implies that there is no longer a requirement for

established spare capacity in the logical IP layer, in contrast with the static multi-

layer resilience schemes discussed in the previous section. In the optical layer,

however, spare capacity still has to be provided to deal with lower layer failures

such as cable cuts or OXC failures. Enough capacity is also needed in the optical

layer to support the reconfiguration of the logical IP network topology and the

traffic routed on that topology.

An illustration of such a dynamic reconfiguration of the logical higher-layer

topology in case of failures is given in Figure 6.21 for an IP-over-OTN network.

Initially in a failure-free situation the traffic flow from router a to router c is

forwarded via the intermediate router b. To this end the logical IP network contains

the IP links a-b and b-c, implemented by the lightpaths A-B and B-C in the OTN

network. When router b fails, routers a and c will detect this failure and come to the

conclusion that these two logical links are useless and can be torn down. This is

requested to the optical layer, by sending a request through the UNI. This releases

some capacity in the optical layer that can be used to set up a direct logical IP link

from router a to router c. This setup is requested to the underlying optical network

by sending a signal through the UNI, requesting the setup of the lightpath between

OXCs A and C. So, at the time of the failure, the logical IP network topology is

reconfigured. As mentioned before, a special feature of the underlying optical

network is needed for this; it must be able to provide an SC service to the client

network. ASONs, or more generally IONs, have this particular feature.

A key issue with this dynamic multilayer recovery strategy (let us take the IP-

over-OTN scenario as an example) involves the actual logical IP network topologies

that will be used at the occurrence of failures. This implies that for dynamic multi-

layer recovery strategies the logical IP layer topology has to be dimensioned several

times: There is one dimensioning exercise for the failure-free case (this is also called

the nominal case) and then there is one dimensioning exercise for each possible IP

router failure with as result—for each IP router failure—the reconfigured topology

that will be used when that particular IP router failure occurs. This is illustrated at

the right side of Figure 6.22. If there are, for example, four IP routers in the

network, five IP layer dimensioning exercises must be performed. For each of

these IP layer dimensioning exercises, the capacity needed in the underlying optical

layer is calculated. Network survivability against OTN layer failures is guaranteed

by using an appropriate resilience scheme in the optical layer. The resources needed



A

D

B

C

E

OTN Layer

a

d

b

c

e

A

D

C

B

E

IP LayerIP Layer

OTN Layer

a

d

c

b

e

Lightpath Implementing IP Link

OTN Link (Optical Fiber)

IP LinkTraffic Flow

IP Link Establishedafter the Failure

Lightpath Implementedat Time of the Failure

Figure 6.21 Illustration of dynamic multilayer survivability strategy.

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

3:2

3pm

page

459

6.2

Gen

ericM

ultila

yer

Reco

very

Appro

ach

es459

in the OTN layer to be able to recover from all possible single (IP or OTN) failures

can then be calculated as the worst case resource (e.g., IP router cards) requirements

of the OTN network taken over the failure-free and all IP failure scenarios (so the

maximum needed capacity requirements over all these scenarios gives the actual

dimensioning results).

In comparison, the left side of Figure 6.22 shows the way of calculating the

required OTN resources for a static multilayer recovery scheme (in the IP layer

some working and spare LSPs are shown; the topology has to be biconnected to

allow MPLS recovery of router failures). The bottom part of the figure then shows

the actual resource requirements on the OTN links for both strategies, showing an

improvement in the case of dynamic multilayer recovery. This theoretical result will

be confirmed by a case study at the end of this section.

We assume that the IP topology used during failure-free conditions is as

optimal as possible with respect to the traffic pattern and delay constraints. When

an IP router failure occurs, the network has to carry less traffic than in the failure-

free nominal case, because all traffic originating or terminating in the failing router

indeed cannot be restored.

There are two possible approaches for the reconfiguration of the IP topology

during such a failure condition, the so-called global reconfiguration option and the

local reconfiguration option. In global reconfiguration, the goal is to have at each

moment the most optimal topology with respect to the new traffic pattern, so

without the traffic entering or leaving the network via the failing router. For

every scenario (failure-free and every IP router failure), the IP topology is com-

pletely recomputed from scratch to obtain a new optimal topology that copes with

the particular failure. The remaining IP traffic is then rerouted over this new logical

IP topology. The local reconfiguration option potentially involves less reconfigu-

ration of the IP topology under failure conditions. In this case, when an IP router

fails, this router and its incident links are removed from the logical IP topology and

the remaining IP traffic (all IP traffic that did not terminate in the failing router) is

rerouted on this reduced topology. Link capacities can be upgraded or downgraded

as needed by the new routing of the (affected) traffic. For example, if we consider a

logical ring for the nominal or failure-free situation, this logical ring can become a

star topology when applying global reconfiguration. In the case of local reconfigur-

ation, however, this will remain a logical ring topology.

With both reconfiguration approaches, the capacity requirements for the OTN

layer are determined for each failure and the failure-free scenario, and the resources

needed on each link in the optical layer are then calculated as the maximum of the

resources needed on that link over each of those scenarios.

Let us now look at the comparison between the static multilayer recovery

schemes described in Section 6.2.3 and the dynamic (ION) recovery schemes

discussed above, and this on the same network scenario as in Figure 6.19. To obtain

the results for the dynamic multilayer recovery schemes (ION local reconfiguration

and ION global reconfiguration), the capacity demands of the IP layer topology

and related traffic pattern on the optical layer network are calculated for

the optimal nominal (failure-free) IP topology and for each of the 14 possible IP



Compare Static andDynamic Results

OTN Layer

OTN Layer

22

3

Static Multi-LayerRecovery Scheme

...

IP LayerIP Layer

OTN Layer OTN Layer

IP LayerIP Layer

OTN Layer

...

OTN Layer

Failure-Free Scenario All Single IP Router Failure Scenarios

11

2

Dynamic (ION-Based) Multi-Layer Recovery Scheme

Capacity Needed on OTN Links Worst-Case Capacity and Resource Requirements Over All Scenarios

Figure 6.22 Static multilayer resilience scheme (left) versus dynamic multilayer resilience scheme using ION flexibility (right). (Notethat for reasons of simplicity, the [physical] OTN topology is assumed to be uniconnected, so a recovery scheme in theOTN layer is not possible. In reality, however, the OTN topology will be biconnected, which enables the use of anappropriate resilience scheme also in that layer.) (S. De Maesschalck, et al, ‘‘Intelligent optical networking for multilayersurvivability,’’ IEEE Communications Magazine, vol. 40, no. 1, pp. 42–49, January 2002.)

Vasseu

r/N

etwork

Reco

very

Fin

alPro

of

8.6

.2004

3:2

3pm

page

461

6.2

Gen

ericM

ultila

yer

Reco

very

Appro

ach

es461

router failure conditions. The underlying optical layer should be able to support

each of these failure conditions and the failure-free condition. Thus, the capacity

that needs to be installed on the links in the optical network is the maximum

capacity needed on those links over all these failure and the failure-free cases.

Figure 6.23 shows a cost comparison (relative to the nominal failure-free situation)

for the static recovery options using MPLS rerouting to protect against IP

router failures (see [Dem99] and [ColONDM01] for more information on this

recovery scheme in the MPLS layer) and for the dynamic options using ION

flexibility. In all options, recovery against single optical node or link failures is

provided using path protection in the optical layer. The total network cost is split in

three parts: a line cost proportional to the length of the links, a node cost propor-

tional to the number of wavelengths entering or leaving an OXC via an aggregate

port, and a tributary cost for each IP router line card connected to an OXC.

Figure 6.23 confirms that for all strategies, the optical network needs to install

more capacity than for the support of the nominal logical IP network. In addition,

ION local reconfiguration is clearly the most cost-efficient multilayer recovery

scheme. The decreasing cost trend from ‘‘double protection’’ to ‘‘IP spare not

protected’’ to ‘‘common pool’’ was expected as the IP spare resources are supported

more and more efficiently by the OTN resources. The higher flexibility needed to

optimize the logical IP topology in each particular fault scenario in ‘‘ION global

reconfiguration’’ requires a higher amount of installed capacity and equipment in

the optical layer than ‘‘ION local reconfiguration,’’ making this global strategy

more expensive (even as expensive as the quite inefficient static ‘‘double protection’’

strategy). The ‘‘ION local reconfiguration’’ solution is less expensive than the

‘‘common pool’’ one. The main cost difference lays in the tributary cost. ‘‘ION

Relative Optical Layer Cost (Percentage of Nominal Case)

0%

20%

40%

60%

80%

100%

120%

140%

160%

ION GlobalReconfiguration

DoubleProtection

IP Spare NotProtected

Common Pool ION LocalReconfiguration

Multilayer Resilience Scheme

Line Cost Node Cost Tributary Cost

Figure 6.23 Cost comparison between static and dynamic multilayer resilience schemes.(S. De Maesschalck, et al, ‘‘Intelligent optical networking for multilayer survivability,’’IEEE Communications Magazine, vol. 40, no.1, pp. 42–49, January 2002.)



local rerouting’’ needs fewer IP router line cards, and because this equipment is

relatively expensive, this equipment saving results in quite a large cost saving.

When looking at these results, however, one remark that needs to be taken in

account, is that the simple but straightforward methodology used to compute the

needed resources in the ‘‘common pool’’ approach, cannot always guarantee a

correct functioning of this multilayer recovery scheme. As long as a single line or

router failure occurs, only one of the two recovery mechanisms is activated and

there is no risk of interference. However, in the case of an OXC failure, the

protection mechanism in both layers is triggered. The optical path protection

scheme is triggered for the flow transiting the failing OXC, the IP/MPLS recovery

for the flow transiting the isolated router. Simply taking the maximum of the

protection capacity over both recovery schemes for calculating the needed spare

resources in ‘‘common pool’’ (as we do) is thus not always appropriate because both

recovery schemes may compete for the same resources at the same time. Besides

this, there is the possibility in the case of an OXC failure in the options ‘‘IP spare

not protected’’ and ‘‘common pool’’ that the optical routes of the spare and

working IP capacity overlap. However, a proper but surely more sophisticated

design approach could solve these problems. This corresponding additional oper-

ational complexity may not be that critical for static networks (e.g., manual provi-

sioning) but becomes an increasingly important issue when evolving to more

dynamic networks. The dynamic multilayer recovery schemes, however, do not

suffer from these design disadvantages and can guarantee a better fault coverage.

The reason is that in this case the spare capacity in the logical network does not

have to be designed in advance, but capacity is provisioned as needed and always

optically protected.

Such a dynamic approach has the advantage of being highly efficient in terms of

required backup capacity. There are of course also some issues and challenges.

Compared to the static multilayer approach, the recovery time in the case an IP

network reconfiguration that is required for a failure is likely to be significantly

higher. Indeed, if we go back to the example of a router failure, with the static

multilayer approach, the routers adjacent to the failed or isolated router can quickly

detect the failure and the network can converge (find alternate paths for the

impacted traffic flows) in a short period of time by means of fast IP routing or

MPLS TE Fast Reroute. The dynamic multilayer recovery approach requires for

the IP router (or routers) to signal via the UNI the setup of new IP links, the routing

and signaling in the optical layer, and finally the setup of IGP router adjacencies

over the newly established IP link (or links). In particular, such an approach

requires several rules to prevent ‘‘false-positive’’ alarms that could lead to several

network instabilities. Indeed, upon a router failure, the network should be quickly

reconfigured to limit the impact of traffic disruption and/or quality-of-service (QoS)

degradation because of the congestion, but at the same time it would be undesirable

to trigger a complex set of recovery mechanisms involving several layers for a

temporary router failure. So the trade-off between fast recovery time and network

stability is difficult to determine. Note also that such a dynamic multilayer recovery

mechanism would still require some extra equipment capacity in the IP layer.



6.2.5 Summary

In the previous sections we discussed generic strategies for survivability in multi-

layer networks. These range from single-layer recovery schemes for multilayer

survivability to static multilayer recovery strategies to dynamic multilayer recovery

approaches. Figure 6.24 illustrates the different options and building blocks that are

possible for recovery in multilayer networks (based on reference [Dem99]).

6.3 Case Studies

In the previous sections of this chapter, we saw the various possible models of

multilayers recovery networks where multiple recovery techniques could be com-

bined to recover from network element failures. In this section, we propose several

case studies corresponding to existing possible multilayers recovery networks.

In this section, three case studies of interlayer recovery mechanisms are pro-

posed. As already mentioned, at the time of publication, the only viable and

deployed interlayer recovery strategy in use is the timer-based sequential bottom-

up escalation approach whereby each layer having a recovery mechanism, starting

at the bottom layer, tries to recover from the detected failure. Once a failure is

detected, each layer waits for a configurable timer to elapse before triggering any

recovery action to give a chance to a lower layer to recover the fault. Taking the

example of two layers (called the top and bottom layers), if the fault occurs in the

bottom layer, the recovery time is as fast as possible. On the other hand, if the fault

can only be recovered at the top layer, this induces some additional delays because

Rec

over

y In

terw

orki

ngS

trat

egie

s

Static Multi-Layer RecoveryStrategy

− Recovery at LowestLayer

− Recovery at HighestLayer

− Recovery at MultipleLayers

Mul

ti-la

yer

Spa

reC

apac

ity D

esig

n

Single-Layer Recovery Options

− Com

mon

Poo

l

− Log

ical S

pare

U

npro

tecte

d

− Dou

ble P

rote

ction

− Loc

al Rec

onfig

urat

ion

− Glob

al Rec

onfig

urat

ion

Interworking Strategy

− Sequential− Bottom-up− Top-down− Diagnostic

− Uncoordinated

− Integrated

− Link vs. Path-Based

− Centralized vs. Distributed Control

− Preplanned vs. Dynamic Route Calculation

− Dedicated vs. Shared Backup Facilities

Figure 6.24 Generic framework for multilayer survivability. (Adapted from P. Demeester, et al,‘‘Resilience in multi-layer networks,’’ IEEE Communications Magazine, vol. 37, no.8,August 1998, pp. 70–76.)



the timer has to elapse at the top layer before triggering a recovery action. As

already pointed out, the level of predictability of the maximum time required by the

bottom layer to recover from a fault helps in adjusting the timer adequately.

Undoubtedly, this approach offers the best guarantees in terms of network stability

avoiding any racing condition between recovery mechanisms simultaneously trig-

gered at different layers.

Three case studies are covered in this section:

1. Optical restoration and MPLS TE Fast Reroute

2. SONET-SDH protection and IP routing

3. MPLS TE Fast Reroute and IP routing

6.3.1 Case Study 1: Optical Restoration and MPLS Traffic EngineeringFast Reroute

In this case study, the optical network will use a restoration recovery mechanism to

handle both fiber failures and other optical equipment failures. In addition, MPLS

TE Fast Reroute is used as a protection mechanism to handle link failure occurring

at the IP/MPLS layer (router interface failure) and IP/MPLS router failures. We

first describe each single-layer recovery mechanism, followed by the multilayer

aspects (Figure 6.25).

Single-Layer Recovery Mechanisms

1. Optical restoration: In this example, the optical network provides a restor-

ation mechanism whereby upon a network element failure detection a

dynamic routing and signaling mechanism is responsible for restoring the

affected set of lightpaths. To minimize the required backup capacity in the

optical layer, the network is dimensioned to survive from a single fiber or

optical node failure. The use of an optical restoration mechanism certainly

has the advantage of supporting the concept of shared optical backup

capacity, which optimizes the required backup capacity compared to a

protection mechanism (e.g., 1þ1 optical protection) for the same failure

coverage. On the other hand, as with any other restoration recovery, the

recovery time is greater than with a protection mechanism and less deter-

ministic. Indeed, once the fiber or optical network equipment failure has

been detected and localized, the FIS must be flooded throughout the net-

work until it reaches the node capable of recovering the traffic, which in turn

recomputes an alternative path and finally resignals the optical path. Several

commercial implementations exist that provide optical restoration based on

proprietary protocols or G-MPLS/ASON. As a matter of fact, the backup

path computation time increases (sometimes nonlinearly) with the number

of optical nodes, the network topology complexity, and the number of

constraints taken into account when computing the path.

2. MPLS TE Fast Reroute: In this case study, a full mesh of MPLS TE LSPs is

established in the network (note that these TE LSPs can be with or without



constraints depending on the network requirements; see Chapter 5 for

a detailed discussion on the use of MPLS TE). These TE LSPs are signaled

as fast reroutable (i.e., local protection using MPLS TE Fast Reroute is

required in the case of network element failure). Hence, in this case study, at

each hop, a set of backup tunnels protecting against link and node failure is

presignaled. In the case of link or IP/MPLS node failure, the TE LSPs are

locally rerouted onto their respective backup tunnels (selected when the TE

LSPs are first signaled) within a very short time (on the order of 50 ms). In a

second step, these TE LSPs are potentially reoptimized along a more opti-

mal path by their respective head-end LSR.

Interlayer Recovery Mechanisms

Let us now focus on the interlayer recovery aspects and in particular two aspects, as

follows:

. Set of recovery actions

. Required backup capacity

Set of Recovery Actions

The interlayer strategy adopted in this case study is the timer-based sequential

bottom-up escalation approach, in which upon a link failure detection, the client

layer (IP/MPLS in this case) waits for some timer Tw to elapse before triggering any

recovery action, which gives the server layer (the optical layer) a chance to restore

IP/MPLS Layer(MPLS TEFast Reroute)

Optical Layer(Restoration)

a

b

c

d

A

B

C

D

E

Working Path

Recovery Path of the a-b Link after a Failure of the Fiber A-B

NHOP (Next Hop) FRRBackup Tunnel Protecting aFailure of the Link a-b

NNHOP (Next Hop) FRR BackupTunnel Protecting Against aFailure of the Router b

Figure 6.25 Case Study 1: optical restoration with IP/MPLS FRR.



the failed resource. As already discussed, the determination of the optimal value of

Tw may not be entirely straightforward. Ideally, Tw must be set to the bounded

value of the restoration time, which is itself a function of the network topology and

set of constraints (e.g., the set of affinities and minimization of the propagation

delay) plus some fudge factor to take into account some unpredictable additional

delays that can occur with any restoration protocol. If Tw is set to a too small value,

there is clearly a risk of triggering an IP/MPLS protection, although the link is

about to be restored by the optical layer. On the other hand, if Tw is set to a too

large value, then failures that cannot be recovered in the optical layer (like an IP/

MPLS node failure) will suffer from unnecessary additional recovery delays.

Let us analyze the different possible failures that can occur in the network and

the set recovery actions triggered in each case:

1. Link failure: As already mentioned, the optical restoration mechanism has

been given enough backup capacity to restore any affected lightpath upon a

single fiber or optical network element failure. Consequently, when a link

failure occurs, both the optical and the IP/MPLS layer will detect the failure

(e.g. in the case of a fiber cut, the optical layer will first detect the failure and

will immediately inform the IP/MPLS of the failure), but just the optical

layer will trigger a restoration process while the IP/MPLS layer will start the

timer Tw. In the case of a single link failure, the optical layer will succeed in

restoring the set of affected lightpaths before Tw, and the IP/MPLS layer will

just clear the alarm.

2. Optical node failure: If the failed optical node does not have any IP/MPLS

router attached to it, then the optical layer will be able to restore all the

lightpaths traversing the optical node. On the other hand, if some routers are

connected to that optical node, the router may be isolated or may suffer

multiple link failures depending on the network configuration. In the former

case (IP/MPLS router isolated), the failure cannot be recovered in the

optical layer. After Tw has elapsed, the IP/MPLS Fast Reroute protection

will be triggered and the LSPs traversing the failing node will be rerouted

onto their respective next-next hop (NNHOP) backup tunnel. Of course the

traffic directed to the isolated node will be dropped because it cannot be

restored. The recovery time Tr in this case will be equal to the failure

detection time plus Tw plus potentially the time for the IP/MPLS layer to

effectively reroute the set of affected TE LSPs onto their respective backup

tunnel.

3. Double link failures, IP/MPLS link failure, or IP/MPLS node failure: The

assumption has been made that the optical layer can recover from single

optical layer link failures. So if a second failure occurs, the recovery process

will have to be performed at the IP/MPLS layer and the recovery time will

still be equivalent to the previous case (Tr). The case of an IP/MPLS link

failure (e.g., caused by a router interface failure) is quite interesting; indeed,

the optical layer cannot recover from such a failure, although it can usually

detect it. In this case, the IP/MPLS layer will trigger FRR but still after Tw



because the IP/MPLS layer cannot unambiguously differentiate such a link

failure from a link failure that can be recovered in the optical layer. Finally,

let us now consider the case of an IP/MPLS node failure. We saw in Chapter

4 that there are several possible node failure scenarios that require different

sets of failure detection mechanisms (see Chapter 4 for an exhaustive list).

For instance, in the case of a power supply failure, all the attached links will

also fail; hence, the adjacent routers will detect the failure and will trigger

FRR after the timer Tw has elapsed (consequently the rerouting time will be

equal to Tr). Now, if the control plane of a centralized architecture IP/MPLS

router fails (which affects both the control plane and the data plane), the

links will not fail and other hello-based protocol mechanisms are required,

which will determine the total rerouting time, as discussed in Chapter 4

(in this case, the timer Tw does not come into play).

The set of recovery actions is illustrated in Figure 6.25; upon a fiber cut between the

optical nodes A and B, the optical layer can restore the lightpath between the routers

a and b (along the path A-E-C-B in the optical layer). If a second link failure occurs

or the optical node B fails or the LSR B fails, after the timer Tw has elapsed, the

LSR B triggers FRR.

Required Amount of Backup Capacity

The second aspect related to such an interlayer recovery approach is the amount of

required backup capacity in the network. Such an approach raises the interesting

question of the amount of required backup capacity. Indeed, some network backup

capacity is required in the optical layer to restore the affected lightpaths from a

single fiber failure or an optical network element failure, but some backup capacity

is also required at the IP layer. Let us take the example of a link failure. Because

some link failures may only be recovered in the IP/MPLS layer (e.g., a router

interface failure), this requires provisioning some backup capacity not only in the

optical layer but also in the IP/MPLS layer. The case of double failures is another

example because the optical layer cannot recover from double failures in this case

study. The immediate consequence is that a protected network element like a link

requires some backup capacity in both layers. Strictly speaking, this is not what is

referred to as double protection and the amount of required backup capacity to

protect a link L can be reduced thanks to the notion of shared capacity in the

optical layer and the IP/MPLS layer. Moreover, as discussed in Chapter 5, an

interesting option is to protect just a proportion of the link capacity at the IP/

MPLS layer (e.g., if x% of the link capacity is used for the QoS-sensitive traffic, an

alternative is to compute a set of backup tunnels offering a capacity of x% instead

of the complete link capacity). However, unavoidably, some backup capacity will

be required in both layers to protect the same network element.

P Important note: In terms of implementation, you may decide to implement the timer

Tw at the optical/SONET/SDH layer in which the server layer waits for Tw before

informing the client layer of the failure or at the client layer (IP/MPLS), whereby

the IP/MPLS layer is immediately informed of the failure but waits for Tw



before triggering any recovery action. Both approaches are functionally identical.

Such an interlayer recovery approach has been demonstrated in the context of the

European LION project (see [Cav IEEE]).

Summary

It has been shown that optical restoration and MPLS TE Fast Reroute can be used

in combination by using a timer-based sequential bottom-up escalation approach.

Such an application has several challenges, particularly in the evaluation of the

timer value Tw, which may be quite difficult to optimally determine, but it has

the benefit of avoiding highly undesirable racing conditions between recovery

mechanisms acting at different layers, hence providing a solution that does not

compromise network stability, if adequately designed. It has also been shown that

careful design must be performed to minimize the required amount of backup

capacity at each layer to protect the same set of network elements.

6.3.2 Case Study 2: SONET/SDH Protection and IP Routing


Such a recovery strategy has been widely deployed in many IP/MPLS networks

during the past several years where link failures are handled by the SONET/SDH

layer and other failures like router interface failure or router failure rely on IP

routing to find an alternate path.

The trend is to move toward different network architectures not involving any

protection at the SONET/SDH layer for several reasons:

. It is not rare to have a high proportion of high-speed links in operators’

backbones networks (OC48 and OC192). Relying on SONET/SDH usually

requires some relatively expensive equipment and the optical layer is more

suitable to deliver such high-speed links.

. SONET/SDH protection (as described in Chapter 2) implies to waste a

significant proportion of the total bandwidth dedicated for protection.

. The emergence of fast recovery techniques like fast IP routing or MPLS TE

Fast Reroute provide fast recovery times (similar to SONET/SDH protec-

tion for MPLS TE Fast Reroute).

That said, though relatively expensive, such a recovery strategy (relying on SONET/

SDH protection) has proven its efficiency.

As in the previous case study, a timer-based sequential bottom-up escalation

approach is adopted in which a timer Tw is started at the IP layer once the alarm

is received by the SONET/SDH layer. Compared to the previous case (where

a restoration mechanism was used in the server layer), the value of the timer Tw is

easier to determine and more deterministic. As mentioned in the SONET/SDH

specification, the maximum recovery time in an MS-SP Ring is 60 ms (10 ms of

detection þ 50 ms of recovery time) provided that the ring distance does not exceed



1200 km, the number of stations is less than 16, and the ring is idle before the

protection. If those conditions are respected, then Tw can be safely set to 60 ms.

If not, Tw must be increased accordingly, but in any case, the recovery time with a

protection scheme will be more deterministic than with a restoration mechanism.

Interlayers Recovery Mechanisms

Set of recovery actions: The set of recovery actions upon link failure is quite

straightforward; the SONET/SDH layer will recover the set of affected VCs

within a short time frame (usually 60 ms, as mentioned earlier). If the failure

cannot be recovered by the SONET/SDH layer (router interface failure, router

node failure, multiple failure in the SONET/SDH layer), then IP routing will

trigger a network convergence, as described in Chapter 4.

Backup capacity: In the previous case study, we mentioned that provisioning

backup capacity in both layers may be necessary because not all failures can be

recovered in a single layer. This is a decision that the operator must make.

Indeed, one could also decide that in the vast majority of the cases, failures are

link failure in the server layer (SONET/SDH); hence, no backup capacity is

required in the IP layer. The assumption is made that multiple failures in the

SONET/SDH layer, router interface failure,79 and IP router failures are suffi-

ciently rare not to justify dedicating backup bandwidth in the IP layer. If such

an event occurs, the IP layer may suffer from congestion whose effects can be

reduced for the sensitive traffic (like voice) by using QoS mechanisms (see

Chapter 4 for more details).

Revertive mode: In most cases, SONET/SDH alarms that result from defects

are held on for 10 seconds after the defect clears. In other words, the SONET/

SDH VC will wait for 10 seconds after the VC has recovered before declaring it

in an ‘‘up state.’’ This guarantees that some link instability (sometimes referred

to as flapping) does not provoke network instabilities.

Summary

Though usually expensive, such an interlayer mechanism has been widely deployed

in several networks. Because the recovery time of SONET/SDH protection is

relatively predictable and deterministic, Tw can be quite easily computed. In the

case of a link failure in the SONET/SDH layer, the link is restored without any

implication on IP routing. If a failure cannot be recovered in the SONET/SDH

layer, IP routing is triggered and the network converges. In terms of required

backup capacity, the assumption was made in this case study that failures not

recoverable in the SONET/SDH layer were sufficiently rare not to justify dedicating

any backup capacity in the IP layer.

79Note that router interface failure may also be handled using Automatic Protection Switching.



6.3.3 Case Study 3: MPLS Traffic Engineering Fast Reroute (LinkProtection) and IP Rerouting Fast Convergence


In this case study, the routers are interconnected by unprotected lightpaths (a fiber

cut or an optical network element failure is not protected at the optical layer).

Furthermore, there are several SRLGs in the network (several lightpaths are routed

through common equipment). MPLS TE Fast Reroute is used to handle link

failures that can occur in the optical layer (e.g., fiber cut, optical equipment failure)

or the IP/MPLS layer (e.g., router interface failure). Note that such a recovery

strategy is very likely to become quite successful. Several existing networks have

adopted this model (with several variants related to using the Fast Reroute to

protect link or node and the IP parameter settings).

We saw in Chapter 5 that there are several possible deployment scenarios for

MPLS TE Fast Reroute.

Scenario 1: The first option is to have a full mesh of MPLS TE LSPs between

the core routers. Note that these TE LSPs may have multiple constraints (e.g.,

bandwidth and affinities) or could just be unconstrained, in which case they just

follow the IGP shortest path.

Scenario 2: The second option is to deploy unconstrained one-hop primary

tunnels (for link protection) that will be quickly fast rerouted onto presignaled

backup tunnels in the case of a failure.

In this case study, MPLS TE is not required for bandwidth optimization or strict

QoS guarantee. Moreover, fast recovery is required only in the case of a link failure.

Hence, the decision is made to use scenario 2; for each link to be protected, the

following set of MPLS TE tunnels is deployed:

. An unconstrained one-hop primary tunnel routed onto the protected link.

. A next-hop (NHOP) backup tunnel whose path is automatically computed

by the point of local repair (PLR) so the backup tunnel is SRLG diverse from

the protected link. In other words, the NHOP backup tunnel path is the

shortest path between the PLR and the next-hop based on the IGP metric

that avoids any link having at least one SRLG in common with the protected

link (this should be treated as an additional constraint in CSPF). This is

mandatory because fast recovery is required in the case of an SRLG failure.

Note: Networks are usually designed to survive from a single SRLG failure. In

other words, an SRLG failure should not result in a disconnected graph where some

destinations may become unreachable. Now, in some situations of double network

failures, such an SRLG-diverse path may not be found. Then there are several

possible alternatives:

. The PLR tries to relax the SRLG-diversity constraint to be able to find a

path for the NHOP backup tunnel. This could still be useful in the case of

a router interface failure or a single link failure.



. Try to find a path that minimizes the number of links having at least one

SRLG in common with the protected link or try to avoid the paths having a

high number of SRLGs in common with the protected section.

The assumption is made that node failures are rare enough to tolerate longer

recovery times; hence, IP routing is used to handle IP/MPLS node failure. This

can be achieved by tuning the IS-IS parameters. As explained in Chapter 4, a few

parameters must be tuned to meet the rerouting time objective, as follows:

lsp-gen-interval 5 200 500

The parameters 5 200 and 500 have the following effects:

B ¼ 200 ms is the amount of time the router waits after the first link failure has

been detected before originating a new link state packet80 A value of 200 ms is

appropriate because there are multiple SRLGs in this network. Thus, waiting

for 200 ms before originating a new LSP maximizes the chance to capture an

accurate network topology change in a single link state packet.

C ¼ 500 ms corresponds to the amount of time the router will wait before

advertising a second LSP if a second local state change occurs.

A ¼ 5 seconds is the maximum amount of time between two successive LSP

originations according to the exponential back-off algorithm described in

Chapter 4.

spf-interval 5 100 200

prc-interval 5 100 200

Because there are multiple SRLGs in this network, it is advisable to set the timer for

triggering the SPF computation to 100 ms, which increases the probability that the

computing router will have received all the LSPs resulting from an SRLG failure

before recomputing its routing table.

It is also recommended to activate iSPF, which significantly reduces the SPF

computation in the case of a network topology change in most cases and conse-

quently helps reduce the recovery time of IP.

Interlayer Recovery Mechanisms

Set of Recovery Actions

So what does happen when a link fails? As soon as the failure is detected by the IP/

MPLS layer by means of the optical layer, which can itself use SONET/SDH

framing, MPLS TE Fast Reroute is immediately triggered. By contrast with

the two previous case studies, no recovery mechanism is deployed in the optical

or SONET/SDH layer, so no timer-based delay approach is required and the

FRR protection must be triggered immediately. Because all the traffic is carried

onto a single LSP, this LSP is rerouted onto its SRLG-diverse NHOP backup

tunnel within a few tens of milliseconds. In a second step, the protected LSPs are

80Note that we use the IS-IS terminology here.



reoptimized to follow the shortest path between the PLR, or Point of Local Repair,

and its neighbor; this last operation, which occurs immediately after the LSP has

been locally rerouted onto its backup tunnel, is not traffic disruptive thanks to the

‘‘make before break’’ procedure detailed in Chapter 5.

In the case of a node failure, which also implies the failure of its local links, as

soon as the failure is detected, MPLS TE Fast Reroute is also triggered (remember,

the PLR cannot differentiate a link from a node failure). In this case, because the

decision has been made to use FRR for link protection only, only NHOP backup

tunnels have been configured to handle link failures in the network. So in the case of

a node failure, MPLS TE Fast Reroute just locally reroutes the 1-hop TE LSP onto

its NHOP backup tunnel. Obviously, this does not recover the traffic because

the node has failed. In this case study, node failures are handled by IP routing.

Consequently, as soon as the failure is detected by the IGP, each router adjacent to

the failed node will originate a new LSP after some time determined by the IGP

timer settings described earlier. The new LSP will be flooded throughout the

network. Each router receiving the new LSP will then wait 100 ms before triggering

a new SPF and recomputing its routing table. At this point, the network has

converged (Figure 6.26).

A very interesting aspect in the case of a link failure is that without any

particular measure, it turns out that a second traffic disruption will occur after

the link failure recovery performed by FRR, which is due to the loop effect resulting

A

F

DB

G

C

I KJ

B1: NHOP Backup Tunnel ProtectingAgainst a Failure of the Link D-E (B1is SRLG Diverse from the Link D-E)

S

H

E

1-Hop Fast Reroutable Primary TunnelCarrying the Traffic from D to E

A

F

DB

G

C

I KJ

S

H

E

1-Hop Primary TunnelRerouted onto B1 andThen Reoptimized Alongthe Path D-C-B-G-H-E

SRLG (Shared Risk Link Group)

Routing Decision

Data Flow

Figure 6.26 Case Study 3: Fast Reroute link protection þ IP routing fast convergence.



from the temporary lack of synchronization between the routers’ link state database

(such a temporary loop effect has been studied in detail in Chapter 4). As described

in Figure 6.26, when the link fails, the protected one-hop primary tunnel between

the node D and E is fast rerouted onto the NHOP backup tunnel within 50 ms.

Hence, the traffic traversing the link D-E is recovered within 50 ms. After some time

determined by the IGP settings mentioned earlier, IS-IS converges, but during a

short period, some temporary loops may potentially occur. For example, back to

Figure 6.26, one possible sequence of events is that the router D converges before

the router C, which results in a temporary loop. As explained in detail in Chapter 4,

the traffic may be dropped during the life of such a temporary loop.

This is an interesting fact because it highlights some interrecovery dependencies

between IP and MPLS TE Fast Reroute; to guarantee that the traffic disruption

upon a link failure is limited to 50 ms and that no additional traffic disruption is

experienced because of IP routing, the IGP must be enhanced to avoid the creation

of temporary loops upon the failure of links protected with a local protection

mechanism like MPLS TE Fast Reroute. Such an enhancement implies the

ability for IP to signal that a link is protected by FRR and some SPF algorithm

enhancements to avoid the creation of temporary loops in such cases. The ability

to signal a link as protected with some local protection has been proposed in

[FRR-PROT].

This raises an interesting question: Why declare the link as down in the IP layer if the

link is protected by the Fast Reroute? Why not just follow the backup path without

triggering any IP rerouting?

This is definitely another possible solution that can be deployed in practice with

existing commercial implementations thanks to the concept of FA, which allows

signaling in the IP layer an MPLS TE LSPs as a link. Let us consider what happens

when a protected one-hop tunnel is rerouted upon a link failure with FA. Once the

link D-E fails, the one-hop tunnel is locally rerouted onto its backup tunnel; shortly

after, the one-hop–protected TE LSP is reoptimized along another path. In any case,

the protected TE LSP stays alive, and hence because the TE LSPs are reported as a

‘‘physical link,’’ every other router in the network will still see a link in ‘‘up’’ state and

IP will never trigger any recovery process. This is illustrated in Figure 6.27.

The drawback of this approach is that the path followed by the rerouted LSPs

may not be optimal compared to the network state if IP rerouting would have

occurred. Indeed, supposing that all links have an identical cost of 1, the flows

between the nodes B and E will follow the path B-C-D-C-B-G-H-E (note that there

is no loop here because the IP packets will be carried onto MPLS TE LSPs between

the node D and E). On the other hand, without FA, IP routing will converge and B

will then route the traffic to E via G, along the path B-G-H-E. Note that if SONET/

SDHprotection had been used in place of MPLS TEFRR, the physical path followed

by the recovered link may have been similar to the MPLS TE backup tunnel path.

Backup capacity: The question of required backup capacity in both the IP and

the MPLS layer has been extensively discussed in Chapters 4 and 5, but as a



reminder, there are several possible strategies in such an interlayer case study,

as follows:

1. The IGP metrics are tuned by means of some off-line optimization tool

to provide bandwidth guarantees upon link and node failures.

The bandwidth guarantee during failure can be done for some flows

(the sensitive traffic like ‘‘voice’’) with QoS mechanisms in the network

like Diffserv to limit the amount of required backup capacity. In

this case, MPLS TE Fast Reroute will just be used to minimize the

packet loss (hence, guaranteeing fast recovery) upon link failure (in this

case study, FRR is used for link protection only). In other words, the

only constraint when computing the backup tunnel path is to find

an SRLG-diverse path. Backup capacity is just required in the client

layer (IP).

2. Another possibility is to reserve some backup capacity to place the

NHOP backup tunnels so that bandwidth guarantee is provided along

the backup tunnels paths for the whole link bandwidth or some pool of

bandwidth. Then the operator can either decide that node failures are

sufficiently rare not to justify to dedicate backup capacity in the IP layer

or to reserve some backup capacity in the IP layer in the case of node

failure. In the latter case, this will unavoidably lead to reserving backup

capacity in both layers to cover similar failures.

A

F

DB

G

C

I KJ

S

H

EFast ReroutedPrimary Tunnel Path

A B C D E

HGF

I J KIP Routing Topology

Figure 6.27 Case Study 3: Fast Reroute link protection þ IP routing fast convergence with FA.



Reuse of a restored resource: In our previous example, all the traffic from the

router D to the router E travels onto the primary protected TE LSPs between

those two nodes. Once the link D-E is restored, a new IGP adjacency is

established, but the traffic will only restart traversing the link D-E once the

primary TE LSPs is reoptimized along this shorter path. The head-end LSR D

can either decide to reoptimize the primary TE LSP as soon as the link is

restored and the IGP adjacency is operational but thanks to the IGP

dampening mechanism, a flapping link will not be immediately reused (see

Chapter 4 for more details on the algorithm). In addition, the operator can

also decide to adopt a timer-based reoptimization approach whereby the tunnel

will be reoptimized on a regular basis, reducing the risk to immediately reuse a

restored flapping link.

Summary

In this case study, we saw that MPLS TE Fast Reroute can be used to provide fast

recovery (50 ms) upon link failure (in the optical layer or the IP layer) in conjunc-

tion with IP routing to recover from IP/MPLS node failure. In such a multilayer

recovery strategy, FRR must be triggered without any delay (as soon as the link

failure is detected); then if the failure cannot be recovered by FRR (in case of node

failure), IP takes the appropriate set of recovery actions after some timers (deter-

mined by the IGP settings) have elapsed. Note that the network stability is

preserved by means of various dampening mechanisms at both layers (FRR reopti-

mization and IP routing). As far as the backup capacity is concerned, various

approaches are possible to minimize the required capacity. Note also that because

the backup capacity is provisioned in upper layers, this offers high granularity

(bandwidth guarantees can be determined on a per-flow level), helping to minimize

the required backup capacity.

6.4 Conclusion

In previous chapters survivability and recovery mechanisms were discussed from

the viewpoint of one network technology, and thus within a single network layer

(e.g., IP routing in the IP layer or 1þ1 optical protection in the OTN layer). In the

first part of this chapter, we highlighted the current evolution from static networks

to intelligent optical networks (IONs) featuring a distributed control plane (this was

used in the multilayer recovery strategies later in this chapter). Within the ITU-T, a

framework for such Automatic Switched Optical Networks (ASONs) is under

standardization, whereas the Generalized Multi-Protocol Label Switching

(G-MPLS) protocol suite under standardization in the IETF is the most likely

solution for implementing an ION. An example of optical restoration was given

in such G-MPLS networks.

The integration of different network technologies, such as IP and OTN,

into (realistic) multilayer transport networks offers new opportunities and



challenges as far as the survivability of such multilayer networks are concerned,

which was the subject of the second part of this chapter. A generic description of the

survivability in multilayer networks was given, which included three main categor-

ies: single-layer recovery in multilayer networks, static multilayer recovery in multi-

layer networks, and dynamic multilayer recovery in multilayer networks. The first

category discussed strategies that apply a single-layer recovery mechanism (i.e.,

recovery is strictly limited to one layer of the network when coping with network

failures) to provide survivability in the multilayer network. A step further gave us

the second category, in which recovery mechanisms will run in different layers of

the network as a reaction to the occurrence of one network failure. The choice of in

which layer to recover the traffic affected by a failure will depend on the circum-

stances, such as the failure type or the timing constraints. This requires some

coordination rules (a so-called escalation strategy) to ensure the efficient interwork-

ing and coordination between the network layers that are involved in the recovery

process. Several such escalation strategies were discussed. Part of this chapter

focussed on the challenges an operator would face when implementing a multilayer

recovery strategy: avoidance of some racing conditions that could occur in case of

multiple network recovery mechanisms at different layers, optimization of the

required network backup resource capacity. In addition, issues of network stability,

network operation complexity, and revertive operation were discussed. These static

multilayer recovery strategies are called static, because the logical network topology

is left unchanged (i.e., static) and no specific actions are taken to modify it. In the

third category, however—dynamic multilayer survivability strategies—such logical

topology modification is used for recovery purposes. This requires the possibility to

flexibly and in real time set up and tear down lower layer network connections that

implement logical links in the higher network layer. For example, optical networks

will be enhanced with a control plane, which gives the client networks the possibility

to initiate the setup and tear down of lightpaths through the optical network, and

which could be used to reconfigure the logical IP network when it is affected by a

network failure. It is worth mentioning that dynamic multilayer survivability strat-

egies have their own challenges in particular in terms of complexity and are not

available in a short term.

This chapter concludes with three case studies illustrating some realistic multi-

layer recovery deployment strategies. The first case study combines the use of

optical restoration with MPLS TE Fast Reroute local protection. The second case

study illustrates a pretty common deployment case in several networks combining

SONET/SDH protection with IP routing. Finally, it is shown how MPLS TE Fast

Reroute can be used in conjunction with IP routing in the third case study. Each

case study starts with an analysis of the mode of operation of each recovery

mechanism followed by a detailed description of the set of recovery actions upon

a network element failure in such a multilayer recovery strategy and in particular

the set of recovery actions under such circumstances. The network design consider-

ations are particularly emphasized throughout those case studies to avoid racing

conditions, maximize the network stability, and optimize the required amount of

network capacity.


6.4 Conclusion 477


Bibliography

[Ala03] W. Alanqar, et al, ‘‘Requirements for generalized MPLS (GMPLS) routing forautomatically switched optical network (ASON),’’ Internet draft: draft-ietf-ccamp-gmpls-ason-routing-reqts-01.txt, December 2003, work in progress. Available at: www.ietf.org.Accessed May 2004.

[ALGO-1] M. Garey, D. Johnson, ‘‘Computers and intractability: a guide to the theory ofNP-completeness,’’ New York, NY, Freeman, 1979.

[ALGO-2] R. Ahuja, T. Magnanti, J. Orlin, ‘‘Network flows,’’ Prentice Hall, EnglewoodCliffs, NJ, 1993.

[ALGO-3] V. Vazirani, ‘‘Approximation algorithms,’’ Springer Verlag, New York, NY,2001.

[ALGO-4] C. Papadimitriou, K. Steiglitz, ‘‘Combinatorial optimization: Algorithms andcomplexity,’’ Dover, Mineola, NY, 1998.

[Ari00] P. Arijs, et al, ‘‘Planning of WDM rings networks,’’ Photonic Network Communi-cations Magazine, vol. 2, no. 1, January 2000.

[Ari01] P. Arijs, ‘‘Planning of ring-based telecommunication networks,’’ PhD thesis, GhentUniversity, Ghent, Belgium, 2000–2001.

[Ari1/00] P. Arijs, M. Gryseels, P. Demeester, ‘‘Planning of WDM ring networks,’’ PhotonicNetwork Communications Magazine, vol. 2, no. 1, January 2000, pp. 33–51.

[Ari7/00] P. Arijs, et al, ‘‘Design of ring and mesh based WDM transport networks,’’ OpticalNetworks Magazine, vol. 1, no. 2, July 2000, pp. 25–40.

[Ari96] P. Arijs, ‘‘Development of algorithms for optimal ring selection within an SDHnetwork topology,’’ M. Sc. Thesis, Ghent University, Ghent, Belgium, 1995–1996.

[Ari97] P. Arijs, et al, ‘‘The design of SDH ring networks using tabu-search and simulatedannealing,’’ paper presented at the 5th International Conference on TelecommunicationSystems: Modelling and Analysis, Nashville, TN, March 1997.

[Ari98] P. Arijs, et al, ‘‘SDH protection in long distance networks: a practical case study,’’DRCN’98, Brugge, Belgium, May 17–20, 1998.

[ARPA-1] J.M. McQuilan, D.C. Walden, ‘‘The ARPANET design decisions,’’ ComputerNetworks, vol. 1, no. 5, August 1977.

[ARPA-2] J.M. McQuillan, I. Richer, E. Rosen. ‘‘An overview of the new routing algorithmfor the ARPANET,’’ ACM SIGCOMM Computer Communication Review, ACM Press,vol. 25, no. 1, January 1995, pp. 54–60.

479


[ARPA-3] J.M. McQuillan, I. Richer, E.C. Rosen, ‘‘ARPANET routing algorithm improve-ments—first semiannual technical report,’’ BBN report no. 3803, April 1978.

[ARPA-4] J.M. McQuillan, I. Richer, E.C. Rosen, D.P. Bertsekas, ‘‘ARPANET routingalgorithm improvements—second semiannual technical report,’’ BBN report no. 3940, Oc-tober 1978.

[ARPA-5] E.C. Rosen, J. Herman, I. Richer, J.M. McQuillan, ‘‘ARPANET routing algo-rithm improvements—third semiannual technical report,’’ BBN report no. 4088, April 1979.

[ARPA-6] J.M. McQuillan, I. Richer, E. Rosen, ‘‘ARPANET routing study—final report,’’BBN report no. 3641, September 1977.

[ARPA-7] W.E. Naylor, L. Kleinrock, ‘‘On the effects of periodic routing updates in packetswitched networks,’’ Conference Record, National Telecommunications.

[ARPA-8] E.C. Rosen, ‘‘The updating protocol of the new ARPANET routing algorithm,’’submitted to Fourth Berkeley Conference on Distributed Data Management and ComputerNetworks.

[Aut02] A. Autenrieth, A. Kirstadter, ‘‘Engineering end-to-end IP resilience using resilience-differentiated QoS,’’ IEEE Communications Magazine, vol. 40, no. 1, January 2002.

[Bat02] P. Batchelor, et al, ‘‘Study on the implementation of optical transparent transportnetworks in the European environment—results of the research project COST 239,’’ JournalPhotonic Network Communications, vol. 2, no. 1, January-March 2002.

[Ben01] G. Bennet, ‘‘The layperson’s guide to optical networking,’’ tutorial, third workshopon Design of Reliable Communication Networks (DRCN) 2001, Budapest, Hungary, Octo-ber 2001.

[BFD] W. Katz, ‘‘Bidirectional forwarding detection,’’ Internet draft: draft-katz-ward-bfd,work in progress. Available at: www.ietf.org. Accessed May 2004.

[Bon01] P. Bonenfant, ‘‘Short course on optical networking, architectures, standards, pro-tection & restoration,’’ European Conference on Optical Networking (ECOC) 2001, Amster-dam, The Netherlands, September 2001.

[BP-PLACEMENT] J.L. Le Roux, ‘‘A method for an optimized online placement of MPLSbypass tunnels.’’ Internet draft: draft-leroux-mpls-bypass-placement, October 2002, work inprogress. Available at: www.ietf.org. Accessed May 2004.

[Car97] T.J. Carpenter, et al, ‘‘Demand routing and slotting on ring networks,’’ DIMACSTechnical Report 97-02, January 1997.

[Cav IEEE] C. Cavazzoni, et al, ‘‘The IP/MPLS over ASON/GMPLS testbed of the ISTProject LION,’’ Journal of Lightwave Technology, vol. 11, November 2003.

[Cho03] J.K. Choi, et al, ‘‘General Switch Management Protocol (GSMP) v3 for opticalsupport,’’ Internet draft: draft-ietf-gsmp-optical-spec-02.txt, June 2003, work in progress.Available at: www.ietf.org. Accessed May 2004.

[Col00] D. Colle, et al, ‘‘Comparison of architectures for stacked ring network featuringcompact add/ drop multiplexers,’’ DRCN’00, Munich, Germany, April 9–12, 2000.

[Col02] D. Colle, ‘‘Design and evolution of data-centric optical networks,’’ PhD thesis,Ghent University, Ghent, Belgium, 2001–2002.


480 Bibliography

[Col02] D. Colle, et al, ‘‘Data-centric optical networks and their survivability,’’ IEEE Journalon Selected Areas in Communications, vol. 20, no. 1, January 2002.

[Col02] D. Colle, et al, ‘‘Data-centric optical networks and their survivability,’’ (invited)IEEE Journal on Selected Areas in Communications, vol. 20, no. 1, January 2002,pp. 6–20.

[ColONDM01] D. Colle, et al, ‘‘Porting MPLS recovery techniques to the MPLambdaSparadigm,’’ Optical Networks Magazine, vol. 2, no. 4, July/August 2001, pp. 29–47.

[ColPNC01] D. Colle, et al, ‘‘MPLS recovery mechanisms for IP-over-WDM networks,’’Photonic Network Communications, Kluwer Academic Publishers, vol. 3, no. 1/2, January2001, pp. 23–40.

[COMP-NETWORKS] L. Peterson, B. Davie, ‘‘Computer networks: a systems approach,’’Morgan Kaufmann, San Francisco, CA, 2003.

[Cos94] S. Cosares, I. Saniec, ‘‘An optimisation problem related to balancing loads onSONET rings,’’ Telecommunication Systems, vol. 3, no. 2, November 1994.

[Dem99] P. Demeester, et al, ‘‘Resilience in multi-layer networks,’’ IEEE CommunicationsMagazine, vol. 37, no. 8, August 1998, pp. 70–76.

[Dem99] P. Demeester, IEEE Communications Magazine, special issue on survivable com-munication networks, vol. 37, no. 8, August 1999.

[DeM02] S. De Maesschalck, et al, ‘‘Intelligent optical networking for multilayer survivabil-ity,’’ IEEE Communications Magazine, vol. 40, no. 1, pp. 42–49, January 2002.

[DeM03] S. De Maesschalck, et al. ‘‘Pan-European optical transport networks: an availabil-ity based comparison,’’ Photonic Network Communication, vol. 5, no. 3, May 2003,pp. 203–225.

[DeM04] S. De Maesschalck, et al, ‘‘Advantages of intelligent optical networks,’’ IEEECommunication Magazine, submitted.

[DIFFSERV-DEPLOY] J. Evans, C. Filsfils, ‘‘Deploying Diffserv in multiservice IP back-bone networks for tight SLA.’’

[DS-TE] F. Le Faucheur, et al, ‘‘Requirements for support of Differentiated Services-awareMPLS Traffic Engineering,’’ RFC 3564, Internet draft: draft-ietf-tewg-diff-te-reqts-06.txt,July 2003, work in progress. Available at: www.ietf.org. Accessed May 2004.

[Dwi00] A. Dwivedi, R. Wagner, ‘‘Traffic model for USA long-distance optical network,’’Proceedings of the Optical Fiber Conference (OFC) 2000, Baltimore, MD, March 2000, vol.1, TuK1-1, pp. 156–158.

[E800] ITU-T Recommendation E.800, ‘‘Terms and definitions related to quality of serviceand network performance including dependability,’’ ITU-T Standardization Organization,August 1994. Available at: www.itu.int. Accessed May 2004.

[Ell03] G. Ellinas, et al, ‘‘Routing and restoration architectures in mesh optical networks,’’Optical Network Magazine, vol. 4, no. 1, January/February 2003, pp. 91–106.

[ETSI1] ‘‘Transmission and multiplexing (TM); generic requirements of transport function-ality of equipment; part 1–1: generic processes and performance,’’ ETSI EN 300 417-1-1 V1.2.1, October 2001.


Bibliography 481

[ETSI2] ‘‘Transmission and multiplexing (TM); Synchronous Digital Hierarchy (SDH);Network protection schemes; interworking: rings and other schemes,’’ ETSI TS 101 010v1.1.1, November 1997.

[EWD-1166] E.W. Dijkstra, ‘‘EWD-1166,’’ November 1993. Available at: www.cs.utexa-s.edu/ users/EWD/ewd11xx/EWD1166.PDF. Accessed May 2004.

[FACILITY-BACKUP] J.P. Vasseur, et al, ‘‘MPLS traffic engineering fast reroute: bypasstunnel path computation for bandwidth protection,’’ Internet draft: draft-vasseur-mpls-backup-computation, November 2003, work in progress. Available at: www.ietf.org.Accessed May 2004.

[FAST-REROUTE] P. Pan, et al, ‘‘Fast reroute techniques in RSVP-TE,’’ Internet draft:draft-ietf-mpls-rsvp-lsp-fastreroute, May 2004, work in progress. Available at: www.ietf.org.Accessed May 2004.

[FM-RECOV] V. Sharma, F. Hellstrand, RFC3469, ‘‘Framework for Multi-Protocol LabelSwitching (MPLS)-based recovery.’’ Internet draft, February 2003, work in progress. Avail-able at: www.ietf.org. Accessed May 2004.

[FRED] D. Lin, R. Morris, ‘‘Dynamics of random early detection.’’

[FRR-IN-USE] ‘‘IS-IS Link attribute TLV,’’ Internet draft: draft-vasseur-isis-link-attibute,May 2004, work in progress. Available at: www.ietf.org. Accessed May 2004.

[G7041] ITU-T Recommendation G.7041/Y.1303, ‘‘Generic framing procedure,’’ ITU-TStandardization Organization. Available at: www.itu.int. Accessed May 2004.

[G7042] ITU-T Recommendation G.7042/Y.1305, ‘‘Link capacity adjustment scheme forvirtual concatenated signals,’’ ITU-T Standardization Organization, May 2002. Availableat: www.itu.int. Accessed May 2004.

[G707] ITU-T Recommendation G. 707, ‘‘Network node interface for the synchronousdigital hierarchy (SDH),’’ ITU-T Standardization Organization. Available at: www.itu.int.Accessed May 2004.

[G707] ITU-T Recommendation G.707/Y.1322, ‘‘Network node interface for the synchro-nous digital hierarchy (SDH),’’ ITU-T Standardization Organization. October 2000.Available at: www.itu.int. Accessed May 2004.

[G709] ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transport net-work,’’ ITU-T Standardization Organization, February 2001, and amendment 1, November2001. Available at: www.itu.int. Accessed May 2004.

[G783] ITU-T Recommendation G.783, ‘‘Characteristics of Synchronous Digital Hierarchy(SDH) equipment functional blocks,’’ ITU-T Standardization Organization, October 2000.Available at: www.itu.int. Accessed May 2004.

[G798] ITU-T Recommendation G.798, ‘‘Characteristics of optical transport network hier-archy equipment functional blocks,’’ ITU-T Standardization Organization, January 2002.Available at: www.itu.int. Accessed May 2004.

[G803] ITU-T Recommendation G.803, ‘‘Architecture of transport networks based on thesynchronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, March 2000.Available at: www.itu.int. Accessed May 2004.

[G805] ITU-T Recommendation G.805, ‘‘Generic functional architecture of transportnetworks,’’ ITU-T Standardization Organization, March 2000. Available at: www.itu.int.Accessed May 2004.

AU1

AU2

AU3

AU4

AU5

AU6,

AU7

AU8

AU9

AU10

AU11

AU12

AU13

AU14

AU15

AU16


482 Bibliography

[G806] ITU-T Recommendation G. 806, ‘‘Characteristics of transport equipment—descrip-tion methodology and generic functionality,’’ ITU-T Standardization Organization, October2000, and ITU-T Recommendation G.806, amendment 1, ITU-T Standardization Organiza-tion, prepublished March 2003. Available at: www.itu.int. Accessed May 2004.

[G807] ITU-T Recommendation G.807/Y.1302, ‘‘Requirements for automatic switchedtransport networks (ASTN),’’ ITU-T Standardization Organization, July 2001. Availableat: www.itu.int. Accessed May 2004.

[G808.1] ITU-T Recommendation G.808.1, ‘‘Generic protection switching—linear trailand subnetwork protection,’’ ITU-T Standardization Organization, under development.Available at: www.itu.int. Accessed May 2004.

[G8080] ITU-T Recommendation G.8080/Y.1304, ‘‘Architecture for the Automatic SwitchedOptical Network (ASON),’’ ITU-T Standardization Organization, November 2001. Avail-able at: www.itu.int. Accessed May 2004.

[G841] ITU-T Recommendation G.841, ‘‘Types and characteristics of SDH network protec-tion architectures,’’ ITU-T Standardization Organization, October 1998. Available at:www.itu.int. Accessed May 2004.

[G842] ITU-T Recommendation G.842, ‘‘Interworking of SDH network protection architec-tures,’’ ITU-T Standardization Organization, April 1997. Available at: www.itu.int. AccessedMay 2004.

[G871] ITU Recommendation G.871/Y1301, ‘‘Framework for optical transport networkrecommendations,’’ ITU-T Standardization Organization, October 2000. Available at:www.itu.int. Accessed May 2004.

[G872] ITU-T Recommendation G.872, ‘‘Architecture of optical transport networks,’’ITU-T Standardization Organization, November 2001. Available at: www.itu.int. AccessedMay 2004.

[G873.1] ITU-T Recommendation G. 873.1, ‘‘Optical Transport Network (OTN)—linearprotection,’’ ITU-T Standardization Organization, prepublished March 2003. Available at:www.itu.int. Accessed May 2004.

[G873.2] ITU-T Recommendation G.873.2, ‘‘Optical Transport Network (OTN)—ringprotection,’’ ITU-T Standardization Organization, under development. Available at: www.itu.int. Accessed May 2004.

[G911] ITU Recommendation G.911, ‘‘Parameters and calculation methodologies for reli-ability and availability of fibre optic systems,’’ ITU-T Standardization Organization, April1997. Available at: www.itu.int. Accessed May 2004.

[Gro00] W.D. Groover, D. Stamatelakis, ‘‘Bridging the ring-mesh dichotomy with p-cycles,’’proceedings of the second International Workshop on Design of Reliable CommunicationNetworks (DRCN’00), Munich, Germany, April 2000,’’ pp. 92–104.

[Gro02] W.D. Groover, J. Doucette, ‘‘Design of a meta-mesh of chain subnetworks:enhancing the attractiveness of mesh-restorable WDM networking on low connectivitygraphs,’’ IEEE Journal of Selected Areas in Communications, vol. 20, no. 1, January 2002,pp. 47–61.

[Gro04] W.D. Groover, ‘‘Mesh-based survivable networks: Options and strategies for op-tical, MPLS, SONET and ATM networking,’’ Prentice Hall PTR, Upper Saddle River, NJ,2003.

AU17

AU18

AU19

AU20

AU21

AU22,

AU23

AU24

AU25

AU26


AU41

Bibliography 483

[Gro98] W.D. Groover, D. Stamatelakis, ‘‘Cycle-oriented distributed preconfiguration: ring-like speed with mesh-like capacity for self-planning network restoration,’’ proceedings of theIEEE International Conference on Communications, Atlanta, GA, June 1998, pp. 537–543.

[Gry01] M. Gryseels, ‘‘Planning of multi-technology telecommunication networks,’’ PhDthesis, Ghent University, Ghent, Belgium, January 2001.

[Gry98] M. Gryseels, K. Struyve, M. Pickavet, P. Demeester, ‘‘Common pool survivabilityfor meshed SDH-based ATM networks,’’ proceedings of the International Symposium onBroadband European Networks (SYBEN’98), Zurich, Switzerland, May 1998, pp. 267–278.

[HASH] Z. Cao, Z. Wang, E. Zagura, ‘‘Performance of hashing-based schemes for Internetload balancing,’’ ITU-T Standardization Organization.

[Her02] E. Hernandez-Valencia, M. Scholten, Z. Zhu, ‘‘The Generic Framing Procedure(GFP): an overview,’’ IEEE Communications Magazine, vol. 40 no. 5, May 2002.

[HISTORY] Available at: www.cs.utexas.edu/users/chris/think/digital_archive.html. Ac-cessed May 2004.

[I321] ITU-T Recommendation I.321, ‘‘B-ISDN Protocol reference model and its applica-tion,’’ ITU-T Standardization Organization, April 1991. Available at: www.itu.int. AccessedMay 2004.

[IP-TE-1] B. Fortz, J. Rexford, M. Thorup, ‘‘Traffic engineering with traditional IP routingprotocols.’’

[IP-TE-2] D. Applegate, E. Cohen, ‘‘Making intra-domain routing robust to changing anduncertain traffic demands: understanding fundamental tradeoffs.’’

[IP-TE-3] B. Fortz, M. Thorup, ‘‘Internet Traffic Engineering by optimizing OSPF weights.’’

[IP-TE-4] M. Thorup, ‘‘Fortifying OSPF/IS-IS against failure.’’

[IP-TE-5] B. Fortz, ‘‘Optimizing OSPF/IS-IS weights in a changing world.’’

[IP-TE-6] A. Nucci, et al, ‘‘IGP link weight assignment for transient link failures.’’

[IP-TRAF] N. Brownlee, K. Claffy, ‘‘Understanding Internet traffic streams: dragonflies andtortoises,’’ IEEE Communications Magazine, October 2002, 110–117.

[ISIS] ISO, ‘‘Intermediate system to Intermediate system routing information exchangeprotocol for use in conjunction with the protocol for providing the connectionless-modenetwork service (ISO 8473),’’ ISO/IEC 10589, 1992.

[ISIS-GR] M. Shand, L. Ginsberg, ‘‘Restart signaling for IS-IS,’’ Internet draft: draft-ietf-isis-restart-05.txt, January 2004, work in progress. Available at: www.ietf.org. Accessed May2004.

[ISIS-MT] T. Przygienda, N. Shen, N. Sheth, ‘‘M-ISIS: multi topology,’’ Internet draft,January 2004, work in progress. Available at: www.ietf.org. Accessed May 2004.

[IS-IS-TAG] C. Martin, B. Neal, S. Previdi, ‘‘A policy control mechanism is IS-IS usingadministrative tags,’’ Internet draft, April 2003, work in progress. Available at: www.ietf.org.Accessed May 2004.

[ISIS-TE] L. Smit, ‘‘IS-IS extensions for traffic engineering,’’ Internet draft: draft-ietf-isis-traffic-05.txt, August 2003, work in progress. Available at: www.ietf.org. Accessed May2004.

AU27

AU28

AU29

AU30

AU31

AU32

AU33

AU34

AU35

AU36

AU37

AU38

AU39

AU40

AU42

AU43

AU44

AU45,

AU46

AU47,

AU48

AU49,

AU50


484 Bibliography

[Jur98] I. Jurdana, B. Mikac, ‘‘An availability analysis of optical cables,’’ Workshop on All-Optical Networks (WAON’98), Zagreb, Croatia, May 1998.

[Kal96] G. Kalbe, et al, ‘‘Operator requirements,’’ European ACTS project ProtectionAcross Network Layers (PANEL), deliverable D1, December 1996.

[Kar97] N. Karunanithi, T. Carpeneter, ‘‘SONET ring sizing with generic algorithms,’’Computers and Operations Research, vol. 24, no. 6, 1997.

[Kar99] S.V. Kartalopoulos, ‘‘Understanding SONET/SDH and ATM: communicationsnetworks for the next millennium,’’ IEEE Press, Piscataway, NJ, 1999.

[KINI] Kini, et al, ‘‘Shared backup label switched path restoration,’’ Internet draft: draft-kini-restoration-shared-backup, May 2001, work in progress. Available at: www.ietf.org.Accessed May 2004.

[Kom02] K. Kompella, Y. Rekhter, ‘‘LSP hierarchy with generalized MPLS TE,’’Internet draft: draft-ietf-mpls-lsp-hierarchy-08.txt, September 2002, work in progress. Avail-able at: www.ietf.org. Accessed May 2004.

[Kom03] K. Kompella, Y. Rekhter, ‘‘Routing extensions in support of generalizedMulti-Protocol Label Switching,’’ Internet draft: draft-ietf-ccamp-gmpls-routing-09.txt,October 2003, work in progress. Available at: www.ietf.org. Accessed May 2004.

[Lab02] J.F. Labourdette, et al, ‘‘Routing strategies for capacity-efficient fast-restorablemesh optical networks,’’ Photonic Network Communications, vol. 4, no. 3–4, Jan-Dec2002, pp. 219–235.

[Lab99] C. Labovitz, A. Ahuja, F. Jahanian, ‘‘Experimental study of Internet stability andwide-area backbone failures,’’ paper presented at the 29th Annual International Symposiumon Fault-Tolerant Computing, Madison, WI, June 1999.

[Las99] A. Lason, et al, ‘‘Network scenarios and requirements,’’ European ISTproject Layers Interworking in Optical Networks (LION), deliverable D6, September1999.

[Lem02] E. Lemuel, ‘‘Asia Pacific submarine cable network service restored,’’ Inq7.net, July2002. Available at: www.inq7.net/inf/2002/jul/18/ inf_1–1.htm. Accessed May 2004.

[LINKNODE-FAILURE] J.P. Vasseur, A. Charny, ‘‘Distinguish a link from a node failureusing RSVP hellos extensions,’’ Internet draft: draft-vasseur-mpls-linknode-failure, October2002, work in progress. Available at: www.ietf.org. Accessed May 2004.

[LSA-FLOOD1] ‘‘OSPF refresh and flooding reduction in stable topologies,’’ Internet draft:draft-pillay-esnault-ospf-flooding-07.txt, June 2003, work in progress. Available at: www.ietf.org. Accessed May 2004.

[LSA-FLOOD2] ‘‘Flooding optimizations in link-state routing protocols,’’ Internetdraft: draft-ietf-ospf-isis-flood-opt.txt, 2002, work in progress. Available at: www.ietf.org.Accessed May 2004.

[M20] ITU-T Recommendation M.20, ‘‘Maintenance philosophy for telecommunicationnetworks,’’ ITU-T Standardization Organization, October 1992. Available at: www.itu.int.Accessed May 2004.

[M30000] ITU-T Recommendation M.3000, ‘‘Overview of TMN recommendations,’’ ITU-TStandardization Organization, 1995. Available at: www.itu.int. Accessed May 2004.

AU51, AU52

AU53


Bibliography 485

[M3010] ITU-T Recommendation M.3010, ‘‘Principles for a telecommunications manage-ment network,’’ ITU-T Standardization Organization, February 2000. Available at: www.itu.int. Accessed May 2004.

[Man1] E. Mannie, et al, ‘‘Generalized multi-protocol label switching (GMPLS)architecture,’’ Internet draft: draft-ietf-ccamp-gmpls-architecture, March 2002, work in pro-gress. Available at: www.ietf.org. Accessed May 2004.

[Man2] E. Mannie, et al, ‘‘Recovery (protection and restoration) terminology for GMPLS,’’Internet draft: draft-ietf-ccamp-gmpls-recovery-terminology, June 2002, work in progress.Available at: www.ietf.org. Accessed May 2004.

[McC95] S. McCarthy, ‘‘Reliability keeps up with network growth,’’ Telephony, June 1995.

[McK00] ‘‘Backbone! How changes in technology and the rise of IP threaten to disruptthe long-haul telecom services industry,’’ September 2000. Available at: www.mckinsey.de/_downloads/knowmatters/telecommunications/backbone.pdf. Accessed May 2004.

[Mod01] E. Modiano, P.J. Lin, ‘‘Traffic grooming in WDM networks,’’ IEEE Communi-cations Magazine, vol. 39, no. 7, July 2001.

[MPLS-DESIGN] J.P. Vasseur, J. Guichard, F. Le Faucheur, ‘‘Real world designs ofconverged MPLS networks—review of deployed network designs to offer L2/L3 VPNs,QoS, traffic engineering, IPv6 and multicast,’’ Cisco Press, 2004 (in press).

[MPLS-TE] E. Osborne, A. Simha, ‘‘Traffic engineering with MPLS,’’ Cisco Press, Indian-apolis, IN, 2002.

[MT] ‘‘Routing in IS-IS,’’ Internet draft: draft-ietf-isis-wg-multi-topology-06.txt, work inprogress. Available at: www.ietf.org. Accessed May 2004.

[OSh94] C. O’Shea, ‘‘Requirements and reference configurations for survivability,’’ Euro-pean RACE project end-to-end Survivable Broadband Networks (IMMUNE), deliverableD2, June 1994.

[OSPF-TE] Y. Katz, ‘‘Traffic engineering extensions to OSPF,’’ Internet draft: draft-katz-yeung-ospf-traffic-09.txt, October 2002, work in progress. Available at: www.ietf.org.Accessed May 2004.

[OSPFv2] J. Moy, ‘‘OSPF version 2,’’ RFC 2328.

[OSPG-GR] J. Moy, P. Pillay-Esnault, A. Lindem, ‘‘Graceful OSPF restart,’’ Internet draft:draft-ietf-ospf-hitless-restart-08.txt, work in progress. Available at: www.ietf.org. AccessedMay 2004.

[OTNTS] ‘‘Optical Transport Networks & technologies standardization work plan,’’ ITU-TStandardization Organization, May 2002. Available at: http://www.itu.int/itudoc/itu-t/com15/otn/76091.html. Accessed May 2004.

[Owe02] K. Owens, V. Sharma, M. Oommen, ‘‘Network survivability considerations fortraffic engineered IP networks,’’ Internet draft: draft-owens-te-network-survivability, May2002, work in progress. Available at: www.ietf.org. Accessed May 2004.

[Pap02] D. Papadimitriou, et al, ‘‘Shared risk link groups encoding and processing,’’ Internetdraft: draft-papadimitriou-ccamp-srlg-processing, June 2002, work in progress. Available at:www.ietf.org. Accessed May 2004.


486 Bibliography

[Pap03] D. Papadimitriou, et al, ‘‘Requirements for generalized MPLS (GMPLS) signalingusage and extensions for Automatically Switched Optical Network (ASON),’’ Internet draft:draft-ietf-ccamp-gmpls-ason-reqts-05.txt, November 2003, work in progress. Available at:www.ietf.org. Accessed May 2004.

[PATH-COMP] J.P. Vasseur, et al, ‘‘RSVP path computation request and reply messages,’’Internet draft: draft-vasseur-mpls-computation-rsvp, 2004, work in progress. Available at:www.ietf.org. Accessed May 2004.

[PREEMPT-POL] J. De Oliviera, J.P. Vasseur, L.C. Chen, C. Scolglio, ‘‘LSP preemptionpolicies for MPLS traffic engineering,’’ Internet draft: draft-deoliviera-diff-te-preemption,2003, work in progress. Available at: www.ietf.org. Accessed May 2004.

[Raj00] B. Rajagopalan, et al, ‘‘IP over optical networks: architectural aspects,’’ IEEECommunication Magazine, vol. 38, no. 9, September 2000, pp. 44–102.

[Ram02] R. Ramaswami, K. Sivarajan, ‘‘Optical networks: a practical perspective,’’ 2nd ed,Morgan Kaufmann, San Francisco, CA, 2002.

[RED] S. Floyd, V. Jacobson, ‘‘Random early detection gateways for congestion avoidance,’’IEEE/ACM Transactions on Networking, vol. 1, no. 4, August 1993, pp. 397–413.

[REFRESH-REDUCTION] L. Berger et al, ‘‘RSVP refresh overhead reduction extensions,’’RFC2961, IETF Web site, April 2001. Available at: www.ietf.org. Accessed May 2004.

[REORDERING] M. Laor, L. Gendel, ‘‘Effect of packet reordering in a backbone link onapplications throughput,’’ IEEE Network, vol. 16, no. 5, September 2002.

[RFC2205] R. Braden, et al, ‘‘Resource Reservation Protocol (RSVP)—version 1 functionalspecification,’’ RFC2205, IETF Web site, September 1997. Available at: www.ietf.org.Accessed May 2004.

[RFC2474] K. Nichols, S. Blake, F. Baker, D. Black, ‘‘Definition of the DifferentiatedServices Field (DS Field) in the IPv4 and IPv6 Headers,’’ RFC 2474, IETF Web site.Available at: www.ietf.org. Accessed May 2004.

[RFC2547] E. Rosen, Y. Rekhter, ‘‘BGP/MPLS VPNs,’’ RFC2547, IETF Web site,March 1999. Available at: www.ietf.org.

[RFC3209] D. Awduche, et al, ‘‘RSVP-TE: extensions to RSVP for LSP tunnels,’’ RFC3209,IETF Web site, December 2001. Available at: www.ietf.org. Accessed May 2004.

[RFC3292] A. Doria, et al, ‘‘General Switch Management Protocol (GSMP) V3,’’ RFC3292,IETF Web site, June 2002. Available at: www.ietf.org. Accessed May 2004.

[RFC3471] L. Berger, ‘‘Generalized Multi-Protocol Label Switching (GMPLS) sig-naling functional description,’’ RFC3471, IETF Web site, January 2003. Available at:www.ietf.org. Accessed May 2004.

[RFC3473] L. Berger, ‘‘Generalized Multi-Protocol Label Switching (GMPLS) signalingresource Reservation protocol-traffic engineering (RSVP-TE) extensions,’’ RFC3473, IETFWeb site, January 2003. Available at: www.ietf.org. Accessed May 2004.

[RIP 1] C. Hedrick, ‘‘Routing Information Protocol,’’ RFC1058, IETF Web site. Availableat: www.ietf.org. Accessed May 2004.

[RIP 2] G. Malkin, ‘‘RIP version 2,’’ RFC1723, IETF Web site. Available at: www.ietf.org.Accessed May 2004.

AU54,

AU55

AU56

AU57, AU58

AU59

AU60, AU61

AU62

AU63, AU64

AU65

AU66, AU67

AU68

AU69

AU70

AU71

AU72

AU73

AU74

AU75

AU76

AU77

AU78

AU79

AU80

AU81

AU82


Bibliography 487

[RIP-TRIG] G. Meyer, S. Shrerry, ‘‘Triggered extensions to RIP to support demand cir-cuits,’’ RFC2091, IETF Web site. Available at: www.ietf.org. Accessed May 2004.

[Rob01] L. Roberts, C. Crump, ‘‘US Internet IP traffic growth,’’ Caspian networks, August2001. Available at: www.caspiannetworks.com/library/presentations/traffic/Internet_Traffic_081301.ppt. Accessed May 2004.

[Ros01] E. Rosen, et al, ‘‘Multi-Protocol Label Switching,’’ RFC3031, IETF Web site,January 2001. Available at: www.ietf.org. Accessed May 2004.

[ROUTING-THESIS] P. Narvaez, ‘‘Routing reconfiguration in IP network,’’ MIT, June2000.

[RSVP-TE] D. Awduche, et al, ‘‘RSVP-TE: extensions to RSVP for LSP tunnels,’’ RFC3209,IETF Web site, December 2001. Available at: www.ietf.org. Accessed May 2004.

[SECOND-METRIC] F. Le Faucheur, et al, ‘‘Use of Interior Gateway Protocol (IGP) metricas a second MPLS traffic engineering metric,’’ Internet draft: draft-ietf-tewg-te-metric-igp,work in progress. Available at: www.ietf.org. Accessed May 2004.

[Sex92] M. Sexton, A. Reid, ‘‘Transmission networking: SONET and the SynchronousDigital Hierarchy,’’ Artech House, Norwood, MA, 1992.

[Sha03] V. Sharma, F. Hellstrand, ‘‘Framework for MPLS-based recovery,’’ Internet draft,work in progress, RFC 3469, IETF Web site, February 2003. Available at: www.ietf.org.Accessed May 2004.

[Shi01] T. Shiragaki, et al, ‘‘Protection architecture and applications of OCh shared protec-tion rings,’’ Optical Network Magazine, vol. 2, no. 4, July/August 2001, pp. 48–58.

[Soc91] T. Socolofsky, C. Kale, ‘‘A TCP/IP tutorial,’’ RFC1180, IETF Web site, January1991. Available at: www.ietf.org. Accessed May 2004.

[SOFT-PREEMPTION] M. Meyer, et al, ‘‘MPLS traffic engineering soft preemption,’’Internet draft: draft-ietf-mpls-soft-preemption, work in progress. Available at: www.ietf.org.Accessed May 2004.

[Sos94] J. Sosnosky, ‘‘Service applications for SONET DCS distributed restoration,’’ IEEEJournal on Selected Areas in Communications, vol. 12, no. 1, January 1994, pp. 59–68.

[Str00] K. Struyve, et al, ‘‘Application, design and evolution of WDM in GTS’sPan-European transport network,’’ IEEE Communications Magazine, vol. 38, no. 3, 2000,pp. 114–121.

[Str01] J. Strand, A. Chiu, R. Tkach, ‘‘Issues for routing in the optical layer,’’ IEEECommunications Magazine, vol. 39, no. 2, February 2001, pp. 81–87.

[SURVIV] R. Bhandari, ‘‘Survivable networks: Algorithms for diverse routing,’’ KluwerAcademic Publishers, Amsterdam, The Netherlands, 1999.

[TE-REQ] D. Awduche, et al, ‘‘Requirements for traffic engineering over MPLS,’’ RFC2702,IETF Web site, September 1999. Available at: www.ietf.org. Accessed May 2004.

[TRAF-EST] ‘‘Traffic matrices estimation: existing techniques and new directions,’’Available at: http://www.acm.org/sitcom/sigcomm2002/papers/trafficmatrix.html. AccessedMay 2004.

AU83,

AU84

AU85

AU86

AU87

AU88,

AU89

AU90

AU91

AU92

AU93

AU94

AU95

AU96

AU97

AU98

AU99

AU100

AU101

AU102

AU103

AU104

AU105


488 Bibliography

[TRAVEL-SALESMAN]. Available at: http://members.cox.net/mathmistakes/travel.htm.Accessed May 2004.

[Ver95] D. Vercauteren, P. Demeester, J. Luystermans, E. Houtrelle, ‘‘Availability analysis ofmulti-layer networks,’’ proceedings of the third International Conference on Telecommuni-cations System Modeling and Analysis, Nashville, TN, March 1995, pp. 483–493.

[Vhe00] P. Van Heuven, et al, ‘‘Recovery in IP based networks using MPLS,’’ paperpresented at the IEEE Workshop on IP-oriented Operations & Management IPOM 2000,2–4 September 2000, Cracow, Poland, pp. 70–78.

[Vis02] M. Vissers, ‘‘Optical Transport Network & Optical Transport Module,’’ ITU-TStandardization Organization, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709–intro-v2.ppt. Accessed May 2004.

[Wau99] N. Wauters, G. Ocakoglu, K. Struyve, P.F. Fonseca, ‘‘Survivability in a newpan-European carriers’ carrier network based on WDM and SDH technology: currentimplementation and future requirements,’’ IEEE Communications Magazine, vol. 37, no.8, August 1999, pp. 63–69.

[Wil01] G. Willems, et al, ‘‘Capacity versus availability trade-offs in mesh-restorable WDMnetworks,’’ proceedings of the third international workshop on Design of Reliable Commu-nication Networks (DRCN’01), Budapest, Hungary, October 2001.

[Wos01] L. Wosinska, L. Thylen, R. Holmstrom, ‘‘Large-capacity strictly nonblockingoptical cross-connects based on microelectrooptomechanical systems (MOEMS) switchmatrices: reliability performance analysis,’’ Journal of Lightwave Technology, vol. 19, no.8, August 2001.

[WRED] Cisco Systems. Available at: www.cisco.com/univercd/cc/td/doc/product/software/ios112/ios112p/gsr/wred_gs.htm. Accessed May 2004.

[Wu97] T.-H. Wu, N. Yoshikai, ‘‘ATM transport and network integrity,’’ Academic Press,Amsterdam, The Netherlands, 1997.

[X200] ITU-T Recommendation X.200, ‘‘Data networks and open systems communications:opens systems interconnection—model and notation,’’ ITU-T Standardization Organization,July 1994. Available at: www.itu.int. Accessed May 2004.

[X700] ITU-T Recommendation X.700, ‘‘Management framework for Open SystemsInterconnection (OSI) for CCITT applications,’’ ITU-T Standardization Organization,September 1992. Available at: www.itu.int. Accessed May 2004.

[X701] ITU-T Recommendation X.701, ‘‘Information management—Open Systems Inter-connection (OSI)—system management overview,’’ ITU-T Standardization Organization,August 1997. Available at: www.itu.int. Accessed May 2004.

AU106

AU107

AU108

AU109

AU110

AU111,

AU112

AU113

AU114,

AU115

AU116

AU117,

AU118


Bibliography 489


List of Figure Sources

Figure 1.5 ITU-T Recommendation I.321, ‘‘B-ISDN Protocol Reference Model and itsApplication,’’ April 1991. Available at: www.itu.int. Accessed May 2004.

Figure 1.9 G. Kalbe, et al, ‘‘Operator requirements,’’ European ACTS project ProtectionAcross Network Layers (PANEL), deliverable D1, December 1996.

Figure 1.14 V. Sharma, F. Hellstrand, ‘‘Framework for MPLS-based recovery,’’ Internetdraft, work in progress, RFC 3469, IETF Web site, February 2003. Available at: www.ietf.org. Accessed May 2004.

Figure 1.15 V. Sharma, F. Hellstrand, ‘‘Framework for MPLS-based recovery,’’ Internetdraft, work in progress, RFC 3469, IETF Web site, February 2003. Available at: www.ietf.org. Accessed May 2004.

Figure 2.2 ITU-T Recommendation G.805, ‘‘Generic functional architecture of transportnetworks,’’ ITU-T Standardization Organization, March 2000. Available at: www.itu.int.Accessed May 2004.

Figure 2.3 ITU-T Recommendation G.803, ‘‘Architecture of transport networks based onthe synchronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, March2000. Available at: www.itu.int. Accessed May 2004.

Figure 2.4 ITU-T Recommendation G.707/Y.1322, ‘‘Network node interface for the syn-chronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, October 2000.Available at: www.itu.int. Accessed May 2004.

Figure 2.5 ITU-T Recommendation G.707/Y.1322, ‘‘Network node interface for the syn-chronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, October 2000.Available at: www.itu.int. Accessed May 2004.

Figure 2.7 ITU-T Recommendation G. 806, ‘‘Characteristics of transport equipment—description methodology and generic functionality,’’ ITU-T Standardization Organization,October 2000, and ITU-T Recommendation G.806, amendment 1, ITU-T StandardizationOrganization, prepublished March 2003. Available at: www.itu.int. Accessed May 2004.



491

AU1

AU2

AU3AU4

AU5AU6

AU7

AU8

AU9

AU10

AU11

AU12

AU13

AU

PUB

PUB2




Figure 2.18 C. Brianza, et al, ‘‘Deliverable D2a: Overall Network Protection—Version 1,’’deliverable from the ACTS-project PANEL, April 1997.

Figure 2.21 ITU-T Recommendation G.841, ‘‘Types and characteristics of SDH networkprotection architectures,’’ ITU-T Standardization Organization, October 1998. Available at:www.itu.int. Accessed May 2004.


Figure 2.23 ITU-T Recommendation G.803, ‘‘Architecture of transport networks based onthe synchronous digital hierarchy (SDH),’’ ITU-T Standardization Organization, March2000. Available at: www.itu.int. Accessed May 2004.

Figure 2.24 ‘‘Transmission and multiplexing (TM); generic requirements of transportfunctionality of equipment; part 1-1: generic processes and performance,’’ ETSI EN 300417-1-1 V1.2.1, October 2001.






Figure 2.36 (Top) ITU-T Recommendation G.842, ‘‘Interworking of SDH network protec-tion architectures,’’ ITU-T Standardization Organization, April 1997. Available at: www.itu.int. Accessed May 2004.

Figure 2.37 ITU-T Recommendation G.842, ‘‘Interworking of SDH network protectionarchitectures,’’ ITU-T Standardization Organization, April 1997. Available at: www.itu.int.Accessed May 2004.

AU14

AU15

AU16

AU17

AU18

AU19

AU20

AU21

AU22

AU23

AU24

AU25

AU26


492 List of Figure Sources













Figure 3.4 Adapted from R. Ramaswami, K. Sivarajan, ‘‘Optical networks: a practicalperspective,’’ 2nd ed, Morgan Kaufmann, San Francisco, CA, 2002.

Figure 3.5 Adapted from R. Ramaswami, K. Sivarajan, ‘‘Optical networks: a practicalperspective,’’ 2nd ed, Morgan Kaufmann, San Francisco, CA, 2002.

Figure 3.6 Adapted from J. Derkacz, et al. ‘‘IP/OTN Cost Model and Photonic EquipmentCost Forecast-IST LION project,’’ Proc. 4th Workshop on Telecommunications Techno-economics, Rennes, France, May 2002.

AU27

AU28

AU29

AU30

AU31

AU32

AU33

AU34

AU35

AU36

AU37

AU38



Figure 3.8 Adapted from M. Vissers, ‘‘Optical Transport Network & Optical Trans-port Module,’’ ITU-T Standardization Organization, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt.Accessed May 2004. and ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the opticaltransport network,’’ ITU-T Standardization Organization, February 2001, and amendment1, November 2001. Available at: www.itu.int. Accessed May 2004.

Figure 3.10 Adapted from ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for theoptical transport network,’’ ITU-T Standardization Organization, February 2001, andamendment 1, November 2001. Available at: www.itu.int. Accessed May 2004.

Figure 3.11 ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transportnetwork,’’ ITU-T Standardization Organization, February 2001, and amendment 1,November 2001. Available at: www.itu.int. Accessed May 2004.

Figure 3.12 Adapted from M. Vissers, ‘‘Optical Transport Network & Optical Trans-port Module,’’ ITU-T Standardization Organization, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt.Accessed May 2004.



Figure 3.15 ITU-T Recommendation G.709/Y.1331, ‘‘Interfaces for the optical transportnetwork,’’ ITU-T Standardization Organization, February 2001, and amendment 1, Novem-ber 2001. Available at: www.itu.int. Accessed May 2004.




Figure 3.19 M. Vissers, ‘‘Optical Transport Network & Optical Transport Module,’’ ITU-TStandardization Organization, April 2002. Available at: http://ties.itu.int/ftp/itu-t/com15/tsg15opticaltransport/tsg15opticaltransport/OTN/g709-intro-v2.ppt. Accessed May 2004.



AU39

AU40

AU41

AU42

AU43

AU44

AU45

AU46

AU47

AU48

AU49

AU50

AU51


494 List of Figure Sources

Figure 3.24 J. Strand, A. Chiu, R. Tkach, ‘‘Issues for routing in the optical layer,’’ IEEECommunications Magazine, vol. 39, no. 2, February 2001, pp. 81–87.

Figure 3.34 P. Arijs, et al, ‘‘Design of ring and mesh based WDM transport networks,’’Optical Networks Magazine, vol. 1, no. 2, July 2000, pp. 25–40.

Figure 3.36 (Left) Adapted from S. De Maesschalck, et al. ‘‘Pan-European optical trans-port networks: an availability based comparison,’’ Photonic Network Communication, vol.5, no. 3, May 2003, pp. 203–225.


Figure 3.39 Adapted from S. De Maesschalck, et al. ‘‘Pan-European optical transportnetworks: an availability based comparison,’’ Photonic Network Communication, vol. 5,no. 3, May 2003, pp. 203–225.

Figure 3.40 Adapted from S. De Maesschalck, et al. ‘‘Pan-European optical transportnetworks: an availability based comparison,’’ Photonic Network Communication, vol. 5,no. 3, May 2003, pp. 203–225.



Figure 3.46 W.D. Groover, D. Stamatelakis, ‘‘Bridging the ring-mesh dichotomy withp-cycles,’’ proceedings of the second International Workshop on Design of ReliableCommunication Networks (DRCN’00), Munich, Germany, April 2000, pp. 92–104.

Figure 4.2 M. Dodge. ‘‘Cybermap of the Month Column,’’ ARAPANET, October 1980.(Illustration courtesy of the Computer Museum of History.Center.) Available at http://mappa.mundi.net/maps/maps_001. Accessed May 2004.

Figure 6.5 ‘‘User network interface (UNI) 1.0 Signaling Specification,’’ Optical Internet-working Forum/User Network Interface Specifications (OIF2000.125.5), June 2001. Avail-able at www.oiforum.com. Accessed May 2004.

Figure 6.7 D. Colle, et al. ‘‘Developing control plane models for optical networks,’’ Tech-nical Digest, 2002 Optical Fiber Communication Conference (OFC2002), Anaheim, CA,March 17-22, 2002, pp. 757–759.

Figure 6.8 D. Colle, et al. ‘‘Developing control plane models for optical networks,’’ Tech-nical Digest, 2002 Optical Fiber Communication Conference (OFC2002), Anaheim, CA,March 17-22, 2002, pp. 757–759.

Figure 6.22 S. De Maesschalck, et al, ‘‘Intelligent optical networking for multilayersurvivability,’’ IEEE Communications Magazine, vol. 40, no. 1, pp. 42–49, January 2002.

Figure 6.23 S. De Maesschalck, et al, ‘‘Intelligent optical networking for multilayersurvivability,’’ IEEE Communications Magazine, vol. 40, no. 1, pp. 42–49, January 2002.

Figure 6.24 Adapted from P. Demeester, et al, ‘‘Resilience in multi-layer networks,’’ IEEECommunications Magazine, vol. 37, no. 8, August 1998, pp. 70–76.

AU52

AU53

AU54




Index

Numbers

1þ1 packet protection in MPLS

TE, 333–334

1þ1 protection (dedicated), 31

1:1 protection (dedicated with

extra traffic), 32

1:N linear APS, 76–78

1:N protection, 32

one-to-one backup in MPLS TE

backup tunnel path

computation, 419–421

bandwidth sharing capability,

343

overview, 318–319, 345

RSVP signaling, 382–384

A

A functions. See adaptation

functions (A)

access points (APs), 44, 45

accounted failure scenarios,

deriving, 16–18

accounting management, 42

adaptation functions (A)

defined, 44, 45

f2 filters in sink function, 60

adapted information (AI), 44, 45

adaptive dynamic distributed

routing algorithm in

ARPANET

ARPA-2 version, 208, 210

ARPANET map (October

1980), 209

efficiency analyses, 210

first version, 208, 209

issues arising with, 208

terms defined, 207–208

adaptive routing protocols,

207–208

add/drop multiplexers (ADMs).

See also Optical Add/

Drop Multiplexers

(OADMs)

DXCs versus, 82

interconnection of stacked

STM-N Rings and,

103–104

in ring networks, 82

in SDH networks, 52–54

additive latency, as criterion

for recovery mechanisms,

27

administrative link cost increase,

temporary loops and,

256–257

ADMs. See add/drop

multiplexers (ADMs)

AELT (Average Expected Loss

of Traffic), 191–192,

196–197

AI (adapted information), 44,

45

Alarm Indication Signal (AIS)

defect in OTNs, 153,

155–156

Alarm Indication Signal (AIS) in

SDH

AU_AIS signal, 65, 67, 68, 70,

72, 73–74

fault detection and

propagation inside NEs,

60–70

fault propagation and

notification on network

level, 70, 72–74

late arrival of MS_AIS signal

in VC-4 cross connection,

68, 69

MS_AIS signal, 65, 66, 70, 72

race conditions in HOP and

LOP layers, 72, 73, 74

TT sink function and, 58

TU_AIS signal, 68, 72–73

algorithm complexity

CPU power and, 281–282

defined, 279–284

Dijkstra algorithm, 243

efficiency and, 279–280, 282,

283–284

as function of problem size,

280–281

implementation choices and

efficiency, 283–284

NP complete problem, 284

QoS during failure and,

265

worst-case scenario, 282–283

algorithms, routing. See routing

algorithms

All Ones defect (dAIS) in SDH,

50, 51

alternate paths. See recovery

paths

American National Standards

Institute (ANSI), 57

APCN 2 (Asia Pacific Cable

Network) submarine

cable break, 15

application layer (Layer 5), 6

APs (access points), 44, 45

APS protocol. See automatic

protection switching

(APS) protocol

497


ARPANET routing protocol,

207–210, 287

Asia Pacific Cable Network

(APCN 2) submarine

cable break, 15

ASON. See Automatic Switched

Optical Network (ASON)

ASTNs (Automatic Switched

Transport Networks),

425–426

asymmetrical load balancing,

260–262

asymmetrical services, 3

AT&T, erroneous software

update in, 15

atomic functions

overview, 43, 45

responsibility in fault

propagation and

notification (SDH),

70, 71

augmented model for control

plane, 435–437

automatic protection switching

(APS) protocol


bidirectional (dual-ended)

operation, 76, 77

linear APS, 74–76

MS-SP Rings and, 83–86,

88

ring APS, 76

sublayer tandem connection

monitoring, 78–80

subnetwork connection

protection, 78–80

trail protection, 74–78

unidirectional (single-ended)

operation, 77

Automatic Switched Optical

Network (ASON)

control plane (CP), 425,

426–437

framework, 425–426

G-MPLS and, 3

optical connection controllers

(OCCs), 425

standardization, 425

transport planes (TPs), 425

Automatic Switched Transport

Networks (ASTNs),

425–426

availability. See also reliability

calculations for optical

networks, 185–192

comparison between 1þ1

protection in ring-based

and mesh-based optical

networks, 192–193

comparison between

protection and

restoration in mesh-based

networks, 194–197

defined, 9

example for computing, 9

formula for, 11, 185

recovery schemes and, 21

topology versus, in mesh-

based optical networks,

195

traffic type versus, in mesh-


195–197

availability calculations for

optical networks

availability of connections and

load, 188–191

ELT and AELT, 191–192

line failures, 186–188

optical node failures, 185

protected connection, 189–190

restored connection, 190–191

unprotected connection,

188–189

Average Expected Loss of

Traffic (AELT), 191–192,

196–197

B

backup capacity

as criterion for recovery

mechanisms, 26

dedicated versus shared, 29–30

overlay backup capacity

network discovery in

MPLS TE facility

backup, 404–405

required amount of backup

capacity (multilayer

recovery case study),

468–469

in single-layer recovery

mechanisms, 29–30

backup path computation in

MPLS TE

bandwidth sharing between

backup paths, 392–393

diverse path computation

algorithm, 393–394

facility backup, 397–419

global path protection,

393–397

guaranteeing QoS during

failure, 386–387, 388–392

introduction, 386

network design

considerations, 387–392

one-to-one backup, 419–421

overview, 385–386, 421

QoS considerations in

backbone network

profiles, 387–388

backup path selection in MPLS

TE, 349–350

backup tunnel path computation

in MPLS TE facility

backup

amount of bandwidth to

protect, 401–405

backup tunnels selection,

416–418

centralized computation,

409–411

distributed model,

411–416

facility-based model, 409

independent CSPF-based

model, 405–409

manual configuration versus

dynamic computation,

397–400

overlay backup capacity

network discovery,

404–405

path computation client

(PCE), 418–419


498 Index

with strict QoS guarantees

during failure, 400–419

triggers, 400

without QoS guarantee during

failure, 397–400


(BDI) defect in OTNs,

153, 156


Overhead (BDI-O) defect

in OTNs, 153


Payload (BDI-P) defect in

OTNs, 153

Backward Error Indication

(BEI) defect in OTNs,

153, 156

bandwidth optimization using

MPLS TE, 306

bandwidth protection violation

in MPLS TE, 350–353

bandwidth sharing capability

between backup paths,

392–393

case study, 368–380

local versus global protection


basic level user reliability

requirements, 18

BDI (Backward Defect

Indication) defect in

OTNs, 153, 156

BDI-O (Backward Defect

Indication Overhead)

defect in OTNs, 153

BDI-P (Backward Defect

Indication Payload)

defect in OTNs, 153

BEI (Backward Error


OTNs, 153, 156

Bellman-Ford routing protocols.

See distance vector

routing protocols

bidirectional (dual-ended)

operation in APS, 76,

77

bidirectional connections in

SDH/SONET, 3

bidirectional forwarding

detection in hello-based

mechanisms, 223

bidirectional line switched Rings

(BLSR). See Multiplex

Section-Shared

Protection Rings (MS-SP

Rings) in SDH

bidirectional linear protection in

MSP, 107–108

bidirectional path switched

Rings (BPSR). See

Subnetwork Connection

Protection Rings (SNCP

Rings)

bidirectional traffic, 3

BML (business management

layer), 43

bottom-up escalation, 446–447

business critical user reliability

requirements, 18

business management layer

(BML), 43

bypass. See facility backup or

bypass in MPLS TE

C

C (connection function), 43, 45

cable cuts

link failure caused by, 220

overview, 12–13

preventing, 20

submarine cable break

(APCN 2), 15

CAPital EXpenditure (CAPEX),

IONs and reduction in,

424

case studies for MPLS TE

recovery mechanisms

(Case Study 1: UK

Network)

assumptions, 354–356

link protection, 356–357

objectives, 356

proposed design, 356–357


recovery mechanisms

(Case Study 2: UK

Network with Shared

SRLGs)

additional assumptions, 359

additional objectives, 359


node protection, 361



recovery mechanisms

(Case Study 3: Complex

US Network)

abbreviations, 364


bandwidth sharing,

368–380


node protection, 368

objectives, 364–365

proposed design, 365

case studies for multilayer

recovery (Case Study 1:

Optical Restoration and

MPLS TE Fast Reroute)

interlayer recovery

mechanisms, 466

overview, 465, 469

required amount of backup

capacity, 468–469

set of recovery actions,

466–468

single-layer recovery

mechanisms, 465–466



SONET/SDH Protection

and IP Routing)

interlayer recovery

mechanisms, 470

overview, 470





MPLS TE Fast Reroute

and IP Rerouting Fast

Convergence)

overview, 476

set of recovery actions,

472–476


Index 499


recovery (continued )



case study for IP routing with

IS-IS

analysis, 275–277


dampening mechanisms

configuration, 272–274

objectives, 272


SPF duration time, 277–278

case study for SDH protection

strategies, 115–127


cost comparisons, 125–127

hybrid SNCP/MS-SP Ring

protection, 122–123,

124–127

network design and evaluation

process, 123–125

network scenario, 115–116

node configurations, 116–122

objective, 123

protection strategies, 122–123

pure end-to-end SNCP

protection, 122, 123–127

pure MS-SP Ring protection,

122, 124–127

CCI (Connection Control

Interface), 425, 431

centralized recovery

mechanisms, 34–35

centralized routing architectures,

RP failure and, 221,

225

characteristic information (CI),

44, 45

circuit switching, 4

classes of recovery


mechanisms, 28

TE LSPs and, 326

Coarse Wavelength Division

Multiplexing (CWDM),

133

common pool strategy, 451–454

configuration management, 42

Connection Control Interface

(CCI), 425, 431

connection function (C), 43, 45

connection points (CPs), 43, 45

connectionless networks, 4–5, 36

connection-oriented networks,

4–5, 36

control plane (CP) in ASONs

architectures, 432–437

augmented model, 435–437

Connection Control Interface

(CCI), 425, 431

defined, 425

G-MPLS and, 426–429

main function, 426

overlay model, 433–434

peer model, 434–435

protocols for implementing,

426–432

control plane overview, 7–8

count-to-infinity problem with

distance vector routing

protocols, 206

CP. See control plane (CP) in

ASONs

CPs (connection points), 43, 45

CWDM (Coarse Wavelength

Division Multiplexing),

133

D

dAIS (All Ones defect) in SDH,

50, 51

dampening algorithms in IP

routing

exponential back-off timer


exponential decay algorithm

for interface dampening,

227–228

fast converge and, 226

flapping resources and, 226

stability preserved by, 226

up-state timer algorithm, 227

D&C interconnection. See drop

and continue

interconnection of rings

in SDH

data link layer (Layer 2), 5

data plane, 6

data-centric networks, evolution

in, 2–3

dDEG (Degraded Signal defect)

in SDH, 50, 51

decentralized recovery

mechanisms, 34–35

dedicated backup capacity, 29,

30

dedicated protection paths

with extra traffic, 32

overview, 31

dedicated protection rings. See

Multiplex Section-

Dedicated Protection

Rings (MS-DP Rings) in

SDH; Optical Multiplex

Section-Dedicated

Protection Rings

(OMS DPRings)

dedicated recovery mechanisms

in ring-based optical

networks, 161, 171–173

defect detection times in SDH

networks, 50–52, 56

defects. See also failures

defined, 10

OTN maintenance signals and

alarm suppression,

154–157

in OTNs, 152–153


Degraded Signal defect (dDEG)

in SDH, 50, 51

degree of survivability, 10

Dense Wavelength Division

Multiplexing (DWDM),

133, 215

Detour LSP in MPLS TE,

318–319, 343, 345

Detour LSP merging, 319,

384–385

DETOUR Object (RSVP),

375–376

dEXC (Excessive Error defect)

in SDH, 50, 51

Diffserv code point (DSCP)

packet marking and, 234–235

packet scheduling and, 235


500 Index

queuing packets based on,

235–236

digital cross-connects (DXCs) in

SDH

ADMs versus, 82



level, 70, 72–74

overview, 53–54

Dijkstra algorithm

complexity, 243

described, 242–243

Dijkstra quoted on, 241

incremental Dijkstra


performance, 248–249

step by step example,

243–248

Dijkstra, Edger, 241

distance vector routing protocols

count-to-infinity problem, 206

Enhanced Interior Gateway

Routing Protocol

(EIGRP), 207

example, 204–206

inefficiency during network

element failure, 205–206

link state protocols versus,

212–213

objective, 204

overview, 204–207

Routing Information Protocol

(RIP), 207


with triggered update

(RIP-TRIG), 207

split horizons techniques,

206–207

distributed (decentralized)

recovery mechanisms,

34–35

distributed routing architectures

RP failure and, 221, 225

temporary loops during link

or node failure, 253–257

distributed routing tables, 207

diverse routing, 21

dLOF (Loss of Frame defect) in

SDH, 50, 51

dLOM (Loss of Multiframe

defect) in SDH, 50, 51

dLOP (Loss of Pointer defect) in

SDH, 50, 51

dLOS (Loss of Signal defect) in

SDH, 51–52

double protection, 449–454

dPLM (Payload Mismatch


dRDI (Remote Defect

Indication defect) in

SDH, 50, 51

drop and continue

interconnection of rings

in SDH

MS-SP and SNCP Rings,

101–102

MS-SP Rings, 97–101

overview, 95, 106

SNCP Rings, 96–97

DSCP. See Diffserv code point

(DSCP)

dTIM (Trace Id Mismatch


dual homing principle, 20, 21

duct topology, SRG and fiber

cable topology versus,

159–160

dUNEQ (Unequipped VC


duplication of packets, as

criterion for recovery

mechanisms, 27

DWDM (Dense Wavelength

Division Multiplexing),

133, 215

DXCs. See digital cross-connects

(DXCs) in SDH

dynamic multilayer recovery

global reconfiguration option,

460

for IP-over-OTN network,

458–460

local reconfiguration option,

460, 462–463

logical IP topologies and, 458,

460, 461

static schemes versus,

457–458, 460, 462–463

dynamic recovery paths, 30, 31

dynamic routing tables, 207

E

earthquake, Hanshin/Awaji,

14–15

EDFA (Erbium-doped fiber

amplifiers), 133

EIGRP (Enhanced Interior

Gateway Routing

Protocol), 207

ELT (Expected Loss of Traffic),

191–192, 194–195

Enhanced Interior Gateway

Routing Protocol

(EIGRP), 207

equipment failures

link failure caused by, 220–221

overview, 13

preventing, 20

Erbium-doped fiber amplifiers

(EDFA), 133

escalation strategy

bottom-up escalation,

446–447

defined, 444

hold-off timer

implementation, 448

recovery token signal

implementation, 448

top-down escalation, 447–448

evolution in data-centric

networks, 2–3

evolution of SHD/SONET to

OTNs, 39

evolution of the optical network

layer

adding flexibility, 139

mesh organization, 137–139

optical nodes, 135

ring organization, 135–137

WDM in point-to-point

optical network layer,

132–134

Excessive Error defect (dEXC)

in SDH, 50, 51

Expected Loss of Traffic (ELT),

191–192, 194–195


Index 501

exponential back-off timer

dampening algorithm,

228–229

exponential decay algorithm for

interface dampening,

227–228

external causes of failure, 12

F

f1 filters (SDH), 58, 80

f2 filters (SDH)

maintenance signals and, 58

overview, 81

in A sink function, 60

in TT sink function, 58, 60

FA (forwarding adjacency), 429

facility backup or bypass in

MPLS TE

backup tunnel path



342–343

link failure and mode of

operation, 322–324

link protection versus node

protection, 320

node failure and mode of

operation, 321–322

overview, 345

PLR behavior before failure,

379–381

PLR behavior during failure,

381–382


single NHOP or NNHOP

backup tunnel in,

319–321

failure coverage. See scope of

failure coverage

failure detection in IP routing

hello-based mechanisms,

223–224

lower layers failure

notification, 222–223

failure profiles

link failures in IP routing,

220–221

link failures in MPLS TE, 353

node failures in IP routing,

221–222

node failures in MPLS TE,

353

failure scenarios

deriving accounted failure

scenarios, 16–18

protection versus restoration

and, 31

scope of, 25

failures. See also defects; outages

or faults; reliability;

specific kinds

accounted versus

unaccounted, 16

availability and optical line

failures, 186–188

availability and optical node

failures, 185

commonly occurring, 12–13

defects defined, 10

drastic or severe, 13–15

failure-and-repair process, 10

FCC reporting requirements,

13

internal versus external causes,

12

link failures in IP routing,

220–221, 225, 253–257

MTBFs (mean time between

failures), 11

MTTR (mean time to repair),

11

in multilayer networks,

438–439

multiple-link, 17–18

network element failure

defined, 10

node failures in IP routing,

221–222, 225–226,

253–257

preventing, 20–21

QoS during, 262–266

root or primary, 10

secondary, or symptoms, 10

single-link, 16, 17

single-node, 16–17

terminology, 10–11

time of failure, 10

unintentional versus

intentional, 8

fast converge



routing and, 226

interaction between fast IGP

convergence and NSF,

293–295

fast recovery, MPLS TE for,

306–307

Fast Reroute (FRR) in MPLS

TE. See local protection

in MPLS TE

FAST-REROUTE Object

(RSVP), 374–375

fault clearing time, 24

fault detection and

characterization in IP

routing, 214

fault detection and propagation

in NEs (SDH)

cable cut upstream of

regenerator, 61–62

cable cut upstream of VC-4

cross-connection, 63–65

distorted noise/signal entering

regenerator, 62–63

incoming AU_AIS signal, 65,

67, 68

incoming MS_AIS signal, 65,

66



68, 69

summary, 68, 70

fault detection and propagation

in optical networks

associated overhead, 143,

145–150

defects, 152–153

maintenance signals and

alarm suppression,

154–157

nonassociated overhead,

150–152

optical channel data unit

overhead (ODUk OH),

145–148


502 Index

optical channel overhead

(OCh OH), 150–151

optical channel payload unit

overhead (OPUk OH),

145

optical channel transport unit

overhead (OTUk OH),

149–150

optical multiplex section

overhead (OMS OH),

151

optical transmission section

overhead (OTS OH),

151–152

overview, 144–145

fault detection in MPLS TE

bandwidth protection

violation and, 350–353

differentiating link failures

from node failures,

349–353

optimal backup path selection

and, 349–350

RSVP hello protocol

extension, 348–349

fault detection time

overview, 23

recovery cycle and, 307–308

fault indication signal (FIS)

as IGP update message,

310–311


propagation in IP routing,

229–237

as RSVP Path Error message,

311

fault management

defined, 42

hierarchy in SDH, 58, 59

SDH processes, 58–60

fault notification time

illustrated, 23

in IP routing, 215

overview, 23

in recovery cycle, 308–309

RSVP reliable messaging,

308–309

fault repaired notification time,

24, 25

faults. See failures; outages or

faults

FCC (Federal Communications

Commission) reporting

requirements, 13

FDI (Forward Defect

Indication) in OTN,

155–156

FDI-O (Forward Defect

Indication Overhead)

defect in OTNs, 153

FDI-P (Forward Defect

Indication Payload)

defect in OTNs, 152–153

FDM (Frequency Division

Multiplexing), 132

Federal Communications

Commission

(FCC) reporting

requirements, 13

FIB (forwarding information

base), 204, 251

fiber cable topology

SRG and duct topology

versus, 159–160

SRG and fiber topology

versus, 160, 161

fiber topology, SRG and fiber

cable topology versus,

160, 161

FIS. See fault indication signal

(FIS)

fish problem in traffic

engineering, 298–301

flapping resources


routing and, 226

LSA origination and, 232–233

flexible optical networks, 200

flow random early detection

(FRED), 236


(FDI) in OTN, 155–156


Overhead (FDI-O) defect

in OTNs, 153


Payload (FDI-P) defect in

OTNs, 152–153

forwarding adjacency (FA),

429

forwarding information base

(FIB), 204, 251

FRED (flow random early

detection), 236

Frequency Division

Multiplexing (FDM),

132

FRR (Fast Reroute) in MPLS

TE. See local protection

in MPLS TE

G

gateways

dual-gateway ring

interconnection schemes,

94–95, 106

between self-healing rings,

node architectures for,

104–105

General Switch Management

Protocol (GSMP), 425,

431

Generalized Multi-Protocol

Label Switching

(G-MPLS)

for ASON CP

implementation, 426–429

ASON standardization and,

425

for dynamic lightpath

allocation, 3

label presentation options,

426–427

link state routing in G-MPLS-

capable networks, 428

LSP representation in,

428–429

restoration, 429–430

generic multilayer recovery

approaches. See also

specific approaches

case studies, 464–476

deciding which layers get

recovery schemes, 439

dynamic multilayer recovery,

457–463


Index 503

generic multilayer recovery

approaches (continued )

generic framework for

multilayer survivability,

464

need for multilayer recovery,

438–439

overview, 437–438, 464,

476–477

single-layer recovery schemes


439–444

static multilayer recovery

schemes, 444–457

supporting spare resources for

multilayer recovery,

449–454

global default restoration in

MPLS TE

advantages and drawbacks,

343–344

defined, 310

fault indication signal (FIS),

310–311

mode of operation, 311–313

overview, 343–344

recovery cycle with, 312–313

recovery time, 313–314

revertive versus nonrevertive

modes, 346–347


global path protection in MPLS

TE


344–345

backup path computation,

393–397


341–342

defined, 310

local protection compared to,

336–346

mode of operation,

315–316

overview, 314, 344–345

recovery time, 316, 336


modes, 347

standardization, 370–371

state overhead and scalability,

336–340

global recovery, defined, 33

G-MPLS. See Generalized

Multi-Protocol Label

Switching (G-MPLS)

good news (link cost decrease),

iSPF and, 288–291

GSMP (General Switch

Management Protocol),

425, 431

guaranteed bandwidth, as


mechanisms, 27

H

Hanshin/Awaji earthquake,

14–15

hello-based mechanisms

bidirectional forwarding

detection, 223

false-positive alarms, 349

IGP hellos, 223

in IP routing, 223–224

layer 2 link failure notification

versus, 223–224


RSVP hello protocol

extension, 348–349

helper neighbors of restarting

routers, 269

higher order path (HOP) layer in

SDH. See also Virtual

Containers-n (VC-n) in

SDH

DXC (digital cross-connect),

70–74, 81

LOPs carried by, 47–48

MS-SP Rings and, 83

overview, 46–47, 55

race conditions and AIS

propagation, 72, 73, 74

hold-off time

illustrated, 23, 24

in multilayer recovery, 38, 448

in recovery cycle, 23, 308

in reversion cycle, 24

hold-off timer

escalation strategy

implementation, 448


HOP layer. See higher order

path (HOP) layer in SDH

I

IETF (Internet Engineering

Task Force), OTN work

by, 139

IGP. See interior gateway

protocols (IGP)

incremental Dijkstra algorithm

(iSPF)

efficiency, 293

final algorithm, 291–293

history, 287

link cost decrease (good news)

and, 288–291

link cost increase and,

287–288

motivation, 285–287

inherent supervision of

subnetwork connections

(SDH), 79

integrated approach tomultilayer

recovery, 37, 449

integrity, defined, 9–10

intelligent optical networks

(IONs), 424, 476

interface dampening using

exponential decay


interior gateway protocols (IGP)

FIS as IGP update message,

310–311

hello-based mechanisms in IP

routing, 223



293–295

link metric manipulation, 265

metric optimization, 263–264


planned node failure and, 226

RP failure and, 225

temporary loop duration and

timers, 255–256


504 Index

Intermediate System to

Intermediate System

(IS-IS) routing protocol


multitopology routing,

238–241

overview, 212


shortest path computation

triggers, 249

TE LSPs and, 328

internal causes of failure, 12

International

Telecommunication

Union (ITU)

OTN standards, 139, 144

SDH standardized by, 57

work on OTN recovery,

158–159

Internet Engineering Task Force

(IETF), OTN work by,

139

Internet Protocol (IP) routing

algorithm complexity,

279–284

analysis of recovery cycle,

214–220

case study with IS-IS


dampening algorithms,

226–229

distance vector routing

protocols, 204–207

failure characterization,

224

failure detection, 222–224

failure profiles, 220–222

fault notification time, 215

FIS propagation, 229–237


(FIB), 204

global versus local recovery

and, 213–214

hold-off timer, 214–215

impact of failure types on

traffic forwarding,

225–226





293–295

link failures, 220–221, 225

link state routing protocols,

207–213

load balancing, 259–262

lower layers failure

notification, 222–223

LSA origination and flooding,

215, 229–237

multilayer recovery case

studies, 469–476

node failures, 221–222,

225–226

nonstop forwarding (NSF)

OSPF example, 266–270

overview, 278–279

protocols, 204–214

QoS during failure, 262–265

rerouting upon link failure

(example), 217–220

research-related topics, 295

route computation, 237–252

routing table computation,

215–217

temporary loops during

network state changes,

252–258

Internet Protocol/Multi-

Protocol Label Switching

(IP/MPLS)

IP/MPLS-over-OTN

multilayer model, 2–3,

203

unidirectional connections

in, 3

interworking. See escalation

strategy

intrusive supervision of


(SDH), 79–80

IONs (intelligent optical

networks), 424, 476

IP layer

failure scenarios, 17–18

in IP-over-OTN network, 6, 7

single-layer versus multilayer

recovery and, 36–37

IP routing. See Internet Protocol

(IP) routing

IP/MPLS-over-OTN multilayer

model, 2–3, 203

IP-over-OTN network

IP layer in, 6, 7

multilayer recovery

requirement in, 438–439

OTN layer in, 6, 7

IS-IS. See Intermediate System

to Intermediate System

(IS-IS) routing protocol

iSPF (incremental SPF). See


algorithm (iSPF)

ITU. See International

Telecommunication

Union (ITU)

J

jitter, as criterion for recovery

mechanisms, 27

L

label switched router (LSR)

configuration of TE LSP on

head-end, 303, 311–312

head-end versus midpoint

versus tail-end, 301

preemption, 305–306

latency, as criterion for recovery

mechanisms, 27

Layer 1 (physical layer), 5, 16–17

Layer 2 (data link layer), 5

Layer 3. See network layer

(Layer 3); optical

network layer

Layer 4 (transport layer), 5–6

Layer 5 (application layer), 6

layered network representation

for IP-over-OTN network, 6, 7

for multitechnology networks,

6, 7

for OTNs, 139–142

reference models, 5

for SDH networks, 46–48, 55

for SONET networks, 57

TCP/IP protocol stack, 5–6


Index 505

LCK (Locked) defect in OTNs,

153

LCs (link connections), 43

line failures in optical networks,

availability and, 186–188

linear protection in SDH,

107–113

multiplex section protection

(MSP), 107–108, 109, 113

overview, 113

path protection, 108–112, 113

link connections (LCs), 43

link cost decrease (good news),

iSPF and, 288–291

link cost increase, iSPF and,

287–288

link disjoint or link diverse TE

LSPs, 301

link failures in IP routing

causes of, 220–221

detection, 222–224

failure characterization, 224

impact on traffic forwarding,

225

LSA origination and, 231

temporary loops from,

253–257

temporary loops from

restored links, 257–258

link failures in MPLS TE

Case Study 1 (UK network),

356–357

Case Study 2 (UK network

with shared SRLGs),

360–361

Case Study 3 (complex US

network), 365–368

differentiating from node

failures, 349–353


353

Link State Advertisement (LSA)

aspects of LSA flooding, 230

example of rerouting upon

link failure, 217

flooding procedure defined,

215, 229

flooding procedure overview,

233–237

impact of origination on

network, 232–233

inefficiencies in flooding,

230–231

LSA refresh, 231–232

opaque LSA, 267

origination process, 231–233

OSPF versus IS-IS routing

protocol and, 212

parameters tuning, 233,

250–251

propagation delay, 233, 237

queuing delays, 233, 234–237

temporary loop duration and

flooding, 255

time estimate for origination

and flooding process, 237

link state databases (LSDBs),

211–212

Link State Packet (LSP), 212

link state routing protocols

distance vector protocols

versus, 212–213

hierarchical routing, 211–212

history, 207–210

IS-IS, 212

link state databases (LSDBs),

211–212

objective, 204

OSPF, 212

overview, 210–213

protocol data units (PDUs),

210–211

load balancing

defined, 259

MPLS TE and, 334–335

per-packet versus pre-

destination, 259–260

recovery upon network failure

and, 261–262

symmetrical versus

asymmetrical, 260–262

local protection for IP,

researches on, 295

local protection in MPLS TE


345–346


397–421


342–343

comparison of approaches,

332–333

defined, 310

Detour LSP merging, 319,

384–385

facility backup or bypass,

320–324, 342–343, 345,

379–382, 397–419

global protection compared

to, 336–346

local defined, 317

merge point (MP), 317, 382

motivations for deploying, 329

multilayer recovery case

studies, 465–469, 471–476

network design with full mesh

of unconstrained TE

LSPs, 329–330

network design with

unconstrained one-hop

TE LSPs, 330–332

NHOP backup tunnel, 316

NNHOP backup tunnel,

316–317

notification of tunnel locally

repaired, 327–328

one-to-one backup or Detour

LSP, 318–319, 343, 345,

382–384, 419–421

overview, 345–346

point of local repair (PLR),

316, 379–382

principles of recovery

techniques, 317–318

protection defined, 318

recovery time, 336


modes, 347–348

RSVP signaling extensions,

372–385

signaling extensions, 329


state overhead and scalability,

336–340

TE LSP properties, 325–326


local recovery, defined, 32–33


506 Index

Locked (LCK) defect in OTNs,

153

LOF (Loss of Frame) defect in

OTNs, 152

logical spare unprotected

strategy, 450–454

LOM (Loss of Multiframe)

defect in OTNs, 152

LOP layer. See lower order path

(LOP) layer in SDH

LOS-O (Loss of Signal

Overhead) defect in

OTNs, 152

LOS-P (Loss of Signal Payload)

defect in OTNs, 152

Loss of Frame defect (dLOF) in

SDH, 50, 51

Loss of Frame (LOF) defect in

OTNs, 152

Loss of Multiframe defect

(dLOM) in SDH, 50, 51

Loss of Multiframe (LOM)

defect in OTNs, 152

Loss of Pointer defect (dLOP) in

SDH, 50, 51

Loss of Signal defect (dLOS) in

SDH, 51–52

Loss of Signal Overhead

(LOS-O) defect in OTNs,

152

Loss of Signal Payload (LOS-P)

defect in OTNs, 152

Loss of Tandem Connection

(LTC) defect in OTNs,

152

low cost user reliability

requirements, 18

lower layers failure notification

in IP routing

layer 2 notification versus

hello-based detection,

223–224

overview, 222–223

lower order path (LOP) layer in

SDH. See also Virtual

Containers-n (VC-n) in

SDH

DXC (digital cross-connect),

70–74, 81

multiple LOPs carried in HOP

layer, 47–48


race conditions and AIS

propagation, 72, 73, 74

LSA. See Link State

Advertisement (LSA)

LSDBs (link state databases),

211–212

LSP (Link State Packet), 212

LSPs, traffic engineering. See

TE Label Switch Paths

(TE LSPs)

LSR. See label switched router

(LSR)

LTC (Loss of Tandem

Connection) defect in

OTNs, 152

M

M:N protection, 32

management plan, 8

matrix connection (MC), 43

mean time between failures

(MTBFs)

in availability formula, 11, 185

for cable cuts, 12

defined, 11

for equipment failures, 13

optical line failures and, 186,

187

optical node failures and, 185

unprotected connections and,

188

mean time to repair (MTTR)

in availability formula, 11, 185

for cable cuts, 13

defined, 11

for equipment failures, 13

optical line failures and, 186

optical node failures and, 185

unprotected connections and,

188

merge point (MP), 317, 382

mesh networks. See also mesh-

based optical networks

ring networks versus, 3–4, 35

single-layer recovery in, 35

mesh-based optical networks.

See also mesh networks

availability comparison

between 1þ1 protection


networks and, 192–193


between protection and

restoration schemes,

194–197

availability versus topology,

195

availability versus traffic type,

195–197

optical cross-connects (OXCs)

in, 137–138

overview, 137–139


173–182

ring-based versus mesh-based

recovery schemes,

182–185

meta-mesh recovery technique in

optical networks,

199–200

MP (merge point), 317, 382

MPLS (Multi-Protocol Label

Switching) in IP layer,

2–3. See also Generalized


Switching (G-MPLS);

Internet Protocol/Multi-

Protocol Label Switching

(IP/MPLS)

MPLS TE. See Multi-Protocol

Label Switching traffic

engineering (MPLS TE);


Switching traffic

engineering (MPLS TE)

recovery techniques

MS layer. See multiplex section

(MS) layer in SDH

MS-DP Rings. See Multiplex

Section-Dedicated

Protection Rings

(MS-DP Rings) in SDH

MSP. See multiplex section

protection (MSP)


Index 507

MS-SP Rings. See Multiplex

Section-Shared


Rings) in SDH

MTBFs. See mean time between

failures (MTBFs)

MTTR. See mean time to repair

(MTTR)

multiautonomous systems

networks, TE LSPs and,

328

multilayer networks. See

Automatic Switched

Optical Network

(ASON); generic

multilayer recovery

approaches

multilayer recovery


common pool strategy,

451–454


deciding which layers get

recovery schemes, 439

double protection strategy,

449–450

dynamic, 457–463

generic framework for

multilayer survivability,

464

integrated approach, 37, 449


strategy, 450–454

need for, 438–439

network operation complexity

and, 455

overview, 36–37, 476–477

qualitative performance

comparison, 456–457

revertive operation, 456

sequential approach, 37,

446–448



440–444

static recovery schemes,

444–457

supporting spare resources

for, 449–454

trade-off between rerouting

time and network

stability, 454–455

uncoordinated approach,

444–446

multiple rings. See ring

interconnection in SDH

multiple-link failures, 17–18

multiplex section (MS) layer in

SDH

defect detection times, 50–52

linear protection, 107–108,

113

overhead bytes, 56

overview, 47, 55


(MSP)

bidirectional linear protection,

107–108

MS-DP Rings, 82, 91–93

MS-SP Rings, 82, 83–91

OMS DPRings, 163–164

OMS SPRings, 164–166

OMS-versus OCh-based

approach, 170–171

overview, 113

path protection versus, 110

shared versus dedicated

approach in optical

networks, 171–173

STM-1 linear protection, 108,

109

unidirectional linear

protection, 108

Multiplex Section-Dedicated

Protection Rings

(MS-DP Rings) in SDH

interconnection, 102–103

misconnections, 92–93

MS-SP Rings versus, 82

operation, 91–92

optical ring networks

compared to, 163

overview, 105

spatial reuse prevented in,

92

as unidirectional line switched

Rings (ULSR),

106

Multiplex Section-Shared


Rings) in SDH. See also

case study for SDH

protection strategies

APS protocol and, 83–86, 88

as bidirectional line switched

Rings (BLSR), 106

drop and continue


drop and continue

interconnection with

SNCP Rings, 101–102

in failure-free situation, 83

link failure and, 83–86

logical view, 88–89

misconnections, 89

as MS trail protection

technique, 86

MS-DP Rings versus, 82

Non-preemptible Unprotected

Traffic (NUT) support,

86

one-way delay on long path,

85–86

operation, 83–86


compared to, 163

overview, 105

span protection in four-fiber

ring, 86–88

spare/protection capacity

sharing between

nonoverlapping

connections, 89–91

spatial reuse feature, 82, 89–91

squelching mechanisms, 88–89

states of ring nodes, 83–86

two-fiber versus four-fiber


multiplexing. See also specific

kinds of multiplexers

byte-interleaved versus bit-

interleaved, 45–46

STM-N ADM example, 52–54

Multi-Protocol Label Switching

(MPLS) in IP layer, 2–3.

See also Generalized



508 Index

Switching (G-MPLS);

Internet Protocol/


Switching (IP/MPLS)


traffic engineering

(MPLS TE). See also


Switching traffic


recovery techniques;


(TE LSPs)

bandwidth optimization

using, 306

classical fish problem, 298–301

components, 303–305

fast recovery using, 306–307

motivations for deploying,

306–307, 329

preemption in, 305–306

QoS guarantees, 306, 386–387,

388–392, 400–419

shared risk link group (SRLG)

and, 301–303


traffic engineering in data

networks, 298–301

tunneling using TE Label

Switch Paths (TE LSPs),

300–301


traffic engineering

(MPLS TE) recovery

techniques

1þ1 packet protection,

333–334


385–421


comparison of global and

local protection, 336–346

extensions for point-to-

multipoint LSPs, 422

failure profile and fault

detection, 348–354

global default restoration,

310–314, 343–344,

346–347

global path protection, 310,

314–316, 344–345, 347

load balancing and, 334–335

local protection, 310, 316–333,

345–346, 347–348

MPLS TE refresher, 298–307

overview, 371–372

recovery cycle analysis,

307–310

research-related topics, 422


modes, 346–348

RSVP signaling extensions for

local protection, 372–385

standardization, 370–371

multitechnology networks, 6

multitopology routing, 238–241

N

NCs (network connections), 44

neighbors of restarting routers,

269

network connections (NCs), 44

network element layer (NEL), 43

network elements (NEs) in SDH

fault detection and

propagation inside, 60–70


network layer (Layer 3). See also

optical network layer

failure scenarios, 17–18

overview, 5

in SDH networks, 46–48, 55

in SONET networks, 57

network management interface

for ASTN (NMI-A), 426

network management layer

(NML), 43

Network Management System

(NMS)

abstraction levels or layers, 43

management aspects of, 42

restoration in SDH networks

and, 113–115

in transmission networks, 42

network planes. See also control

plane (CP) in ASONs

control plane, 7–8

data or user plane, 6

illustrated, 7

management plan, 8

network reliability. See

reliability

NMI-A (network management

interface for ASTN), 426

NML (network management

layer), 43

NMS. See Network

Management System

(NMS)

node disjoint or node diverse TE

LSPs, 301

node failures in IP routing. See

also route processor (RP)

failure



225–226

LSA origination and, 231

planned, 221–222, 226

temporary loops from,

253–257

node failures in MPLS TE

Case Study 2 (UK network

with shared SRLGs), 361

Case Study 3 (complex US

network), 368

differentiating from link

failures, 349–353


353–354

planned, 354

nonintrusive supervision of


(SDH), 79

Non-preemptible Unprotected

Traffic (NUT), 86

nonrevertive mode

inMPLSTErecovery, 346–348


mechanisms and, 36


OSPF example

backward compatibility,

269–270

entering graceful restart mode,

267–268


Index 509

nonstop forwarding (continued )

entering in helper mode, 269

exiting graceful restart mode,

268–269

grace period, 267

during graceful restart period,

268


convergence and, 293–295

mode of operation of

restarting router, 267–269

mode of operation of

restarting router’s

neighbors, 269

mode of operation overview,

267

overview, 266–267

restarting period defined, 266

NP complete problems, 284

NSF. See nonstop forwarding

(NSF) OSPF example

NTT (Japanese telephone

company), 14–15

NUT (Non-preemptible

Unprotected Traffic), 86

O

OADMs. See Optical Add/Drop

Multiplexers (OADMs)

OCCs (optical connection

controllers) in ASONs,

425–426

OCh layer. See optical channel

(OCh) path layer of

OTNs

OCh OH (optical channel

overhead), 150–151

OCI (Open Connection


OTNs, 153

ODU (optical channel data unit)

layer of OTNs, 141

ODUk OH (optical channel data

unit overhead), 145–148

OEO (optical-electrical-optical)

OXC switches, 137

OIF (Optical Internetworking

Forum), OTN work by,

139

OMS DPRings. See Optical

Multiplex Section-


Rings (OMS DPRings)

OMS layer. See optical multiplex

section (OMS) layer of

OTNs

OMS OH (optical multiplex

section overhead) in

optical networks, 151

OMS SPRings. See Optical

Multiplex Section-Shared

Protection Rings (OMS

SPRings)

1þ1 packet protection in MPLS

TE, 333–334

1þ1 protection

dedicated with extra traffic, 32

1:1 protection

dedicated, 31


1:N protection, 32

one-to-one backup in MPLS TE

backup tunnel path



343

overview, 318–319, 345


Open Connection Indication

(OCI) defect in OTNs,

153

Open Shortest Path First

(OSPF) routing protocol


example, 266–270

overview, 212

packet marking and, 234


triggers, 249

TE LSPs and, 328

OPeration EXpenditure

(OPEX), IONs and

reduction in, 424

Optical Add/Drop Multiplexers

(OADMs)

fixed, 135, 136

flexible, 135–136

OMS DPRings and, 164

OMS SPRings and, 165–166

optical nodes and, 135

recovery in ring-based optical

networks and, 161, 164,

165–166, 170, 171

ring organization and, 135–137

optical channel data unit (ODU)

layer of OTNs, 141


overhead (ODUk OH),

145–148

optical channel (OCh) path layer

of OTNs

defined, 140

recovery mechanisms in ring-


161

optical channel overhead (OCh

OH), 150–151


(OPU) layer of OTNs, 141


overhead (OPUk OH),

145

optical channel protection in

ring-based optical

networks

mixed OCh DPRings and

OCh SPRings, 170


versus, 170–171

OCh DPRings, 166–169

OCh SPRings, 169–170

optical channel transport unit

overhead (OTUk OH),

149–150

optical connection controllers

(OCCs) in ASONs,

425–426

optical cross-connects (OXCs)

in mesh-based optical

networks, 137–138,

176–177

OEOEO opaque switches, 137

opaque or OEO OXC

switches, 137

restoration schemes in

mesh-based OTNs and,

178–179


510 Index

SRLGs and, 302

transparent or OOO OXC

switches, 137

wavelength routing

(WR-OXC), 138, 176–177

wavelength translating

(WT-OXC), 138, 177

Optical Internetworking Forum

(OIF), OTN work by, 139

optical multiplex section (OMS)

layer of OTNs

defined, 140

link-based restoration

schemes, 178



161


overhead (OMS OH),

151

Optical Multiplex Section-


Rings (OMS DPRings)

overview, 163–164

shared approach versus, 161,

171–173

Optical Multiplex Section-

Shared Protection Rings

(OMS SPRings)

dedicated approach versus,

161, 171–173

overview, 164–166

optical network layer


evolution, 132–139


with optical nodes, 135

recovery schemes, 157–158


WDM in point-to-point,

132–134

optical networks. See also mesh-

based optical networks;

Optical Transport

Networks (OTNs); ring-



availability and 1þ1

protection in ring-based

versus mesh-based optical

networks, 192–193

availability and protection

versus restoration in

mesh-based networks,

194–197

availability calculations,

185–192

defects, 152–153

evolution of the optical

network layer, 132–139

fault detection and

propagation, 144–157


alarm suppression,

154–157


optical nodes, 135

overhead, 145–152

overview, 200–201

recovery mechanisms in mesh-

based networks, 173–182


based networks, 160–173

recovery schemes in the


157–160

research trends, 197–200


ring-based versus mesh-based

recovery schemes,

182–185

WDM in point-to-point


132–134

optical nodes

failures and availability, 185

overview, 135

optical physical section (OPS)

layer of OTNs

defined, 140

optical transport module and,

143–144


overhead (OTS OH),

151–152

optical transport module (OTM)

frame structure of OTUk, 142

OPS layer and, 143–144

order of (maximum supported

wavelength channels), 143

structure, 142–144

Optical Transport Networks

(OTNs)

architectural aspects and

structure, 139–142

associated overhead, 143,

145–150

bottleneck at nodes overcome

by, 2

extension toward G-MPLS, 3


alarm suppression,

154–157

nonassociated overhead,

150–152


(ODU) layer, 141

optical channel (OCh) path

layer, 140


(OPU) layer, 141


(OMS) layer, 140

optical physical section (OPS)

layer, 140, 143–144


(OTS) layer, 139–140

optical transport module

(OTM) structure,

142–144

overview, 139

SDH/SONET network

evolution to, 39

standardization, 139, 144

standardization work on

recovery, 158–159

traffic volumes for, 46

optical-electrical-optical (OEO)

OXC switches, 137

OPU (optical channel payload

unit) layer of OTNs, 141

OPUk OH (optical channel

payload unit overhead),

145

OSPF. See Open Shortest Path

First (OSPF) routing

protocol


Index 511

OTM. See optical transport

module (OTM)

OTN layer

in IP-over-OTN network,

6, 7

single-layer versus multilayer

recovery and, 36–37

OTNs. See Optical Transport

Networks (OTNs)

OTS OH (optical transmission

section overhead),

151–152

OTUk OH (optical channel

transport unit overhead),

149–150

outages or faults. See also

failures; reliability

defined, 10

detection and propagation

inside NEs (SDH), 60–70

drastic or severe, 13–15

FCC reporting requirements,

13

information propagation

through SDH network,

70–74

planned versus unplanned, 12

router power supply outage,

221, 225

overlay model for control plane,

433–434

OXCs. See optical

cross-connects (OXCs)

P

packet switching, 4

path protection in MPLS TE.

See global path

protection in MPLS TE

path protection in SDH

drop and continue mechanism

in, 110

dual-network representation

for disjoint paths, 112

end-to-end SNCP, 108–110

linear 1:N, 110–112


(MSP) versus, 110

Path-Specific (PS) method of

identifying signaled TE

LSP, 379

payload, 6

Payload Mismatch defect

(dPLM) in SDH, 50, 51

Payload Mismatch (PLM) defect

in OTNs, 152

Payload Missing Indication

(PMI) defect in OTNs,

153, 155–156

p-cycles, 197–199

PDH (Plesiochronous Digital

Hierarchy), 45

PDUs (protocol data units),

210–211

peer model for control plane,

434–435

performance

criteria for recovery

mechanisms, 25–28

of Dijkstra algorithm,

248–249

of multilayer recovery


performance management, 42

per-packet load balancing,

259–260

physical layer (Layer 1), 5,

16–17

planes. See network planes

planned node failure

hitless upgrades and, 354

in IP routing, 221–222, 226

in MPLS TE, 354

Plesiochronous Digital

Hierarchy (PDH), 45

PLM (Payload Mismatch) defect

in OTNs, 152

PLR. See point of local repair

(PLR)

PMD (polarization mode

dispersion), 132

PMI (Payload Missing


OTNs, 153, 155–156

point of local repair (PLR)

behavior before failure,

379–381

behavior during failure,

381–382

defined, 316

polarization mode dispersion

(PMD), 132

power supply failure

facility failure, 221


225, 353

in IP routing, 221, 225

in MPLS TE, 353

node failure caused by, 221

router power supply outage,

221, 225

preplanned recovery paths

dynamic recovery paths

versus, 31

overview, 30


and, 31

pre-session load balancing,

260

preventing failures, 20–21

primary failure, 10

primary path, recovery schemes

and, 21, 22

propagation delay in LSA

flooding, 233, 237

protection. See also specific kinds

1þ1 (dedicated), 31

1:1 (dedicated with extra

traffic), 32

1:N (shared recovery with

extra traffic), 32

case study for SDH protection


M:N, 32


networks, 175–177,

180–182

in optical networks, 158

restoration compared to,

430

restoration versus, 31,

113–115, 158

protection rings. See ring

protection in SDH

protocol data units (PDUs),

210–211


512 Index

PS (Path-Specific) method of

identifying signaled TE

LSP, 379

Q

quality of service (QoS)

algorithm complexity and, 265

backbone network profile

considerations, 387–388

guarantee during failure in IP

routing, 264–265

guarantee during failure in

MPLS TE, 306, 386–387,

388–392, 400–419

link metric manipulation and,

265

in MPLS Diffserv-aware

networks, 387–388

during non-steady state

periods in MPLS TE,

388–392

overprovisioned networks

and, 387, 397

queuing delays in LSA

flooding and, 234–235


in traffic engineered networks,

388

traffic engineering at steady

state and, 262–264

queuing delays in LSA flooding

congestion avoidance


packet marking, 234–235

packet scheduling, 235

QoS and, 234–235

queuing packets based on

DSCP, 235–236

queuing process described, 233

random early detection (RED)

and, 236

R

random early detection (RED),

236

RDI. See remote defect

indication (RDI) signal in

SDH

recovery cycle

criteria for performance,

25–28

fault detection time, 307–308

fault notification time,

308–309

with global default restoration


hold-off timer, 308

illustrated, 23


overview, 23–24, 307–310

recovery operation time, 309

traffic recovery time, 309–310

recovery cycle in IP routing


link failure, 217–220

fault detection and

characterization, 214

fault notification time, 215

hold-off timer, 214–215

routing table computation,

215–217

recovery extent

defined, 32

global versus local recovery,

32–34

recovery head-end (RHE)

in APS subnetwork

connection protection, 80

in APS trail protection, 77

in global recovery, 33

in local recovery, 32

local versus global recovery

and, 33


in SNCP Rings, 93

recovery mechanisms in mesh-







between protection and

restoration schemes,

194–197

link-based recovery schemes,

174–175

link-based restoration

schemes, 178

overview, 173–175

path-based recovery schemes,

174

p-cycles, 197–199

preplanned versus dynamic,

178–180

protection combined with

restoration, 182

protection in WP versus VWP

networks, 176–177

protection options, 175–176

protection versus restoration,

174, 180–181


ring-based schemes versus,

182–185

shared restoration schemes,

178

recovery mechanisms in MPLS

TE

global default restoration,

310–314

global path protection, 310,

314–316

local protection, 310, 316–333







dedicated versus shared

schemes, 161, 171–173

interconnection of rings, 173

layer of implementation and,

161

mesh-based schemes versus,

182–185

meta-mesh recovery

technique, 199–200


OCh SPRings, 170

multiplex section protection,

163–166





Index 513



(continued )



approach, 170–171

optical channel protection,

166–170

overview, 160–162

SONET/SDH networks

compared to, 163

two-fiber versus four-fiber

configuration, 166, 167

unidirectional versus

bidirectional rings and,

161

recovery operation time, 23–24,

309

recovery paths


and, 33–34

preplanned versus dynamic,

30–31

recovery schemes and, 21–22

in single-layer recovery

mechanisms, 30–31

recovery schemes (basic

principle), 21–22

recovery tail-end (RTE)

in APS subnetwork

connection protection,

80

in APS trail protection, 77

in global recovery, 33

in local recovery, 32


and, 33


in SNCP Rings, 93

recovery time


mechanisms, 26

defined, 26

with global default restoration


with global path protection in

MPLS TE, 316


in MPLS TE, 336


and, 31

recovery token signal

escalation strategy

implementation, 448

in multilayer recovery, 38

RED (random early detection),

236

regenerator (SDH)

cable cut upstream of, 61–62,

70, 72

distorted noise/signal entering,

62–63

incoming AU_AIS signal, 65,

67, 68

incoming MS_AIS signal, 65,

66

regenerator section (RS) layer in

SDH

defect detection times,

50–52

overview, 47, 55

reliability. See also failures;

outages or faults

definitions, 9–11

importance of, 8, 20

measures to increase, 20–22

overview, 8–22

requirements for services,

18–19

requirements for users, 18

SLA examples, 19–20

trend of requirements, 20

Remote Defect Indication defect

(dRDI) in SDH, 50, 51

Remote Defect Indication (RDI)

signal in SDH

fault management processes

and, 60



level, 70, 72

HOP_RDI signal, 72

MS_RDI signal, 64–65,

70, 72

RI_RDI (remote information

–remote defect

indication), 60

SSF signal triggering, 64–65

reordering of packets, as


mechanisms, 27

research-related topics

flexible optical networks, 200

on IP routing, 295

meta-mesh recovery

technique, 199–200

MPLS TE, 422

p-cycles, 197–199

trends in optical networking,

197–200

Resource Reservation Protocol

(RSVP)

FIS as RSVP Path Error

message, 311

hello protocol extension,

348–349

reliable messaging mode,

308–309

scalability issues, 304

signaling extensions for MPLS

TE local protection,

372–385

TE LSP setup, 304

Traffic Engineering extensions

(RSVP-TE), 427

restarting router

defined, 267

entering graceful restart mode,

267–268

exiting graceful restart mode,

268–269

during graceful restart period,

268

restoration

in G-MPLS networks,

429–430


networks, 177–182

optical, multilayer recovery


in optical networks, 158

protection compared to, 430

protection versus, 31,

113–115, 158


reversion cycle

criteria for performance, 25–28


514 Index

illustrated, 24

overview, 24–25


mechanisms and, 36

reversion operation time, 24, 25

RHE. See recovery head-end

(RHE)

RIB. See routing information

base (RIB) or routing

table

ring interconnection in mesh-


173

ring interconnection in SDH

drop and continue

interconnection of

MS-SP and SNCP Rings,

101–102

drop and continue

interconnection of


drop and continue

interconnection of SNCP

Rings, 96–97

dual-gateway schemes, 94–95,

106

global versus local protection

techniques, 95

MS-DP Rings, 102–103

node architectures for

gateways, 104–105

overview, 93–95, 105–106

of stacked STM-N Rings,

103–104

virtual ring (VR)


vulnerability of single-node

interconnections, 94

ring networks. See also ring-


defined, 3

mesh networks versus, 3–4, 35

popularity of, 82

single-layer recovery in, 35, 36

SONET/SDH compared to

optical, 136–137

as transmission networks, 41

ring protection in SDH. See also

specific kinds

MS-DP Rings (Multiplex

Section-Dedicated

Protection Rings), 82,

91–93

MS-SP Rings (Multiplex

Section-Shared

Protection Rings), 82,

83–91

overview, 81–82, 105–106

ring interconnection, 93–105

SNCP Rings (Subnetwork

Connection Protection

Rings), 82, 93

in SONET versus SDH,

106–107

ring-based optical networks. See

also ring networks





interconnection of rings, 173

mesh-based versus ring-based

recovery schemes,

182–185


OCh SPRings, 170


163–166

OADMs and, 135–137






approach, 170–171

optical channel protection,

166–170

overview, 135–137


160–173

shared versus dedicated

approach, 161, 171–173

SONET/SDH networks

compared to, 136–137

RIP (Routing Information

Protocol), 207

RIP-TRIG (Routing

Information Protocol

with triggered update),

207

root failure, 10

Rosen, Eric, 287

route computation

Dijkstra algorithm, 241–249

routing information base

(RIB) update, 251–252

shortest path computation,

238–241


triggers, 249–251

route processor (RP) failure

centralized versus distributed

architectures and, 221,

225


225, 353–354

in IP routing, 221, 225


Route Record Object (RRO) of

RSVP, 376–377

route recursion, 252

router interface failure, 221

router power supply outage


225


routing algorithms. See also

Internet Protocol (IP)

routing

adaptive dynamic distributed

algorithm in ARPANET,

207–210

complexity, 243, 265, 279–284

congestion avoidance


dampening algorithms,

226–229

Dijkstra algorithm for

shortest path, 241–249

for IGP metric optimization,

264



QoS during failure and

algorithm complexity, 265

routing information base (RIB)

or routing table


Index 515

routing information base (RIB)

or routing table

(continued )



link failure, 217–220

populating, 217

route recursion, 252

shortest path tree (SPT)


updating, 251–252


(RIP), 207


with triggered update

(RIP-TRIG), 207

routing table. See routing

information base (RIB)

or routing table

RP failure. See route processor

(RP) failure

RRO (Route Record Object) of

RSVP, 376–377

RS layer. See regenerator section

(RS) layer in SDH

RSVP. See Resource Reservation

Protocol (RSVP)

RSVP signaling extensions for

MPLS TE local

protection

detour merging, 384–385

DETOUR Object, 375–376

FAST-REROUTE Object,

374–375

identification of a signaled TE

LSP, 378–379

Route Record Object (RRO),

376–377

SESSION-ATTRIBUTE

Object, 372–374

signaling a protected TE LSP

with a set of constraints,

378

signaling with facility backup,

379–382

signaling with one-to-one

backup, 382–384

RTE. See recovery tail-end

(RTE)

S

safety critical user reliability

requirements, 18

scalability


mechanisms, 27–28

hello-based mechanisms in IP

routing and, 223



of RSVP, 304

scope of failure coverage


mechanisms, 25–26

failure scenarios, 25


and, 34

percentage of coverage, 25–26

SDEG (Signal Degrade) defect

in OTNs, 152

SDH. See Synchronous Digital

Hierarchy (SDH)

secondary failures or symptoms,

10

security management, 42

self-healing ring mechanisms.

See ring protection in

SDH

Sender-Template-Specific (STS)

method of identifying

signaled TE LSP, 379

sequential approach to

multilayer recovery

bottom-up escalation,

446–447

escalation strategy

implementation, 448

overview, 37, 446


server signal fail (SSF) signal in

SDH

MS_TT_Sk function and,

64–65

A sink function and, 60

A source function and, 60

service management layer

(SML), 43

service-level agreements (SLAs)

overview, 19–20

SML layer and, 43

services

reliability requirements and

types of, 18–19

SLA examples, 19–20

SESSION-ATTRIBUTE object

(RSVP), 372–374

shared backup capacity, 29–30

shared recovery

in one-to-N protection, 32


networks, 161, 171–173

shared risk group (SRG)

defined, 18

optical network recovery and,

159–160

shared risk link group (SRLG)

defined, 18

LSA origination parameter

tuning and, 250–251

MPLS TE and, 301–303

researches, 295

SRLG disjoint TE LSPs,

303


Dijkstra algorithm, 241–249



multitopology routing and,

238–241

shortest path defined, 238

triggers, 249–251

shortest path tree (SPT)

computation for routing

tables, 216–217

Signal Degrade (SDEG) defect

in OTNs, 152

signaling. See also specific signals

extensions for local protection

in MPLS TE, 329

fault detection and

propagation inside NEs

and (SDH), 60–70

fault management processes

and (SDH), 58–60

OTN maintenance signals and

alarm suppression,

154–157


516 Index


and, 31

recovery token signal, 38

requirements as criterion for

recovery mechanisms, 28

SDH versus SONET, 57

STM-N signal (SDH), 47–48

STS-3N signal (SONET), 47

single point of failure, recovery

schemes and, 22


mechanisms

backup capacity, dedicated

versus shared, 29–30

centralized versus

decentralized, 34–35

characteristics, 28–36

connection-oriented versus

connectionless networks,

36

control of recovery process,

34–35

global versus local recovery,

32–34


439–444


in, 31–32

recovery paths, preplanned

versus dynamic, 30–31


mode, 36

ring versus mesh networks,

35–36

single-layer recovery schemes in

multilayer networks

overview, 439–440

survivability at the bottom

layer, 440, 441

survivability at the highest

possible layer, 443–444

survivability at the lowest

detecting layer, 442–443

survivability at the top layer,

440, 442

single-link failures

defined, 16

focus on, 17

global recovery, 33

local recovery, 32

single-node failures

defined, 16–17

focus on, 17

global recovery, 33

local recovery, 32–33

sink functions, f2 filters in, 58, 60

SLAs. See service-level

agreements (SLAs)

SML (service management

layer), 43

SNC (subnetwork connection),

44

SNCP. See subnetwork

connection protection

(SNCP)

SNCP Rings. See Subnetwork

Connection Protection

Rings (SNCP Rings)

software failures

AT&T erroneous software

update, 15

hitless upgrades and, 354


225–226, 354

in IP routing, 221, 225–226

in MPLS TE, 354


overview, 12

SONET. See Synchronous

Optical NETwork

(SONET)

spatial reuse

MS-SP Ring feature, 82,

89–91

prevented in MS-DP Rings, 92

prevented in SNCP Rings, 92

SPT (shortest path tree)

computation for routing

tables, 216–217

SRG. See shared risk group

(SRG)

SRLG. See shared risk link

group (SRLG)

SSF signal. See server signal fail

(SSF) signal in SDH

stability


mechanisms, 28


routing and, 226


time and network

stability in multilayer

recovery, 454–455

standardization

ASON, 425

ASTNs, 425

MPLS TE recovery, 370–371

OTNs, 139, 144

SDH, 57

SONET, 57

work on OTN recovery,

158–159

state overhead, as criterion for

recovery mechanisms, 27

static multilayer recovery

schemes

common pool strategy,

451–454


double protection strategy,

449–454

dynamic multilayer recovery

versus, 457–458, 460,

462–463

escalation strategy defined,

444

integrated approach, 449


strategy, 450–454

network operation complexity

and, 455

overview, 444

qualitative performance

comparison, 456–457

revertive operation, 456

sequential approach, 446–448

supporting spare resources for


449–454


time and network

stability, 454–455

uncoordinated approach,

444–446

STM-N Rings, interconnection

of stacked rings, 103–104


Index 517

STM-N signal. See Synchronous

Transport Module of

order N (STM-N) signal

in SDH

STS (Sender-Template-Specific)

method of identifying

signaled TE LSP, 379

STS-3N (Synchronous

Transport Signal of level

3N) signal, 47

sublayer supervision of


(SDH), 80

sublayer tandem connection

monitoring with APS,

78–80

submarine cablebreak (APCN2),

15



Rings). See also case

study for SDH protection

strategies

drop and continue


drop and continue

interconnection with



compared to, 163

overview, 82, 93, 105–106

as unidirectional or

bidirectional path

switched Rings (UPSR or

BPSR), 106–107

subnetwork connection

protection (SNCP)

APS protocol and, 78–80


108

path protection, 108–110

subnetwork connection (SNC),

44

survivability

defined, 10

degree of, 10


in multilayer networks

and, 440–444

switch-back operation

defined, 24

reordering of packets from, 27

switch-over operation, 21

symmetrical load balancing,

260–262

symmetrical services, 3

symptoms or secondary failures,

10

Synchronous Digital Hierarchy

(SDH). See also ring

protection in SDH

ADM (add/drop multiplexer),

52–54

APS protocol, 49, 74–80

base signal, 57

bidirectional connections in, 3


defect detection times, 50–52,

56

DXCs, 53–54, 70, 72–74, 82

evolution to OTNs, 39

fault detection and

propagation inside NEs,

60–70

fault management hierarchy,

58, 59

fault management processes,

58–60



level, 70–74

frame structure, 48–52

interfaces, 56

linear protection, 107–113

MS-DP Rings, 82, 91–93

MS-SP Rings, 82, 83–91

multilayer recovery case study,

469–470


(MSP), 107–108, 109, 113

network elements (NEs),

52–55, 56

network layers, 46–48, 55

operational aspects, 57–81

optical rings compared to

SDH rings, 136–137

overhead bytes relevant for

recovery, 48–52, 56

overview, 127–129

path protection, 108–112, 113

references and research-

related topics, 129–130


ring interconnection, 93–105

SNCP Rings, 82, 93

SONET compared to, 56–57,

106–107

standardization, 57

TM (terminal multiplexer), 53

VC-n, 47–48, 55

Synchronous Optical NETwork

(SONET). See also

Synchronous Digital

Hierarchy (SDH)

base signal, 57

bidirectional connections in, 3

evolution to OTNs, 39

multilayer recovery case study,

469–470

network layers, 57

optical rings compared to

SONET rings, 136–137

references and research-

related topics, 129–130

SDH compared to, 56–57,

106–107

standardization, 57

STS-3N signal, 47

Synchronous Transport Module

of order N (STM-N)

signal in SDH

ADM example, 52–54

in multiplex section

protection, 107–108

overview, 47

STM-1 frame format, 48–49

STM-1 tributary interface to

client, 55

Synchronous Transport Signal

of level 3N (STS-3N)

signal (SONET), 47

T

TCP/IP protocol stack, 5–6

TCPs (termination connection

points), 44


518 Index


(TE LSPs)

affected, defined, 316

bandwidth protection desired,

325–326, 373, 377, 378,

380, 383

classes of recovery, 326

classical fish problem and,

300–301

configuration on head-end

LSR, 303, 311–312

Detour LSP in MPLS TE,

318–319

extensions for point-to-

multipoint LSPs, 422

in facility backup, 320–324,

345

fast-reroutable, 325

global default restoration and,

311, 312–313, 314,

343–344

global path protection and,

315–316, 344–345

identification of a signaled TE

LSP, 378–379

label recording desired, 373,

379

link disjoint or link diverse,

301

local protection desired, 373,

378, 379–380, 383

multiarea (OSPF), or

multilevel (IS-IS), or

multiautonomous

systems networks and, 328

network design with full mesh

of unconstrained TE

LSPs, 329–330, 332–333

network design with

unconstrained one-hop

TE LSPs, 330–333

node disjoint or node diverse,

301

node protection desired, 326,

373, 377, 380, 383

notification of tunnel locally

repaired, 327–328

in one-to-one backup,

318–319, 345

packet forwarding, 305

path computation, 304

preemption, 305–306

properties in MPLS TE,

325–326


modes and, 346–348

RSVP hello protocol

extension and, 348–349

secondary, in global path

protection, 315

setup, 304

signaling a protected TE LSP

with a set of constraints,

378

soft preemption desired, 373

SRLG disjoint, 303

TE LSPs. See TE Label Switch

Paths (TE LSPs)

Telecommunications

Management Network

(TMN), 42

temporary loops

administrative link cost

increase and, 256–257


duration and number of

routers involved,

255–256

illustrated, 253, 254, 255, 256,

258

link or node failures and,

253–257

link-load increase from, 257

researches, 295

restored network elements

and, 257–258

terminal multiplexer (TM) in

SDH networks, 53

termination connection points

(TCPs), 44

TID (Trace Identifier Mismatch)

defect in OTNs, 152

time of failure, 10

TMN (Telecommunications

Management Network),

42


topology of optical networks

availability versus, in mesh-

based networks, 195

SRG and, 159–161

Trace Id Mismatch defect

(dTIM) in SDH, 50, 51

Trace Identifier Mismatch (TID)

defect in OTNs, 152

traffic

availability versus traffic type


networks, 195–197

data versus voice, 1–2

importance of reliability and,

20

increase in, 1

IP/MPLS-over-OTN

multilayer model for

large volumes, 2–3

optical technology and

concentration of, 20

symmetrical versus

asymmetrical, 3

unidirectional versus

bidirectional, 3

WDM as solution for, 132–134

traffic engineering. See also


Switching traffic


applicability of, 298

classical fish problem, 298–301

in data networks, 298–301

in non-MPLS networks, 298

at steady state, QoS and,

262–264

traffic forwarding in IP routing

failure types and, 225–226


(FIB), 204

multitopology routing and,

240–241


OSPF example, 266–270

traffic recovery time

illustrated, 23

MPLS TE recovery

mechanisms, 310

overview, 24, 309

in recovery cycle, 309–310


Index 519

traffic reversions time, 24, 25

trail protection in SDH


architecture for, 74–76

linear APS, 74–76

MS-SP Rings and, 86

overhead bytes and bits, 74–75

sink direction, 74–76

source direction, 76

trail signal fail (TSF) signal in

SDH

cable cut upstream of

regenerator and, 61–62

TT sink function and, 60

trail termination (TT) functions

defined, 44, 45

f2 filters in sink function

(SDH), 58, 60

trail protection and (SDH),

73–75

transmission networks. See also

Synchronous Digital

Hierarchy (SDH);

Synchronous Optical

NETwork (SONET)

atomic functions, 43, 45

illustrated, 41

management of, 42–43


reference points, 45

structuring/modeling,

43–45

transport layer (Layer 4), 5–6

TSF signal. See trail signal fail

(TSF) signal in SDH

TT functions. See trail

termination (TT)

functions

U

ULSR (unidirectional line

switched Rings). See

Multiplex Section-


Rings (MS-DP Rings) in

SDH

unaccounted failures, 16

uncoordinated approach to


444–446

Unequipped VC defect

(dUNEQ) in SDH, 50,

51

unidirectional (single-ended)

operation in APS,

77

unidirectional connections in

IP/MPLS, 3

unidirectional line switched

Rings (ULSR). See

Multiplex Section-


Rings (MS-DP Rings)

in SDH

unidirectional linear protection

in MSP, 108

unidirectional path switched

Rings (UPSR). See



Rings)

unidirectional traffic, 3

UPSR (unidirectional path

switched Rings). See



Rings)

up-state timer dampening

algorithm, 227

user plane, 6

users

reliability requirements and

types of, 18

trend of reliability

requirements, 20

V

Virtual Containers-n (VC-n) in

SDH

cable cut upstream of VC-4

cross-connection, 63–65

connection functions, 54–55

defect detection times,

50–52



68, 69

overhead bytes, 56


time diagram for VC-12 cross-

connected by DXC-4/1s,

72

time diagram for VC-3 cross-

connected by DXC-4/3s,

73

virtual ring (VR) interconnection

in SDH, 95–96

virtual wavelength path (VWP)

optical networks

defined, 137

protection in WP networks

versus, 176–177

WT-OXCs and, 137, 177

voice traffic, data traffic volume

versus, 1–2

VR (virtual ring) interconnection

in SDH, 95–96

VWP optical network. See

virtual wavelength path

(VWP) optical networks

W

Wavelength Division

Multiplexing (WDM)

bandwidth capacity increased

by, 2

Coarse (CWDM), 133

Dense (DWDM), 133

Erbium-doped fiber amplifiers

and, 133

interconnection of stacked

STM-N Rings and, 103

overview, 132–134

in point-to-point optical

network, 132–134

wavelength path (WP) optical

networks

defined, 137

protection in VWP networks

versus, 176–177

WR-OXCs and, 137, 176–177


520 Index

wavelength routing optical

cross-connects (WR-

OXCs), 138, 176–177

wavelength translating optical

cross-connects

(WT-OXCs), 138, 177

WDM. See Wavelength Division

Multiplexing (WDM)

weighted random early detection

(WRED), 236

working path, recovery schemes

and, 21, 22

WP optical networks. See

wavelength path

(WP) optical

networks

WRED (weighted random early

detection), 236

WR-OXCs (wavelength routing

optical cross-connects),

138, 176–177

WT-OXCs (wavelength

translating optical cross-

connects), 138, 177


Index 521

Documents

Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS