Restartability Manage-ment in the Cisco Core Router CRS/NGStefan Schaeckeler (Cisco Systems, Inc.) Ashwin Narasimha Murthy (Google, Inc.)
Table of Contents
System Overview
CRS/NG Restartability Overview −Problem Definition and High Level Solution
Concrete Example −Statistics Resource Manager Library
Conclusion
2
System Overview
Core Router Extremely complex System• SW: 16 MLOC• HW: several chasses, LCs (1 CPU, 5 NPUs,
chips galore), RPs (1 CPU, chips galore), fabric cards, blade cards, …
Forms distributed System99.9...9% Uptime
3
System Overview
System Manager: restarts crashed Process• HW bug• SW bug
Process must maintain State (after Crash)CRS/NG Approach• Key data structures in shared memory• Well written algorithm guarantee consistency
CRS 1 CRS 3 CRS/NG (final name?)
4
CRS/NG Restartability Overview
CRS/NG runs Cisco IOS/XRCisco IOS/XR Abstraction Layer on Linux• Sophisticated IPC• Sophisticated shared memory API
Special malloc for shared memory Static configuration file
– Mapping identifiers to fixed virtual addresses– STATS_RESTART 0x50000000
(Re)attaching to shared memory via identifier Previously allocated objects always available
…5
CRS/NG Restartability Overview
Process requiring Restartability• Key data-structures in shared memory• Careful algorithm design to avoid
• Temporary inconsistencies account1 := account1+X; account2 := account2-X;
• Pointer operations (disconnection of linked lists)• Crashes during IPCs• Crashes before a return; (caller records success)
• Optional recovery phase• Compromises are possible
6
Concrete Example: Statistics Resource Manager Library
HW: Extremely simplified View on CRS/NG
7
Concrete Example: Statistics Resource Manager Library
SW: Somewhat simplified View on CRS/NG Statistics Manager
8
Concrete Example: Statistics Resource Manager Library
Client Application / Library crashes RestartClient Application: State is gone• Stats pointers are lost• Other state is lost
Stats Lib• State is gone• Stats pointers are lost
Solution for Stats Lib• Keep freelists in shared memory• Smart algorithm for keeping state consistent9
Concrete Example: Statistics Resource Manager Library
Step 1: Keeping State in Shared Memory01 stats_cl_ctx_st *mstats_cl_bind (char *name) {02 void *shmem;03 stats_cl_ctx_st *con;04 05 /* open shmem at a predetermined address */06 shmem = shmwin_attach(SSE_STATS_RESTART_ADDRESS); // posix mmap: MAP_FIXED flag07 con=shmem+name_to_offset(name);08 09 if (strcmp(con->name, name)) {10 /* first bind */1112 /* init "empty" context */13 con->freelist[0..max]=NULL;14 con->mutex=0;15 strcpy(con->name, name);16 } else {17 /* restart */18 /* do nothing, just return con */18 }20 return con;21 }
10
Concrete Example: Statistics Resource Manager Library
Step 2a: Smart Algorithm −A pragmatic Approach (chosen for CRS/NG)Few Concepts: (Re-)moving nodes from freelist
• Worst case: a page is lost (bad?) Requesting fresh page from server
• Worst case: page is lost (bad?) Updating bitmap: mark some pointers as
allocated − client does not pick up• Worst case: some pointers are lost (bad?)11
Concrete Example: Statistics Resource Manager Library
Discussion of worst Case ScenariosA page (or a few Pointers within) is lost• = 256 out of 8 million stats pointers in NPU
memory − no big deal• = 80 byte out of several GB of CPU memory
for node structure − no big deal
Client frees a Pointer from a lost Page Error Code is returned Client is irritated but has to ignore itWe never give out same Pointer twice
12
Concrete Example: Statistics Resource Manager Library
Step 2b: Smart Algorithm −A perfect Approach
Complicated Algorithm /Very difficult Implementation• Further pointers in shared memory• Need to figure out where crashed and
continue from there
Requirement: interacting Libraries and Processes must be "perfect" as well
13
Conclusion
Pragmatic Approach of CRS/NG+ Easy to implement+/− Crashes: worst Case: small Mem. Leak+ No Run-time Performance Hit
Perfect Approach+ Very difficult to implement Error prone+ Crashes: no Memory Leak− Perhaps Run-time Performance Hit
14
Thank You
15
Platinum Sponsors:
Gold Sponsors:
Silver Sponsors:
Organization Sponsors