25
Open Source Development Labs Carrier Grade Linux Availability Requirements Definition Version 3.0 ( Near Final Draft– 12 January 2005) Prepared by the Carrier Grade Linux Working Group

Introduction to CGL Availability Requirements

Embed Size (px)

Citation preview

Page 1: Introduction to CGL Availability Requirements

Open Source Development Labs

Carrier Grade Linux Availability

Requirements DefinitionVersion 3.0

(Near Final Draft– 12 January 2005)

Prepared by the Carrier Grade Linux Working Group

Open Source Development Labs, Inc.12725 SW Millikan Way, Suite 400Beaverton, OR 97005 USA

Phone: +1-503-626-2455

Page 2: Introduction to CGL Availability Requirements

Copyright (c) 2005 by The Open Source Development Labs, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is available at http://www.opencontent.org/opl.shtml/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder.

Other company, product, or service names may be the trademarks of others.

Linux is a Registered Trademark of Linus Torvalds.

Contributors to the Availability Requirements Definition include (in alphabetical order):

Badovinatz, Peter (IBM)Chacron, Eric (Alcatel)Cherry, John (OSDL)Christopher, Johnson (Sun)Cress, Andrew (Intel)Dake, Steven (Monta Vista)Fleischer, Julie (Intel)Haddad, Ibrahim (Ericsson)** Ikebe, Takashi (NTT)** Ishitsuka, Seiichi (NEC)Kevin, Fox (Sun)** Kimura, Masato (NTT Comware)Kukkonen, Mika (Nokia)Liu, Bing Wei (Intel)Manas, Saksena (Timesys)Nakayama, Mitsuo (NEC)Sakuma, Junichi (OSDL)

*Specification editor**Assistant specification editor

Comments on the contents of this document should be sent to [email protected] .

Page 3: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

1 Introduction to CGL Availability Requirements......................................................1

2 Document Organization..............................................................................................1

3 Requirements and Roadmap Definitions...................................................................1

4 Availability Requirements...........................................................................................3AVL.1.0 Robust Mutexes..........................................................................3AVL.2.0 Software ECC Support...............................................................3AVL.3 Forced Device Removal..............................................................3AVL.3.1 Block Device Removal...............................................................4AVL.3.2 Forced Unmount.........................................................................4AVL.4 Memory Overcommit Actions....................................................4AVL.4.1 VM Strict Over-Commit.............................................................4AVL.5 Non-Intrusive Monitoring of Processes.....................................5AVL.5.1 Kernel-Level Non-Intrusive Application Monitor Without Modifying Application Code.......................................................................5AVL.5.2 Kernel-Level Non-Intrusive Application Monitor Using a Defined API.................................................................................................6AVL.6.0 Disk Predictive Analysis............................................................6AVL.7 Redundant Paths to Resources...................................................6AVL7.1 Multi-Path Access to Storage......................................................6AVL.8 Fast System Startup Within Kernel Space...............................7AVL.8.1 Fast Linux Restart Bypassing BIOS...........................................7AVL.9.0 Boot Image Fallback Mechanism...............................................7AVL.10.0 Live Patching............................................................................7

5 Availability Roadmap..................................................................................................8AVL.3 Forced Device Removal..............................................................8AVL.3.3 Forced Unmount Application Notification.................................8AVL.4 Memory Overcommit Actions....................................................8AVL.4.2 Replaceable OOM Killer............................................................8AVL.4.3 Low Memory Condition Monitor...............................................8AVL.4.4 Out Of Memory Notification Mechanism..................................9AVL.5 Non-Intrusive Monitoring of Processes.....................................9AVL.5.3 Process-level Non-intrusive Application Monitor......................9AVL.7 Redundant Paths to Resources.................................................10AVL.7.2 Advanced Multi-Path Access to Storage..................................10AVL.7.3 Redundant Communication Paths...........................................10AVL.8 Fast System Startup Within Kernel Space.............................10AVL.8.2 Fast Linux Start Using Known-Devices Database...................10AVL.8.3 Parallel Driver Initialization During Startup............................11AVL.11.0 Fault Isolation Enabling.........................................................11AVL.12.0 NFS Client Protection Across Server Failures.......................11AVL.13.0 Fast System Startup Within User Space............................11AVL.13.1 Parallel User Initialization During Startup.............................12AVL.14.0 Excessive CPU Cycle Usage Detection.................................12AVL.15.0 Fast Application Restart Mechanism......................................12AVL.16.0 Fallback Operation Mechanism..............................................13

i

Page 4: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

AVL.17.0 Multiple FIB Support.............................................................13AVL.18.0 iSCSI Error Handling Support................................................13AVL.19.0 Application Profiler................................................................13AVL.20.0 Kernel Resources Expansion for Threads..............................14

Appendices........................................................................................................................15

A.1 General Systems References.................................................................................15

ii

Page 5: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

1 Introduction to CGL Availability RequirementsThis section contains requirements that apply to the Linux kernel, core libraries, and tools essential to a carrier-grade system. These Availability requirements are related to single system availability, such as support for memory failure detection. Requirements related to clustered availability, such as heartbeat monitoring and failover, are in the Clustering requirements section.

2 Document OrganizationThis document is a section of the OSDL Carrier Grade Linux Requirements Definition Version 3.0, which is organized into the separately published sections listed below:

Overview of Requirements Version 3.0

Availability Requirements Definition Version 3.0

Clustering Requirements Definition Version 3.0

Hardware Requirements Definition Version 3.0

Performance Requirements Definition Version 3.0

Security Requirements Definition Version 3.0 (to be released mid-2005)

Serviceability Requirements Definition Version 3.0

Standards Requirements Definition Version 3.0

Released versions of these sections can be found at http://www.osdl.org/lab_activities/carrier_grade_linux/documents.html/document_view.

3 Requirements and Roadmap DefinitionsTwo types of requirements are included in each section of the OSDL Carrier Grade Linux Requirements Definition Version 3.0:

Requirements –Describes requirements necessary for a CGL system

Roadmap –Highlights possible future requirements

Each requirement or roadmap item is described as follows:

ID A unique identification number including:

An acronym identifying a category for the requirement (first field)

An ID number for the requirement (second field)

An ID number for a sub-requirement (third field). A “0”in this field indicates the requirement is a stand-alone requirement. An empty field indicates the requirement is a summary requirement with sub-requirements. A number in this field indicates this requirement is a sequentially numbered sub-

1

Page 6: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

requirement

A summary requirement is also indicated by bolding the header of the requirement.

Name Short description of the requirement

Category The category to which the requirement is assigned. The categories for Availability are:

AVL.x.x Availability

Description Detailed description of the requirement.

2

Page 7: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

4 Availability Requirements

ID Name CategoryAVL.1.0 Robust Mutexes Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide an enhancement to the POSIX Thread implementation that provides support for robust mutexes. Robust mutex support shall permit a mutex to synchronize threads, either in the same process or in different processes, even when processes or threads exit or abort unexpectedly.

A robust mutex is initialized with robust mutex attributes. It must be an inter-process shared mutex, allocated in a shared memory segment mapped into the processes that use it. Applications using a robust mutex shall be able to see various return codes that indicate whether the previous holder of the mutex terminated, and also the recovery status of the state of the mutex. The new holder of the robust mutex shall be able to detect a failure, perform cleanup actions, and re-initialize the mutex for continued use.

If a cleanup of the state protected by the mutex can't be completed, the mutex shall be marked “inconsistent” so that any future attempts to lock it will generate a status indicating that it is inconsistent. The following two modes for setting the mutex to an inconsistent state shall be provided:

Automatically mark the mutex “inconsistent” when the owner dies and the subsequent owner fails to explicitly mark it healthy.a subsequent mutex lock is attempted and completed.

Provide an advisory to subsequentthe next owners that the mutex needs to be explicitly marked inconsistent.

For further details, refer to http://www.humanfactor.com/pthreads/posix-threads.html .

ID Name CategoryAVL.2.0 Software ECC Support Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism for reporting when hardware error checking and correcting (ECC) detects and/or recovers from a single-bit ECC error, and a panic trigger mechanism that is activated whenever hardware ECC detects multi-bit ECC errors.

ID Name CategoryAVL.3 Forced Device Removal Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for forced unmounting of a file system and block device removal. When a file system is unmounted, processes shall not be able to access or open files on the file system. When a block device is removed, a hot swap signal shall be sent to the storage controller.

3

Page 8: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.3.1 Block Device Removal

Description: OSDL CGL specifies that Linux shall allow removal of a block device while it is in use without degrading the reliability of the system. The block device shall be removable even if it has been placed in use by an open file command, such as fdisk /dev/sda; it is a member of a RAID-1 volume; a file system is mounted on the device; or a combination of these. If a file is in use and it cannot be serviced by a mirrored disk, the operating system shall return an error to the system calls referencing that file.

ID Name CategoryAVL.3.2 Forced Unmount Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support for forced unmounting of a file system. The unmount shall work even if there are open files in the file system. Pending requests shall be ended with the return of an error value when the file system is unmounted.

ID Name CategoryAVL.4 Memory Overcommit Actions Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to configure a global limit on RAM utilization. This limit is a combination of physical memory and swap space. In addition, adequate information and an interface must be provided to allow a middleware component to take action before the system runs out of memory. This requirement is in addition to or a replacement for the kernel out-of-memory killer.

ID Name CategoryAVL4.1 VM Strict Over-Commit Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the ability to control kernel virtual memory allocation adjustments based on the specific needs of the system. Control of virtual memory shall include but not be limited to the following:

Strict over-commit – The total address space committed for the system is not permitted to exceed swap + a configurable percentage of physical RAM (the default is 50%).

Heuristic over-commit – Obvious over-commits of address space are refused. Limited to free physical memory + free swap.

4

Page 9: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.5 Non-Intrusive Monitoring of Processes Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a range of capabilities to enable non-intrusive monitoring of processes. To enable monitoring, some configuration actions may have to be taken to specify which processes are to be monitored. Capabilities may be limited in certain cases, as long as the limitations are known. Capabilities to be provided include the following:

Processes must be manageable and controllable even if they are not a direct child process of the tools and mechanisms provided to enable these capabilities. A carrier system consists of middleware and processes from many sources, which may be difficult to run from a single parent process, as they will usually require different userids, capabilities, permissions, etc.

The latency of event detection while processes are being monitored must be as low as possible, preferably occurring immediately upon complete failure of a process.

The overhead of monitoring the processes should be as low as possible.

Since inittab does not provide sufficient capabilities to meet this requirement, enhancements to inittab must be provided to address the following limitations:

o Monitors only processes inittab starts

o Limited reactions to process death

o No healthcheck capabilities for non-terminating processes

o No controls on respawn loops of processes

ID Name CategoryAVL.5.1 Kernel-Level Non-Intrusive Application Monitor Without

Modifying Application Code.Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a a service to enable non-intrusive monitoring of processes at the kernel level. To enable monitoring, the following capabilities shall be provided;

Communication between the monitoring process and the kernel.

Registering a list of processes.

Ability to define policy based on process events including process/thread creation and exit.

Ability to take action whenever an event occurs.

ID Name CategoryAVL.5.2 Kernel-Level Non-Intrusive Application Monitor Using a Defined

API Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a service to enable non-intrusive monitoring of processes at the kernel level through a defined API. Any application to be monitored will need to use this API.

5

Page 10: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.6.0 Disk Predictive Analysis Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide capabilities to assist in predictive analysis of disks. The aim of this support is to assist in predicting situations likely to lead to failure of disks. This allows preventive action to be taken to avoid the failure and resulting disruption of service.

Note that this could be considered a subset of the requirement SMM.7 Diagnostics and Monitoring Framework, but since isolated mechanisms to support this requirement currently exist, it is listed as a separate requirement.

ID Name CategoryAVL.7 Redundant Paths to Resources Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable redundant access paths to system resources.

The software shall handle sending and receiving data via redundant paths without conflicts, and provide high-availability access to resources even if an error occurs in one of the redundant paths.

ID Name CategoryAVL.7.1 Multi-Path Access to Storage Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a single cluster node to storage devices. The software shall determine if multiple paths exist to the same port of the I/O device, and, with configurable controls, balance I/O requests across multiple host bus adapters. If multiple paths exist to the same device over two separate device ports on the same host bus adapter, those I/Os will not be balanced.

Handling a path failure must be automatic. A mechanism must be provided for the reactivation of failed paths, allowing them to be placed back in service. It must be possible to automatically determine and configure multiple paths. Automatic configuration shall allow automatic multi-path configuration of complete disks and partitions located on those disks

A multipath device feature that allows multipath detection and mapping early in the boot process must be provided so that the root file system can exist on a multipath device.

6

Page 11: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.8 Fast System Startup Within Kernel Space Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide capabilities to allow a single system to move from power-on to ready in as short a time as possible.

The normal startup sequence includes:

1. Power on and boot (includes BIOS initialization)

2. Load the Linux image

3. Start and initialize Linux

A cold start (BIOS to operating system handoff) comprises steps 1 through 3. A warm start (operating sytem to operating system handoff) comprises steps 2 and 3.

Fast system startup capabilities include the ability to:

Bipass BIOS initialization by beginning the startup sequence at step 2 (see AVL 10.1).

Bipass initialization of the Linux image in step 3 (See AVL 10.2).

Complete a parallel initialization of device drivers in step 3 (See AVL 10.3).

ID Name CategoryAVL.8.1 Fast Linux Restart Bypassing BIOS Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to speed up operating system initialization by bypassing the BIOS when one instance of Linux reboots to another instance of Linux.

ID Name CategoryAVL.9.0 Boot Image Fallback Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables a system to fallback to a previous "known good" boot image in the event of a catastrophic boot failure (i.e. failure to boot, panic on boot, failure to initialize HW/SW). System images are captured from the "known good" system and the system reboots to the latest good image. This mechanism would allow an automatic fallback mechanism to protect against problems resulting from system changes, such as program updates, installations, kernel changes, and configuration changes."

ID Name CategoryAVL.10.0 Live Patching Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide the mechanism for dynamically replacing the symbols of a running process without restarting. Dynamic replacement of symbols allows a process to access patched functions or values without restarting and can improve process availability.

7

Page 12: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

5 Availability Roadmap

ID Name CategoryAVL.3 Forced Device Removal Availability

Description: See description in Availability Requirements section above.

ID Name CategoryAVL.3.1 Block Device Removal

Description: OSDL CGL specifies that Linux shall allow removal of a block device while it is in use without degrading the reliability of the system. The block device shall be removable even if it has been placed in use by an open file command, such as fdisk /dev/sda; it is a member of a RAID-1 volume; a file system is mounted on the device; or a combination of these. If a file is in use and it cannot be serviced by a mirrored disk, the operating system shall return an error to the system calls referencing that file.

ID Name CategoryAVL.3.3 Forced Unmount Application Notification Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a notification mechanism when a forced unmount of a file system occurs. The notification mechanism should send a signal or other message to a process that attempts to access a file on an umounted volume.

ID Name CategoryAVL.4 Memory Overcommit Actions Availability

Description: See description in Availability Requirements section above.

ID Name CategoryAVL.4.2 Replaceable OOM Killer Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms to allow the replacement of the out-of-memory (OOM) killer algorithm within the kernel. In an environment in which an application is made up of many processes, the act of killing any single process may prevent the application from continuing to provide service while leaving its remaining processes running and preventing proper recovery. Hence it must be possible to provide a replacement algorithm that can take the relationships between processes into account when determining which ones to slay. By default the current algorithm in the kernel is used. The new algorithm can be activated by loading the relevant kernel module.

8

Page 13: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.4.3 Low Memory Condition Monitor Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a low memory condition monitor. To avoid encountering a true out-of-memory (OOM) condition in the Linux kernel, a user-space facility should be provided to monitor memory usage and take action based on a configurable low-memory threshold. This threshold would be set to predict an OOM condition before it becomes critical. The threshold would apply to both physical memory and swap area.

The application should record the top N memory-consuming processes, so that when the threshold is reached, processes that are not on the user-defined do-not-kill list that are trending up in memory use can be killed. This capability would allow the application to tell the kernel to stop allocating memory to user-space processes. When applications run out of pre-allocated memory, the system could remain nominally in service until more memory becomes available.

ID Name CategoryAVL.4.4 Low Memory Notification Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a low memory notification mechanism.

Whenever a low memory condition is detected, the mechanism shall generate a remote notification. Notification methods shall support enterprise-level notification protocols such as SNMP or CIM. See:

STD.7 SNMP (for IPv4 and IPv6)

STD.12.0 CIM

ID Name CategoryAVL.5 Non-Intrusive Monitoring of Processes Availability

Description: See description in Availability Requirements section above.

9

Page 14: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.5.3 Process-Level Non-Intrusive Application Monitor Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide control and management capabilities for processes that cannot be altered to incorporate a monitoring API. Such capabilities are known as non-intrusive monitoring. These capabilities must be implemented programmatically using commands or scripts.

Another issue for many such processes is that the start script itself may spawn an application process that is not under the control of the management process. This sub-requirement assumes that this does not happen, and the child process remains under the control of the management entity.

Capabilities required:

The following capabilities must be enabled for controlling processes:

o The ability to start a process (or a list of processes)

o The ability to stop a process (or a list of processes)

The following capabilities must be enabled for monitoring processes:

o The ability to detect the unexpected exit of a process

o The ability to configure a set of actions in response to an unexpected exit of a process

The following services must be provided beyond those currently provided by inittab:

o The ability to configure whether to restart the application if the process dies

o A configurable amount of time to wait before restarting the application

o A limit on the number of times to restart the application

ID Name CategoryAVL.7 Redundant Paths to Resources Availability

Description: See description in Availability Requirements section above.

ID Name CategoryAVL.7.2 Advanced Multi-Path Access to Storage Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. The mechanism should implement the following features:

Ability to boot from SAN storage using the multipath mechanism.

Ability to use a swap partition on a multipath disk.

Kernel support for a path-switching policy.

Error logs must provide easy device identification

10

Page 15: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.7.3 Redundant Communication Paths Availability

Description: OSDL CGL specifies that Linux shall provide support for redundant communication paths between nodes to improve network availability. The system should handle sending and receiving data between nodes via redundant communication paths without any conflicts.The path should form logical or physical end-to-end redundant paths.

ID Name CategoryAVL.8 Fast System Startup Within Kernel Space Availability

Description: See description in Availability Requirements section above.

ID Name CategoryAVL.8.2 Fast Linux Start Using Known-Devices Database Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to speed up operating system initialization. The improvement in boot speed could be achieved by leveraging boot load to inform the operating system of previously connected devices, or the known devices could be derived from a previously running instance of the operating system.

ID Name CategoryAVL.8.3 Parallel Driver Initialization During Startup Availability

Description: OSDL CGL specifies that, if multiple drivers are compiled into the Linux Kernel, the initialization or probing routines of those drivers execute in parallel. CGL further specifies that, if multiple drivers are to be loaded as modules, the driver modules are loaded in parallel. CGL further specifies that in either of these two cases, a driver is only initialized once its dependent drivers have initialized.

ID Name CategoryAVL.11.0 Fault Isolation Enabling Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide support to report anomalies detected on a compute node. The objective in reporting these anomalies is to provide data for fault isolation mechanisms. Software-related failures may require actions like the restart or termination of a process or the unloading and reinstallation of a kernel module. Hardware-related failures may require actions to restart, turn off, or isolate a failing device.

OSDL CGL specifies that carrier grade Linux shall provide mechanisms to isolate faulty software or hardware components. These mechanisms can be activated by management middleware fault isolation policies.

11

Page 16: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.12.0 NFS Client Protection Across Server Failures Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide mechanisms that allow an NFS server to have failover capability to provide service continuity upon a node failure. The NFS service has to be resumed on another node without any impact on NFS clients other than the retransmission of pending requests (open files must remain open). Clients authenticated on the old server must remain authenticated on the new server.

ID Name CategoryAVL.13 Fast System Startup Within User Space Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a variety of capabilities to allow a single system to move from a power-on state to an application-ready state in as short a time as possible.

The normal startup sequence includes:

1. Power on and boot (includes BIOS initialization)

2. Load the Linux image

3. Start and initialize Linux

4. Start application

ID Name CategoryAVL.13.1 Parallel User Initialization During Startup Availability

Description: OSDL CGL specifies that the user initialization procedure executed by the program /sbin/init shall provide a mechanism to allow multiple init scripts to run in parallel. CGL further specifies that a service is only started once its dependent services have started.

ID Name CategoryAVL.14.0 Excessive CPU Cycle Usage Detection Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that detects excessive CPU cycle usage by any process or thread. To enable detection, the following capabilities shall be provided:

Communication between the monitoring process and the kernel.

Registering a list of processes or threads and their allowed CPU cycle thresholds.

Ability to define policy based on process events including process/thread creation and exit.

Ability to take action whenever an event occurs.

Ability to set the CPU cycle threshold to a resolution of one millisecond.

12

Page 17: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.15.0 Fast Application Restart Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables a quick application restart. Typical applications in a carrier environment use multiple processes with inter-process communications. As applications become more complex, application initialization times become longer.

To speed up application initialization, the mechanism shall provide the functionality to simultaneously save memory images of multiple processes (including the kernel resources used by each process) and to restore the images.

When the application completes initialization, including making connections between processes and setting up kernel resources for inter-process communication, the application invokes a save function that makes a copy of the memory images of the process and kernel resources. If the application hangs, the mechanism restores the memory images and kernel resources and restarts the application.

ID Name CategoryAVL.16.0 Fallback Operation Mechanism Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism that enables or disables specific functions that allow system fallback mode operation when an overload condition is detected. It is desirable that the mechanism provide the functions below:

A softirq-based interrupt handler.

Temporal roll-in/roll-out.

Temporal low priority daemon execution stops.

ID Name CategoryAVL.17.0 Multiple FIB Support Availability

Description: OSDL CGL specifies that Linux shall support multiple Forwarding Information Base (FIB) quick look-up tables with forwarding addresses to allow better server virtualization of overlapping addresses.

An FIB is a table that contains a copy of the forwarding information in the IP routing table. All hooks/changes required to support multiple FIBs shall be added.

ID Name CategoryAVL.18.0 iSCSI Error Handling Support Availability

Description: OSDL CGL specifies that the iSCSI Initiators implemented by carrier grade Linux should support the following iSCSI options:

Header and Data Digests

Error recovery level 1 as specified by RFC3270

13

Page 18: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

ID Name CategoryAVL.19.0 Application Profiler Availability

Description: OSDL CGL specifies that carrier grade Linux shall provide a mechanism to profile critical resources of the kernel and applications. The critical resources that are profiled by this mechanism shall include (but are not limited to):

Time used

Memory used

Number of semaphores, mutexes, sockets, and threads/child processes in use

Number of open files.

Monitoring shall happen at configurable, periodic intervals or as initiated by the user.

ID Name CategoryAVL.20.0 Kernel Resources Expansion for Threads Availability

Description: OSDL CGL specifies that carrier grade Linux shall expand available kernel resources to provide additional support for threads. The existing thread model is defined as a lightweight process model; therefore some thread kernel resources are missing. Threads are widely used in carrier grade level applications, so at least the following additional kernel resource functionality shall be provided to support threads:

1. Full SIGNAL support – The SIGNAL should be sent to each thread.

2. Full rlimit support – The rlimit parameter should be supported for each thread.

14

Page 19: Introduction to CGL Availability Requirements

Carrier Grade Linux Availability Requirements Definition Version 3.0

Appendices

A.1 General Systems References

POSIX:

Should these links have test describing what they are pointing to? http://www.opengroup.org/ http://www.unix.org/online.html http://www.opengroup.org/onlinepubs/007908799/ http://posixtest.sf.net for more POSIX conformance data on Linux. POSIX Technical Corrigendum 1 text:

http://www.opengroup.org/pubs/catalog/u057.htm

POSIX Specification with current Technical Corrigendum: http://www.unix.org/version3/

Linux Standard Base, Free Standards Group: http://www.linuxbase.org/ http://www.freestandards.org/

Service Availability Forum: http://www.saforum.org/

IETF: http://www.ietf.org/rfc.html

15