Upload
caitlin-aston
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
User-space System Device Enumeration (uSDE)
Mark Bellon
MontaVista Software, Inc.
uSDE
• Enumerate - to specify one after another– Specify/instantiate/remove system devices
• create
• delete
• diagnostics
– Deal with devices in a dynamic environment• system start up
• hot insertions and removals
uSDE (1)?
• An architecturally and philosophically neutral framework for enumerating the devices attached to a computer system
• An open, extensible implementation (even in real-time!) of device enumeration that supports one or more systems of enumeration - simultaneously if necessary!
uSDE (2)?
• Provides transaction protected consistent real-time (low latency) access to data
• Designed for carrier grade and embedded environments; desktops fall out trivially
• Optimized for speed; can handle a huge number of devices
• Small and reliable
uSDE (3)?
• It did not start life as as specialized or limited handler; from its beginning it has been designed to handle all device types
• It does not mandate a formal database
• It operates entirely in user space– MVL CGE 3.1– 2.6 test 6 or later
uSDE Overview
uSDE executivedaemon
uSDE/sbin/hotplugreplacement
uSDE scanner
uSDE agent
uSDE utility ConfigurationFiles
backing-store(optional)
exec-cache
PolicyMethod
PolicyMethod
PolicyMethod
PolicyMethod
uSDE utility
uSDE agent
uSDE External Stimuli (1)
uSDE executivedaemon
uSDE/sbin/hotplugreplacement
uSDE scanner
uSDE agent
Appear events
Insert/remove events
Aspect-change events
uSDE External Stimuli (2)
• uSDE /sbin/hotplug replacement– A binary that provides the functionality of
existing shell scripts– Forwards all hotplug events to the uSDE
executive for processing– Device insert and remove event are of
particular interest
uSDE External Stimuli (3)
• uSDE scanner– Invoked by the uSDE executive to determine
the initial ensemble of system devices– Scans sysfs for appropriate devices and sends
“appear” events– Typically runs only once (when uSDE
executive runs for “the first time”
uSDE External Stimuli (4)
• uSDE agent– A program, usually a daemon, that provides
information necessary for the manipulation of a device that is otherwise unavailable from sysfs, /proc or the kernel
– Commonly used to send aspect-change events• Multi-chassis, geographical addressing
– ATCA
– “well known” platforms
• IPMI and/or networks
uSDE Executive (1)
uSDE executivedaemon
PolicyMethod
PolicyMethod
PolicyMethod
PolicyMethod
Internal Events
uSDE Executive (2)
• Loads configuration files
• Determines initial device ensemble– device scanner
• Initializes event/device handlers– sends (internal) “init” event to each handler
• Processes events– handles out of order arrival issues
uSDE Executive (3)
• Event processing– Classifies device associated with an event– Maps external event to an internal event– Queues the internal event for servicing– Schedules internal event processing– Provides logging of critical data
uSDE Executive (4)
• Device classification (phase 1)– Derived directly from device’s sysfs path
• class– disk, ethernet, cdrom, floppy, loop, raid, etc.
• sub-class– sda -> class “disk”, sub-class “scsi”
– eth0 -> class “ethernet”, sub-class “generic”
uSDE Executive (5)
• Device classification (phase 2)– sub-class from phase 1 may be updated
• Determine parent device
• Search for additional information and, if present, override initial classification
– “scsi” may become “fibrechannel”, “ieee-1394”, etc.
– “ide” may become “eide”, “serial-ata” , etc.
– No limitations on sub-class override– pci-info file provides information for this phase
uSDE Executive (6)
• The internal event is queued for service– sysfs path of device– internal event type– class and sub-class assigned to device
• Enumeration service maintains queues– each class has a queue– sub-class is ignored
uSDE Executive (7)
• Device queues are aggressively scheduled– All queues may be running concurrently– No concurrent servicing within a queue
• Events may be coalesced– identical event type and sub-class– each sysfs path is added to a list
• A service container is invoked in response to an event
uSDE Executive (8)
• A service container is a list of one or actions that are invoked in a definite order– a configuration file specifies the service containers
• Class and sub-class control handling– A service container is associated with each class and
sub-class
• An internal event is sent to each action within the service container
uSDE Executive (9)
• An action contained in a service container is known as a policy method– implement the policies of its designer– Each policy method is sent the same parameters
• Policy methods must be prepared to accept multiple arguments (devices)– minimized number of invocations– “closeness” optimizations are possible
uSDE Policy Methods (1)
PolicyMethod
PolicyMethod
PolicyMethod
PolicyMethod
uSDE Policy Methods (2)
• Policy methods:– Are Linux programs
• Write in any language you wish including shells
– Are invoked with a standardized command line• class
• sub-class
• event type
• device argument(s) - sysfs path
• standardized options
uSDE Policy Methods (3)
• Policy methods:– actually enumerates a device– determine which instance within class should
be associated with a device– are free to implement whatever policies they
see fit
uSDE Files (1)
uSDE executivedaemon
uSDE utilities ConfigurationFiles
Backing store(optional)
exec-cache
PolicyMethod
PolicyMethod
PolicyMethod
PolicyMethod
uSDE Files (2)
• Human readable - ASCII
• Formal grammars (YACC) for each file– One can be sure the file is valid
• Hand optimized lexer for speed– still room for improvement
• Separate API for each file via shared library– No wasted memory
uSDE Files (3)• Deployment-model
– how to handle events and permissions
• hardware-map (optional)
– how to control your special hardware
• pci-info (optional)
– additional information for classification
• backing-store (optional)
– a place to retain critical information
• exec-cache (optional in the future (special case))
– executive caches classification here
uSDE Policy Method Toolkit (1)
Trivial PolicyMethod
PersistentPolicy
Method
EmulationPolicy
Method
A wonderful set of sample code to play with...
Policy Method Toolkit (2)
• disk-ide-policy– implements persistent device naming
• Vendor/model string, Serial number
– handles IDE, EIDE, serial ATA and USB hosted [E]IDE devices
– Implements replacement and relocation policies for [E]IDE and mapped serial ATA
Policy Method Toolkit (3)
• disk-scsi-policy– implements persistent device naming
• Vendor ID, Product ID, Serial number– handles parallel SCSI, IEEE-1394,
FibreChannel and USB hosted SCSI devices– handles multi-ported storage devices– implements replacement and relocation policies
for parallel SCSI
Policy Method Toolkit (4)
• floppy-policy– handles internal floppies– USB floppies show up as disks
• simple-device-policy– handles block and character devices– “catch all” for many device classes
Policy Method Toolkit (5)
• ethernet-policy– implements persistent device naming
• initial MAC address
– implements replacement and relocation policies• USB ethernet devices not supported yet (trivial)
– uses hardware-map file to insure specific interfaces retain names despite device search order
Policy Method Toolkit (6)
• Emulation policies (for those that need it)– devfs– Linux Standard Base (LSB)
Policy Method Toolkit (7)
• Special purpose policies– disk-cs-policy
• An example of a policy that makes use of an agent
• Names are based on the geographical address of a disk in a chassis/slot environment
– multipath-policy• automatic provisioning of multi-ported disks
• Not limited to SCSI or FibreChannel
Where is it?
• http://sourceforge.net/projects/usde
• http://source.mvista.com/sde
Future Directions (1)
• A sufficient portion of our ideas are expressed in this prototype; it’s time to get lots of feedback and additional input
• Implementation is open source and available
• sourceforge project is up and running
Future Directions (2)
• Event mechanism is a closed socket hack. This should be replaced with an open messaging system
• grammar cleanup throughout
• classification scheme should be reviewed, simplified; scripted?
• Utilities should be improved and expanded– helpers for scripted policies that want retention
Future Directions (3)
• general walk-through and review
• multipathing - additional controls
• more device classes; more policies
• devfs and lsb emulation needs work
• flood of ideas from the community
• backing store content wars
Discussion Items
• Disk naming• Multi-chassis agent example• backing-store and deployment-model examples• Critical definitions• Configuration file details• Transaction details• More on events
A Few Definitions (1)
• Interface Technology Path (ITP)– The unique, unambiguous and repeatable path
over which a system traverses hardware to arrive at the “location” of a device.
– Must remain constant across system crashes, reset and reboots
– For PCI devices the ITP is the Slot Path Address (SPA) of a device
A Few Definitions (2)
• Interface Domain IDentifier (IDID)– The unique identification of a device within the
domain managed by the device’s parent device (controller/interface/adapter)
– Examples : address/LUN, Dev/Func
A Few Definitions (3)
• Device Discrimination (DD)– The ability to discern a difference between
devices that on the surface appear to be identical. Specifically, it is the ability to uniquely identify one device from another where the devices share the same class, vendor and product descriptions
A Few Definitions (4)
• Device Discrimination (continued)– The most common form of device
discrimination is implemented via a serial number
– When a device is not discriminatable a useful equivalent is possible - use the ITP and IDID!
A Few Definitions (5)
• Persistent Device Naming (PDN)– Associates a unique name with a device based
on several of the device’s attributes– This differs from the current Linux device
naming scheme where the “name” of a device is actually a (shorthand) description of the data path and selection criteria used to access the device
A Few Definitions (6)
• Persistent Device Naming (PDN) (cont.)– Persistently named devices must provide an
ensemble of attributes, including the ITP, IDID and DD, that unambiguously discriminates one device from all others. It is then possible to recognize and insure that the device name remains constant regardless of how the device is interfaced to the a system
A Few Definitions (7)
• Persistent Device Naming (PDN) (cont.)– When a device’s name cannot be built directly
from its attributes some form of non-volatile storage must be be available to record the unique attributes along with the name assigned (aliased) to the device
uSDE Files in Detail (1)
uSDE executivedaemon
uSDE utilities ConfigurationFiles
Backing store(optional)
exec-cache
PolicyMethod
PolicyMethod
PolicyMethod
PolicyMethod
deployment-model File (1)
• service directive– Specifies which list of policy methods is
associated with a given class and sub-class
• device-node-default directive– specifies the device node control information for
a given class and sub-class• mode
• group
• owner
deployment-model File (2)• device-node-specific directive
– specifies the device node control information for a specific device - class and instance within class
• mode• group• owner
• alias directive– specifies an alias associated with a specific device
- class and instance within class
hardware-map File
• Optional
• map directive– specifies that a particular device, identified by
its ITP, is to be treated as a specific instance within a class
– force eth0 hardware to stay eth0 no matter what the discovery order
• Additional information in the future
pci-info File
• Specifies the sub-class associated with a given PCI device by mapping the PCI vendor and product registers to a sub-class
• Will be generalized to handle other interfaces in the near future
• Optional
exec-cache File
• Not a configurable file• Used internally by the uSDE executive to cache
the mapping of a sysfs path to class and sub-class– Have to remember how a device was classified
so the correct service action can be invoked upon remove/disappear
• Will be made optional via a special insert/appear mode of the executive in the near future
backing-store File
• Optional file used to store non-volatile information– policy methods store their data, if any, here– simple “data base”– hierarchical model
File Transactions (1)
• All files are protected via a transaction framework
• Transaction framework is tuned for speed and simplicity– lock contention is expected to be minimal– files are expected to be small– files are human readable - ASCII
File Transactions (2)• Serialization is performed at transaction start
and end times:– lock is held only within the formal transaction start and
end routines– All of the files involved in the transaction are read into
memory– Modified files are rewritten if modified– transaction must be repeated if modified file has been
previously modified (but after transaction start within a given thread) by another thread of execution
uSDE External Events (1)
• Insert event– a device has been physically inserted into the
system
• Remove event– a device has been physically removed from the
system
uSDE External Events (2)
• Appear event– a device has been detected that was not inserted
• initial device scanning
• diagnostics (return to service)
• Disappear event– a device currently known to the system and in
service has disappeared from the system• no longer in service
• diagnostics (removal from service)
uSDE External Events (3)
• Aspect-Change Event– A parameter associated with a device has
become available or has changed• information otherwise unavailable from the kernel
• “unusual” information sources - “out of band”
Unambiguous Disk Naming (1)
• Names should be persistent– Name remains fixed across reboots and
configuration changes
• Multi-ported disks are a challenge:– How is a disk named?– How does on unambiguously access a port?– How does generic SCSI logically work?
• One node or multiple?
Unambiguous Disk Naming (2)
• /dev/sde-disk/disk-name/d<n>p<m>– <n> is data port number (all disks have 0)– <m> is partition number
• generic SCSI node is either:– generic (if one)– generic_d<n> (if multiple)
• multi-path nodes are “multi_p<m>”
backing-store details (1)object "ethernet0"{
string "class" "ethernet"string "sub-class" "generic"string "vendor-string" "Intel Corp. 82544EI Gigabit Ethernet Controller"string "product-string" "Intel Corp. 82544EI Gigabit Ethernet Controller"string "discriminator" "00:02:b3:c3:5d:ac"string "interface-technology-path" "/devices/pci0000:00/0000:XX:03.0/0000:XX:1d.0/0000:XX:01.0"integer "class-instance" 0string "state" "present"
}
backing-store details (2)object "disk0"{
string "device-path" "/dev/sde-disk/disk0"string "class" "disk"string "sub-class" "fibrechannel"string "vendor-string" "IBM "string "product-string" "DDYF-T36950R "string "discriminator" "TFF6C829"integer "class-instance" 0string "state" "present"string "service-location" "unknown"object "ports"{
object "0"{
string "interface-technology-path" "/devices/pci0000:00/0000:XX:02.0/0000:XX:1d.0/0000:XX:01.0"string "interface-domain-ID" "0:9:0"string "sysfs-path" "/sys/block/sdd"integer "reference-count" 3
}object "1"{
string "interface-technology-path" "/devices/pci0000:00/0000:XX:02.0/0000:XX:1d.0/0000:XX:01.1"string "interface-domain-ID" "0:9:0"string "sysfs-path" "/sys/block/sdb"integer "reference-count" 3
}}
}
deployment-model details
service-container disk fibrechannel { disk-scsi-policy multipath-policy }service-container disk ide { disk-ide-policy lsb-policy devfs-policy }service-container ethernet generic { ethernet-policy }
device-node-default disk fibrechannel{
mode 0x642owner “root”group “foo”
}
Multi-chassis agent example (1)
CPU
SLOT
2
Chassis 0x1234 Chassis 0x5678
CPU
SLOT
1
CPU
SLOT
3
CPU
SLOT
1
CPU
SLOT
3
CPU
SLOT
4
DISK
SLOT
4
DISK
SLOT
2
Disk
Net
Chassis have their disks and networks interconnected Hot swap notification is limited to the chassis (IPMI)A publisher agent broadcasts hot swap events to other chassisEach CPU runs a subscriber agent - processes hot swap eventsEach CPU is running a uSDE executive
Multi-chassis agent example (2)
Hot Swap Subscriberand uSDE agent
uSDE Executive
disk-cs-policy
Chassis 0x1234, slot 1,2 3
Publisher
Insert event for chassis 0x5678, slot 2, disk ID
Aspect-change event
Aspect-change event
Hot Swap Subscriberand uSDE agent
uSDE Executive
disk-cs-policy
Chassis 0x5678, slot 1,3, 4
/dev/chassis5678/slot2/... /dev/chassis5678/slot2/...
Create device node