100
1 Linux Operating System 許 許 許

1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Embed Size (px)

Citation preview

Page 1: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

1

Linux Operating System

許 富 皓

Page 2: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

2

NameSpace

Page 3: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Namespace [Michael Kerrisk]

Currently, Linux implements six different types of namespaces.

The CLONE_NEW* identifiers listed in parentheses are the names of the constants used to identify namespace types when employing the namespace-related APIs (clone(), unshare(), and setns() )

3

Page 4: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Six Linux Namespaces

Mount namespaces (CLONE_NEWNS, Linux 2.4.19) UTS namespaces (CLONE_NEWUTS, Linux 2.6.19) IPC namespaces (CLONE_NEWIPC, Linux 2.6.19) PID namespaces (CLONE_NEWPID, Linux 2.6.24) Network namespaces (CLONE_NEWNET, Linux 2.6.29)  User namespaces (CLONE_NEWUSER, Linux 3.8)

4

Page 5: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Goals of Namespace (1) [Michael Kerrisk]

The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

5

Page 6: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Goals of Namespace (2) [Michael Kerrisk]

One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization (as well as other purposes) that provides a group of processes with the illusion that they are the only processes on the system

6

Page 7: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PID Namespace [Michael Kerrisk]

The global resource isolated by PID namespaces is the process ID number space.

This means that processes in different PID namespaces can have the same process ID.

PID namespaces are used to implement containers that can be migrated between host systems while keeping the same process IDs for the processes inside the container.

7

Page 8: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Process PID [Michael Kerrisk]

As with processes on a traditional Linux (or UNIX) system, the process IDs within a PID namespace are unique, and are assigned sequentially starting with PID 1.

Likewise, as on a traditional Linux system, PID 1—the init process—is special: it is the first process created within the namespace, and it performs certain management tasks within the namespace.

8

Page 9: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Creation of a New PID Namespace [Michael Kerrisk]

A new PID namespace is created by calling clone() with the CLONE_NEWPID flag.

child_pid = clone(childFunc, child_stack, CLONE_NEWPID | SIGCHLD, argv[1]);

9

Page 10: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PID Namespace Hierarchy[Michael Kerrisk]

PID namespaces form a hierarchy: A process can "see" only those processes

contained in its own PID namespace and in the child namespaces nested below that PID namespace.

If the parent of the child created by clone() is in a different namespace, the child cannot "see" the parent; therefore, getppid() reports the parent PID as being zero.

10

Page 11: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PID Namespace Hierarchy [text book]

11

Page 12: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

/proc/PID Directory[Michael Kerrisk]

Within a PID namespace, the /proc/PID directories show information only about processes within that PID namespace

or processes within one of its descendant namespaces.

12

Page 13: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Mount a proc filesystem[Michael Kerrisk]

However, in order to make the /proc/PID directories that correspond to a PID namespace visible, the proc filesystem ("procfs" for short) needs to be mounted from within that PID namespace.

From a shell running inside the PID namespace (perhaps invoked via the system() library function), we can do this using a mount command of the following form:

# mount -t proc proc /mount_point

13

Page 14: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Nested PID Namespaces[Michael Kerrisk]

PID namespaces are hierarchically nested in parent-child relationships.

Within a PID namespace, it is possible to seeall other processes in the same namespace,

as well asall processes that are members of descendant

namespaces.

14

Page 15: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

“See” a Process [Michael Kerrisk]

Here, "see" means being able to make system calls that operate on specific PIDs.e.g., using kill() to send a signal to process.

Processes in a child PID namespace cannot see processes that exist (only) in the parent PID namespace (or further removed ancestor namespaces).

15

Page 16: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PID returned by getpid() [Michael Kerrisk]

A process will have one PID in each of the layers of the PID namespace hierarchy starting from the PID namespace in which it resides through to the root PID namespace.

Calls to getpid() always report the PID associated with the namespace in which the process resides

16

Page 17: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Traditional init Process and Signals

The traditional Linux init process is treated specially with respect to signals. The only signals that can be delivered to init are

those for which the process has established a signal handler.

All other signals are ignored. This prevents the init process—whose presence

is essential for the stable operation of the system —from being accidentally killed, even by the super user.

17

Page 18: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

init Processes of Namespaces and Signals

PID namespaces implement some analogous behavior for the namespace-specific init process. Other processes in the namespace (even privileged

processes) can send only those signals for which the init process has established a signal handler.

Note that (as for the traditional init process) the kernel can still generate signals for the PID namespace init process in all of the usual circumstances

e.g., hardware exceptions, terminal-generated signals such as SIGTTOU, and expiration of a timer.

18

Page 19: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Signals from Ancestor Namespaces

Signals can be sent to the PID namespace init process by processes in ancestor PID namespaces.

Again, only the signals for which the init process has established a handler can be sent, with two exceptions: SIGKILL

 and SIGSTOP.

19

Page 20: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

init Process and SIGKILL and SIGSTOP

When a process in an ancestor PID namespace sends SIGKILL and SIGSTOP to the init process, they are forcibly delivered (and can't be caught).

The SIGSTOP signal stops the init process; SIGKILL terminates it.

20

Page 21: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Termination of init Processes

Since the init process is essential to the functioning of the PID namespace, if the init process is terminated by SIGKILL (or it terminates for any other reason), the kernel terminates all other processes in the namespace by sending them a SIGKILL signal.

21

Page 22: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Connection between Processes and Namespaces

22

struct nsproxy *nsproxy;

Page 23: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Definition of struct nsproxy

struct nsproxy {

atomic_t count;

struct uts_namespace *uts_ns;

struct ipc_namespace *ipc_ns;

struct mnt_namespace *mnt_ns;

struct pid_namespace *pid_ns;

struct net *net_ns;

};

A nsproxy is shared by processes which share all namespaces. As soon as a single namespace is cloned or unshared, the nsproxy

is copied.23

Page 24: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

struct nsproxy

A structure to contain pointers to all per-process namespaces - fs (mount), uts, network, ipc, etc.

'count' is the number of processes holding a reference.

24

Page 25: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Initial Global Namespace

struct nsproxy init_nsproxy = {

.count = ATOMIC_INIT(1),

.uts_ns = &init_uts_ns,

#if defined(CONFIG_POSIX_MQUEUE)|| defined(CONFIG_SYSVIPC)

.ipc_ns = &init_ipc_ns,

#endif

.mnt_ns = NULL,

.pid_ns = &init_pid_ns,

#ifdef CONFIG_NET

.net_ns = &init_net,

#endif

};

25

Page 26: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Process Identification Number

Unix processes are always assigned a number to uniquely identify them in their namespace.

This number is called the process identification number or PID for short.

Each process generated with fork or clone is automatically assigned a new unique PID value by the kernel.

26

Page 27: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

27

Process ID PIDs are numbered sequentially in each PID namespace:

the PID of a newly created process is normally the PID of the previously created process increased by one.

Of course, there is an upper limit on the PID values; when the kernel reaches such limit, it must start recycling the lower, unused PIDs. By default, the maximum PID number is PID_MAX_LIMIT-1

(32,767 or 262143).

Page 28: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Maximum PID Number

#define PAGE_SHIFT 12

#define PAGE_SIZE 1UL << PAGE_SHIFT)

#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)

#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \

(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

P.S.: PID_MAX_LIMIT is equal to 215 (32768) or 224.

#define PIDMAP_ENTRIES ((PID_MAX_LIMIT + 8*PAGE_SIZE - 1)/PAGE_SIZE/8)

P.S.: PIDMAP_ENTRIES is equal to 1 or 215.

28

Page 29: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PIDs in PID Namespaces

Namespaces add some additional complexity to how PIDs are managed.

PID namespaces are organized in a hierarchy.

29

Page 30: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

A Process May Have Multiple PIDs When a new namespace is created, all PIDs

that are used in this namespace are visible to the parent namespace, but the child namespace does not see PIDs of the parent namespace.

However this implies that some processes are equipped with more than one PID, namely, one per namespace they are visible in. This must be reflected in the data structures.

30

Page 31: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Global IDs

Global IDs are identification numbers that are valid within the kernel itself and in the initial global namespace.

For each ID type, a given global identifier is guaranteed to be unique in the whole system.

31

Page 32: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Local IDs

Local IDs belong to a specific namespace and are not globally valid.

For each ID type, they are valid within the namespace to which they belong, but identifiers of identical type may appear with the same ID number in a different namespace.

32

Page 33: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Global PID and TGID

The global PID and TGID are directly stored in the task_struct, namely, in the elements pid and tgid:

typedef int __kernel_pid_t;

typedef __kernel_pid_t pid_t;

struct task_struct {

...

pid_t pid;

pid_t tgid;

...

} 33

Page 34: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

34

PIDs and Processes

Linux associates a different PID with each process or lightweight process in the system. As we shall see later in this chapter, there is a

tiny exception on multiprocessor systems. This approach allows the maximum

flexibility, because every execution context in the system can be uniquely identified.

Page 35: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

35

Threads in the Same Group Must Have a Common PID

On the other hand, Unix programmers expect threads in the same group to have a common PID. For instance, it should be possible to send a

signal specifying a PID that affects all threads in the group.

In fact, the POSIX 1003.1c standard states that all threads of a multithreaded application must have the same PID.

Page 36: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

36

Thread Group

To comply with POSIX 1003.1c standard, Linux makes use of thread groups.

The identifier shared by the threads is the PID of the thread group leader , that is, the PID of the first lightweight process in the group.

The thread group ID of a thread group is called TGID.

Page 37: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

37

Process Groups

Modern Unix operating systems introduce the notion of process groups to represent a job abstraction. For example,

in order to execute the command line: $ ls | sort | more

a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more.

In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise).

Page 38: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Process Groups [waikato]

One important feature is that it is possible to send a signal to every process in the group.

Process groups are used for distribution of signals, andby terminals to arbitrate requests for their

input and output.

Page 39: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Process Groups [waikato]

Foreground Process Groups A foreground process has read and write access to

the terminal. Every process in the foreground receives SIGINT (^C

) SIGQUIT (^\ ) and SIGTSTP signals. Background Process Groups

A background process does not have read access to the terminal.

If a background process attempts to read from its controlling terminal its process group will be sent a SIGTTIN.

Page 40: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

40

Group Leaders and Process Group IDs Each process descriptor includes a field

containing the process group ID. Each group of processes may have a

group leader, which is the process whose PID coincides with the process group ID.

Page 41: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

41

Creation of a New Process Group [

Bhaskar]

A newly created process is initially inserted into the process group of its parent.

The shell after doing a fork would explicitly call setpgid to set the process group of the child.

The process group is explicitly set for purposes of job control.

When a command is given at the shell prompt, that process or processes (if there is piping) is assigned a new process group.

Page 42: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

42

Login Sessions

Modern Unix kernels also introduce login sessions.

Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal -- usually, the first command shell process created for the user.

Page 43: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

43

Login Sessions vs. Process Groups

All processes in a process group must be in the same login session.

A login session may have several process groups active simultaneously; one of these process groups is always in the

foreground, which means that it has access to the terminal.

The other active process groups are in the background. When a background process tries to access the terminal, it

receives a SIGTTIN or SIGTTOUT signal. In many command shells, the internal commands bg and fg can

be used to put a process group in either the background or the foreground.

Page 44: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Representation of a PID Namespace

struct pid_namespace {

struct kref kref;

struct pidmap pidmap[PIDMAP_ENTRIES];

int last_pid;

unsigned int nr_hashed;

struct task_struct *child_reaper;

struct kmem_cache *pid_cachep;

unsigned int level;

struct pid_namespace *parent;

:

struct user_namespace *user_ns;

struct work_struct proc_work;

kgid_t pid_gid;

int hide_pid;

int reboot; /* group exit code if this pidns was rebooted */

unsigned int proc_inum;

};

44

Page 45: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

child_reaper Field

Every PID namespace is equipped with a process that assumes the role taken by init in the global picture.

One of the purposes of init is to call wait4 for orphaned processes, and this must likewise be done by the init process of the namespace.

A pointer to the task structure of this process is stored in child_reaper. 45

Page 46: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

parent Field

parent is a pointer to the parent namespace, and level denotes the depth in the namespace hierarchy.

The initial namespace has level 0, any children of this namespace are in level 1, children of children are in level 2, and so on.

Counting the levels is important because IDs in higher levels must be visible in lower levels.

46

Page 47: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pidmap Field

struct pidmap {

atomic_t nr_free;

void *page;

};

#define PIDMAP_ENTRIES ((PID_MAX_LIMIT + 8*PAGE_SIZE - 1)/PAGE_SIZE/8

struct pid_namespace {

:

struct pidmap pidmap[PIDMAP_ENTRIES];

:

}

47

Page 48: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

PID bitmap [1][2][3]

To keep track of which PIDs have been allocated and which are still free, the kernel uses a large bitmap in which each PID is identified by a bit.

The value of the PID is obtained from the position of the bit in the bitmap.

48

Page 49: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Allocate a Free PID

Allocating a free PID is then restricted essentially to looking for the first bit in the bitmap whose value is 0; this bit is then set to 1.

static int alloc_pidmap(struct pid_namespace *pid_ns)

49

Page 50: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Free a PID

Freeing a PID can be implemented by ‘‘toggling‘‘ the corresponding bit from 1 to 0.

static void free_pidmap(struct upid *upid)

50

Page 51: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

struct upid

struct upid represents the information that is visible in a specific namespace.

struct upid {

/* Try to keep pid_chain in the same cacheline as nr for find_vpid */

int nr;

struct pid_namespace *ns;

struct hlist_node pid_chain;

};

51

Page 52: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Fields of struct upid

nr represents the numerical value of an ID, and ns is a pointer to the namespace to which the value belongs.

All upid instances are kept on a hash table to which we will come in a moment, and pid_chain allows for implementing hash overflow lists with standard methods of the kernel.

52

Page 53: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

The Kernel-internal Representation of A PID

struct pid is the kernel-internal representation of a PID.

struct pid

{

atomic_t count;

unsigned int level;

/* lists of tasks that use this pid */

struct hlist_head tasks[PIDTYPE_MAX];

struct rcu_head rcu;

struct upid numbers[1];

}; 53

Page 54: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Type enum pid_type enum pid_type

{

PIDTYPE_PID,

PIDTYPE_PGID,

PIDTYPE_SID,

PIDTYPE_MAX

};

Notice that thread group IDs are not contained in this collection.

This is because the thread group ID is simply given by the PID of the thread group leader, so a separate entry is not necessary. 54

Page 55: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

level Field of struct pid

A process can be visible in multiple namespaces, and the local ID in each namespace will be different.

level denotes in how many namespaces the process is visible (in other words, this is the depth of the containing namespace in the namespace hierarchy).

55

Page 56: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

numbers Field of struct pid

numbers contains a upid instance for each level. Note that the array consists formally of one

element, and this is true if a process is contained only in the global namespace. Since the element is at the end of the structure, additional entries can be added to the array by simply allocating more space.

56

Page 57: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of Field struct upid numbers[]

57

atomic_t count

sruct hlist_head tasks[PIDTYPE_SID]

sruct hlist_head tasks[PIDTYPE_PGID]

sruct hlist_head tasks[PIDTYPE_PID]

int level

struct upid numbers[0]

struct upid numbers[1]

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

Page 58: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

tasks Field of struct pid

The definition of struct pid is headed by a reference counter.

tasks is an array with a list head for every ID type. This is necessary because an ID can be used for several processes.

All task_struct instances that share a given ID are linked on this list.

PIDTYPE_MAX denotes the number of ID types. 58

Page 59: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pids Field of struct task_struct

Since all tast_struck structures that share an identifier are kept on a list headed by tasks, a list element is required in struct task_struct:

struct task_struct {

...

/* PID/PID hash table linkage. */

struct pid_link pids[PIDTYPE_MAX];

...

}; 59

Page 60: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

struct pid_link

struct pid_link

{

struct hlist_node node;

struct pid *pid;

};

60

Page 61: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of Field struct hlist_head tasks[]

61

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

pid

node

pid

node

pid

node

pid

node

pid

node

pid

node

struct task_struct struct task_struct

tasks[0]

tasks[1]

tasks[2]

count

levelint nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

tasks[0]

tasks[1]

tasks[2]

count

level

pids[0]

pids[1]

pids[2]

group_leader group_leader

Page 62: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Create a New pid Instance When a new process is created, a new pid instance is also created.

long do_fork(unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr)

{

:

p = copy_process(clone_flags, stack_start, stack_size, child_tidptr, NULL, trace);

:

}

static struct task_struct *copy_process(unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int __user *child_tidptr, struct pid *pid, int trace)

{

:

if (pid != &init_struct_pid) {

retval = -ENOMEM;

pid = alloc_pid(p->nsproxy->pid_ns);

if (!pid)

goto bad_fork_cleanup_io;

}

:

} 62

Page 63: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pids[] Fields of a New Process [source code]

When a new process is created, The pids[0] field of the task_struct of

the new process will be added to the linked list headed at the tasks[0] field of the new pid instance.

If it is NOT a Thread Group Leader (TGL), the pids[1] (i.e. pids[PIDTYPE_PGID]) and pids[2] (i.e. pids[PIDTYPE_PGID]) fields of its task_struct will not be set.

63

Page 64: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of pids[] Fields of a New non-TGL Process

64

pid

node

pid

node

pid

node

struct task_struct

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

tasks[0]

tasks[1]

tasks[2]

count

level

group_leader

pids[0]

pid

node

pid

node

pid

node

group_leader

struct task_struct

thread group leader

pids[1]

pids[2]

Page 65: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of pids[] Fields of a Session Leader Process

65

pid

node

pid

node

pid

node

struct task_struct

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

tasks[0]

tasks[1]

tasks[2]

count

level

group_leader

pids[0]

pids[1]

pids[2]

Page 66: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Relationship between a PGL and TGL [source code] A Process Group Leader (PGL) must also

be a Thread Group Leader (TGL).

66

Page 67: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of pids[] Fields of a New TGL Process [source code]

When a new process P1 is created by process P2 (P2 is

not a TGL) and P2‘s Process Group Leader (PGL) is P3, The pids[0].node field of the task_struct of P1 will be

added to the linked list headed at the tasks[0] field of the new pid instance.

If P1 is a TGL, the pids[1].node (i.e. pids[PIDTYPE_PGID].node) of its task_struct will be added to the linked list headed at the tasks[1] field of a pid instance. The pid instance is pointed by the pids[PIDTYPE_PGID].pid of P3‘s task_struct.

pids[2].node of the task_struct of P1 is handled similarly. 67

Page 68: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of Field tasks[], if P2 is not a TGL

68

struct pid

P1, struct task_struct

tasks[0]

tasks[1]

tasks[2]

count

pids[0]

pids[1]

pid

node

pid

node

pid

node

group_leader

pids[2]

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader

P2, struct task_structP3, struct task_struct

struct pidstruct pid

Page 69: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of Field tasks[], if P2 is a TGL

69

struct pid

P1, struct task_struct

tasks[0]

tasks[1]

tasks[2]

count

pids[0]

pids[1]

pid

node

pid

node

pid

node

group_leader

pids[2]

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader

P2, struct task_structP3, struct task_struct

struct pidstruct pid

Page 70: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Linked List That Links All Processes in a Process Group

If a process is a process group leader (P.S.: it must also be a thread group leader), the tasks[1] field of the pid instance pointed by the pids[1].pid of its task_struct is the head of the linked list that links the field pids[1].pid of the task_struct of all thread group leaders in the same process group.

70process group leader

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader group_leader group_leader

Page 71: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Thread Group

Processes in the same thread group are chained together through the thread_group field of their tast_struct structures [1][2].

struct task_struct {

:

struct list_head thread_group;

:

} 71

Page 72: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Function attach_pid()

Suppose that a new instance of struct pid has been allocated and set up for a given ID type. It is attached to a task_struct structure as follows:

int fastcall attach_pid(struct task_struct *task, enum pid_type type,

struct pid *pid)

{

struct pid_link *link;

link = &task->pids[type];

link->pid = pid;

hlist_add_head_rcu(&link->node, &pid->tasks[type]);

return 0;

}72

Page 73: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

attach_pid(p,PIDTYPE_PGID,pid)

73

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

tasks[0]

tasks[1]

tasks[2]

count

pid

node

pid

node

pid

node

group_leader group_leader group_leader

pid

p

Page 74: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

struct pid related Helper Functions

Obtain the pid instance associated with the task_struct structure.

The auxiliary functions task_pid, task_tgid, task_pgrp, and task_session are provided for the different types of IDs.

static inline struct pid *task_pid(struct task_struct *task)

{ return task->pids[PIDTYPE_PID].pid; }

static inline struct pid *task_pgrp(struct task_struct *task)

{ return task->group_leader->pids[PIDTYPE_PGID].pid; }

74

Page 75: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Numerical PID related Helper Functions (1) Once the pid instance is available, the numerical

ID can be read off from the upid information available in the numbers array in struct pid.

pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)

{

struct upid *upid;

pid_t nr = 0;

if (pid && ns->level <= pid->level) {

upid = &pid->numbers[ns->level];

if (upid->ns == ns)

nr = upid->nr;

}

return nr;

} 75

Page 76: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Numerical PID related Helper Functions (2) pid_vnr returns the local PID seen from the

namespace to which the ID belongs. pid_nr obtains the global PID as seen from the init process.

Both rely on pid_nr_ns and automatically select the proper level: 0 for the global PID,

and pid->level for the local one.

76

Page 77: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)

77

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

tasks[0]

tasks[1]

tasks[2]

count

level

:

:

struct pid_namespace

unsigned int level;

struct pid_namespace

unsigned int level;

pid

ns

match

nr

numbers[ns->level]

Page 78: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

78

Return Value of the System Call getpid( ) The getpid( ) system call returns the

value of TGID relative to the current process instead of the value of PID, so all the threads of a multithreaded application share the same identifier.

Most processes belong to a thread group consisting of a single member; as thread group leaders, they have the TGID equal to the PID, thus the getpid( ) system call works as usual for this kind of process.

Page 79: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

79

Return Value of the System Call getpid( ) pid_t pid_vnr(struct pid *pid)

{

return pid_nr_ns(pid, task_active_pid_ns(current));

}

static inline pid_t task_tgid_vnr(struct task_struct *tsk)

{

return pid_vnr(task_tgid(tsk));

}

SYSCALL_DEFINE0(getpid)

{

 return task_tgid_vnr(current);

}

Page 80: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Graphic Explanation of getpid( )

80

pid

node

pid

node

pid

node

group_leader

pid

node

pid

node

pid

node

group_leader

current

struct task_struct

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

count

level

:

:

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

count

level

:

:

numbers[level]

struct pid_namespace

unsigned int level;

unsigned int level;

pid_nr_ns

nr

Page 81: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pid_hash Hash Table

Hash table pid_hash is used to find the pid instance that belongs to a numeric PID value in a given namespace.

static struct hlist_head *pid_hash;

81

Page 82: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Size of pid_hash

pid_hash is used as an array of hlist_head.

The number of elements is determined by the RAM configuration of the machine and lies between 24 = 16 and 212 = 4,096 (It seems that the size used in kernel 3.9 is 16[1][2]).

82

Page 83: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

pidhash_init

pidhash_init computes the apt size and allocates the required storage.

void __init pidhash_init(void)

{ unsigned int i, pidhash_size;

pid_hash = alloc_large_system_hash("PID", sizeof(*pid_hash), 0, 18,HASH_EARLY | HASH_SMALL, &pidhash_shift, NULL, 0, 4096);

pidhash_size = 1U << pidhash_shift;

for (i = 0; i < pidhash_size; i++)

INIT_HLIST_HEAD(&pid_hash[i]);

}

83

Page 84: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Hash Function pid_hashfn

static unsigned int pidhash_shift = 4;

#define pid_hashfn(nr, ns) \

hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)

hash_long returns a value between 0 and 15, if the size of the hash table is 16.

84

Page 85: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Add a upid Instance (or numbers[] field) into the Hash Table pid_hash

struct pid *alloc_pid(struct pid_namespace *ns)

{

struct pid *pid;

enum pid_type type;

int i, nr;

struct pid_namespace *tmp;

struct upid *upid;

:

for ( ; upid >= pid->numbers; --upid) {

hlist_add_head_rcu(&upid->pid_chain,&pid_hash[pid_hashfn(upid->nr, upid->ns)]);

upid->ns->nr_hashed++;

}

:

}

85

Page 86: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

86

int nr

ns

pid_chain

count

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

int nr

ns

pid_chain

count

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

int nr

ns

pid_chain

count

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

int nr

ns

pid_chain

count

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

Graphic Explanation of pid_hash

int nr

ns

pid_chain

count

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

int nr

ns

pid_chain

struct pidcount

level

:

:

int nr

ns

pid_chain

int nr

ns

pid_chain

pid_hash

pid_hash[0]

pid_hash[15]

pid_hash[1]

pid_hash[2]

pid_hash[8]

pid_hash[6]

pid_hash[7]

pid_hash[12]

pid_hash[3]

Page 87: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Multiple PIDs of a Process

When a new process is created, it may be visible in multiple namespaces.

For each of them a local PID must be generated.

This is handled in alloc_pid:

87

Page 88: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Excerpt of alloc_pid()struct pid *alloc_pid(struct pid_namespace *ns)

{

struct pid *pid;

enum pid_type type;

int i, nr;

struct pid_namespace *tmp;

struct upid *upid;

...

tmp = ns;

for (i = ns->level; i >= 0; i--) {

nr = alloc_pidmap(tmp);

...

pid->numbers[i].nr = nr;

pid->numbers[i].ns = tmp;

tmp = tmp->parent;

}

pid->level = ns->level;

...

}88

Page 89: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Set the values of field numbers[] of struct pid

Starting at the level of the namespace in which the process is created, the kernel goes down to the initial, global namespace and creates a local PID for each.

All upid that are contained in struct pid are filled with the newly generated PIDs.

89

Page 90: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Obtain the pid Instance from a numbers[] FieldVoid foo(struct pid_namespace *ns)

{ struct upid *pnr;

:

container_of(pnr, struct pid, numbers[ns->level]);

:

}

90

int nr

ns

pid_chain

int nr

ns

int nr

ns

pid_chain

pid_chain

struct pid

tasks[0]

tasks[1]

tasks[2]

count

level

:

:

pnr

numbers[ns->level]

Page 91: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Function find_pid_ns()

struct pid *find_pid_ns(int nr, struct pid_namespace *ns)

{

struct upid *pnr;

hlist_for_each_entry_rcu(pnr, &pid_hash[pid_hashfn(nr, ns)], pid_chain)

if (pnr->nr == nr && pnr->ns == ns)

return container_of(pnr, struct pid, numbers[ns->level]);

return NULL;

}

91

Page 92: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

92

Relationships among Processes

Processes created by a program have a parent/child relationship.

When a process creates multiple children, these children have sibling relationships.

Several fields must be introduced in a process descriptor to represent these relationships with respect to a given process P.

Processes 0 and 1 are created by the kernel. Process 1 (init) is the ancestor of all other

processes.

Page 93: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

93

real_parent:points to the process descriptor of the process

that created P or points to the descriptor of process 1 (init) if

the parent process no longer exists. Therefore, when a user starts a background

process and exits the shell, the background process becomes the child of init.

Fields of a Process Descriptor Used to Express Parenthood Relationships (1)

Page 94: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

94

Fields of a Process Descriptor Used to Express Parenthood Relationships (2) parent:

Points to the current parent of P this is the process that must be signaled when the

child process terminates. its value usually coincides with that of real_parent.

It may occasionally differ, such as when another process issues a ptrace( ) system call requesting that it be allowed to monitor P.

see the section "Execution Tracing" in Chapter 20.

Page 95: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

95

Fields of a Process Descriptor Used to Express Parenthood Relationships (3) struct list_head children:

The head of the list containing all children created by P. This list is formed through the sibling field of the child

processes. struct list_head sibling:

The pointers to the next and previous elements in the list of the sibling processes, those that have the same parent as P.

P.S.: /* * children/sibling forms the list of my natural children*/struct list_head children; /* list of my children */struct list_head sibling; /* linkage in my parent's children list

Page 96: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

Family Relationships between Processes

96

Page 97: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

97

Example

Process P0 successively created P1, P2, and P3. Process P3, in turn, created process P4.

children/sibling fields forms the list of children of P0 (those links marked with )

Page 98: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

98

Other Relationship between Processes

There exist other relationships among processes: a process can be a leader of a process

group or of a login session, it can be a leader of a thread group, and it can also trace the execution of other

processes (see the section "Execution Tracing" in Chapter 20).

Page 99: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

99

Other Process Relationship Fields of a Process Descriptor P (1) struct task_struct * group_leader

Process descriptor pointer of the thread group leader of P.

Page 100: 1 Linux Operating System 許 富 皓. 2 NameSpace Namespace [Michael Kerrisk]Michael Kerrisk Currently, Linux implements six different types of namespaces

100

Other Process Relationship Fields of a Process Descriptor P (2)

/*

* ptraced is the list of tasks this task is using ptrace on.

* This includes both natural children and PTRACE_ATTACH targets.

* p->ptrace_entry is p's link on the p->parent->ptraced list.

*/

struct list_head ptraced;

struct list_head ptrace_entry;

[example]: function __ptrace_link()