Boot

Boot Sequence in Linux

Boot Sequence in LinuxMayur SadavarteFurquan ShaikhGud Morning Everyone..I am Mayur Sadavarte, this is Furquan Shaikh and today we wil be covering detail tutorial on the boot sequence in Linux

1OverviewBIOSBoot LoadersKernel InitializationThis are three major topics that we will be discussing today2Outline of Boot Sequence

This is the overview of the whole boot sequence..this covers the entire flow from pressing of PowerON switch to the point where you see your Login screen for the Linux3BIOSBIOS Basic I/O SystemFirst program that runs when you turn-on/reset the computerInitial interface between the hardware and the operating systemResponsible for allowing you to control your computers hardware settings for booting upIn a multi-processor or multi-core system one CPU is dynamically chosen to be the bootstrap processor (BSP) that runs all of the BIOS and kernel initialization code, others are called application processors(AP)So when these processors come into play?? Wait, we will get there!!

BIOS stands for basic input output system and this is the program which gets executed when the machine is switched on or restarted.What is the purpose of this program?? This program acts as the interface between the hw and the OSLets you have your own setting for booting up your machineWhat if the system is multiprocessor/core system??

5BIOS ComponentsBIOS ROMStored on EEPROM (programmable)Called flash BIOSBIOS CMOS MemoryNon-volatile storage for boot-up settingsNeed very little power to operatePowered by lithium batteryOn last slide we said that this program gets executed first!! But from where?? We also said that user can set his own boot preferences with the help of this program..how??6Memory Layout for the first 4GB in x86

The hack with the intel architecturecalled reset vectorJump to BIOS entry pointThe RAM is still not initialized..thus all these critical addresses are mapped to flash memory at this point

7BIOS TasksCheck CMOS setup for custom settingsLoad the interrupt handlers and device driversInitialize registers and power management settings (ACPI)Initializes RAMPOST (Power on Self-test)Display BIOS settingsDetermine which devices are bootableInitiate bootstrap sequencePOST: faulty video card leads to beepsas u cant display any fault info on the screen!!Otherwise we see the BIOS manufacturers logo in case of assembled PC or the laptop manufacturers logo in case of branded laptopsMemory is tested, keyboard is must!!Power interface builds a number of data tables that describe the devices in the computer; these tables are later used by the kernel.

8Bootable devicesTo boot an operating system, BIOS runtime searches devices that are both active and bootable in the order of preference defined in CMOS settingsBootable device can be: Floppy DriveCD-ROMPartition on HDDDevice on networkUSB flash memory stickLets talk a little bit about bootable devices..we just said that we can set these devices according to our priorities..Any storage device that has some special content in its first sector is bootable device..9Master Boot Record

How does bios come to know whether particular device is bootable or not??MBR signature10MBR (Master Boot Record)BIOS reads first 512-byte sector of the hard diskContains two important components:OS-specific bootstrapping programPartition table for the diskLoaded at location 0x7c00 in RAM and control is given to this codeMBR could beWindows specificLinux specificSome virus (favorite spot for hackers to get control right at the beginning)Lets talk a bit more about MBRThats why we have an option while installing Linux whether you want to overwrite existing MBR and install grup on the existing MBR11Outline of Boot Sequence

k..so we have covered BIOS and MBR till this point..lets delve a bit into bootloaders..12Boot LoadersBoot LoadersSpecialized loaders e.g the floppy boot sectorcompatible with specific storage mediumGeneral loaders running under another operating system e.g LOADLINUse facilities given by host OS to load guest kernelFile system aware general loaders running on firmware e.g GRUBAlmost little Operating Systems by themselvesConversant with one or more file systemsUse facilities of firmware and sometimes have their own driversFile system unaware general loaders running on the firmware e.g LILODepends on third party software (/sbin/lilo) to create mapping Mapping stored at some predefined locationThe design of bootloader have been an evolutionary process...and hence types of bootloaders14File System Awareness

Kernel image on the left side is compiled kernel build which was put on disk by using the facilities provided by the existing kernel.While loading bootloader supporting the read only file system, fetches the kernel image using the file system meta data and puts the kernel into the memory15File System Unawareness

This is a bit complex..Kernel image is again put on the disk using the facilities of the existing running kernel but the similarity stops here..Now the mapper program is run which converts the location of the kernel image into raw disk parameters(c,h,s) and puts in one special file map fileThis file is always kept at some predefined location on diskNow bootloader being file system unaware reads this map file and fetches the location of kernel image in disk raw format16File Unaware LoadersAdvantage:No changes required in bootloader or map installer if file system of a new device is supported by the linux kernelDisadvantage:Map installer has to run after adding new kernel imageOr moving the kernel image to a new pathBootloader needs to have read only support of each new file system linux supports!!

I dont why people want to move the location of existing kernel image!! But yeah, grub claims to support that..being file system aware!!17Boot Loaders - LinuxA multi-stage program which eventually loads the kernel image and initial RAM Disk(initrd)Stage-1 Boot Loader is less than 512 bytes (why?)Just does enough to load next stageNext stage can reside in boot sector or the partition or area in the disk which is hardcoded in MBRWhat is this next stage we are talking about?MBR works in pretty straightforward way..it checks for the partition table..checks for the active flag and passes the control to the bootloader stored in that partition..bootloaders are essentially a multistage software..Stage-1 of grub can reside in MBR actually or in first sector of any partition..even logical..18

Where does Stage-1 BootLoader Reside?These are the first 446 bytes of the hard diskwhere the MBR reside..and grup first stage overwrites this when we choose the option of overwriting MBR while linux installation.19Next stageStage-1 of GRUB mostly resides in MBR (as we just saw)Stage-1.5 is a crucial feature (which makes grub file-system aware)GRUB Stage 1.5 is located in the first 30 kilobytes of hard disk immediately following the MBR or in the Linux partitionStage 1.5 loads Stage 2 from /boot/grubWhat happens when this partition is corrupted?The /boot/grub directory contains the stage1, stage1.5, and stage2 boot loaders, as well as a number of alternate loaders (for example, CD-ROMs use the iso9660_stage_1_5)

Does anyone know the similar stage 2 loader name for windows??20GRUB Stage-2Being powerful and file-system aware, it can display the boot options to user from -/boot/grub/grub.cfg

GRUB command-line - you can boot a specific kernel with a named initrd image as follows:grub> kernel /bzImage-2.6.14.2 [Linux-bzImage, setup=0x1400, size=0x29672e]

grub> initrd /initrd-2.6.14.2.img [Linux-initrd @ 0x5f13000, 0xcc199 bytes]Now its the time to fire the kernel!! But where are we going to place kernel image in the memory??

Boot.ini for windows21

Sample GRUB Conf file22Memory Limitationsi386 can access 1MB in real modeSome space required for bootloader, bios ROMsKernel used to come in zImagecan have maximum size as 512 KB (does seem like a constraint!)Hence comes bzImage loaded above 1MBhow can bootloader access memory more than 1MB in real mode??Switch the CPU mode back and forth (unreal mode)Still uses BIOS FunctionalitiesUnreal mode - it is not a separate addressing mode If you read GRUB source code, youll see these transitions all over the place (look under stage2/ for calls to real_to_prot and prot_to_real)23Memory Layout for the first 4GB in x86

Root File SystemFor mounting root fs kernel requires two things Media on which root fs residesDriver to access this mediaE.g. ext2 partition on an IDE diskNumber of root device passed as a parameterKernel normally has IDE driver inbuiltHere come the Complications!!What if kernel has no device driverCan this happen?What will be the size of the kernel if we compile it with all available device drivers? Some drivers might upset other hardware while probing for their own devices

Booting from network or any new device!!26What options do we have?Cant include all possible drivers as part of kernelHaving multiple pre-compiled kernelsCompiling the customized kernelLinking the pre-compiled kernel with required modulesDetached modules - initrd

But again how to access detached modules?? Doesnt it seem a chicken-egg problem??Oh but we already have a powerful program which can access any area of the hard disk given the path of that object!! And that is a bootloader..So let the bootloader only download the important modules for you and keep it in memory.27initrd

Outline of Boot Sequence

Kernel InitializationThank you Mayur.

Okay. Until this point, everything that we have seen is laying the ground for the system to be ready for the most important component of the operating system. Here onwards, the actual kernel takes over control.30How Operating System Starts Life?At this point, the processor is running in real mode.Kernel image is loaded into memory by the boot loader using BIOS servicesThe image is an exact copy of the image on hard drive /boot/vmlinuz-2.6.38Two major components of this image:Small part containing real-mode kernel code loaded below the 640K barrierBulk of the kernel code which runs in protected mode, loaded after the first megabyte of memorySo now lets see how the operating system comes to life.

At this point, the processor is still running in real mode. i.e. the default mode for backward compatibility.

And as we saw the boot loader has brought in the kernel image into memory using BIOS services. This image in memory is an exact copy of the image on hard drive which is located at /boot/vmlinuz.

There are basically two major components of this image:One is a small real mode stub which is loaded below the 1MB barrierThe other big chunk of kernel code which runs almost entirely in protected mode is loaded starting at the first megabyte of memory.31In the beginning0x100000Real mode stubEarly stack/heapKernel Command LineCompressed kernel imageInitial ramdisk~64KRAMSo, in the beginning, this is what the memory would look like.

We have used different coloring schemes to indicate different kinds of components. The two light brown blocks below the 1MB mark refer to the BIOS code.The light blue block is the kernel image.The smaller box which is located below the 1Mb barrier is the real mode stub of the kernel.And the bigger box is the compressed kernel image which is placed at 1Mb.

Below the 1Mb barrier, around 32K chunk is present for early stack/heap.Above this, kernel command line parameters are stored. These parameters are the boot time parameters passed through the boot loader.32RAM contents after boot loader is done

This is another picture which depicts the memory layout with exact addresses of each component33Major Steps in Kernel InitializationPlatform-specific initialization(In assembly language)Platform-independent initialization(In high-level language C)An important point to note here is that, there are two kinds of kernel initializations:

One is the platform specific initialization. Most of this code is present in assembly language. It performs very critical tasks which require ultimate control as provided by assembly language.

The other is platform independent initialization. This code is mostly written in C language.34

Kernel Initialization Timeline(Arch-dependent)This is a high level overview of the architecture dependent initialization

We enter the kernel at the first blue block. So the entry point into the kernel, the very first code that is run is the setup assembler function. It is a label assigned to a set of assembly instructions. One more important thing to remember is we are looking at x86 specific initialization.35Architecture Specific Setup on IA-32setup assembler function (arch/x86/boot/header.S)Checks if kernel was loaded to correct positionProbes hardware via BIOSDetermines size of physical memoryInitializes graphic cardSwitches CPU to protected mode by setting PE bit in cr0 registerRAM0x100000As the flow goes BIOS passed control to the boot loader which passed on control to the kernel and it starts executing here.

This code is present in arch/x86/boot/header.SWhat this piece of code does is:- It checks if the kernel was loaded to correct positionIt probes the hardware via bios.to determine how much physical memory is present, which memory ranges are safe to accessTo get information about the power management schemesReset the graphic cardOnce all this is done, it is almost ready to go to protected mode. Almost ready because, there are two important initializations that are required before switching to protected mode. Any guesses what these initializations might be?

GDT and IDTIt reinitializes the entire IDT. Now, IDT is similar to IVT but it has descriptors instead of just function pointers. However, this IDT is initialized to just NULL right now. Because we do not want to handle any interrupts until we reach a steady state.

The other structure is the GDT, which is Global Descriptor Table. It deals with how we look at memory. In order to gain access to entire 4GB of memory, it is important to setup the GDT. Now, even though Linux does not use segmentation, x86 architecture requires that you setup the segments before any use because that is how it provides protection. Hence, we reset the GDT with every segment code data spanning the entire 4GB of memory.

Finally, we are ready to switch to protected mode. This is done by writing to special register i.e.cr0. 36

Kernel Initialization Timeline(Arch-dependent)Now, we have reached the end of the 2nd blue block. i.e. we will now start executing in protected mode. Thus, we start execution at the first red block.37Prepare for decompressionRAM0x100000Decompressed sizeBefore proceeding into any further initializations, the first task that the code does is copy itself to some higher location in order avoid coming in its own path while decompressing. Hence we need to move things around. Why do you think is it done this way? The boot loader could have actually loaded the compressed image at a higher location.

The answer is simplicity on the part of boot loader. The kernel image might or might not be compressed. So, regardless of this it is always located at 1MB.

Okay. So, now we move chunk higher in memory and forget about the earlier chunk and jump to the new chunk.38B) startup_32 assembler function (arch/i386/boot/compressed/head.S)Creates a provisional kernel stackFills uninitialized kernel data with null bytes (data between _edata and _end constants)Calls the C routine decompress_kernelArchitecture Specific Setup on IA-32RAM0x100000Decompressed sizeSo, now we come to the startup_32 assembler function. We are running in 32-bit mode.

What it does is create a provisional kernel stack and call the function decompress_kernel

39C) decompress_kernel() C routine (arch/x86/boot/compressed/misc_32.c)Decompresses kernelWrites uncompressed kernel code to position 0x100000 directly after the first MiB of memoryUncompressing is the first operation performed by kernelScreen Message Uncompressing Linux and Ok, booting the kernel

Architecture Specific Setup on IA-32RAM0x100000Decompressed sizeAs the name suggests, it decompresses the kernel.

It basically uses an algorithm like gunzip which is quite similar to the userland uncompress program. However, it does not use any user space library. It has its own implementation in the kernel. Decompression creates a chunk much bigger than the original chunk. And this is again located at 1MB.

Now, we again jump back to the first MB of memory.

This is the point where you see the message Uncompressing Linux and Ok, booting the kernel.

40D) startup_32 assembler function (arch/x86/kernel/head_32.S)Fills bss segments of kernel with zeroesInitializes provisional kernel Page Tables to identically map the linear addresses to the same physical addressesStores the address of Global Page Directory in cr3 registerEnables paging by setting PG bit in cr0 registerPuts the parameters obtained from BIOS and parameters passed to the OS into the first page frameJumps to start_kernel functionArchitecture Specific Setup on IA-32RAM0x100000So now we are back to the first 1MB. And start the execution of the function startup_32. Though the name of the function is same as the earlier one, the location is different and we jump directly to an address so the name doesnt matter.

What this does is:Again initialize the segments and stackFill bss segments with zeroesSetup virtual memory. It stores address of Global Page Directory in cr3 and enables paging by setting the PG bit in cr0 register.It then copies the command line arguments to the first page frameAnd finally jumps to start_kernel function41Kernel Initialization Timeline(Arch-dependent)

Thus, we have arrived at the end of the two red blocks. Thus completed the arch dependent initialization. At this point, segmentation and paging both are set up.42Kernel Initialization Timeline (mostly Arch-independent)

Now, as you see we move to the second half of initializations the architecture independent initializations

You can note the difference in the file organizations too. Initially they were located under arch/x86 and now its init/main.c.

Lets see what this does.43High-Level Initializationstart_kernelDisplay version bannerArch-specific high level setup for memory managementEvaluate command-line argumentsInitialize all subsystemsDetermine processor and system errorsStart idle process and init threadAs we saw the last part of arch dependent init passed control to start_kernel

This is a very long function covering more than 100 lines. It does a lot of work for initializations.44start_kernel: a) Version bannerFirst step is to output the version messagelinux_banner global variable in init/version.cTo start with, it prints the kernel version, which is located in init/version.c45start_kernel: b) setup_archsetup_archparse_early_paramssetup_memorypaging_initDetermine position of kernel in memoryThen, it does some more setup required for memory initializations46start_kernel: c) Interpret command-line argumentsInterprets command line arguments passed to kernel at boot timeArguments passed in the form key=value pairsIt interprets command line arguments which are provided by the bootloader in the form key=value pairs.47start_kernel: d) Initialize various subsystemsInvoke subroutines to initialize almost all important kernel subsystemsCode Snippet:asmlinkage void __init start_kernel(void){ trap_init(); mm_init(); sched_init(); rcu_init(); init_IRQ(); init_timers(); hrtimers_init(); softirq_init(); timekeeping_init(); time_init(); }Now, comes the main part..lots and lots of initializations.This function as you see makes a lot of function calls all over the kernel to do a lot of initializations. Workqueues, locks, memory buffer pools, timers, etc. It initializes all the subsystems.48Sample Initializationvoid __init trap_init(void){ set_trap_gate(0,&divide_error); set_intr_gate(1,&debug); set_intr_gate(2,&nmi); set_system_gate(4,&overflow); set_intr_gate(14,&page_fault); set_system_gate(SYSCALL_VECTOR,&system_call);}Here is a sample initialization. Where we initialize the various traps. 0 as divide by zero error. 80 as system call. Etc.49start_kernel: e) Search for known system errorsChecks for bugs in architecture (check_bugs)Replaces certain assembler instructions depending on processor type with faster, modern alternatives.Then it looks for any know errors in the processor and checks if it can use any modern and faster assembler instructions.50start_kernel: f) initLast two actions of start kernel:rest_init : New thread that performs some more initializations and starts the first user-space program /sbin/initThe original kernel thread becomes the idle thread that is called when system has nothing else to doThen it continues with calling rest_init.

Role of rest_init is to fork the first process. Till now, we were running as PID 0 which is also called the idle task.

At this moment we create a kernel thread which goes on to become the process PID 1, which is the first process seen when you execute ps command on Linux. What it basically does is start the userspace program /sbin/init.51Code flow diagram for initinitRegister as child reaperSMP initializationdo_basic_setupprepare_namespaceinit_postfree_initmemExecute userspace initialization programNow, this init as we all know registers itself as the parent for all the orphaned child processes.

So any process whose parent process finishes execution becomes a child of this process

Another important task that it performs is wake up all the other processors. At this point, the other processors are partially initialized but they are not running yet. Now, we have a scheduler so we will wake up the processors and they will wait for work to do.

Then it calls init_post which does all the cleanup for initialization.

Thus, we our system is ready with both the kernel and the user space.52Outline of Boot Sequence

Questions?Referenceshttp://duartes.org/gustavo/blog/post/how-computers-boot-upBooting Linux: The History and the Future, Werner AlmesbergerUnderstanding The Linux Kernel, 3rd Editionhttp://www.ibm.com/developerworks/linux/library/l-linuxboot/Kernel Walkthrough: The Boot Process by Bart TrojanowskiThe Linux Boot Process by Daniel EriksenIntel 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide, Parts 1 and 2

Documents

Boot