Improving MeeGo boot-up time

Preview:

DESCRIPTION

My talk at LinuxCon Japan 2010.

Citation preview

Improving MeeGo Boot-Up Time

Hiroshi DOYU <Hiroshi.DOYU@nokia.com>

September 2010, LinuxCon Japan

Preface Background Handset Boot-Up status My experiment Further optimization idea Q & A

Preface

Inspired by QuickBoot

Ubiqutous QuickBoot

http://www.ubiquitous.co.jp/En/products/middleware/quickboot

Embedded Linux Wiki

Boot Time - eLinux.org

http://elinux.org/Boot_Time

Tributed to Tim Bird

Improving Android Boot-Up Time

Background

Impact of boot-up time

For consumer client device

User experience TV, IVI, Camera Immediate action is preferable right after power on.

Tablet, netbook, handset Is cold start really necessary?

More complicated S/W stacks, more memory consumed.

Mass Production test The more time a device spends on the production line, the more expensive.

Boot-Up time definition

Until when?

When Login prompt appears. When Desktop shows up. When Network is available. When Browser is ready. When it can take a picture. When CPU goes into idle. This depends on: Your H/W configuration.

Your S/W configuration.

Your system requirements.

The shortest isn’t always the best.

Measurement method(kernel)

printk timestamps show_delta: linux-2.6/scripts/show_delta, a python script

initcall debugging dmesg -s 256000 | grep "initcall" | \ sed "s/\(.*\)after\(.*\)/\2 \1/g" | sort -r -n bootgraph dmesg | \ linux-2.6.git/scripts/bootgraph.pl > output.svg ftrace

Measurement method(userland)

uptime / # cat /proc/uptime 18.73 14.24 / # cat /proc/uptime 20.55 16.05 bootchart A newer version is released in MeeGo No additional tool to create svg. Directly created.

entire measurement Including bootloader, kernel and userland

grabserial show_delta, again

oprofile ETM, Embedded Trace Macrocell, H/W assisted

Existing Optimization techniques

kernel optimization asynchronous initcall

asynchronous resume/suspend

misc: preset lpj, no probe, no console, deferred module loading

userland optimization initscript: upstart or systemd. Do it in parallel

readahead

prelink

hibernation based optimization snapshot boot

InstantBoot

Warp2

QuickBoot

BIOS/bootloader assisted.

Is cold start still necessary?

Do we need cold start so often? Flashing a hibernation image in advance could reduce the production

line usetime.

Optimization may depend on your product specific part S/W configuration

H/W configuration

Your system requirement

Wouldn’t hibernation be ok in most cases?

Handset Boot-Up status

Handset requirement

Responsiveness of device/applications Quick response could improve UX, especially Handsets. One touch can choose a friend from "contact list".

One touch can start camera. Same as digital camera.

One touch can start web browsing.

A call has to be processed within a short time, from operator spec.

Resolving dynamic libraries takes more time than swapping in pages. All major applications can be started but invisible Then, visible upon request.

RAM is occupied with started applications/daemons.

Handset Boot-Up time

N900 boot-up takes ~40 sec Until Desktop shows up.

Number of applications 137 Swap status

Handset Bootgraph

Handset BootChart

Handset memuse

N900 Boot time breakdown

Bootloader: 0.44 sec Kernel: 2.68 sec. With serial console.

Could be shorter without serial console.

Desktop: 39.03 sec

My experiment

Target spec

OMAP3 based reference board Similar to N900

512MB RAM MeeGo Handset Number of applications ~161

~120 sec with all application boot-up done

Swap status

No hibernation support for ARM

There was no hibernation support for ARM. Picked up old patch, and upgraded to v2.6.35. Rejected by RMK because: Need to be synch’ed with suspend-to-ram

Lack of PXA support

coprocessor differences between ARM versions

mrc p15, 0, %0, c2, c0, 0

At least, it works! Let’s proceed.

Which hibernation method to use?

Three implementation of hibernation

1. swsusp Included in mainline kernel as default.

2. uswsusp Userland implementation

3. tuxonice Out of kernel, but many features Compression of images

multiple thread I/O

readahead

LVM support

Start with swsusp

To start hibernation echo disk > /sys/power/state

swsusp/eMMC

swsusp/eMMC

Use mtdblock rather than eMMC

mtdblock is much faster than eMMC. mtdblock ~23 MB/sec/READ

eMMC ~20 MB/sec/READ

~15 MB/sec/READ

This is a HACK since: mtdblock itself is bogus without wear-leveling support.

mtdswap is *volatile*. Good performance

But cannot be used for hibernation.

Need non-volatile mtdswap!!

swsusp/MTD

swsusp/MTD

Port TuxOnIce on ARM

TuxOnIce has many optimization features:

Compression of images multiple threaded I/O readahead LVM support To drop pagecache echo -2 > /sys/power/tuxonice/image_size_limit To start hibernation echo disk > /sys/power/tuxonice/do_hibernation

TuxOnIce/MTD

TuxOnIce/MTD

Shrink memory before hibernation

Reclaim memory as much as possible right before hibernation. echo 10000 > /sys/power/shrink_mem

TuxOnIce/MTD/shirink_mem

TuxOnIce/MTD/shirink_mem

What is the bottleneck?

The smaller RAM consumed, the lesser boot time. But cannot squeeze any more after certain size

In our case: size: ~110 MB

~70% of boot time is spent on (compressed) image restoration.

meminfo/shirink_mem

What occupies RAM?

Who uses lots of memory MeeGo "memuse" can identify.

Why unevictable?

Recent SoC has smart coprocessors GPU, DSP and H/W accelerators.

They may have IOMMU. More memory could be shared with coprocessors

http://en.wikipedia.org/wiki/IOMMU

Why does IOMMU have an effect?

pages have to be DMA’able. Shared pages have to be pinned. They shouldn’t be swapped out. Unevictable

Further optimization idea

Linearity of hibernation method

Linux VM tries to occupy RAM as much as possible(ex: page cache). RAM consumption can be squeezed at certain point. The boot time increases in proportion to the size of unevictable

memory.

For further optimization, we need something more!

Proposals

1. To increase read performance of storage Faster storage? mtd gets shorter boot-up time than eMMC

faster mtd gets shorter boot-up time than slower mtd

non-volatile mtdswap driver

LVM swap to improve disk performance by raid-0

2. Still to decrease image size Kill & restart bloated Apps if possible. maybe a bit brutal, but it works certainly.

Swap out unevictable pages How to ensure if those pages exisit when it’s necessary?

page coloring memory cgroup, which process page can be swapped out

3. Lazy image/page loading

Don’t we forget the system responsiveness?

Example: Ubiquitous QuickBoot

Can be considered as "Lazy image/page loading":

http://www.ubiquitous.co.jp/En/products/middleware/quickboot

Q & A

Thank you!

Please send comments toHiroshi.DOYU@nokia.com

Recommended