26
MIRAGEOS 2.0 : BRANCH CONSISTENCY FOR XEN STUB DOMAINS Dave Scott Citrix Systems Thomas Gazagnaire University of Cambridge Anil Madhavapeddy University of Cambridge @mugofsoup @eriangazag @avsm http://openmirage.org http://decks.openmirage.org/xendevsummit14/ Press <esc> to view the slide index, and the <arrow> keys to navigate.

XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

Embed Size (px)

Citation preview

Page 1: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

MIRAGEOS 2.0: BRANCH CONSISTENCYFOR XEN STUB DOMAINS

Dave Scott Citrix Systems Thomas Gazagnaire University of Cambridge

Anil Madhavapeddy University of Cambridge

@mugofsoup@eriangazag

@avsm

http://openmirage.orghttp://decks.openmirage.org/xendevsummit14/

Press <esc> to view the slide index, and the <arrow> keys to navigate.

Page 2: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

INTRODUCING MIRAGE OS 2.0These slides were written using Mirage on OSX:

They are hosted in a 938kB Xen unikernel written in staticallytype-safe OCaml, including device drivers and network stack.

Their application logic is just a couple of source files, writtenindependently of any OS dependencies.

Running on an ARM CubieBoard2, and hosted on the cloud.

Binaries small enough to track the entire deployment in Git!

Page 3: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

INTRODUCING MIRAGE OS 2.0

Page 4: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

NEW FEATURES IN 2.0Mirage OS 2.0 is an important step forward, supporting more, andmore diverse, backends with much greater modularity.

For information about the new components we cannot cover here,see :openmirage.org

, for running unikernels on embedded devices ., Git-like distributed branchable storage.

, a from-scratch native OCaml TLS stack., for low-latency inter-VM communication., modular C foreign function bindings.

Xen/ARMIrminOCaml-TLSVchanCtypes

Page 5: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

THIS XEN DEV SUMMIT TALKWe focus on how we have been using Mirage to:

improve the core Xenstore toolstack using Irmin.a performance and distribution future for Xenstore.plans for upstreaming our patches.

But first, some background...

Page 6: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

IRMIN: MIRAGE 2.0 STORAGEIrmin is our library database that follows the modular designprinciples of MirageOS: https://github.com/mirage/irmin

Runs in both userspace and kernelspaceA key = value store (sound familiar?)Git-style: commit, branch, mergePreserves history by defaultBackend support for in-memory, Git and HTTP/REST stores.

Mirage unikernels thus version control all their data, and have adistributed provenance graph of all activities.

Page 7: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

BASE CONCEPTSOBJECT DAG (OR THE "BLOB STORE")

Append-only and easily distributed.Provides stable serialisation of structured values.Backend independent storage

memory or on-disk persistenceencryption or plaintext

Position and architecture independent pointerssuch as via SHA1 checksum of blocks.

Page 8: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

BASE CONCEPTSHISTORY DAG (OR THE "GIT STORE")

Append-only and easily distributed.Can be stored in the Object DAG store.Keeps track of history.

Ordered audit log of all operations.Useful for merge (3-way merge is easier than 2-way)

Snapshots and reverting operations for free.

Page 9: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

BASE CONCEPTS

Page 10: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

IRMIN TOOLINGopam update && opam install irmin

Command-line frontend that uses:storage: in-memory format or Gitnetwork: custom format, Git or HTTP/RESTinterface: JSON interface for storing content easily

OCaml library that supplies:merge-friendly data structuresbackend implementations (Git, HTTP/REST)

Page 11: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).

Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)

Page 12: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).

Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)TRANSACTION_START branch; TRANSACTION_END merge

The "original plan" in 2002 was for seamless distribution acrosshosts/clusters/clouds. What happened? Unfortunately theprevious transaction implementations all suck.

Page 13: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: CONFLICTSTerrible performance impact: a transaction involves 100 RPCsto set it up (one per r/w op), only to be aborted and retried.Longer lived transactions have a greater chance of conflict vs ashorter transaction, repeating the longer transaction.Concurrent transactions can lead to live-lock:

Try starting lots of VMs in parallel!Much time wasted removing transactions (from xend )

Page 14: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: CONFLICTSConflicts between Xenstore transactions are sodevastating, we try hard to avoid transactionsaltogether. However they aren't going away.

Page 15: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: CONFLICTSObserve: typical Xenstore transactions (eg creating domains)shouldn't conflict. It's a flawed merging algorithm.If we were managing domain configurations in git , wewould simply merge or rebase and it would work.Therefore the Irmin Xenstore simply does:

DB.View.merge_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* if merge doesn't work, try rebase *) DB.View.rebase_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* A true conflict: tell the client *) ...

Page 16: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: PERFORMANCE

Page 17: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: TRANSACTIONSBig transactions give you high-level intent

useful for debug and tracingminimise merge commits (1 per transaction)minimise backend I/O (1 op per commit)crash during transaction can tell the client to "abort retry"

Solving the performance problems with bigtransactions in previous implementations greatly

improves the overall health of Xenstore.

Page 18: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: RELIABILITYWhat happens if Xenstore crashes?

Rings full of partially read/written packets. No reconnectionprotocol in common use.

proposal on xen-devel but years before we can rely on itPer-connection state in Xenstore:

watch registrations, pending watch eventsIf Xenstore is restarted, many of the rings will be broken... you'll probably have to reboot the host

Page 19: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: RELIABILITYIrmin to the rescue!

Data structure libraries built on top of Irmin, for examplemergeable queues. Use these for (eg) pending watch events.We can persist partially read/written packets so fragments canbe recovered over restartWe can persist connection information (i.e. ring informationfrom an Introduce) and auto-reconnect on startAdded bonus: easy to introspect state via xenstore-ls , cansee each registered watch, queue etc

Page 20: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: TRACINGWhen a bug is reported normal procedure is:

stare at Xenstore logs for a very long timeslowly deduce the state at the time the bug manifested(swearing and cursing is strictly optional)

With Irmin+Xenstore, one can simply:

git checkout to the revisionInspect the state with lsIn the future: git bisect automation!

Page 21: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: TRACING$ git log --oneline --graph --decorate --all...| | * | 1787fd2 Domain 0: merging transaction 394| | |/| * | 0d1521c Domain 0: merging transaction 395| |/* | 731356e Domain 0: merging transaction 396|/* 8795514 Domain 0: merging transaction 365* 74f35b5 Domain 0: merging transaction 364* acdd503 Domain 0: merging transaction 363

Page 22: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: DATA STORAGEXenstore contains VM metadata ( /vm ) and domain metadata( /local/domain )But VM metadata is duplicated elsewhere and copied in/out

xl config files, and xapi database(insert cloud toolstack here)

With current daemons, it is unwise to persist large data.

What if Xenstore could store and distribute thisdata efficiently, and if application data could be

persisted reliably?

Page 23: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: THE DATAIrmin to the rescue!

Check in VM metadata to Irminclone , pull and push to move between hosts

expose to host via FUSE, for Plan9 filesystem goodnessmaybe one day even echo start > VM/uuid/ctlFUSE code at

VM data could be checked in to Irminvery important for unikernels that have no native storage

https://github.com/dsheets/profuse

Page 24: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: UPSTREAMINGAdvanced prototype exists using Mirage libraries, but doesn't fullypass unit test suite. Before upstreaming:

Write fixed-size backend for block devicePreserving history is a good default, but history does need tobe squashed from time to time.

Upstream patches:switch to using using opam to build Xenstorereproducible builds via a custom Xen remoteallows using modern OCaml libraries (Lwt, Mirage, etc...)

In Xapi, delete existing db and replace with Xenstore 2.0

Page 25: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

XENSTORE: CODEPrototype+unit tests at:

(can build without Xen on MacOS X now)https://github.com/mirage/ocaml-xenstore-server

opam init --comp=4.01.0eval ̀opam config env̀opam pin irmin git://github.com/mirage/irminopam install xenstore irmin shared-memory-ring xen-evtchn io-pagegit clone git://github.com/mirage/ocaml-xenstore-servercd ocaml-xenstore-servermake

./main.native --enable-unix --path /tmp/test-socket --database /tmp/db&

./cli.native -path /tmp/test-socket write foo=bar

./cli.native -path /tmp/test-socket write read foocd /tmp/db; git log

Page 26: XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

HTTP://OPENMIRAGE.ORG/Featuring blog posts about Mirage OS 2.0 by:

, , , , , , ,

, and .

Amir Chaudhry Thomas Gazagnaire David KaloperThomas Leonard Jon Ludlam Hannes Mehnert Mindy PrestonDave Scott Jeremy Yallop

Mindy Preston and Jyotsna Prakash from OPW/GSoC will also betalking about their projects in the community panel!

More Irmin+Xenstore posts with details:Introduction to IrminUsing Irmin to add fault-tolerance to Xenstore