Upload
the-linux-foundation
View
573
Download
0
Tags:
Embed Size (px)
Citation preview
MIRAGEOS 2.0: BRANCH CONSISTENCYFOR XEN STUB DOMAINS
Dave Scott Citrix Systems Thomas Gazagnaire University of Cambridge
Anil Madhavapeddy University of Cambridge
@mugofsoup@eriangazag
@avsm
http://openmirage.orghttp://decks.openmirage.org/xendevsummit14/
Press <esc> to view the slide index, and the <arrow> keys to navigate.
INTRODUCING MIRAGE OS 2.0These slides were written using Mirage on OSX:
They are hosted in a 938kB Xen unikernel written in staticallytype-safe OCaml, including device drivers and network stack.
Their application logic is just a couple of source files, writtenindependently of any OS dependencies.
Running on an ARM CubieBoard2, and hosted on the cloud.
Binaries small enough to track the entire deployment in Git!
INTRODUCING MIRAGE OS 2.0
NEW FEATURES IN 2.0Mirage OS 2.0 is an important step forward, supporting more, andmore diverse, backends with much greater modularity.
For information about the new components we cannot cover here,see :openmirage.org
, for running unikernels on embedded devices ., Git-like distributed branchable storage.
, a from-scratch native OCaml TLS stack., for low-latency inter-VM communication., modular C foreign function bindings.
Xen/ARMIrminOCaml-TLSVchanCtypes
THIS XEN DEV SUMMIT TALKWe focus on how we have been using Mirage to:
improve the core Xenstore toolstack using Irmin.a performance and distribution future for Xenstore.plans for upstreaming our patches.
But first, some background...
IRMIN: MIRAGE 2.0 STORAGEIrmin is our library database that follows the modular designprinciples of MirageOS: https://github.com/mirage/irmin
Runs in both userspace and kernelspaceA key = value store (sound familiar?)Git-style: commit, branch, mergePreserves history by defaultBackend support for in-memory, Git and HTTP/REST stores.
Mirage unikernels thus version control all their data, and have adistributed provenance graph of all activities.
BASE CONCEPTSOBJECT DAG (OR THE "BLOB STORE")
Append-only and easily distributed.Provides stable serialisation of structured values.Backend independent storage
memory or on-disk persistenceencryption or plaintext
Position and architecture independent pointerssuch as via SHA1 checksum of blocks.
BASE CONCEPTSHISTORY DAG (OR THE "GIT STORE")
Append-only and easily distributed.Can be stored in the Object DAG store.Keeps track of history.
Ordered audit log of all operations.Useful for merge (3-way merge is easier than 2-way)
Snapshots and reverting operations for free.
BASE CONCEPTS
IRMIN TOOLINGopam update && opam install irmin
Command-line frontend that uses:storage: in-memory format or Gitnetwork: custom format, Git or HTTP/RESTinterface: JSON interface for storing content easily
OCaml library that supplies:merge-friendly data structuresbackend implementations (Git, HTTP/REST)
XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)
XENSTORE: VM METADATAXenstore is our configuration database that stores VM metadata indirectories (ala Plan 9).
Runs in either userspace or kernelspace (just like Mirage)A key = value store (just like Irmin)Logs history by default (just like Irmin...)TRANSACTION_START branch; TRANSACTION_END merge
The "original plan" in 2002 was for seamless distribution acrosshosts/clusters/clouds. What happened? Unfortunately theprevious transaction implementations all suck.
XENSTORE: CONFLICTSTerrible performance impact: a transaction involves 100 RPCsto set it up (one per r/w op), only to be aborted and retried.Longer lived transactions have a greater chance of conflict vs ashorter transaction, repeating the longer transaction.Concurrent transactions can lead to live-lock:
Try starting lots of VMs in parallel!Much time wasted removing transactions (from xend )
XENSTORE: CONFLICTSConflicts between Xenstore transactions are sodevastating, we try hard to avoid transactionsaltogether. However they aren't going away.
XENSTORE: CONFLICTSObserve: typical Xenstore transactions (eg creating domains)shouldn't conflict. It's a flawed merging algorithm.If we were managing domain configurations in git , wewould simply merge or rebase and it would work.Therefore the Irmin Xenstore simply does:
DB.View.merge_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* if merge doesn't work, try rebase *) DB.View.rebase_path ~origin db [] transaction >>= function | ̀Ok () -> return true | ̀Conflict msg -> (* A true conflict: tell the client *) ...
XENSTORE: PERFORMANCE
XENSTORE: TRANSACTIONSBig transactions give you high-level intent
useful for debug and tracingminimise merge commits (1 per transaction)minimise backend I/O (1 op per commit)crash during transaction can tell the client to "abort retry"
Solving the performance problems with bigtransactions in previous implementations greatly
improves the overall health of Xenstore.
XENSTORE: RELIABILITYWhat happens if Xenstore crashes?
Rings full of partially read/written packets. No reconnectionprotocol in common use.
proposal on xen-devel but years before we can rely on itPer-connection state in Xenstore:
watch registrations, pending watch eventsIf Xenstore is restarted, many of the rings will be broken... you'll probably have to reboot the host
XENSTORE: RELIABILITYIrmin to the rescue!
Data structure libraries built on top of Irmin, for examplemergeable queues. Use these for (eg) pending watch events.We can persist partially read/written packets so fragments canbe recovered over restartWe can persist connection information (i.e. ring informationfrom an Introduce) and auto-reconnect on startAdded bonus: easy to introspect state via xenstore-ls , cansee each registered watch, queue etc
XENSTORE: TRACINGWhen a bug is reported normal procedure is:
stare at Xenstore logs for a very long timeslowly deduce the state at the time the bug manifested(swearing and cursing is strictly optional)
With Irmin+Xenstore, one can simply:
git checkout to the revisionInspect the state with lsIn the future: git bisect automation!
XENSTORE: TRACING$ git log --oneline --graph --decorate --all...| | * | 1787fd2 Domain 0: merging transaction 394| | |/| * | 0d1521c Domain 0: merging transaction 395| |/* | 731356e Domain 0: merging transaction 396|/* 8795514 Domain 0: merging transaction 365* 74f35b5 Domain 0: merging transaction 364* acdd503 Domain 0: merging transaction 363
XENSTORE: DATA STORAGEXenstore contains VM metadata ( /vm ) and domain metadata( /local/domain )But VM metadata is duplicated elsewhere and copied in/out
xl config files, and xapi database(insert cloud toolstack here)
With current daemons, it is unwise to persist large data.
What if Xenstore could store and distribute thisdata efficiently, and if application data could be
persisted reliably?
XENSTORE: THE DATAIrmin to the rescue!
Check in VM metadata to Irminclone , pull and push to move between hosts
expose to host via FUSE, for Plan9 filesystem goodnessmaybe one day even echo start > VM/uuid/ctlFUSE code at
VM data could be checked in to Irminvery important for unikernels that have no native storage
https://github.com/dsheets/profuse
XENSTORE: UPSTREAMINGAdvanced prototype exists using Mirage libraries, but doesn't fullypass unit test suite. Before upstreaming:
Write fixed-size backend for block devicePreserving history is a good default, but history does need tobe squashed from time to time.
Upstream patches:switch to using using opam to build Xenstorereproducible builds via a custom Xen remoteallows using modern OCaml libraries (Lwt, Mirage, etc...)
In Xapi, delete existing db and replace with Xenstore 2.0
XENSTORE: CODEPrototype+unit tests at:
(can build without Xen on MacOS X now)https://github.com/mirage/ocaml-xenstore-server
opam init --comp=4.01.0eval ̀opam config env̀opam pin irmin git://github.com/mirage/irminopam install xenstore irmin shared-memory-ring xen-evtchn io-pagegit clone git://github.com/mirage/ocaml-xenstore-servercd ocaml-xenstore-servermake
./main.native --enable-unix --path /tmp/test-socket --database /tmp/db&
./cli.native -path /tmp/test-socket write foo=bar
./cli.native -path /tmp/test-socket write read foocd /tmp/db; git log
HTTP://OPENMIRAGE.ORG/Featuring blog posts about Mirage OS 2.0 by:
, , , , , , ,
, and .
Amir Chaudhry Thomas Gazagnaire David KaloperThomas Leonard Jon Ludlam Hannes Mehnert Mindy PrestonDave Scott Jeremy Yallop
Mindy Preston and Jyotsna Prakash from OPW/GSoC will also betalking about their projects in the community panel!
More Irmin+Xenstore posts with details:Introduction to IrminUsing Irmin to add fault-tolerance to Xenstore