NoSql NOW! 2013
Delivering big content at NBC News with RavenDB
A quick tour
• Schema-less document database with RESTful API. • Fully ACID and all writes saved to disk (ESENT). • Indexing/queries executed with Lucene.NET.
• Easily extended with custom logic using “bundles”.
• Management UI provided in Silverlight.
• Host as Windows Service, IIS app, or embedded in your app.
Raven server
• .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby.
• Wraps HTTP API.
• Provides client-side caching, change notification, LINQ querying.
• Easily extended with many, many hooks into almost all operations.
Raven client
• Open source: http://github.com/ravendb/ravendb
• License is AGPL (free) or commercial (paid).
• Exception: Your project can use any OSI-approved license and still use Raven for free.
• Commercial licenses based on max parallelism and RAM.
• Windows clustering support and storage compression/encryption available with Enterprise license only.
Raven licensing
Demo
Why RavenDB?
• Includes nbcnews.com, today.com and more.
• 1.2 billion pageviews/month.
• 140 million video streams/month.
• 58 million unique users/month.
• Traffic spikes up to 100x normal when big news events happen.
NBC News Digital network
• Very fast page load required
• “Instant” publish time required
• 6 to 8 code deployments each day
• High availability: zero* downtime allowed
One of the largest US news sites
High availability
is when the answer to:
“What’s the longest outage
before you wind up
in your boss’s office?”
is < 5 seconds.
Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw
• Rolling deployments and rollbacks.
• Apps and services decoupled physically and temporally.
• Designed for both auto-failover/recovery and manual reconfiguration by ops.
• Seamless scale out by adding instances of any process.
• And more…
Some prerequisites for HA
• Data schema can evolve rapidly
• Apps shouldn’t know where data is
• Apps should talk to the closest data replica
• Apps should automatically find a new replica if the closest becomes unavailable
• Ops can add/remove replicas quickly and easily, without affecting any running apps
HA data: a private data cloud
• Schema-less document database allows rapid change.
• Fully ACID model fit business needs.
• Strong replication functionality supported HA needs.
• Easily customizable on both client and server.
• Easily deployed and managed.
• First class .NET client.
Why we chose RavenDB
• Raven used behind:
• NBC News and TODAY apps: Windows 8, iOS,
Android, Windows Phone, XBox, Roku.
• Growing number of sections of nbcnews.com and
today.com.
• Raven usage stats:
• ~10 million docs, +1000s of new docs/day.
• 10s of writes/sec.
• 100s of reads/sec (after 3 layers of caching).
Current* state of Raven usage
The details
• Each doc cached as long as memory available.
• Requests include If-Modified-Since header.
• 304 Not Modified response saves bandwidth.
• Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom).
Client-side caching
• You define sharding strategy – a method.
• Raven manages storing each doc to the correct instance and fanning/merging queries.
• No auto-rebalancing of shards if you change number of instances.
Raven sharding
• All queries are performed against indexes. • Indexes can be predefined or auto-created. • Indexing/queries are executed in Lucene.NET.
• Fielded. • Full text with built-in or custom analyzers. • Geo-spatial. • Map-reduce. • Result transformers can load other docs.
• Query with LINQ or Lucene syntax. • Indexes may be stale. Can force wait for non-stale results.
(Danger! Primarily for unit tests.) • Projections occur on server, reducing data on the wire. • Super-cool stuff: eval patching, index scripts.
Raven indexing and querying
• Need indexes up to date before letting a client talk to a replica.
• Indexes are created by the client app:
• Static: CreateIndexes() at startup scans assemblies for index classes.
• Dynamic: when client issues a query.
Indexing catch-22
• Define new index, with no code using it.
• Deploy and allow new index to build.
• Redeploy with code using the new index.
• Redeploy after deleting old index definition.
• Delete old index on each replica.
Updating a static index – a pain
• If you do it by Id, it is consistent (within a single Raven server)
• Load() • Store() • Delete()
• Queries are only eventually consistent (“eventually” is measured in milliseconds)
Consistency
• Eventual consistency – replication is async in background.
• All replication is one-way and managed by source.
• Can enable transitive replication – useful for new instances.
• Set W value to ensure replication to minimum number of instances (v2.5). Or timeout.
• Client will auto-failover to replication destinations, configurable to reads only or reads and writes.
Raven replication
• Sequential guids.
• Unique for every write to a database.
• Used for caching in client, concurrency control, and replication.
Etags
Source: What’s the last etag I replicated to you?
Destination: 42
Source: I’m up to 49, so here’s a POST with some docs in it.
Destination: Got ‘em.
Source: What’s the last etag I replicated to you?
Destination: 49
The replication conversation
• Replication from each instance to all other instances.
• Any instance could receive writes.
• Reduce replication conflicts by forcing writes to single “master”.
• Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle.
Multi-master replication
• Null Id and tag can be extracted: client generates with Hi-Lo
• Null Id received at server: guid
• Id ending in / received at server: append auto-increment integer.
• Otherwise: use the value in the object.
• Server prefix protects against edge-case failures.
Id generation
• Control where reads and writes go. Implemented in a custom DocumentStore wrapper.
• Control aggressive caching time.
• Deploy new instances with replication.
• Backup – but probably never restore in production.
• Copy indexes.
• Monitor with stats endpoints.
Raven operations tasks
• Modeling/versioning
• Replication
• Client failover
• Consistency
Keep in mind…
• Concurrency control
• Indexing and updates
• Id generation
• Caching
• http://ravendb.net
• GitHub: http://github.com/ravendb
• Ayende’s blog: http://ayende.com
• RavenDB Google group • @RavenDB on Twitter
• Me: @jtbennett on Twitter
More info on Raven
Questions?
Many thanks to:
You.
NoSql NOW!
Huge.
Rhinos: @ayende, @synhershko.
Peacocks: @benlakey, @johncoder, @pkdotnet,
Colin Hicks, Peter Durham, Bryan Wheeler.
hugeinc.com [email protected] 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843