Oak, the architecture of Apache Jackrabbit 3

  • View
    1.718

  • Download
    6

  • Category

    Software

Preview:

DESCRIPTION

Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.

Citation preview

Oakthe architecture of Apache Jackrabbit 3

Subsection TitleSubsection Title• Text• Text• Text• Text

Resources

• http://jackrabbit.apache.org/oak/• Docs

• http://jackrabbit.apache.org/oak/docs/• Code

• https://svn.apache.org/repos/asf/jackrabbit/oak/trunk/• https://github.com/apache/jackrabbit-oak

• Builds• http://ci.apache.org/builders/oak-trunk/• https://travis-ci.org/apache/jackrabbit-oak

Outline• Tree model• Updating the tree• Refresh and garbage collection• Concurrency and conflicts• Interlude: Implementations• Replicas and sharding• Access control• Comparing revisions• Commit hooks• Observers• Search• Big picture

Tree model

a d

b c

Paths as identifiers//a/a/b/a/c/d

a d

b c

Paths as identifiers//a/a/b/a/c/d

Updating the tree

?

r1 r2

HEAD

r1: /d r2: /d

r1: /a/cr2: /a/c

Refresh and garbage collection

refresh

garbage

Concurrency and conflicts

r1 r2br2a

r1

r2b

r2a

r3merge

Conflict handling strategiesa. Fully serialized commits

• fail on conflict, no concurrent updates

b. Partially serialized commits• fail on conflict, concurrent conflict-free updates

c. Partial merge logic• conflict markers, manual conflict resolution

d. Full merge logic• conflicting changes may be lost

Interlude: implementations

MicroKernel/NodeStore

• Implementation of the tree/revision model

Responsible for

Clustering

Sharding

Caching

Conflict handling

etc.

Not responsible for

Type validation

Access control

Search

Versioning

etc.

Current implementations

DocumentMK TarMK (SegmentMK)

Persistence backends MongoDB, JDBC (WIP) Local FS (tar files)

Conflict handling Partial serialization Full serialization

Clustering MongoDB clustering Simple failover

Sharding MongoDB sharding N/A

Single-node performance Moderate High

Key use cases Large deployments (>1TB), concurrent writes

Small/medium deployments, mostly read

Replicas and sharding

master copy full replica cache

Replicas and caches

by path by level by hash

Sharding strategies

with caching

Access control

Accessible paths//a/b/d

Existentialism

• All (syntactically valid) paths can be traversed

• But the identified node might not exist• For example:

root.getChildNode(“a”).exists() -> false

root.getChildNode(“a”).getChildNode(“b”).exists() -> true!

• Implemented as a decorator over the MK

Comparing revisions

What changed?

Content diff

• Tells what changed between two content trees• Cornerstone of most higher-level functionality• validation• indexing• observation• etc.

r1

r2b

r2a

r3

Examples

r1 -> r3“a” modified

“b” removed“d” modified

“e” added

r1 -> r2a“a” modified

“b” removed

r1 -> r2b“d” modified

“e” added

Commit hooks

If this changed, commit this instead

Commit hooks

• Based on given before and after states, a hook can:• fail the commit, or• pass the commit unmodified, or• pass the commit with modifications

• Key plugin mechanism in Oak• All configured hooks are applied in sequence• Used for much higher level functionality

• Often implemented using a content diff

Examples

• All kinds of validation• node types, access control, references, etc.

• Trigger-like functionality• autocreated content, default values, etc.

• In-content index updates• etc.

Types of hooks

CommitHook Editor Validator

Content diff Optional Always Always

Can modify commit Yes Yes No

Programming model

Simple Callbacks Callbacks

Performance impact

High Medium Low

Observers

Observers

• Based on given before and after states, an observer can:• observe what changed in the content tree

• Invoked after the commit, unlike commit hooks• Always asynchronous for changes from other cluster

nodes• Depending on backend, can be synchronous for

changes on the local cluster node• Often implemented using a content diff

Examples

• JCR Observation• External index updates• Cache invalidation• Logging• etc.

Search

SELECTWHERE x=y

/a//*

Parser

Parser

ParserIndex

Index

Index

Parser Index

Query engine

Query processing steps

1. Parsinga. Select matching parserb. Parse the query string

2. Executiona. Estimate cost per indexb. Select index with the least cost estimatec. Execute the query against the index

3. Post-processinga. Filter results on access control and additional constraintsb. Apply sorting, grouping, faceting, etc.

Index implementations

• Property index• Reference index• Lucene index• in-content• local file system

• Solr index• embedded• external

Big picture

MicroKernel

Oak Core

Oak JCR

Oak API

NodeStore API

JCR API

Plugins

Questions?

Recommended