17
Flexible search in Apache Jackrabbit Oak Tommaso Teofili

Flexible search in Apache Jackrabbit Oak

Embed Size (px)

DESCRIPTION

ApacheCon EU 2014 presentation about the flexible architecture for search in Apache Jackrabbit Oak.

Citation preview

Page 1: Flexible search in Apache Jackrabbit Oak

Flexible search in Apache Jackrabbit Oak

Tommaso Teofili

Page 2: Flexible search in Apache Jackrabbit Oak

Apache Jackrabbit Oak

•  Scalable content repository •  JCR 2.0 •  Designed for concurrent access (MVCC) •  Pluggable components (storage, indexes) •  Powering AEM 6.0

18/11/14   2  

Page 3: Flexible search in Apache Jackrabbit Oak

Oak Architecture

•  Oak-JCR •  Oak-Core – MVCC (node states and immutable trees) – Core components (Security, Query engine, …) – Plugins

•  Oak-MK – Pluggable storage

18/11/14   3  

Page 4: Flexible search in Apache Jackrabbit Oak

Oak – the Query Engine

•  Query languages – XPATH – SQL-2

•  Selects the index(es) supposed to perform better – Search is demanded to the underlying indexes – No index? The repository is traversed

•  ACLs applied afterwards

18/11/14   4  

Page 5: Flexible search in Apache Jackrabbit Oak

Indexing – the IndexEditor API

•  NodeState before = builder.getNodeState(); •  builder.child(”a").setProperty(”foo", ”bar"); •  NodeState after = builder.getNodeState(); •  NodeState indexed = editorHook.processCommit(before, after, …); // who said MVCC?

18/11/14   5  

Page 6: Flexible search in Apache Jackrabbit Oak

Searching – the QueryIndex API

•  Filter filter = … ; // "select * from [nt:folder]" •  filter.restrictPath("/somenode",

Filter.PathRestriction.DIRECT_CHILDREN); •  Cursor cursor = queryIndex.query(filter,

nodeState); // search against a state •  IndexRow row = cursor.next(); // results

18/11/14   6  

Page 7: Flexible search in Apache Jackrabbit Oak

Searching – Filters

•  Full text expressions •  Property restrictions •  Path restrictions – Exact – Parent – Child – Descendant

•  Node type restrictions

18/11/14   7  

Page 8: Flexible search in Apache Jackrabbit Oak

Configuring indexes

•  Indexes are declared by adding “query index configuration” nodes in the repository – Type – Asynchronous – Reindex –  Index specific properties

18/11/14   8  

Page 9: Flexible search in Apache Jackrabbit Oak

In repository indexes

•  Data structures designed as content – Property index – Ordered property index – Node type index – Reference index

18/11/14   9  

Page 10: Flexible search in Apache Jackrabbit Oak

Lucene index

•  Full text and (sorted) property restrictions •  Stored in repository •  Tika for indexing binaries •  Configurable indexing rules (boost), codec,

analyzers

19/11/14   10  

Page 11: Flexible search in Apache Jackrabbit Oak

Lucene index

•  Interesting facts – DocValues for sorted property restrictions – Uncompressed stored fields – Property exists queries •  TermRange vs Wildcard vs Term vs MatchAll

+FieldExistsFilter

19/11/14   11  

Page 12: Flexible search in Apache Jackrabbit Oak

Solr index

•  Full text, property, path restrictions •  Embedded or remote Solr(Cloud) •  Configurable – Mapping restriction / fields – Page size – Commit policy

•  Most is configured on the Solr side

18/11/14   12  

Page 13: Flexible search in Apache Jackrabbit Oak

Problems

•  Hard to express complex queries •  Cannot leverage underlying indexes

advanced capabilities

18/11/14   13  

Page 14: Flexible search in Apache Jackrabbit Oak

Native language support

•  Leverage underlying index capabilities – Multiple query languages/parsers

•  More accurate full text queries (and results) – … where native(’lucene', 'name:(hello world)

“hello world”^3') •  Advanced index capabilities (e.g. MLT) – … where native('solr', 'mlt?q=path:/content/

sample1&mlt.fl=jcr:title') 19/11/14   14  

Page 15: Flexible search in Apache Jackrabbit Oak

Adding more indexes

•  Create an IndexEditor – Turn diff into an “indexable”

•  Create a QueryIndex – Turn a Filter into an index-specific query

•  “Declare” the index

18/11/14   15  

Page 16: Flexible search in Apache Jackrabbit Oak

Looking forward

•  Results aggregation features (e.g. facets) •  More configuration options (Lucene, Solr) •  Smarter index selection •  Cover indexes

18/11/14   16  

Page 17: Flexible search in Apache Jackrabbit Oak

Thanks