Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Preview:

DESCRIPTION

 

Citation preview

Round 2

Battle of the Giants

Rafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com

VS

Ich bin ein…

Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch” authorSolr.pl co-founderFather and husband

Copyright 2013 Sematext Group. Inc. All rights reserved

Copyright 2013 Sematext Group. Inc. All rights reserved

VS

Under the Hood

Copyright 2013 Sematext Group. Inc. All rights reserved

Lucene 4.3Lucene 4.3

ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationTools Support

Copyright 2013 Sematext Group. Inc. All rights reserved

Expectations vs Reality

Only ElasticSearch nodesSingle leader

Copyright 2013 Sematext Group. Inc. All rights reserved

Solr + ZooKeeperLeader per shard

DistributedFault tolerant

Automatic leader election

All Time Top Committers

Copyright 2013 Sematext Group. Inc. All rights reserved

Active Contributors

Copyright 2013 Sematext Group. Inc. All rights reserved

The Code

Copyright 2013 Sematext Group. Inc. All rights reserved

The Mailing Lists

Copyright 2013 Sematext Group. Inc. All rights reserved

Trends

Copyright 2013 Sematext Group. Inc. All rights reserved

Collection vs Index

Collections and Indices can be spread among different nodes in the cluster

Copyright 2013 Sematext Group. Inc. All rights reserved

Collection – main logical index

Index – main logical structure

Apache Solr Index Structure

Field and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom postings formatMultiple document types require shared schemaCan be read using API

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Index Structure

Schema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – child documentsCustom similarity Custom postings format Multiple document with different structureCan be read and written using API

Copyright 2013 Sematext Group. Inc. All rights reserved

Shards and Replicas

Many shards0 or more replicasReplica can become leader Replicas can be created on live cluster

Copyright 2013 Sematext Group. Inc. All rights reserved

Configuration

Static in solrconfig.xmlCan be reloaded with

core reload

Static in elasticsearch.yml

Changable at runtime

Copyright 2013 Sematext Group. Inc. All rights reserved

Discovery

Copyright 2013 Sematext Group. Inc. All rights reserved

Zen DiscoveryApache Zookeeper

Solr & ZooKeeper

Requires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ensemble needed

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Zen Discovery

Automatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - way failure detection

Copyright 2013 Sematext Group. Inc. All rights reserved

HTTP FTW

HTTP REST API in ElasticSearch or Query String for simple queriesHTTP with Query String in Apache SolrBoth provide specialized Java API

Copyright 2013 Sematext Group. Inc. All rights reserved

Results Grouping

Group on: field value query result function query

Copyright 2013 Sematext Group. Inc. All rights reserved

Prospective Search

Called PercolatorMatches documents to stored queries

Copyright 2013 Sematext Group. Inc. All rights reserved

Full Text Search Capabilities

Variety of queriesControl score calculationDifferent query parsers Advanced Lucene queries

Copyright 2013 Sematext Group. Inc. All rights reserved

Score Calculation

Leverage Lucene scoring Control importance of: documents queries terms phrasesSimiliarity configuration

Copyright 2013 Sematext Group. Inc. All rights reserved

Apache Solr and Score Influence

Index - time boostingQuery - time

Term boostsField boostsPhrases boostFunction queriesSub-queries used for boosting

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch and Score Influence

Index - timeQuery - time

Different queries provide different boost controlsCan calculate distributed term frequenciesNegative and Positive boosting queriesCustom score filters

Scripts

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Query Rescore

Reorders top N hits by using other queryExecuted on shards before results are returned to the node handling itNot executed with scan and count

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Nested Objects

Indexed as separate documentsStored in the same part of index as root docHidden from standard queries and filtersNeed appropriate queries and filters (nested)Top level documents can be sorted on the basis of nested ones

Copyright 2013 Sematext Group. Inc. All rights reserved

Solr Parent – Child Relationship

Used at query timeMulti core joins possible

select?q={!join from=parent to=id}color:Yellow

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Parent – Child

Proper indexing requiredIndexed as separate documentsStandard queries don’t return child documentsRetrieve parent docs using queries and filters (has_child, has_parent, top_children)

Copyright 2013 Sematext Group. Inc. All rights reserved

FiltersUsed to narrown down query results

Good candidates for caching and reuse

Copyright 2013 Sematext Group. Inc. All rights reserved

AddictiveCan use different query parsersCan use local paramsNarrows down faceting results

Defined using Query DSLCan be used for score calculation Doesn’t narrow down faceting results

Faceting

Copyright 2013 Sematext Group. Inc. All rights reserved

TermsRange & queryTerms statisticsSpatial distance

Pivot Histograms

Real Time Or Not ?

Get not yet indexed docs from transaction logDon’t need searcher reopening

Copyright 2013 Sematext Group. Inc. All rights reserved

Separate Get and Multi Get API

Separate Realtime Get Handler

Data Handling

Single and batch indexing supported

Copyright 2013 Sematext Group. Inc. All rights reserved

JSON in / JSON out(and YAML)

Different formats allowed (XML, JSON, CSV, binary)

Partial Document Updates

Not based on LUCENE-3837Server-side doc reindexingBoth servers use versioning Decreases network traffic

Copyright 2013 Sematext Group. Inc. All rights reserved

Apache Solr Partial Doc Update

Sent to the standard update handlerRequires _version_ field

curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]'

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Partial Doc Update

Special end – point exposed - _updateSupports parameters like routing, parent, replication, percolate, etc (similar to Index API)Uses scripts to perform document updates

curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true }}'

Copyright 2013 Sematext Group. Inc. All rights reserved

Solr Collections API

Collection creation reload deletion shards splitting

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Indices REST API

Index creation deletion closing and opening refreshing existence checking

Copyright 2013 Sematext Group. Inc. All rights reserved

Apache Solr Shard Splitting

Copyright 2013 Sematext Group. Inc. All rights reserved

admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1

Cluster State Monitoring

Copyright 2013 Sematext Group. Inc. All rights reserved

Multiple MBeans exposed by JMX

Multiple REST end – points exposed to get different statistics

ElasticSearch Statistics API

Health and state checkNodes informationCache statisticsSegments informationIndex informationMappings information

Copyright 2013 Sematext Group. Inc. All rights reserved

SPM – „One to rule them all”

ElasticSearch Cluster Settings Update

Control rebalancing recovery allocationChange cluster configuration properties

Copyright 2013 Sematext Group. Inc. All rights reserved

ElasticSearch Custom Shard Allocation

Cluster level:

Index level:

curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" }}'

curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"}'

Copyright 2013 Sematext Group. Inc. All rights reserved

Moving Shards and Replicas

Move shards between nodes on demand

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }'

Copyright 2013 Sematext Group. Inc. All rights reserved

Copyright 2013 Sematext Group. Inc. All rights reserved

The Verdict

And The Winner Is ?

The Users

Copyright 2013 Sematext Group. Inc. All rights reserved

We Are Hiring !

Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !

http://sematext.com/about/jobs.html

Copyright 2013 Sematext Group. Inc. All rights reserved

Copyright 2013 Sematext Group. Inc. All rights reserved

Rafał Kuć @kucrafal rafal.kuc@sematext.com

Sematext @sematext http://sematext.com http://blog.sematext.com

ElasticSearch Server 25% off:MREESS25

Thank You !

Recommended