26
OCTOBER 13-16, 2016 AUSTIN, TX

Lucene/Solr spatial in 2015

Embed Size (px)

Citation preview

Page 1: Lucene/Solr spatial in 2015

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • AU S T I N , T X

Page 2: Lucene/Solr spatial in 2015

Lucene/Solr Spatial in 2015David Smiley

Search Engineer/Consultant (Freelance)

Page 3: Lucene/Solr spatial in 2015

About David Smiley

Freelance Search Developer/ConsultantExpert Lucene/Solr development skills,advise (consulting), trainingJava, spatial, and full-stack experience

Apache Lucene/Solr committer & PMC memberPrimary author of “Apache Solr Enterprise Search Server”

Page 4: Lucene/Solr spatial in 2015

More Spatial Contributors!

Spatial4j Lucene Solr

David Smiley ✔️ ✔️ ✔️

Ryan McKinley ✔️

Justin Deoliveira ✔️

Mike McCandless ✔️

Nick Knize ✔️

Karl Wright ✔️

Ishan Chattopadhyaya ✔️

Page 5: Lucene/Solr spatial in 2015

Agenda

New Features / CapabilitiesNew ApproachesImprovementsPending

Page 6: Lucene/Solr spatial in 2015

Topic: New Features

Heatmaps / grid faceting — Lucene, SolrSurface-of-sphere shapes (Geo3d) — LuceneAccurate indexed geometries — Lucene, SolrGeoJSON read/write — Spatial4j

Page 7: Lucene/Solr spatial in 2015

Heatmaps: Spatial Grid Faceting

Spatial density summary grid faceting,also useful for point-plotting search results

Usually rendered with a gradient radiusLucene & Solr APIsScalable & fast usually…

v5.2

Page 8: Lucene/Solr spatial in 2015

Heatmaps Under the Hood

Requires a PrefixTreeStrategy Lucene field — grid basedAlgorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid

Conceptually facet.method=enum for spatialWorks on non-point indexed shapes tooComplexity: O(cells * cellDepthFactor) not O(docs)No/low memory; mainly the grid of integers

Solr will distribute to shards and mergeCould be faster still; a BFS (vs DFS) layout would be perfect

Page 9: Lucene/Solr spatial in 2015

Solr Heatmap Faceting

On an RPT field (SpatialRecursivePrefixTreeFieldType)

prefixTree=“packedQuad”

Query: /select?facet=true&facet.heatmap=geo_rpt&facet.heatmap.geom= ["-180 -90" TO "180 90”]facet.heatmap.format=ints2D or png

// Normal Solr response..."facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]]...

Page 10: Lucene/Solr spatial in 2015

Solr Heatmap Resources

Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+SearchJack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.htmlLive Demo: http://worldwidegeoweb.comOpen-source JavaScript Solr Heatmap Libraries

https://github.com/spacemansteve/SolrHeatmapLayerhttps://github.com/mejackreed/leaflet-solr-heatmaphttps://github.com/voyagersearch/leaflet-solr-heatmap

Page 11: Lucene/Solr spatial in 2015

Geo3D: Shapes on the Surface of a Sphere

… or Ellipsoid of configurable axisNot a general 3D space geometry libInternally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematicsShapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional bufferDistance computations: Arc (angular or surface), Linear (straight-line), Normal

Page 12: Lucene/Solr spatial in 2015

All 2D Maps of the Earth Distort Straight Lines

A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!

Page 13: Lucene/Solr spatial in 2015

Geo3D, continued…

BenefitsInherently more accurate than 2D projected spatial

especially for big shapes or near polesMany computations are fast; no expensive trigonometryAn alternative to JTS without the LGPL license (still)

Has own Lucene module (spatial3d), thus jar fileMaven groupId: org.apache.lucene, artifact: lucene-spatial3d

No Solr integration yet; pending more Spatial4j integration

Page 14: Lucene/Solr spatial in 2015

Index & Search Geo3D Geometries

Spatial4j Geo3dShape wrapper with RPT

In Lucene-spatial for nowIndex Geo3d shapes

Limited to grid accuracy

Query by Geo3d shapeLimited distance sortHeatmaps

Geo3DPointField & PointInGeo3DShapeQuery

Based on a 3D BKD index

In spatial3d moduleIndex points-only

No multi-valuedQuery by Geo3d shapeNo distance sortLeaner & faster than RPT

v5.4v5.2

Page 15: Lucene/Solr spatial in 2015

RPT/SpatialPrefixTrees and Accuracy

RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree

Thus represents shapes as grid cells of varying precision by prefix

Example, a point shape:D, DR, DRT, DRT2, DRT2YMore accuracy scales

Example, a polygon shape:Too many to list… 508 cellsMore accuracy does NOT scale

Page 16: Lucene/Solr spatial in 2015

Combining RPT with Serialized Geometry

RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)SDV (SerializedDVStrategy) stores serialized geometry (accurate)RPT + SDV → CompositeSpatialStrategy

Accuracy & speed & smaller indexesOptimized intersects predicate avoids some geometry checks> 80% faster intersects queries, 75% smaller index

Solr adapter: RptWithGeometrySpatialFieldCompatible with the Heatmaps featureIncludes a shape cache (per-segment); configurable

v5.2

Page 17: Lucene/Solr spatial in 2015

Topic: New Approaches

LuceneBKD Tree IndexesGeoPointField

Page 18: Lucene/Solr spatial in 2015

BKD Tree Indexes

New numeric/spatial index approach with own file formatNot based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdfMuch faster and compact than Trie/PrefixTree based indexes

Wither term auto-prefixing? LUCENE-5879Indexed point-data only; multi-valued mostlyIntersects predicate onlyFiltering only (no distance or other scoring)Multiple implementations… (next slide)

Neat visualization https://youtu.be/x9WnzOvsGKs

Page 19: Lucene/Solr spatial in 2015

Multiple BKD Implementations

Multiple implementations of the same BKD concept:(1D) RangeTreeDocValuesFormat(2D) BKDPointField & BKD…Query(3D) Geo3DPointField & PointInGeo3DShapeQuery(ND) LUCENE-6825 (to Lucene-core) in-progress

1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for nowNo Lucene-spatial module SpatialStrategy wrappers yet

thus no Spatial4j Shape integration nor Solr integration yet

Page 20: Lucene/Solr spatial in 2015

BKD 1D: RangeTree

Efficient range search on single/multi-valued numbers or terms

Could be used for numbers, dates, IPV6 bytes, …Alternatives: Normal number fields (trie), DateRangeField (RPT)

Would love to see a benchmark!How-To:

RangeTreeDocValuesFormatNumbers: SortedNumericDocValuesField with NumericRangeTreeQueryBytes: SortedSetDocValuesField with SortedSetRangeTreeQuery

v5.3

Page 21: Lucene/Solr spatial in 2015

BKD 2D: BKDPointField

Efficient 2D geospatial point indexAlternative to RPT or GeoPointField5.7x faster than RPT w/ GeoHash. Smaller indexes.

How-To:Use BKDPointField (requires BKDTreeDocValuesFormat)Query:

BKDPointInBBoxQueryBKDPointInPolygonQuerypoint-radius (circle) — in-progress LUCENE-6698

v5.3

Page 22: Lucene/Solr spatial in 2015

GeoPointField

2D geospatial point fieldIndexed point-only data, single/multi-valuedSpatial 2D Trie/PrefixTree terms index

But not affiliated with Lucene-spatial SpatialPrefixTree/RPTConfigurable 2x grid size (defaults to 512)Compact bit interleaved Z-order encodingRe-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic2-phase grid/postings then doc-values algorithm

v5.3

Page 23: Lucene/Solr spatial in 2015

…continued

Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy

No Heatmaps, No custom Shape implementationsNo Solr support yetNo dependencies

Easy to use compared to RPT; simpler internally tooHow-To:

doc.add(new GeoPointField(name, lon, lat, Store.YES))GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending

Page 24: Lucene/Solr spatial in 2015

Topic: Improvements

Spatial4jMinimal longitude bounding-box algorithm

Lucene (PrefixTree / RPT indexing)Leaner & faster non-point indexesNew PackedQuadPrefixTree

SolrDistance units: Kilometers/Miles/DegreesNicer ST_* spatial query parsers (almost done)

Page 25: Lucene/Solr spatial in 2015

Topic: Some Pending Spatial TODOs

Spatial4jGeo3D integration — a JTS alternative

LuceneFlexPrefixTree — LUCENE-4922Multi-dimensional BKD — LUCENE-6825SpatialStrategy adapters for GeoPointField, etc.

SolrBetter spatial Solr QParsers — SOLR-4242GeoJSON parsingMore FieldType adapters for latest Lucene spatialDateRangeField facetingNearest-neighbor search

Well, 2015 isn’t over yet. :-)

Page 26: Lucene/Solr spatial in 2015

That’s all for now; thanks for coming!

Need Lucene/Solr guidance or custom development?

Contact me!Email: [email protected]: http://www.linkedin.com/in/davidwsmileyG+: +DavidSmileyTwitter: @DavidWSmiley