26
OCTOBER 13-16, 2016 AUSTIN, TX

Lucene/Solr Spatial in 2015: Presented by David Smiley

Embed Size (px)

Citation preview

Page 1: Lucene/Solr Spatial in 2015: Presented by David Smiley

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Lucene/Solr Spatial in 2015: Presented by David Smiley

Lucene/Solr Spatial in 2015 David Smiley

Search Engineer/Consultant (Freelance)

Page 3: Lucene/Solr Spatial in 2015: Presented by David Smiley

3

About David Smiley

Freelance Search Developer/Consultant Expert Lucene/Solr development skills,advise (consulting), training Java, spatial, and full-stack experience

Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”

Page 4: Lucene/Solr Spatial in 2015: Presented by David Smiley

4

More Spatial Contributors!

Spatial4j Lucene Solr

David Smiley ✔️ ✔️ ✔️

Ryan McKinley ✔️

Justin Deoliveira ✔️

Mike McCandless ✔️

Nick Knize ✔️

Karl Wright ✔️

Ishan Chattopadhyaya ✔️

Page 5: Lucene/Solr Spatial in 2015: Presented by David Smiley

5

Agenda

New Features / Capabilities New Approaches Improvements Pending

Page 6: Lucene/Solr Spatial in 2015: Presented by David Smiley

6

Topic: New Features

Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j

Page 7: Lucene/Solr Spatial in 2015: Presented by David Smiley

7

Heatmaps: Spatial Grid Faceting

Spatial density summary grid faceting, also useful for point-plotting search results

Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually…

v5.2

Page 8: Lucene/Solr Spatial in 2015: Presented by David Smiley

8

Heatmaps Under the Hood

Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid

Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers

Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect

Page 9: Lucene/Solr Spatial in 2015: Presented by David Smiley

9

Solr Heatmap Faceting

On an RPT field (SpatialRecursivePrefixTreeFieldType)

prefixTree=“packedQuad” Query: /select?facet=true&facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png

// Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...

Page 10: Lucene/Solr Spatial in 2015: Presented by David Smiley

10

Solr Heatmap Resources

Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries

https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap

Page 11: Lucene/Solr Spatial in 2015: Presented by David Smiley

11

Geo3D: Shapes on the Surface of a Sphere

… or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight-line), Normal

Page 12: Lucene/Solr Spatial in 2015: Presented by David Smiley

12

All 2D Maps of the Earth Distort Straight Lines

A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!

Page 13: Lucene/Solr Spatial in 2015: Presented by David Smiley

13

Geo3D, continued…

Benefits Inherently more accurate than 2D projected spatial

especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still)

Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d

No Solr integration yet; pending more Spatial4j integration

Page 14: Lucene/Solr Spatial in 2015: Presented by David Smiley

14

Index & Search Geo3D Geometries

Spatial4j Geo3dShape wrapper with RPT

In Lucene-spatial for now Index Geo3d shapes

Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps

Geo3DPointField & PointInGeo3DShapeQuery

Based on a 3D BKD index In spatial3d module

Index points-only No multi-valued

Query by Geo3d shape No distance sort Leaner & faster than RPT

v5.4v5.2

Page 15: Lucene/Solr Spatial in 2015: Presented by David Smiley

15

RPT/SpatialPrefixTrees and Accuracy

RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix

Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales

Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale

Page 16: Lucene/Solr Spatial in 2015: Presented by David Smiley

16

Combining RPT with Serialized Geometry

RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy

Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index

Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable

v5.2

Page 17: Lucene/Solr Spatial in 2015: Presented by David Smiley

17

Topic: New Approaches

Lucene BKD Tree Indexes GeoPointField

Page 18: Lucene/Solr Spatial in 2015: Presented by David Smiley

18

BKD Tree Indexes

New numeric/spatial index approach with own file format Not based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than Trie/PrefixTree based indexes

Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide)

Neat visualization https://youtu.be/

Page 19: Lucene/Solr Spatial in 2015: Presented by David Smiley

19

Multiple BKD Implementations

Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress

1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet

thus no Spatial4j Shape integration nor Solr integration yet

Page 20: Lucene/Solr Spatial in 2015: Presented by David Smiley

20

BKD 1D: RangeTree

Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT)

Would love to see a benchmark! How-To:

RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery

v5.3

Page 21: Lucene/Solr Spatial in 2015: Presented by David Smiley

21

BKD 2D: BKDPointField

Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes.

How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query:

BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698

v5.3

Page 22: Lucene/Solr Spatial in 2015: Presented by David Smiley

22

GeoPointField

2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index

But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm

v5.3

Page 23: Lucene/Solr Spatial in 2015: Presented by David Smiley

23

…continued

Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies

Easy to use compared to RPT; simpler internally too How-To:

doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending

Page 24: Lucene/Solr Spatial in 2015: Presented by David Smiley

24

Topic: Improvements

Spatial4j Minimal longitude bounding-box algorithm

Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree

Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)

Page 25: Lucene/Solr Spatial in 2015: Presented by David Smiley

25

Topic: Some Pending Spatial TODOs

Spatial4j Geo3D integration — a JTS alternative

Lucene FlexPrefixTree — LUCENE-4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc.

Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search

Well, 2015 isn’t over yet. :-)

Page 26: Lucene/Solr Spatial in 2015: Presented by David Smiley

26

That’s all for now; thanks for coming!

Need Lucene/Solr guidance or custom development?

Contact me! Email: [email protected] LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley