Upload
lucidworks
View
1.234
Download
4
Embed Size (px)
Citation preview
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lucene/Solr Spatial in 2015 David Smiley
Search Engineer/Consultant (Freelance)
3
About David Smiley
Freelance Search Developer/Consultant Expert Lucene/Solr development skills,advise (consulting), training Java, spatial, and full-stack experience
Apache Lucene/Solr committer & PMC member Primary author of “Apache Solr Enterprise Search Server”
4
More Spatial Contributors!
Spatial4j Lucene Solr
David Smiley ✔️ ✔️ ✔️
Ryan McKinley ✔️
Justin Deoliveira ✔️
Mike McCandless ✔️
Nick Knize ✔️
Karl Wright ✔️
Ishan Chattopadhyaya ✔️
5
Agenda
New Features / Capabilities New Approaches Improvements Pending
6
Topic: New Features
Heatmaps / grid faceting — Lucene, Solr Surface-of-sphere shapes (Geo3d) — Lucene Accurate indexed geometries — Lucene, Solr GeoJSON read/write — Spatial4j
7
Heatmaps: Spatial Grid Faceting
Spatial density summary grid faceting, also useful for point-plotting search results
Usually rendered with a gradient radius Lucene & Solr APIs Scalable & fast usually…
v5.2
8
Heatmaps Under the Hood
Requires a PrefixTreeStrategy Lucene field — grid based Algorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid
Conceptually facet.method=enum for spatial Works on non-point indexed shapes too Complexity: O(cells * cellDepthFactor) not O(docs) No/low memory; mainly the grid of integers
Solr will distribute to shards and merge Could be faster still; a BFS (vs DFS) layout would be perfect
9
Solr Heatmap Faceting
On an RPT field (SpatialRecursivePrefixTreeFieldType)
prefixTree=“packedQuad” Query: /select?facet=true&facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] facet.heatmap.format=ints2D or png
// Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]] ...
10
Solr Heatmap Resources
Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+Search Jack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.html Live Demo: http://worldwidegeoweb.com Open-source JavaScript Solr Heatmap Libraries
https://github.com/spacemansteve/SolrHeatmapLayer https://github.com/mejackreed/leaflet-solr-heatmap https://github.com/voyagersearch/leaflet-solr-heatmap
11
Geo3D: Shapes on the Surface of a Sphere
… or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight-line), Normal
12
All 2D Maps of the Earth Distort Straight Lines
A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!
13
Geo3D, continued…
Benefits Inherently more accurate than 2D projected spatial
especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still)
Has own Lucene module (spatial3d), thus jar file Maven groupId: org.apache.lucene, artifact: lucene-spatial3d
No Solr integration yet; pending more Spatial4j integration
14
Index & Search Geo3D Geometries
Spatial4j Geo3dShape wrapper with RPT
In Lucene-spatial for now Index Geo3d shapes
Limited to grid accuracy Query by Geo3d shape Limited distance sort Heatmaps
Geo3DPointField & PointInGeo3DShapeQuery
Based on a 3D BKD index In spatial3d module
Index points-only No multi-valued
Query by Geo3d shape No distance sort Leaner & faster than RPT
v5.4v5.2
15
RPT/SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree Thus represents shapes as grid cells of varying precision by prefix
Example, a point shape: D, DR, DRT, DRT2, DRT2Y More accuracy scales
Example, a polygon shape: Too many to list… 508 cells More accuracy does NOT scale
16
Combining RPT with Serialized Geometry
RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate) SDV (SerializedDVStrategy) stores serialized geometry (accurate) RPT + SDV → CompositeSpatialStrategy
Accuracy & speed & smaller indexes Optimized intersects predicate avoids some geometry checks > 80% faster intersects queries, 75% smaller index
Solr adapter: RptWithGeometrySpatialField Compatible with the Heatmaps feature Includes a shape cache (per-segment); configurable
v5.2
17
Topic: New Approaches
Lucene BKD Tree Indexes GeoPointField
18
BKD Tree Indexes
New numeric/spatial index approach with own file format Not based on Lucene Terms index https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf Much faster and compact than Trie/PrefixTree based indexes
Wither term auto-prefixing? LUCENE-5879 Indexed point-data only; multi-valued mostly Intersects predicate only Filtering only (no distance or other scoring) Multiple implementations… (next slide)
Neat visualization https://youtu.be/
19
Multiple BKD Implementations
Multiple implementations of the same BKD concept: (1D) RangeTreeDocValuesFormat (2D) BKDPointField & BKD…Query (3D) Geo3DPointField & PointInGeo3DShapeQuery (ND) LUCENE-6825 (to Lucene-core) in-progress
1D,2D,3D Implementations are either in lucene-sandbox or lucene-spatial3d for now No Lucene-spatial module SpatialStrategy wrappers yet
thus no Spatial4j Shape integration nor Solr integration yet
20
BKD 1D: RangeTree
Efficient range search on single/multi-valued numbers or terms Could be used for numbers, dates, IPV6 bytes, … Alternatives: Normal number fields (trie), DateRangeField (RPT)
Would love to see a benchmark! How-To:
RangeTreeDocValuesFormat Numbers: SortedNumericDocValuesField with NumericRangeTreeQuery Bytes: SortedSetDocValuesField with SortedSetRangeTreeQuery
v5.3
21
BKD 2D: BKDPointField
Efficient 2D geospatial point index Alternative to RPT or GeoPointField 5.7x faster than RPT w/ GeoHash. Smaller indexes.
How-To: Use BKDPointField (requires BKDTreeDocValuesFormat) Query:
BKDPointInBBoxQuery BKDPointInPolygonQuery point-radius (circle) — in-progress LUCENE-6698
v5.3
22
GeoPointField
2D geospatial point field Indexed point-only data, single/multi-valued Spatial 2D Trie/PrefixTree terms index
But not affiliated with Lucene-spatial SpatialPrefixTree/RPT Configurable 2x grid size (defaults to 512) Compact bit interleaved Z-order encoding Re-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic 2-phase grid/postings then doc-values algorithm
v5.3
23
…continued
Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategy No Heatmaps, No custom Shape implementations No Solr support yet No dependencies
Easy to use compared to RPT; simpler internally too How-To:
doc.add(new GeoPointField(name, lon, lat, Store.YES)) GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery. …DistanceRangeQuery pending
24
Topic: Improvements
Spatial4j Minimal longitude bounding-box algorithm
Lucene (PrefixTree / RPT indexing) Leaner & faster non-point indexes New PackedQuadPrefixTree
Solr Distance units: Kilometers/Miles/Degrees Nicer ST_* spatial query parsers (almost done)
25
Topic: Some Pending Spatial TODOs
Spatial4j Geo3D integration — a JTS alternative
Lucene FlexPrefixTree — LUCENE-4922 Multi-dimensional BKD — LUCENE-6825 SpatialStrategy adapters for GeoPointField, etc.
Solr Better spatial Solr QParsers — SOLR-4242 GeoJSON parsing More FieldType adapters for latest Lucene spatial DateRangeField faceting Nearest-neighbor search
Well, 2015 isn’t over yet. :-)
26
That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me! Email: [email protected] LinkedIn: http://www.linkedin.com/in/davidwsmiley G+: +DavidSmiley Twitter: @DavidWSmiley