Upload
anshum-gupta
View
543
Download
1
Embed Size (px)
DESCRIPTION
Presentation from my talk at the Minneapolis Apache Lucene/Solr meetup on Sep 23, 2014 hosted by and at Target.
Citation preview
Who am I?
• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010 but consistent community involvement since 2012
• Organizations I am or have been a part of:
Apache Solr has a huge install base and tremendous momentum
most widely used search solution on the planet. 8M+
total downloads
Solr is both established & growing
250,000+monthly downloads
Solr has tens of thousands of applications in production.
You use Solr everyday.
2500+open Solr jobs.
Activity Summary30 Day summary
Aug 18 - Sep 17 2014
• 128 Commits • 18 Contributors
via https://www.openhub.net/p/solr
12 Month Summary Sep 17, 2013 - Sep 17, 2014
• 1351 Commits • 29 Contributors
Solr - Releases
Search - Until recently
• Large organizations (Enterprise)
• Expensive
• Complex
• $$$$$
–Someone
“Easy is good”
New Age Search• Everyone… startups, websites
• Special use cases
• E-commerce
• Mails and personal data
• Personal data - Across devices
• Social and Local!
• Analytics
Decision making!
• Short time frame
• Confidence measure:
• Getting started quick
• Configure and see the tip of the iceberg
• Issues only uncover later in the story
Until recently…• Getting started:
• Download
• java -jar start.jar
• SolrCloud, getting started….
• Download
• Copy example directory ‘x’ times over.
• java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
• java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
• It runs!
Times… they are a changin…
• Download
• cd solr
• Standalone: bin/solr start
• SolrCloud, example, interactive:
• bin/solr start -e cloud (< 2 minutes!)
Let’s index some data…
• Auto Generation of Unique Key
• Solr accepts a single doc
Managed Schema
• Solr is the schema owner
• REST APIs - Hide the implementation details
• When you know what you got
• Or when you don’t! (Schema-less mode)
• Update and Addition of Fields and FieldTypes
More reading: https://lucidworks.com/blog/schemaless-solr-part-1/
Configuration APIs
• Configure Solr using APIs
• solrconfig.xml… What did you say?
Data Import Handler
• Rocket science no more!
• Make things work
Command Line Utils
• Ping and other tasks for already running instance.
• Works for *nix and Windows too!
Query DSL
q=*:*&rows=0&wt=json
&facet.field=cat&indent=true
&facet.pivot=cat,popularity,inStock
&facet.pivot=popularity,cat
&facet.pivot.mincount=2
&facet.limit=5&facet=true
{ “q” : ”*:*”,
“rows” : “0”,
“facet” : {
“” : “true”,
“pivot” : {
“” : [
“cat,popularity,inStock”,
“popularity,cat” ],
“mincount” : “2”
},
“field” : “cat”,
“limit” : “5”
}
Solr Scale Toolkit
• Easily deploy SolrCloud clusters
• Live patching and rolling restarts
• Dependency on AWS soon to go away
• Chef or Puppet still are valid approaches
More reading: http://lucidworks.com/blog/introducing-the-solr-scale-toolkit/
Talking about the Admin UI…
• Already improved from 3.x
• Uploading documents
• Collections API is coming soon
Collection Actions
There’s so much more…
• Self describing handlers
• Improved SolrJ API
• More support for other languages
• HDFS: Auto addition of replicas
• Cross Data-center replication
• SOLR - Make an application, not ‘war’.
It’s easy.. and stable!
• Benchmarking
• Tons of users testing it
• Evolving test framework
Solr scalability is unmatched.
• 10TB+ Index Size • 10 Billion+ Documents • 100 Million+ Daily Requests
Solr scalability is unmatched.
Where is it headed?• Download
• See that server directory?
• Use start scripts
• Send a document, or a few…
• Things don’t really look the way they should?
• Use the schema APIs
• Add fields… not enough?
• Add field types and then add fields
• Configure Solr using REST APIs
For Production:
• Use Solr Scale Toolkit to deploy, patch and manage!
• Configure Solr using REST APIs
Lucidworks Fusion
Intelligent Search Services/API
Recommendation Module Signal Processing Analytics Service
Discovery Engine
Analytics StoreEnrichment Services⚒
Analyst Workbench
eCommerce Solution
Admin/ Management
SiLK Log Analysis
Search/ Discovery
Partner Solutions
Connector Framework
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/