Upload
anshum-gupta
View
652
Download
3
Embed Size (px)
DESCRIPTION
This is from my opening talk at the Downtown SF Apache Lucene/Solr meetup on Sep 17, 2014 at Trulia.
Citation preview
• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010 but consistent community involvement since 2012
• Organizations I am or have been a part of:
Who am I ?
• Start scripts in Solr
• bin/solr start -e cloud!
• Schemaless - REST APIs to manage schema
• Auto-generate a unique key in schema-less example
• Remove the restriction of adding json by only wrapping it in an array in a new path ‘/update/json/docs’
Ease of Use
• ~3 Years
• 20+ contributors
• 18 Retweets! :)
• http://searchhub.org/2014/09/02/pivot-facets/
Distributed Pivot Faceting
• Auto addition of Replicas when using shared file system (HDFS).
• New spatial BBoxField
• Exporting full sorted result sets!
And more…
• Unloading/Deletion of cores that failed to initialize.
• Update request handlers are registered implicitly, no need to define them.
• Terms Query parser for efficiently filtering documents by a list of values.
• Json loader now flattens nested json to multiple documents.
• Correctly decode special characters in managed stopwords and synonym endpoints.
• Facet counts are no longer duplicated in response if the request duplicates them.
Solr Core
• The CLUSTERSTATUS API tracks and returns much more than the previous version e.g. roles, live nodes etc.
• MIGRATE Collections API
• Now works with legacyCloud=false mode
• Retrying gets better with handling of pre-existing temp collection.
• DELETEREPLICA now removes instance and data directory by default.
• distrib.singlePass parameter to make EXECUTE_QUERY phase fetch all fields and skip GET_FIELDS.
• Also, other bug fixes and slightly better logging!
SolrCloud - APIs
• No more losing the Overseer with the OverseerRoles enabled.
• Distributed commit and optimize are no longer serially executed across all replicas.
• Improvements in leader initiated recovery.
• A ZooKeeper session expiry during setup can keep LeaderElector from joining elections.
• Schemaless concurrency improvements
SolrCloud - Internals
• DistributedQueue is more efficient at creating zk watches.
• Correctly decode special characters in managed stopwords and synonym endpoints.
• OCP doesn’t exit on ZK connection loss and other Zk communication retries.
• Bug Fixes in composite id router.
SolrCloud - Internals
• Improvement in transaction log replay performance on HDFS
• HdfsDirectoryFactory uses supplied Configuration for communicating with secure kerberos.
• HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance.
SolrCloud - HDFS
• SolrJ is better. Support for interval faceting.
• Performance improvement in C*SS - No more spin lock.
• DIH now has onError event handler hook.
• Data Import cancel button in Admin UI
• Improvements to MailEntityProcessor
SolrJ, DIH and more…
• Solr's schema now uses DelegatingAnalyzerWrapper that uses less heap for cached TokenStreamComponents because it caches per FieldType not per Field.
• Reduce CPU usage by avoiding repeated costly calls to Document.getField inside DocumentBuilder.toDocument for use-cases with large number of fields and copyFields.
• BinaryResponseWriter fetches unnecessary stored fields when only pseudo-fields are requested.
Optimizations
• CoreContainer.preRegisterInZk() and CoreContainer.register() commands are merged into CoreContainer.create().
• CoreContainer.remove() has now been replaced with CoreContainer.unload().
• Opened up "public" access to DataSource, DocBuilder, and EntityProcessorWrapper in DIH.
• Added support for multiple spellcheck collations, multi-valued field highlighting to /browse UI.
• Improved SolrCloud cloud-dev scripts.
• Hardened tests so you can rely on this stuff even more!
Solr Developer? Other changes
• Solr 4.10.1
• Should be out anytime!
• LUCENE-5934: 4.10 broke backwards compatibility for 4.0 beta & 4.0-release indexes
• Trunk moves to Java8 after a recent vote
• Ease of use
• Performance + benchmarking
• Stability
• Analytics
What’s next?
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/
Connect @