Upload
calvin-hendryx-parker
View
1.196
Download
1
Embed Size (px)
Citation preview
Enterprise Search in Plone using Solr
Calvin Hendryx-ParkerPlone Conference 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Java Based
• Full-Text Search
• Web Services API
• Standards Based Interfaces
• Scalable
• XML Configuration
• Extensible
What is Solr?
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Indexing
• Query
Playing with Solr
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Data Schema
• Faceted Search
• Administrative Interface
• Incremental Updates
• Supports Sharding
• Index Databases, Local Files and Web Pages
• Supports Multiple Indexes
Solr Features
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Stopwords
• Synonyms
• Highlighted Context Snippets
• Spelling Suggestions
• More Like This Suggestions
• Supports Rich Documents
Solr Features
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010Solr Performance
• Wiktionary Dataset
• 49.5 Millions lines of XML
• 1.3 GB of data
• 1.7 Million Pages Indexed in 5.5 hours
• ZODB Size after import 1.1GB
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
collective.solr
Integration Options with Plone
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Monkey Patching
• Relies on collective.indexing
• Duplicates all indexes
• Sub-Optimal Integration with Zope Transactions
• Relies on Thread Locals
collective.solr Issues
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
What to do?
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Reevaluate
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• No Monkey Patching
• Simpler Code
Solr Integration as a Catalog Index
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• ZCatalog Index
• Doesn't depend on Plone
• Utilizes new foreign_connections Connection Method
• Pass through Solr Queries
• Direct access to the Solr Response
Enter alm.solrindex
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Still handled by the ZCatalog
• Could change in the future
Sorting
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Handle Parsing Attributes for Indexing
• Translate field-specific queries to Solr
• Registered as Zope Utilities
alm.solrindex Field Handlers
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
<html><body><h3>Code Sample</h3><p>Replace this text!</p></body></html>
Example Handlerclass TextFieldHandler(DefaultFieldHandler):
def parse_query(self, field, field_query): name = field.name request = {name: field_query} record = parseIndexRequest(request, name, ('query',)) if not record.keys: return None
query_str = ' '.join(record.keys) if not query_str: return None
return {'q': u'+%s:%s' % (name, quote_query(query_str))}
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• GenericSetup Profile
• Tests
• Uses solrpy instead of the unsupported solr.py
Other alm.solrindex Features
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
• Can replace several ZCatalog indexes
• Remove any indexes you have replaced
• Use it for all Text Indexes
• Still Utilize the ZCatalog Indexes for Everything Else
Tips
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
DemoProject Gutenburg Data
Wednesday, October 27, 2010
PLONE CONFERENCE 2010
Questions?
Wednesday, October 27, 2010
Check out
sixfeetup.com/demos
Wednesday, October 27, 2010