33
Things Made Easy: One Click CMS Integration with Solr & Drupal Peter M. Wolanin, Ph.D. Momentum Specialist (principal engineer), Acquia, Inc. Drupal contributor drupal.org/user/49851 co-maintainer of the Drupal Apache Solr Search Integration module May 10, 2012

Things Made Easy: One Click CMS Integration with Solr & Drupal

Embed Size (px)

DESCRIPTION

Presented by Peter Wolanin | Acquia, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 If you have a new web project or and existing Drupal site, the combination of Drupal and Apache Solr is both powerful and easy to set up thanks to the existing integration code. The module allows for substantial customization with the administrative UI. Drupal facilitates further customizations of the UI, indexing, and bosting because of the open architecture that provides multiple opportunities for custom code to alter the behavior. A couple code snippets will be followed by a review of other contributed Drupal modules that further enhance the search capability. Finally, this session will showcase some example of Drupal sites using Solr including Acquia's own sites and Drupal sites including many well-known Enterprise and government sites.

Citation preview

Page 1: Things Made Easy: One Click CMS Integration with Solr & Drupal

Things Made Easy: One Click CMS Integration with Solr & Drupal

Peter M. Wolanin, Ph.D.Momentum Specialist (principal engineer), Acquia, Inc.

Drupal contributor drupal.org/user/49851co-maintainer of the Drupal Apache Solr Search Integration module

May 10, 2012

Page 2: Things Made Easy: One Click CMS Integration with Solr & Drupal

• What is Drupal?• What Apache Solr features are integrated with

Drupal?• Why is Drupal plus Apache Solr is better than

starting from scratch?• What elements of the search can you

configure in the UI without code?

Key Questions to Be Answered

Page 3: Things Made Easy: One Click CMS Integration with Solr & Drupal

• You are starting a new website project?• You are wondering how hard it is to actually

integrate Apache Solr with a website?• You already use Drupal but not with Apache

Solr?• You like things that are easy yet powerful?

Why Are You Here?

Page 4: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal: Web Application Framework + CMS == Social Publishing Platform

blogs /wikis

forums / comments

socialranking

social tagging

users

social networks

workflow

taxonomy

semantic web

RSS

content

analytics

ContentMgmt

Systems

SocialSoftware

Tools

Drupal “… is as much a Social Software platform as it is a web content management system.”

CMS Watch, The Web CMS Report 2009

Page 5: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal + Solr Provides Immediate Access to Rich Search Features

Dynamic content requires dynamic navigation - which is provided by an effective searchSearch facets mean no dead endsSolr provides better keyword relevancy in resultsMuch faster searches for sites with lots of contentBy avoiding database queries, Drupal with Solr scales better

Page 6: Things Made Easy: One Click CMS Integration with Solr & Drupal

DEMO: A Drupal 7 partial copy of the conference

site with Apache Solr integration

http://youtu.be/yY6kma_ViWc

Page 7: Things Made Easy: One Click CMS Integration with Solr & Drupal
Page 8: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Has User Accounts, Roles & Permissions

Define custom roles Set granular access controls by roleConfigure user behavior:

– Registration– Email– Profiles– Pictures

Page 9: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Modules Add Functionality

“There’s a module for that”More than 4100 Drupal 7 community modulesOften controlled by role-based permissionsDrupal core and modules are GPL v2+, and have a huge, active community

Page 10: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal is Written in PHP, Which Makes for Easy Customization

The Drupal architecture encourages and provides many avenues for customization by writing modules but not patching Drupal coreDrupal has a huge community of users. Approximately 10,000 sites report to Drupal.org that they use the Apache Solr Search Integration module.

Page 11: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Adapts toYou!!

Page 12: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Entities are Content + Data

Node 7 Node 9Node 8

Node 4 Node 6Node 5

Node 1 Node 3Node 2

Nodes are the basic entity used for text contentThe entity system is extensible - can represent any dataExamples of data stored within Drupal entities

– Text– geographic location– Node reference

Page 13: Things Made Easy: One Click CMS Integration with Solr & Drupal

Define new data fields on a node using the Field API module.

– Text, images, integers, date, reference, etc

Flexible and configurable in the UINo programming required (many existing modules)

Entity Types are Enriched With User-configurable Data Fields

Page 14: Things Made Easy: One Click CMS Integration with Solr & Drupal
Page 15: Things Made Easy: One Click CMS Integration with Solr & Drupal

A Strong Framework for Content Classification

Core taxonomy systemModules provide taxonomy-based appearance, access controlStandard input options include free tagging, flat-controlled, and hierarchical-controlled

Page 16: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal + Solr Search for Business, Government and NGOs

http://www.mattel.com/search/apachesolr_search/

http://www.hrw.org/en/search/apachesolr_search/http://www.restorethegulf.gov/search/apachesolr_search/

http://www.nypl.org/search/apachesolr_search/

http://www.mylifetime.com/community/search/apachesolr_search/

http://opensource.com/search/apachesolr_search/

https://www.ethicshare.org/publications/

http://www.poly.edu/search/apachesolr_search/

https://www.eff.org/search/site/

http://www.whitehouse.gov/search/site/

http://www.emporia.edu/search/site/

Page 17: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Has Already Solved Many Solr Integration Challenges

The most important - content indexing.Facets, sorting, and highlighting of results.Immediate integration with the More Like This and spell-check handlers.Included sub-module integrates content access permissions by indexing to and filtering Solr results based on the current user.

Page 18: Things Made Easy: One Click CMS Integration with Solr & Drupal

Easy Content Recommendation!Uses the MLT handlerPicks fields from the currently viewed node

Page 19: Things Made Easy: One Click CMS Integration with Solr & Drupal

The Module Has a Pipeline for Indexing Drupal Content to Solr

Drupal entities are processed into one (or more) document objects. Each document object is converted to XML and sent to Solr.

titlenidtype

Node object Document object

Drupalfunctions

entity_typelabel

entity_idbundle

XML string

<doc> <field name="entity_type">node</field> <field name="label">Hello Drupal</field> <field name="entity_id">101</field> <field name="bundle">session</field></doc>

Page 20: Things Made Easy: One Click CMS Integration with Solr & Drupal

Entity Meta-data Gives Automatic Facets!

Content typesTaxonomy terms per vocabularyContent authorsPosted and modified datesText and numbers selected via select list/radios/check boxes

Page 21: Things Made Easy: One Click CMS Integration with Solr & Drupal

Drupal Modules Implement hooks to Control Indexing and DisplayHOOK_apachesolr_index_document_build($document, $entity, $entity_type, $env_id)

By creating a Drupal module (in PHP), you can implement module and theme “hooks” to extend or alter Drupal behavior. Change or replace the data normally indexed.Modify the search results and their appearance.

Page 22: Things Made Easy: One Click CMS Integration with Solr & Drupal

Updates to an Entity or Related Meta-data Cause Reindexing

Drupal entities are indexed during Drupal cron (typically invoked via *nix cron).By using a specialized tracking table, content can automatically be queued for reindex when changed, and subsets of content can potentially be sent to different Solr indexes.Entities include many ID-based reference fields (e.g. the User ID of the author). Changes to the referenced data is also watched.

Page 23: Things Made Easy: One Click CMS Integration with Solr & Drupal

Indexing Tracking Tables Maintain Order+-------------+-----------+-------------+--------+------------+| entity_type | entity_id | bundle | status | changed |+-------------+-----------+-------------+--------+------------+| node | 36 | session | 1 | 1336520756 || node | 37 | session | 1 | 1336510489 || node | 38 | session | 1 | 1336510456 || node | 39 | session | 1 | 1336510456 || node | 40 | speaker_bio | 1 | 1336510456 |+-------------+-----------+-------------+--------+------------+

When a node is updated, the “changed” timestamp is updated.The indexing pipeline tracks the largest timestamp and entity_id which has been indexed.

Page 24: Things Made Easy: One Click CMS Integration with Solr & Drupal

Example: Taxonomy Term Classifying a Node is Changed

Grapefruit Citrus fruit

All nodes classified with this terms are queued to be re-indexed by setting the “changed” column to the current time. Thus you will correctly match ‘Citrus’ instead of ‘Grapefruit’ for those documents.

function apachesolr_taxonomy_term_update($term)

Page 25: Things Made Easy: One Click CMS Integration with Solr & Drupal

When Unpublished, Content is Purged

Drupal core includes a simple editorial workflow where content may be toggled between published (visible) and unpublished (incomplete, removed, spam, etc).The module immediately removes content from the index when unpublished, and also tracks it for future removal in case the Solr server is unavailable.

Page 26: Things Made Easy: One Click CMS Integration with Solr & Drupal

Search Using Dismax Query Parsing & Boosting Features

Dynamic fields in schema.xml used to index standard and custom entity data fieldsDismax (or EDismax) handler used for keyword searching across multiple fields and per-field boostsQuery-time boosting options available in the UI

Page 27: Things Made Easy: One Click CMS Integration with Solr & Drupal

A Query Object Is Used to Prepare and Run Searches

$query->setParam('hl.fl', $field);$keys = $query->getParam('q');$response = $query->search();

HOOK_apachesolr_query_prepare($query)

Page 28: Things Made Easy: One Click CMS Integration with Solr & Drupal

More Modules Available to Add More Features

ApacheSolr AttachmentsApache Solr Multisite SearchApache Solr Organic Groups IntegrationApachesolr User indexingApachesolr Commerce

A few examples:

Page 29: Things Made Easy: One Click CMS Integration with Solr & Drupal

To Wrap Up !

Drupal has extensive Apache Solr integration already, and is highly customizable.The Drupal platform is widely adopted, and the Drupal community drives rapid innovation.Acquia provides Enterprise Drupal support and a network of partners.Acquia includes a secure, hosted Solr index with every support subscription.

Page 30: Things Made Easy: One Click CMS Integration with Solr & Drupal

• What is Drupal?• What Apache Solr features are integrated with

Drupal?• Why is Drupal plus Apache Solr is better than

starting from scratch?• What elements of the search can you

configure in the UI without code?

Did I Answer These?

Page 31: Things Made Easy: One Click CMS Integration with Solr & Drupal

• http://www.solarium-project.org/• http://php.net/solr

http://pecl.php.net/package/solr• http://code.google.com/p/solr-php-client/

Other PHP Integration Tools

Caveat: don’t use serialized PHP response format in a custom integration - use JSON writer.

Page 32: Things Made Easy: One Click CMS Integration with Solr & Drupal

• Do you love Drupal, Solr, the LAMP stack, DevOps or anything related, and working at a fast-growing and successful startup?

• Boston and Portland area U.S. offices.• Some remote opportunities as well.• Come talk to me!

[email protected] in IRC #drupal or #solr

Acquia is Hiring!

Page 33: Things Made Easy: One Click CMS Integration with Solr & Drupal

Resources ... Questions? !

http://drupal.org/project/apachesolrhttp://drupal.org/project/apachesolr_attachmentshttp://archive.org/details/drupalconchi_day2_attain_apache_solr_coding_chopshttp://www.acquia.com/tags/apachesolrhttp://groups.drupal.org/lucene-nutch-and-solr