33
www.edureka.co/apache-solr Boost the Search using Apache Solr View Apache Solr course details at www.edureka.co/apache-solr For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : [email protected]

Boost the Search with Apache Solr

  • Upload
    edureka

  • View
    284

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Boost the Search with Apache Solr

www.edureka.co/apache-solr

Boost the Search using Apache Solr

View Apache Solr course details at www.edureka.co/apache-solr

For Queries during the session and class recording:Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN

For more details please contact us: US : 1800 275 9730 (toll free)INDIA : +91 88808 62004Email Us : [email protected]

Page 2: Boost the Search with Apache Solr

Slide 2

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

www.edureka.co/apache-solr

How it Works?

Page 3: Boost the Search with Apache Solr

Slide 3 www.edureka.co/apache-solr

Objectives

At the end of this module, you will be able to understand:

The need for search engine for enterprise grade applications

The objectives & challenges of search engine

What is Indexing & Searching & Why do you need them?

How is Indexing & Searching Handled in Lucene

What is Solr & its features?

What is Solr schema & its structure?

How to achieve Bigdata/NoSQL needs using SolrCloud

Leveraging Solr Capabilities with Hadoop

About job opportunity for Solr Developers

Page 4: Boost the Search with Apache Solr

Slide 4Slide 4Slide 4 www.edureka.co/apache-solr

Why Do I Need Search Engines ?

Page 5: Boost the Search with Apache Solr

Slide 5Slide 5Slide 5 www.edureka.co/apache-solr

Search Engine: Why do I need them?

1. Text Based Search

2. Filter

3. Documents

1

2

3

Page 6: Boost the Search with Apache Solr

Slide 6Slide 6Slide 6 www.edureka.co/apache-solr

Search Engine – What it should be?

If you need a storage engine to search records / documents using text-based keywords it should support following

features:

1. Should be optimized for faster text searches

2. Should have flexible schema

3. Should support sorting of documents

4. Web Scale - Should be optimized for reads

5. Should be document oriented

Page 7: Boost the Search with Apache Solr

Slide 7Slide 7Slide 7 www.edureka.co/apache-solr

Cleartrip Spatial Search

Page 8: Boost the Search with Apache Solr

Slide 8Slide 8Slide 8 www.edureka.co/apache-solr

What is Lucene ?

Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications

Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy )

Scalable & High-performance Indexing

Powerful, Accurate and Efficient Search Algorithms

Cross-Platform Solution

» Open Source & 100% pure Java

» Implementations in other programming languages available that are index-compatible

Doug Cutting “Creator”

Page 9: Boost the Search with Apache Solr

Slide 9Slide 9Slide 9 www.edureka.co/apache-solr

Indexing – How it works?

I like edureka coursesEdureka teaches big

data coursesEdureka helps learn new

technologies easily

Document - 1 (“D1”) Document - 2 (“D2”) Document - 3 (“D3”)

“edureka” = {D1, D2, D3}“courses” = {D1, D2}“teaches” = {D2}“big” = {D2}“data” = {D2}“helps” = {D3}

“edureka”

Page 10: Boost the Search with Apache Solr

Slide 10Slide 10Slide 10 www.edureka.co/apache-solr

Lucene – Writing to Index

Field

Field

Field

Field

Analyzer IndexWriter Directory

Document

Classes used when indexing documents with Lucene

Page 11: Boost the Search with Apache Solr

Slide 11Slide 11Slide 11 www.edureka.co/apache-solr

Lucene – Searching In Index

QueryParser

Analyzer

IndexSearcherExpressionQuery object

Text fragments

Query Parser translates a textual expression from the end into an arbitrarily complex query for searching

Page 12: Boost the Search with Apache Solr

Slide 12Slide 12Slide 12 www.edureka.co/apache-solr

Scoring – Score Boosting

Document’s weight / score can be changed from default, which is called as boosting

Lucene allows influencing search results by "boosting" at different times:

Scoring

Index Time

Query Time

Index-time boost by calling Field.setBoost() before a document is added to the index

Query-time boost by setting a boost on a query clause, calling Query.setBoost()

Page 13: Boost the Search with Apache Solr

Slide 13Slide 13Slide 13 www.edureka.co/apache-solr

A Search System

The first step of all search engines, is a concept called Indexing

Indexing is the processing of original data into a highly efficient cross-reference lookup in order to facilitate rapid searching

Analyze: Search engine does not index text directly. The text are broken into a series of individual atomic elements called tokens

Searching is the process of consulting the search index and retrieving the documents matching the query, sorted in the requested sort order

Acquire content

Build document

Analyze document

Index document

Index

Search UI

Build query

Render results

Run query

Page 14: Boost the Search with Apache Solr

Slide 14Slide 14Slide 14 www.edureka.co/apache-solr

Solr is an open source enterprise search server / web application

Solr Uses the Lucene Search Library and extends it

Solr exposes lucene Java API’s as RESTful services

You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP

You query it via HTTP GET and receive XML, JSON, CSV or binary results

What is Solr ?

Page 15: Boost the Search with Apache Solr

Slide 15Slide 15Slide 15 www.edureka.co/apache-solr

Advanced Full-Text Search Capabilities

Optimized for High Volume Web Traffic

Standards Based Open Interfaces - XML, JSON and HTTP

Comprehensive HTML Administration Interfaces

Server statistics exposed over JMX for monitoring

Near Real-time indexing and Adaptable with XML Configuration

Linearly scalable, auto index replication, auto, Extensible Plugin Architecture

Solr: Key Features

Page 16: Boost the Search with Apache Solr

Slide 16Slide 16Slide 16 www.edureka.co/apache-solr

Solr – Who is using it ?

For more information, go to: http://lucidworks.com/blog/who-uses-lucenesolr/

Page 17: Boost the Search with Apache Solr

Slide 17Slide 17Slide 17 www.edureka.co/apache-solr

Solr: Architecture

Page 18: Boost the Search with Apache Solr

Slide 18Slide 18Slide 18 www.edureka.co/apache-solr

Request Handler

Query ParserResponse

Writer

Index

qt: selects a RequestHandler for a query using/select(by default, the DisMaxRequestHandler is used)

defType : selects a query parser for the query(by default, uses whatever has been configured for the RequestHandler)

qf: selects which fields to queryin the index(by default, all fields are required)

wt: selects a response writer for formatting the query response

fq: filters query by applying an additional query to the initial query’s results, caches the results

Rows: specifies the number of rows to be displayed at one time

Start: specifies an offset(by default 0) into the query results where the returned response should begin

Solr: Search Process

Page 19: Boost the Search with Apache Solr

Slide 19Slide 19Slide 19 www.edureka.co/apache-solr

Velocity Search UI / Solritas

Solr includes a sample search UI based on the VelocityResponseWriter (also known as Solritas) that demonstrates several useful features, such as:

» Searching» Faceting » Highlighting» Autocomplete » Geospatial searching

You can access the Velocity sample Search UI here:

http://localhost:8983/solr/browse

Page 20: Boost the Search with Apache Solr

Slide 20Slide 20Slide 20 www.edureka.co/apache-solr

Faceting

Faceting is the arrangement of search results into categories based on indexed terms

Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term

Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for

Page 21: Boost the Search with Apache Solr

Slide 21Slide 21Slide 21 www.edureka.co/apache-solr

Faceting

A category is an aspect of indexed documents which can be used

to classify the documents

» For example, in a collection of books at an online bookstore,

categories of a book can be its price, author, publication date,

binding type, and so on

Page 22: Boost the Search with Apache Solr

Slide 22Slide 22Slide 22 www.edureka.co/apache-solr

Faceting

In faceted search, in addition to the standard set

of search results, we also get facet results,

which are lists of subcategories for certain

categories

» For example, for the price facet, we get a

list of relevant price ranges; for the author

facet, we get a list of relevant authors; and

so on. In most UIs, when users click one of

these subcategories, the search is

narrowed, or drilled down, and a new

search limited to this subcategory (e.g., to a

specific price range or author) is performed

Page 23: Boost the Search with Apache Solr

Slide 23Slide 23Slide 23 www.edureka.co/apache-solr

Demo

Page 24: Boost the Search with Apache Solr

Slide 24Slide 24Slide 24 www.edureka.co/apache-solr

Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability called SolrCloud

SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas

Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas

Documents can be sent to any server and ZooKeeper will figure it out

SolrCloud

Page 25: Boost the Search with Apache Solr

Slide 25Slide 25Slide 25 www.edureka.co/apache-solr

Architecture

Page 26: Boost the Search with Apache Solr

Slide 26Slide 26Slide 26 www.edureka.co/apache-solr

Leveraging Solr Capabilities with Hadoop

Solr provides us fast, efficient, powerful full-text search and near real-time indexing and SolrCloud is flexible

distributed search and indexing, and will do things like automatic fail over etc.

Hence its very suitable as NoSQL replacement for traditional databases in many situations, especially when the size of

the data exceeds what is reasonable with a typical RDBMS

We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr

In all the major Hadoop distribution like Cloudera, Hortonworks, MapR you can integrate Solr easily

Page 27: Boost the Search with Apache Solr

Slide 27Slide 27Slide 27 www.edureka.co/apache-solr

PDF

Word

HTML

. . .

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App

MapReduce Indexing Job

Raw Files Indexed

HDFS(Hadoop Distributed File System)

Scalable Indexing

Input Data

Page 28: Boost the Search with Apache Solr

Slide 28Slide 28Slide 28 www.edureka.co/apache-solr

Job trends for Apache Solr

Page 29: Boost the Search with Apache Solr

Slide 29Slide 29Slide 29 www.edureka.co/apache-solr

Disclaimer

Criteria and guidelines mentioned in this presentation may change. Please visit our website for latest and additional information on Apache Solr

Page 30: Boost the Search with Apache Solr

Slide 30Slide 30Slide 30 www.edureka.co/apache-solr

Course Topics

Module 5

» Solr Searching

Module 6

» Solr Extended Features

Module 7

» Solr Cloud & Administration

Module 8

» Final Project

Module 1

» Introduction to Apache Lucene

Module 2

» Exploring Lucene

Module 3

» Introduction to Apache Solr

Module 4

» Solr Indexing

Page 31: Boost the Search with Apache Solr

Slide 31Slide 31Slide 31 www.edureka.co/apache-solr

Exclusive

On Apache Solr Course

To avail this offer please contact us: US : 1800 275 9730 (toll free)INDIA : +91 88808 62004Email Us : [email protected]

Page 32: Boost the Search with Apache Solr

Slide 32Slide 32Slide 32 www.edureka.co/apache-solr

References

http://www.indeed.com/jobtrends

Office.com Clip Art/

Page 33: Boost the Search with Apache Solr