Upload
edureka
View
301
Download
5
Embed Size (px)
Citation preview
wwwedurekacoapache-solr
New-Age Search through Apache Solr
View Apache Solr course details at wwwedurekacoapache-solr
For QueriesPost on Twitter edurekaIN askEdurekaPost on Facebook edurekaIN
For more details please contact us US 1800 275 9730 (toll free)INDIA +91 88808 62004Email Us salesedurekaco
Slide 2
LIVE Online Class
Class Recording in LMS
247 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
wwwedurekacoapache-solr
How it Works
Slide 3 wwwedurekacoapache-solr
Objectives
At the end of this module you will be able to understand
The need for search engine for enterprise grade applications
The objectives amp challenges of search engine
How is Indexing amp Searching Handled in Lucene
Solr and its Architecture
Near Real Time Search with Solr
Leveraging Solr Capabilities with Hadoop
Solr with YARN
About job opportunity for Solr Developers
Slide 4Slide 4Slide 4 wwwedurekacoapache-solr
Why Do I Need Search Engines
Slide 5Slide 5Slide 5 wwwedurekacoapache-solr
Search Engine Why do I need them
1 Text Based Search
2 Filter
3 Documents
1
2
3
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 2
LIVE Online Class
Class Recording in LMS
247 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
wwwedurekacoapache-solr
How it Works
Slide 3 wwwedurekacoapache-solr
Objectives
At the end of this module you will be able to understand
The need for search engine for enterprise grade applications
The objectives amp challenges of search engine
How is Indexing amp Searching Handled in Lucene
Solr and its Architecture
Near Real Time Search with Solr
Leveraging Solr Capabilities with Hadoop
Solr with YARN
About job opportunity for Solr Developers
Slide 4Slide 4Slide 4 wwwedurekacoapache-solr
Why Do I Need Search Engines
Slide 5Slide 5Slide 5 wwwedurekacoapache-solr
Search Engine Why do I need them
1 Text Based Search
2 Filter
3 Documents
1
2
3
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 3 wwwedurekacoapache-solr
Objectives
At the end of this module you will be able to understand
The need for search engine for enterprise grade applications
The objectives amp challenges of search engine
How is Indexing amp Searching Handled in Lucene
Solr and its Architecture
Near Real Time Search with Solr
Leveraging Solr Capabilities with Hadoop
Solr with YARN
About job opportunity for Solr Developers
Slide 4Slide 4Slide 4 wwwedurekacoapache-solr
Why Do I Need Search Engines
Slide 5Slide 5Slide 5 wwwedurekacoapache-solr
Search Engine Why do I need them
1 Text Based Search
2 Filter
3 Documents
1
2
3
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 4Slide 4Slide 4 wwwedurekacoapache-solr
Why Do I Need Search Engines
Slide 5Slide 5Slide 5 wwwedurekacoapache-solr
Search Engine Why do I need them
1 Text Based Search
2 Filter
3 Documents
1
2
3
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 5Slide 5Slide 5 wwwedurekacoapache-solr
Search Engine Why do I need them
1 Text Based Search
2 Filter
3 Documents
1
2
3
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 6Slide 6Slide 6 wwwedurekacoapache-solr
Search Engine ndash What it should be
If you need a storage engine to search records documents using text-based keywords it should support following
features
1 Should be optimized for faster text searches
2 Should have flexible schema
3 Should support sorting of documents
4 Web Scale - Should be optimized for reads
5 Should be document oriented
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 7Slide 7Slide 7 wwwedurekacoapache-solr
Cleartrip Spatial Search
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 8Slide 8Slide 8 wwwedurekacoapache-solr
What is Lucene
Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications
Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )
Scalable amp High-performance Indexing
Powerful Accurate and Efficient Search Algorithms
Cross-Platform Solution
raquo Open Source amp 100 pure Java
raquo Implementations in other programming languages available that are index-compatible
Doug Cutting ldquoCreatorrdquo
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 9Slide 9Slide 9 wwwedurekacoapache-solr
Indexing ndash How it works
I like edureka coursesEdureka teaches big
data coursesEdureka helps learn new
technologies easily
Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)
ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3
ldquoedurekardquo
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 10Slide 10Slide 10 wwwedurekacoapache-solr
Lucene ndash Writing to Index
Field
Field
Field
Field
Analyzer IndexWriter Directory
Document
Classes used when indexing documents with Lucene
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 11Slide 11Slide 11 wwwedurekacoapache-solr
Lucene ndash Searching In Index
QueryParser
Analyzer
IndexSearcherExpressionQuery object
Text fragments
Query Parser translates a textual expression from the end into an arbitrarily complex query for searching
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 12Slide 12Slide 12 wwwedurekacoapache-solr
Solr is an open source enterprise search server web application
Solr Uses the Lucene Search Library and extends it
Solr exposes lucene Java APIrsquos as RESTful services
You put documents in it (called indexing) via XML JSON CSV or binary over HTTP
You query it via HTTP GET and receive XML JSON CSV or binary results
What is Solr
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 13Slide 13Slide 13 wwwedurekacoapache-solr
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Near Real-time indexing and Adaptable with XML Configuration
Linearly scalable auto index replication auto Extensible Plugin Architecture
Solr Key Features
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 14Slide 14Slide 14 wwwedurekacoapache-solr
Solr Architecture
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 15Slide 15Slide 15 wwwedurekacoapache-solr
Request Handler
Query ParserResponse
Writer
Index
qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)
defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)
qf selects which fields to queryin the index(by default all fields are required)
wt selects a response writer for formatting the query response
fq filters query by applying an additional query to the initial queryrsquos results caches the results
Rows specifies the number of rows to be displayed at one time
Start specifies an offset(by default 0) into the query results where the returned response should begin
Solr Search Process
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 16Slide 16Slide 16 wwwedurekacoapache-solr
Near Real-Time Search
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time
httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 17Slide 17Slide 17 wwwedurekacoapache-solr
Real-Time Get
The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher
This is primarily useful when using Solr as a NoSQL data store and not just a search index
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 18Slide 18Slide 18 wwwedurekacoapache-solr
Leveraging Solr Capabilities with Hadoop
Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible
distributed search and indexing and will do things like automatic fail over etc
Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of
the data exceeds what is reasonable with a typical RDBMS
We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr
In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 19Slide 19Slide 19 wwwedurekacoapache-solr
Word
HTML
Raw Files
Lucene
SolR SolR SolR
Query Response
Search Web App
MapReduce Indexing Job
Raw Files Indexed
HDFS(Hadoop Distributed File System)
Scalable Indexing
Input Data
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 20Slide 20Slide 20 wwwedurekacoapache-solr
Solr with YARN
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 21Slide 21Slide 21 wwwedurekacoapache-solr
Job trends for Apache Solr
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 22Slide 22Slide 22 wwwedurekacoapache-solr
Disclaimer
Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing
Slide 23Slide 23Slide 23 wwwedurekacoapache-solr
Course Topics
Module 5
raquo Solr Searching
Module 6
raquo Solr Extended Features
Module 7
raquo Solr Cloud amp Administration
Module 8
raquo Final Project
Module 1
raquo Introduction to Apache Lucene
Module 2
raquo Exploring Lucene
Module 3
raquo Introduction to Apache Solr
Module 4
raquo Solr Indexing