Upload
dr-haxel-cem-gmbh
View
631
Download
0
Tags:
Embed Size (px)
Citation preview
1
An Overview of the Enterprise Search Market, & Current Best Practices
Iain Fletcher
April 20, 2015
2
Agenda
• A brief overview of the current enterprise search
market
• The convergence of search with analytics
disciplines
• Likely future architectures for search applications
4
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint, HP Autonomy, IBM/Vivisimo, Dassault/Exalead
2. Stand-alone specialists, often bought to address specific apps
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: E.g. Lucene, Solr, Elasticsearch
– With support/add-ons: E.g. LucidWorks, Cloudera Search, Elastic
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch, MS Azure search
5
The dominant market share is with SharePoint, open
source, and the Google Search Appliance
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells search – and a lot of GSAs have
been shipped during the past few years
Market Observations
6
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing. For example:
– Coveo, Attivio, Sinequa all have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA have limited content processing
functionality and rely on 3rd parties for connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
7
Further Observations
• The search engines with less focus on peripheral issues
(such as content processing and connectivity) have
dominant market share
• Connectivity remains challenging, especially when
combined with continual data growth
• The movement of data sets to the cloud adds further
complexity
– Hybrid indexing environments will be with us for some years
8
Content Processing / Text Analysis Examples
• Normalization
– Names, dates, synonyms, spelling
• Entity identification and resolution
• Additional metadata from content analysis
• Categorization
• Document vector extraction
• Splitting and concatenation
• Dupe & near-dupe detection
• Link analysis
• Ingesting external signals
• Security enforcement and analysis
Index
security
category
metadata
9
Future Directions
So what will search architectures look like in the future?
Important Influences:
• The need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and churn in repository
/ storage fashions
10
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, evangelized by IBM,
Cloudera, etc.
2. Contemporary Search Architectures
Background Info
12
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
13
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a month
• If you change something significant in the index
pipeline, you will need to re-index
RE-INDEX
14
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndex
EmployeeDirectory
CMS
Etc.
RE-INDEX
Content
Processing
Staging Repository
Iterative
Development
15
The Future Architecture?
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging Repository
Iterative
Development
• This environment will encourage ever more sophisticated content processing• We expect much innovation in text analytics during the next few years
• Driven by cheap, easily available processing power
• The deliverable is a richer search index
16
The Future Architecture
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging Repository
Iterative
Development
• Google.com works something like this for 10+ years
17
An Integrated Search/Analytics Architecture
Hadoop
ContentSources
Connectors
/ Crawlers
CMS
File system
Rapid, & ad hoc Indexing
Content
Processing
Staging Repository
Iterative
Development
ETL
DataSources
Data Warehouse
Logfiles
Etc.
OSINT Search App.
Search App.
Analysis App.
Analysis App.
• Encourages agile exploitation of data and content resources
18
Summary• Search and Analytics are tending towards to the same
architecture
• Autonomous connectivity and content processing systems simplify and de-risk projects
• The “search index” is a mature technology, and becoming a commodity
– Thanks to open source alternatives setting high standards
• The centre of attention is shifting from the index to the content preparation
– This perhaps fits well with the profile of dominant market leaders: SharePoint, GSA, Solr, Elasticsearch….
19
Conclusion
• The foundation of great search and analytical applications
is a clean, rich and detailed index
• Much of the innovation during the next years will be in
content analytics
– The architecture discussed makes it easy to adopt new ideas
and products
– And it promotes agility, experimentation, and innovation
• In a data-driven world, agility is vital
20
The analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
21
An Overview of the Current Enterprise Search Market, & Current Best Practices
Iain Fletcher
April 20, 2015
Thank you!
23
Reference Architecture
Content sources
Connectors
Indexes
Semantics
Text Mining
Quality Metrics
Content Processing Pipelines
Big Data Framework
Indexes
Queryparsing
Search Engine
Web Browser
Staging Repository