Upload
visibium
View
1.501
Download
6
Embed Size (px)
DESCRIPTION
The trade-off between scale and update rate that search engines face on the Web 2.0. How enhanced indexing and smart filtering enable near-real-time engines. SolR Lucene ultra-fast search server and the user-defined "websphere" (feeds and filters).
Citation preview
1
A near‐real‐time search and alert service based on SolR Lucene
April 2013 www.visibium.com
2
The need
What’s new with NFC
technology?
What is said on my
competitors?
What is said by my
competitors?
What’s said on my brand?
What’s said on key executives of my company?
What’s said on my last marketing
campaign?
What’s said on my product launch? What’s said on
my last ad campaign?
3
The need
What’s new with NFC
technology?
What is said on my
competitors?
What is said by my
competitors?
What’s said on my brand?
What’s said on key executives of my company?
What’s said on my last marketing
campaign?
What’s said on my product launch? What’s said on
my last ad campaign?
Industry watch
Competition watch
Brand protection
Campaign Impact analysis
I need to permanently
search the Web 2.0 on certain
topics
4
The need
What’s new with NFC
technology?
What is said on my
competitors?
What is said by my
competitors?
What’s said on my brand?
What’s said on key executives of my company?
What’s said on my last marketing
campaign?
What’s said on my product launch? What’s said on
my last ad campaign?
I need to permanently
search the Web 2.0 on certain
topics
I know where to look
I know what I’m looking for…
… and I want to get an alert whena new matching content is posted.
Within minutes, not the day after.
5
The Problem
I need to permanently
search the Web 2.0 on certain
topics
I want to get an alert whena new matching content is posted…
Some websites take days to get indexed by the major search
engines (Google, Bing, Yahoo!…)
Alert services are as good as their indexing rate is.
A day, not a minute, is the norm (except for breaking news and weather alerts).
… within minutes, not the day after.
Real “real‐time search” engines(OneRiot, Wowd,
Crowdeye, Collecta) failed as the technology involved massive
R&D costs
Google closed its real time search service in 2011
6
The State of the Union
… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging
7
The State of the Union
… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging
Social Web Monitoring & Trending solutions• Look at big chunks of
the Web• Detect trends, mood,
new topics, influencers, etc.
Near‐real‐time search engines• Typically look at the
most popular content feeds, and run indexing at frequent intervals (hence the near‐real‐time)
• Some offer powerful query tools.
8
The State of the Union
… within minutes, not the day after.Narrow look, deep digging Broad look, shallow digging
Social Web Monitoring & Trending solutions• Look at big chunks of the Web• Detect trends, mood, new
topics, influencers, etc.• Typically can’t single out
contributions on a match to a user‐defined query.
Near‐real‐time search engines• Typically look at the
most popular content feeds, and run indexing at frequent intervals (hence the near‐real‐time)
• Some offer powerful query tools to users.
9
Let’s dig deep
Deep dig is about using powerful query toolswhich require full‐text indexing (among other things).
The lesser data the “nearer” real time.
So…
Full text indexing carriesa trade‐off betweenscale and update rate.
10
Let’s dig deeper
Deep dig is about using powerful query toolswhich require full‐text indexing (among other things).
Full text indexing carriesa trade‐off betweenscale and update rate.
The lesser data the “nearer” real time.
So … 2 directions fora nearer real time
Enhanced indexing
Smart selection of data to index
11
Enhanced indexing
What do Apple, Netflix, Wikipedia, LinkedIn eBay and Twitter have in common?
12
Enhanced indexing with SolR Lucene
What do Apple, Netflix, Wikipedia, LinkedIn eBay and Twitter have in common?
13
Enhanced indexing with SolR Lucene
Picking up the right tools for the job
14
Limiting the indexed data
Content feeds• Twitter public stream
(fire hose)• Twitter private feeds• Facebook updates• Syndicated content
(RSS)• Blogs, forums• News
SEARCH
• Watch• Queries
Matchingresults
Basic architecture
• Alerts• Dispatch
15
Limiting the indexed data
Selective architecture
SEARCH
Content feeds• Twitter public stream
(fire hose)• Twitter private feeds• Facebook updates• Syndicated content
(RSS)• Blogs, forums• News
Filtered data
index
FILTERS
• Geo (e.g. local search engine)• Audience (e.g. most popular)• Buzz (e.g. #tags)
• Watch• Queries
Matchingresults
• Alerts• Dispatch
16
Smart selection of data to index
User‐defined filters
SEARCH
Content feeds• Twitter public stream
(fire hose)• Twitter private feeds• Facebook updates• Syndicated content
(RSS)• Blogs, forums• News
Filtered data
index
FILTERS
User‐defined filters
• Watch• Queries• Refined queries (reprocessing)
Matchingresults
• Alerts• Dispatch
17
Visibium
• A near‐real‐time search and alert service• User‐defined feeds and filters• Full‐text indexing• Advanced queries• Refined search reprocessing• Powered by SolR Lucene
Monitor the slice of the web you really care about
© Visibium, 2011‐2013
www.visibium.com