C D H
C D HTransform Enterprise Search with
FAST Search for SharePoint
C D H Quick Facts
About Us• 22nd Year• Grand Rapids &
Royal Oak• 30 Staff
Approach• Vendor
Independent• Non-reseller• Professional
Services Only
Partnerships• Microsoft Gold• VMware Enterprise• Citrix Silver• Novell Gold• Cisco Premier
C D H Expertise
C D H Talks TechC D H
C D H About me
David TappanConsultantIOAp, MCITP, MCTS: [email protected]
C D H
C D H FAST Search: Better Insight
C D H Agenda: Insight
• How FAST increases insight
• Insight into how FAST is used to solve
specific business problems
• Insight into what FAST Search high
availability really requires
C D H A question
What is Search, really?
C D H One answer
“Search is the ability to find text strings in documents”
C D H
”What should I know about selling ERP?”
- Alan Brewer, Sales Lead
”What should I know about implementing ERP?”
- Renee Lo, Consultant
The Problem: Hidden meaning in the searcher’s intent
C D H Another answer
“Search is the ability to query any document property”
C D H
C D H Recommended reading
• http://www.well.com/~doctorow/metacrap.htm
C D H A better answer
Search is a service that matches what you mean with what documents mean.
C D H
C D H
How FAST Search for SharePoint enables better meaning extraction
Cool FAST solutions
C D H F4SP Architecture Basics
C D H
• Query terms in title vs. bodyContext
• «Bill Gates» vs. «Bill saw the gates»Query term proximity
• «...a page about Bill Gates...»«Anchors» match query terms
• Others clicked a hit for «Bill Gates»Click history match
In the box: Dynamic rank algorithms at query time
C D HCustomizable Query Processing
What is someone thinking about when they perform a query?
C D H
Looking for a knowledge management solution?!?!?
I love SharePoint
It’s the best Knowledge Management Solution in the market
Have you ever built an e-commerce solution on it?
Our focus is knowledge management, and it just works!
We use it as a web content management system, and we’re so happy with it
Great for WCM, Great for KM!
Just deployed for KM… so good, so far… will get back once the pilot is over!
Search and the activity feed
Knowledge Management
Web Content Management
E-Commerce
C D H
fql = xrank(string(“fast search”),
or(department:or(string(“services”),
string(“engineering”)),
keywords:string(“knowledge management”)),
boost=10,000)
For the geeks…
C D H
• Prefer shallow URLs
Landing pages
• Links from other pagesAuthority
• Boost sites/documentsHigh quality
In the box: Static rank algorithms at content processing time
C D H Customizable content processingHow to Index Content by Location?
• Address, intersection, zip code, names, etc.– One Microsoft Way, Redmond, WA
• Geodetic coordinates (latitude & longitude)– 47.639767, -122.129755– Degrees, minutes, seconds
• 47° 38’ 23.16” N, 122° 7’ 47.1” W
• Universal Transverse Mercator (UTM) – 10N 565367 5276630
• Military Grid Reference System (MGRS)– 10T ET 65367 76630
Index Schema ( Managed Properties)
C D H Geographic entity extraction
• Requirement– Parse elements from text– Tag documents with the individual values
• Solution – Custom regular expression extraction– Call Bing Maps API– Return latitude and longitude and store as crawled property
{ name: 'Microsoft', address: 'One Microsoft Way, Redmond,
WA 98052',phone: '1‐800‐Microsoft (642‐7676)',path: 'http://www.microsoft.com', latitude: '47.639767',longitude: '‐122.129755' }
C D H How they did it
End UsersData Sources
Fede
ratio
n
OpenSearch Source
Content ProcessorIndexerQuery
Processor
Search Center Index
PartitionIndex
Partition
…Format Conversion
Language Detection
Entity ExtractionLemmatizationMapper
…
FeederFeeder
Geo-coding with Bing Maps API
C D H(
YOUR_TERM(s)_HERE,maxlatitude:range(LOW_LAT,max),minlatitude:range(min,HIGH_LAT),maxlongitude:range(LOW_LON,max),minlongitude:range(min,HIGH_LON)
)e.g. and(football,maxlatitude:range(12,max),minlatitude:range(min,34), maxlongitude(56,max),minlongitude(min,78))
Geographic queries
C D H Takeaways
• Search ain’t beanbag• http://www.well.com/~doctorow/metacrap.htm• FAST Search for SharePoint provides tools
to extract MEANING from content and queries
C D H
C D HScaling FAST Search:
What it takes
C D H
Content Volume
Query Volume
Scale-out multiple “dimensions”
Query VolumeContent VolumeIndexing freshness
Redundancy optionsSearchIndexing
Performance targets*15M Docs/node25 QPS/node50 docs/sec
*Depends on content and hardware specifics
Search and Indexing
Crawling and Content Processing
Query and Result Processing
No theoretical upper bounds!
FAST Search for SharePoint scaleout
C D H
FAST
Admin DB
FAST Content SSA
Admin DB
FAST
Crawl DB
FAST Content SSA
Crawl DB
Crawl comp.Crawl comp.
Admin component
Master Crawl comp.
Crawl comp.
Crawl dataCrawl historyCrawl queue additions
Request crawl
Poll request
Log request
Poll request
Distribute work
FAST Search
Document batches
Content Web Service
Web crawls
Database
Don’t forget SharePoint!
C D H SharePoint Search components
Query
Crawl
Admin
Index P1
CrawlAdmin Props
SharePoint ServerAll Components on one server
Database ServerAll Databases on one Instance
C D HSearch deployment: Query layer build out
Query
Crawl
Admin
Index P1
CrawlAdmin Props
Database ServerAll Databases on one Instance
Query
Index P1
QueryQuery
Index P2
Query
SharePoint ServerQuery Components on Multiple ServersIndex Re-Partitioned
Props
C D HSearch deployment: Crawl layer build out
CrawlCrawl
Crawl
Admin
CrawlAdmin Props
Database ServerAll Databases on one Instance
Query
Index P1
Query Query
Index P2
Query
SharePoint ServerQuery Components on Multiple ServersIndex Re-Partitioned
Props
CrawlCrawl
Crawl
SharePoint ServerCrawl Components on Multiple Servers
C D H
Royal Oak306 S. Washington Ave.Suite 212Royal Oak, MI 48067p: (248) 546-1800
Thank You
Grand Rapids15 Ionia SWSuite 270Grand Rapids, MI 49503p: (616) 776-1600
(c) C/D/H 2007. All rights reservedwww.cdh.com