43
Local Touch – Global Reach www.us.sogeti.c om SharePoint 2010 Search Deep Dive Corey Erkes, Manager Consultant Sogeti USA

Local Touch – Global Reach SharePoint 2010 Search Deep Dive Corey Erkes, Manager Consultant Sogeti USA

Embed Size (px)

Citation preview

Local Touch – Global Reach

www.us.sogeti.com

SharePoint 2010 Search Deep Dive

Corey Erkes, Manager ConsultantSogeti USA

2www.us.sogeti.com

Local Touch – Global Reach

About Me• Manager Consultant within Sogeti SharePoint Practice

• Worked with SharePoint since V2

• MCTS: Microsoft SharePoint 2010, Configuring

• Co-Leader of Omaha SharePoint User Group

• Coauthor of SharePoint 2010 Governance Book

• Member of UNO IS&T Alumni Board

SharePoint 2010 Deep Dive

3www.us.sogeti.com

Local Touch – Global Reach

Agenda

SharePoint 2010 Search Deep Dive

SharePoint 2010 Search Versions• SharePoint 2010 Foundation• Search Server Express• Search Server• SharePoint 2010 Server• FAST

Search 2010 Architecture• How to Configure• Crawl Component• Query Component• Associated Databases

How to Scale Out

4www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

SharePoint 2010 Search Versions

5www.us.sogeti.com

Local Touch – Global Reach

Wait, there are different flavors of Search?

SharePoint 2010 Search Deep Dive

• SharePoint Foundation 2010• Search Server 2010 Express• Search Server 2010• SharePoint Server 2010• FAST Search Server 2010 for SharePoint

Search Server 2010 Express is a separate product outside of SharePoint 2010, but when installed with SharePoint Foundation 2010, can provide a lot of functionality

6www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Functionality Breakdown

SharePoint 2010 Search Deep Dive

Feature SharePoint Foundation

2010

Search Server

Express

Search Server 2010

SharePoint Server 2010

FAST Search Server 2010 for SharePoint

Visual Best Bets Limited Limited

Scopes

Search enhancements based on user context

Custom properties

Property extraction Limited Limited Limited

Query federation

Query suggestions

Similar results

Sort results on managed properties or rank profiles

Relevancy tuning by document or site promotions

Limited Limited Limited

7www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Functionality Breakdown - Continued

SharePoint 2010 Search Deep Dive

Feature SharePoint Foundation

2010

Search Server

Express

Search Server 2010

SharePoint Server 2010

FAST Search Server 2010 for SharePoint

Shallow results refinement

Deep results refinement

Document preview and thumbnails

Windows 7 federation

People search

Social search

Taxonomy integration

Multi-tenant hosting

Rich Web indexing support

8www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

SharePoint 2010 Index Size Capabilities

• SharePoint Foundation 2010 can be scaled out to over ~10 million with addition of search server and assign it to crawl different content databases

9www.us.sogeti.com

Local Touch – Global Reach

Available Search Repositories

SharePoint 2010 Search Deep Dive

Repository SharePoint Foundation

2010

Search Server

Express

Search Server 2010

SharePoint Server 2010

FAST Search Server 2010 for SharePoint

SharePoint sites

Windows file shares

Exchange public folders

Lotus Notes

Web sites

IFilters for additional systems

Structured content in databases

10www.us.sogeti.com

Local Touch – Global Reach

Search Manageability

SharePoint 2010 Search Deep Dive

Manageability SharePoint Foundation

2010

Search Server

Express

Search Server 2010

SharePoint Server 2010

FAST Search Server 2010 for SharePoint

UI-based administration Limited

Scriptable deployment and management via PowerShell

Microsoft System Center Operations Manager Pack

Health Monitoring

Usage Reporting

11www.us.sogeti.com

Local Touch – Global Reach

So wait, Search Server Express is free?

SharePoint 2010 Search Deep Dive

Feature Search Server Express

SharePoint Server 2010

Performance with sub-second response time 10 million items* 100 million items

Scriptable deployment and management via PowerShell

User interface–based (UI-based) administration

Relevancy tuning by document or site promotions

Common connector framework for indexing and federation

Search from Windows 7 and Windows Mobile

Metadata-based refinement panel

Metadata extraction on managed properties

Scriptable deployment and management using Windows PowerShell

Relevance improves with social behavior

Query suggestions, related searches, and improved “Did you mean?”

* - assumes SQL Server and not SQL Server Express

12www.us.sogeti.com

Local Touch – Global Reach

Really, Search Server Express is free?

SharePoint 2010 Search Deep Dive

Feature Search Server Express

SharePoint Server 2010

People and expertise search

Taxonomy and term store integration

Phonetic and nickname search

Integration with My Site

That’s a lot of goodness for free!

13www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

Unfortunately, FAST Search is not free!

14www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

SharePoint 2010 Search Architecture

15www.us.sogeti.com

Local Touch – Global Reach

Goodbye SSP, Hello SharePoint Search Service!

SharePoint 2010 Search Deep Dive

Search Service Application

Creation of Search Service Application\Proxy can be provisioned in one of three ways:

• Central Administration Manage Service Applications Page• Central Administration Farm Configuration Wizard• PowerShell (how the cool kids do it!)

Creation of Search Service Application PowerShell Walk-Thruhttp://blogs.msdn.com/b/russmax/archive/2009/10/20/sharepoint-2010-configuring-search-service-application-using-powershell.aspx

16www.us.sogeti.com

Local Touch – Global Reach

SharePoint Search Roles

SharePoint 2010 Search Deep Dive

Four unique roles involved in Search• Web server role

• Provides interface for searching• Query server role

• Serves search results to web server(s)• Crawl server role

• Responsible for crawling content• Database server role

• Hosts the three databases associated with search• Property database• Crawl database• Search administration database

17www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

Search Service Application Proxy

Search Components

Web Front End Query Server / Query Processor

WCF Call

Query Component

Property Store Database

Search Administration Database

Index

Index Server

Index

Propagation

Content Data Sources

SharePoint Web Sites

Shared Folders

External Web Sites

CustomDatabases

OtherSystems

CrawlerConnector(s) Crawl Database

18www.us.sogeti.com

Local Touch – Global Reach

Database Role

SharePoint 2010 Search Deep Dive

A minimum of three databases are required to support Search:• Property databases

• Contains metadata or associated custom properties for all crawled items

• Crawl databases• Contains history of the crawl • Manages start and stop points of crawls• Database can have more than one crawl associated to it,

but a single crawler can only be associated to one database

• Search Administration database• Stores search configuration data such as scopes and

refiners. • Contains security information for the crawl content

19www.us.sogeti.com

Local Touch – Global Reach

Database Sizing

SharePoint 2010 Search Deep Dive

Calculations for sizing databases• Property databases

• 0.046 x (sum of content databases)• Crawl databases

• 0.015 x (sum of content databases)• Search Administration database

• Allocate 10 GB

Database Characteristics• Property databases

• Write-heavy, 1:2 ratio• Crawl databases

• Read-heavy, 3:1 ratio• Should not be collocated with Property DB

• Search Administration database• Equal read/write

20www.us.sogeti.com

Local Touch – Global Reach

Crawl Role

SharePoint 2010 Search Deep Dive

Purpose of crawl server is to index content• Crawl runs under MSSeach.exe (SharePoint Server Search 14)• Crawl sever does not contain copy of index, index is

streamed/propagated to Query server• No longer a single point of failure

• Crawler component needs to be mapped to SQL crawl database• Possible to create multiple Crawl databases and Crawler

components

21www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

Search Service Application Proxy

Crawl Architecture

Web Front End Query Server / Query Processor

WCF Call

Query Component

Property Store Database

Search Administration Database

Index

Index Server

Index

Propagation

Content Data Sources

SharePoint Web Sites

Shared Folders

External Web Sites

CustomDatabases

OtherSystems

CrawlerConnector(s) Crawl Database

22www.us.sogeti.com

Local Touch – Global Reach

Crawl Role – Fault Tolerance

SharePoint 2010 Search Deep Dive

• Can be achieved by provisioning a secondary crawl component on a secondary server

• Can be mapped to same SQL Crawl database• Having more crawl databases than Crawl components

doesn’t make sense and wastes system resources

• Crawl Database fault tolerance should be handled through SQL mirroring

23www.us.sogeti.com

Local Touch – Global Reach

Crawl Role – Performance

SharePoint 2010 Search Deep Dive

• Performance is improved by adding additional Crawl components as two or more are crawling content instead of one

• Load is distributed across both Crawl components• Overlapping would not occur as items are crawled in batches by

both crawlers

24www.us.sogeti.com

Local Touch – Global Reach

Crawl Role – Distribution

SharePoint 2010 Search Deep Dive

• Can be accomplished by doing the following:• Crawl Component 1 Crawl DB 1• Crawl Component 2 Crawl DB 2

• Each web application host is assigned a crawl component and attempts to distribute load evenly across crawl databases

• sales.company.com Crawl Component 1 Crawl DB 1• hr.company.com Crawl Component 2 Crawl DB 2

• Distribution is based off # of items/doc id’s that are stored in crawl DB

25www.us.sogeti.com

Local Touch – Global Reach

Crawl Role – Distribution Example

SharePoint 2010 Search Deep Dive

Let’s say you have two web applications• sales.company.com Crawl Component 1 Crawl DB 1• hr.company.com Crawl Component 2 Crawl DB 2

Crawl DB 1 contains 3000 itemsCrawl DB 2 contains 10,000 items

New web application is provisioned: finance.company.com• No need to create additional crawl component or crawl DB

What crawl DB will new host be associated to?

26www.us.sogeti.com

Local Touch – Global Reach

Query Role

SharePoint 2010 Search Deep Dive

Purpose of query server is to server up queries to WFE• Index is stored on Query server(s)• Query server(s) contains one or more Query Components• Query Component is mapped to only one Property Store DB• Query Component is where index that is propagated from Crawler

resides

27www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

Search Service Application Proxy

Query Architecture

Web Front End Query Server / Query Processor

WCF Call

Query Component

Property Store Database

Search Administration Database

Index

Index Server

Index

Propagation

Content Data Sources

SharePoint Web Sites

Shared Folders

External Web Sites

CustomDatabases

OtherSystems

CrawlerConnector(s) Crawl Database

28www.us.sogeti.com

Local Touch – Global Reach

Query Component – Fault Tolerance

SharePoint 2010 Search Deep Dive

Highly recommended to create fault tolerance index by mirroring a Query component onto another server in the farm.

Check “Fail-over Query Component” if you only want fault tolerance and not increase in query performance.

29www.us.sogeti.com

Local Touch – Global Reach

Query Component – Sizing the Index

SharePoint 2010 Search Deep Dive

Index will be approximately 3.5% of Index size• Don’t forget about size needed for mirror• Additional space needed for master merge

Example:

• 100 GB Content Database• Index partition: 100 GB x 3.5% = 3.5 GB• Index partition mirror: 100 GB x 3.5% = 3.5 GB• Space for master merge: All index partitions x 3• Total Space = (3.5 x 2) x 3 = 21 GB

Recommend having enough memory to fit 33% of the index in RAM.

30www.us.sogeti.com

Local Touch – Global Reach

Query Component – Performance

SharePoint 2010 Search Deep Dive

Index size is the main bottleneck for query performance

• Index contains 10 million documents = Avg. of 2 seconds per query• Index contains 20 million documents = Avg. of 4 seconds per query

Creating multiple index partitions is the key to reducing query times and reducing bottlenecks. A new index partition can be added through Search Application Topology in Central Administration.

31www.us.sogeti.com

Local Touch – Global Reach

Property DB Store – Fault Tolerance & Performance

SharePoint 2010 Search Deep Dive

Fault Tolerance• SQL mirroring should be used to achieve fault tolerance.

Performance• Add addition Property Store DB if bottlenecks occur• Must first create new Property Store DB, then create new

Query component and map to new Property Store DB• Additional Query component should not include mirror if

performance is wanted

• You will need to reset index and re-crawl as a new Query component (index partition) would be created

32www.us.sogeti.com

Local Touch – Global Reach

Property Store DB – Add Query Component

SharePoint 2010 Search Deep Dive

Property Store DB must be created before adding Query Component so it appears in dropdown

33www.us.sogeti.com

Local Touch – Global Reach

Query Processor

SharePoint 2010 Search Deep Dive

• Runs under w3wp.exe process• Processes a query by retrieving results from the index\Query

Components• Utilizes the Property Store DB and Search Administration DB to

obtain metadata and perform security trimming• Will load balance requests if more than one Query Component

(mirrored) exists within the same Index Partition

• Query Processor connects to every Property Store DB and Query Component to retrieve results

• Unlike MOSS 2007 where the Query Processor ran on the WFE, any server can run the Query Processor in SharePoint 2010

34www.us.sogeti.com

Local Touch – Global Reach

Query Processor – Fault Tolerance & Performance

SharePoint 2010 Search Deep Dive

• Add additional Query Processor service to another machine in farm• Doesn’t have to be WFE

• Requested will be load balanced in a round-robin fashion to each Query Processor

Search Query and Site Settings Service can be found in CA Services On Server

35www.us.sogeti.com

Local Touch – Global Reach

SharePoint 2010 Search Deep Dive

Search Service Application Proxy

Overall Search Architecture

Web Front End Query Server / Query Processor

WCF Call

Query Component

Property Store Database

Search Administration Database

Index

Index Server

Index

Propagation

Content Data Sources

SharePoint Web Sites

Shared Folders

External Web Sites

CustomDatabases

OtherSystems

CrawlerConnector(s) Crawl Database

36www.us.sogeti.com

Local Touch – Global Reach

Scale-out Decision Points

SharePoint 2010 Search Deep Dive

Number of items Action

0 – 1 million All Search roles can coexist on one or two servers

1 – 10 million Move crawl components to another server, while the query components remain on the Web servers.

10 – 20 million Add a crawl server. Each crawl server has one crawler. Create another index partition with query components and distribute these across query servers.

20 – 40 million Add index partitions with distributed query components. Add another crawl database, and then add a new associated crawler to each crawl server.

40 – 100 million Isolate each topology layer into server groups in which each role is deployed to its own set of servers. Each server group can be scaled out to meet specific requirements for the components in that role.

http://www.microsoft.com/download/en/details.aspx?id=20066

37www.us.sogeti.com

Local Touch – Global Reach

Performance Metrics Thoughts

SharePoint 2010 Search Deep Dive

To improve this metric… Take these actions

Full crawl time and resultfreshness

Add crawl servers, crawlers, and crawl databases. Each crawl database contains content from independent sources. Each crawl database can have several crawl components associated with it, and those crawl components can be distributed among many crawl servers. If you have several content sources, multiple crawl components and associated crawl databases allow you to crawl the content concurrently.

Time required for results to be returned

If query latency is caused by high peak query load, add query servers and index partitions. Each index partition can contain up to ~10 million items. You can also add a mirror for each query component for a given index partition. Place the mirror copy on a different server. Query throughput increases when you add index partition instances. If query latency is caused by database load, isolate the property database from crawl databases by moving it to a separate database server.

http://www.microsoft.com/download/en/details.aspx?id=20066

38www.us.sogeti.com

Local Touch – Global Reach

Small Farm Topology

SharePoint 2010 Search Deep Dive

http://www.microsoft.com/download/en/details.aspx?id=20066

39www.us.sogeti.com

Local Touch – Global Reach

Medium Farm Topology

SharePoint 2010 Search Deep Dive

http://www.microsoft.com/download/en/details.aspx?id=20066

40www.us.sogeti.com

Local Touch – Global Reach

Medium Search Farm Topology

SharePoint 2010 Search Deep Dive

http://www.microsoft.com/download/en/details.aspx?id=20066

41www.us.sogeti.com

Local Touch – Global Reach

Medium Dedicated Search Farm Topology

SharePoint 2010 Search Deep Dive

http://www.microsoft.com/download/en/details.aspx?id=20066

42www.us.sogeti.com

Local Touch – Global Reach

Large Dedicated Search Farm Topology

SharePoint 2010 Search Deep Dive

http://www.microsoft.com/download/en/details.aspx?id=20066

43www.us.sogeti.com

Local Touch – Global Reach

References

SharePoint 2010 Search Deep Dive

Search Technologies for SharePoint 2010 Productshttp://download.microsoft.com/download/0/0/0/00015E0A-67CD-490C-9C1B-DCFA8E9BAEFC/Search%20Model%201%20of%204%20-%20Search%20Technologies.pdf

SharePoint Brew – Search 2010 Architecture and Scale, Part 1 Crawlhttp://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-1-crawl.aspx

SharePoint Brew – Search 2010 Architecture and Scale, Part 2 Queryhttp://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx