View
4.374
Download
1
Category
Preview:
DESCRIPTION
Delivered at the SharePoint Best Practices Conference in La Jolla CA Feb 6 to 9
Citation preview
Enterprise Search
ITP278
Marianne Sweeny Ascentium wwwascentiumcom Mariannesweenyascentiumcom Director of Search Services Web producer
at Microsoft for 7+ years pointy-head not propeller-head
Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix
Introduction
July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web
2000 1 billion pages1999 26 million pages
There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008
ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year
There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first
few pages and not necessarily Google itselfndash Enterprises have different lines of business and different
information types Search of tomorrow is here today
ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Marianne Sweeny Ascentium wwwascentiumcom Mariannesweenyascentiumcom Director of Search Services Web producer
at Microsoft for 7+ years pointy-head not propeller-head
Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix
Introduction
July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web
2000 1 billion pages1999 26 million pages
There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008
ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year
There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first
few pages and not necessarily Google itselfndash Enterprises have different lines of business and different
information types Search of tomorrow is here today
ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Agenda Introduction MOSS 2007 Search Configuring MOSS Search Here There Be Dragons Resources Appendix
Introduction
July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web
2000 1 billion pages1999 26 million pages
There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008
ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year
There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first
few pages and not necessarily Google itselfndash Enterprises have different lines of business and different
information types Search of tomorrow is here today
ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Introduction
July 2008 Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web
2000 1 billion pages1999 26 million pages
There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008
ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year
There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first
few pages and not necessarily Google itselfndash Enterprises have different lines of business and different
information types Search of tomorrow is here today
ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
There is No Magic Bullet Susan Feldman (IDC) Enterprise Search Summit West 2008
ndash Employees average 35 hoursweek searchingndash Cost = $5000 per employee per year
There can be no ldquosilver bulletrdquo solution for finding informationndash Customers donrsquot know what they donrsquot knowndash ldquoGoogle experiencerdquo is finding what they wantneed in the first
few pages and not necessarily Google itselfndash Enterprises have different lines of business and different
information types Search of tomorrow is here today
ndash Personalized to the device and userndash Contextualndash Flexiblendash Securendash Adaptable
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Search Index A Different Kind of Database
Search Engine Index SQL Server Index
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Web Search and Enterprise Search
Publishers want their content to be found
Anarchistic publishing model = ldquoanyone anywhere any timerdquo
Unlimited document set No real standards or code more like
guidelines No central authority Spam Commercialization Technology is agnostic Has to work the same for everyone
worldwide No shared understanding
Enterprise Search
Successful enterprise search efforts target corpuses of information and set search scopes appropriately IampKM pros are wise to study information worker context before trying to ldquoGoogle-izerdquo their enterprises Forrester Search Wave Q2 2008
Web Search Publishers do not think about
document discoverability Controlled corpus of documents Standards and practices in place No spam Users and authors generally
share contextual understanding Customized tagging or metadata Can customize search
technology to enterprise themes and concepts
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Advanced Search Few customers use it and those that do are
disappointed Boolean or SQL operators work sporadically
Confusing message What is ldquoregularrdquo searchhellipnot as effective
Search has progressed beyond the stages of Advanced Filters Facets Context
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
MOSS 2007 Search
Query engine breaks the search terms down
Index engine stores the properties
Content index stores the text
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Better Than EverMOSS 2007 Relevance customizable to the
enterprise content Automated metadata extraction Enhanced text analysis
Fully integrated admin experience between Windows
SharePoint Services v3 and MOSS 2007 Single search system and index
per server farm Custom content groups Best
Bets scheduling are now shared services
Scopes can be tied to document properties
Improved control over indexing
SharePoint 2003 Relevance keyed on numeric values
derived solely from document text Collection frequency Term frequency Document length Term position
Different systems between Windows SharePoint Systems and SharePoint Portal Server Multiple indexes Custom Content groups Best
Bets scheduling configurations are portal-based
Scopes tied to content sources Index propagated at completion of
master crawl only
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Simplified Administration UISearch settings page at the SSP levelManaging crawls
bull Content sourcesbull Explicit SharePoint Content Source Typebull Content source for Business Data (Enterprise CAL)
Crawl logsbull Snapshot of crawled content in your index ndash lists all documents found in the
content source and their statusbull Filters by date site and etcbull Summary by host name (of successes errors and warnings)
Crawl rulesbull Included and excluded rulesbull Ability to pre-test crawl rulesbull Easy to change order of crawl rules
Managing scopesbull Scopes decoupled from content sourcesbull Scopes can span multiple content sourcesbull Scope by Property Site Content Source and URL
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Indexing Performance Improvements Search is a shared service
ndash Unified WSS and MOSS search for 1 index per SSPndash Crawls content sources crawl rules schema shared scopes etc are administered
centrally at the shared service levelndash Scopes and best bets can also be administered at the consuming sites
Crawl to small indexes that are then consolidated at scheduled times into a ldquomaster mergerdquo
Content index that holds text of pages with Property store that holds other document values
Propagate data incrementally as it is being indexed to the query serversndash Propagation starts within 30 seconds of the first shadow index writtenndash No need to wait till the end of the crawl for information to be available in queries ndash No propagation of properties
Single item add removal without re-indexing entire corpus with continuous propagation
ndash Change Log Crawl detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items
ndash Security Change Only Crawl no need to fully index all the content of a site when permissions on this site have changed
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Relevance Types Dynamic ranking = relevance impacted by query term
ndash Frequencyndash Location in documentndash Appearance in link text ndash Appearance in URL
Static ranking = relevance independent of customer queryndash URL Depthndash Click Distancendash AuthorityDemoted sitendash Change property weightsndash Language of customer (browser setting)ndash Document type HTML files PPT Word docs emails
XML files Excel spreadsheets Plain text List items
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Relevance EnhancementsManually assign synonyms and editorialized results to keywords
ndash Use search logs to detect popular searches low click-through from results or 0 result queries
Search Alertsndash User can subscribe to receive email when results change
File type filtering ndash Some file types are deemed more relevant (ie HTML DOC)
than others (XML txt)ndash Supports 220 files types MS and non-MS application
Property weights ndash Assign different weights to properties so that important
properties such as lsquoTitlersquo have a bigger influence on rankingndash Change default property weights through the Schema Object
Modelndash Note The weights used in the product were carefully tested
Changes to the weights may also have a negative effect on relevance Marcy Tobin wants me to tell you that this is not a trivial
undertaking
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
MOSS 2007 Faceted SearchFacets are predetermined content categories presented to the customer to narrow search results
bullCan be presented pre- or post- querybullUsed for Advanced search
Empowers customer to most effectively refine their search
Filters results by predetermined categories
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Federated Search Import or export federated locations using Federated
Location Definition (FLD) files Incorporates results from outside content sources that
subscribe to OpenSearch 11 Passes the query into the subscribed resource and
returns results into single interface Relevance calculation done according to originating
resource criteria not MOSS 2007 criteria Pre-defined FLD files found at
httpwwwmicrosoftcomenterprisesearchconnectorsfederatedaspxfscp
Can develop own FLD files if destination subscribes to OpenSearch 11
ndash Day Software has developed a standard connector for LiveLink ECM
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
People SearchBuild and publish rich personal profiles
Customize personal profile attributes Populate personal profiles using information from Active Directory other
LDAP directories or Line-of Business systems Control access to information using security and privacy controls Generate and display organizational charts based on directory
information Publish personal profiles using MOSS My Sites
Identify people who can help Find people based on keyword matches with MOSS personal profiles Find people in line-of-business systems Filter results by common attributes such as Job Title or Department Find ldquoin-commonrdquo connections including managers site memberships
distribution lists and colleagues Group results by social distance Subscribe to People Alerts
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
People Search Results Page
Find people by project expertise orhellip
Find people by project expertise orhellip
Filter by relevant attributes
Filter by relevant attributes
Contact information amp online availabilityContact information amp online availability
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Extracts data from line-of-business CRM and other 3rd Party data stores Caches for indexing by search
service Searches any data source
accessible through ADOnet or Web Services
Uses Live Communication Server for connectivity options
Aggregated into a single application
LOB Applications with BDC
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
FAST ESP TechnologyFAST is a sophisticated search engine tailor-made for ecommerce and help
desk Uses sophisticated linguistic processing Searches structured and unstructured content Indexing Process Conversion-language detection-synonyms-spell check-
external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing
Why is it Unique Auto Classification Advanced Linguistics text mining for
concept and relationship mapping Recall Lemmatization synonym
expansion wildcards anti-phrasing phonetic search
Precision Exact word matching exact phrase matching proximity tokenization
Location aware results (retail and news) ndash excellent for mobile search
Recommendation engine Increased capacity100-200 million
documents on 1 server and 150 million qsecond
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Custom Results Search Scopes
Allow users to refine search through filtering Define content resources and map to business ruleskey concepts Focused content = shared understanding = more precise results
Duplicate results filtering Collapsing duplicates from same directory or site to leave more room for other
relevant results Less favoritism more results on desired page 1
Definitions Automatically extract ldquodefinitionsrdquo from indexed content and display them as
matches directly on the results page A web property on the Search Best Bets web part (can turn onoff display of
definition) Returned in the Query Object Model Can not be edited
Best Bets Editorially assigned results based on these key concepts assigned to selected
query terms Can be many-to-many
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Scalability No physical limit for the maximum number of
documents in one index Recommended document limit is 50 Millions of
documents per indexer A document is anything from a Word or PowerPoint
file to a web page an individual SharePoint list item one people entry or an SAP customer record
Largesmall documents count the same The lsquoaverage document sizersquo depends on the
corpus mixndash ie heavy use of WSS 30 lists versus limited use
Dependent on supporting hardware
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Security Query time stripping ndash customer only sees those results
that they have permission to view Support for pluggable authentication for content in
SharePoint Server and WSS 30 Sites Implements ASPNET 20 authentication model
Minimum crawler permission is ldquoFull Readrdquo Still provides the same security trimming functionality Automatically configured for new sites
Search visibility options Prevent siteslists appearing in search results at a
sitelist level ldquoSecurity onlyrdquo crawl for single item addremoval
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Search Analytics Export search logs to Excel
Query terms Page views Number of results returned
Volume trends Query success can define success for
certain query terms Report Center
Access to MOSS 2007 BI features Filters data for permissions and relevance
Key Performance Indicators [KPI] Create a KPI list or other measures of
success Default KPIs exist in OOB deployment KPI information can be drawn from MOSS
2007 data sources SharePoint lists Excel workbooks SQL Server 2005 Analysis Services manually entered information
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Configuring MOSS 2007 Search
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Search Roadmap Useful participants
Content creators Information ArchitectUser Experience Architect Taxonomist
Define key enterprise themes in content Map existing content to these themes Create filters and scopes to map for themes
Get as much customer data as possible to find search pain points Review search logs and customer feedback mechanisms What are they trying to find What terms are they using
Assemble a cross functional team to Assign relevance weighting that makes sense to the customer behavior and the corpus Develop Best Bets for searches with 0 results Create editorial guidelines and tools that enforce strong meta data standards across the
enterprise Develop controlled vocabulary that best describes enterprise key concepts and themes
and Is used as a foundation for meaningful metadata and facets Design a structure that leverages the structural elements like URL depth and click distance
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Paretorsquos Principle Known as the 8020 rule
Named after late 19th century economist
20 of your content is answering 80 of your searches
Not an excuse to stop optimizing at the top 20 Donrsquot forget the Long Tail
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Define Content Define content scopes
Segment content into logical groups Create scope rule based on
ndash Addressndash Property queryndash Content source
At the SSP level or individual level SSP level scopes are shared among all sites that use the SSP
Select Authority resources Define special terms if needed
Terms or language proprietary to the enterprisendash ie ldquogoat rodeordquo
Provides additional clarification for searcher Use synonym mapping for term variants
ndash C and Csharp
Two information points can be displayed for a special termndash Definition of the termndash Best Bet
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Designate Authority Sites Hilltop Algorithm
Quality of links more important than quantity of links
Segmentation of corpus into broad topics
Selection of authority sources within these topic areas
Pre-query calculation applied at query time
Topic Sensitive Page Rank Consolidation of Hypertext Induced
Topic Selection [HITS] and PageRank Pre-query calculation of factors
based on subset of corpusndash Context of term use in documentndash Context of term use in history of queriesndash Context of term use by user submitting
query
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Educate Structural Influences File Type Bias
In order of relevancy (highest to lowest )ndash HTML Web pagesndash PowerPoint presentationsndash Word documentsndash Emailsndash XML filesndash Excel spreadsheetsndash Plain text filesndash List items
Auto Language Detect Foreign language results are less relevant than results in userrsquos language English language is always considered as relevant as userrsquos language
URL Depth and Click Distance Short URLs are like prime real estate Items with shorter URLs are considered more relevant than items placed
in longer URLsndash The level is determined by reviewing the number of slash (ldquordquo) characters in the
URL
Keywords separated by hyphens in the URL are good
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Educate Content Influences Anchor Link Text
Search indexes the anchor text from the following elementsndash HTML anchor elementsndash SharePoint Services link listsndash SharePoint Portal Server 2003 listingsndash Word 2007 Excel 2007 and PowerPoint 2007 hyperlinks
Any file types handled by installed 3rd party iFilter components which emit hyperlinks
Metadata extraction Shadow title detection is provided within the body of the item
ndash Primarily based on text formatting featuresndash Shadow title is added automatically to the documentndash Weighted the same as the original title ndash Only for Microsoft Office file types
Auto Description text Optimized URLs
Enterprise Search checks URL matching at query time If query matches to the host name of a page in the index it will display as
the first result
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Enhanced Search Results
Synonym Mapping Best Bets
Site Actions gtgt Site Settings gtgt Modify All Site Settings gtgt Site Collection Administration (Select Keywords) gtgt Manage Keywords gtgt Add Keywordldquo gtgt
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Hardware Considerations Dedicated crawl-target servers for large
sites Separate SQL Server instance for Search Fast disk for SQL fast CPU for Indexer
more memory Dedicated Web Front End Server for
crawling Separate indexer machine
In most cases your search index is on its own server
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Indexing Configuration Use dedicated web front ends for crawling large
farmssites Upgrade WSS 2003 sites to WSS 2007 sites to index
them faster Define Crawler Impact Rules to avoid site overload
Schedule for off-hours crawling where appropriate Balance results freshness with load on servers
Consider using single content access account per region
Regularly cleanup and Review Crawl rules Property and schema Best Bets keywords
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Customizing Results DisplayTo access the XSL property of the Search Core Results Web Part
1 In your browser navigate to the results page URLCopy Code httpltServerNamegtSearchCenterPagesresultsaspx
2 Click the Site Actions link and then click Edit Page
3 In the Search Core Results Web Part click the edit down arrow to display the Web Part menu and then click Modify Shared Web Part This opens the Search Core Results Web Part tool pane
4 Click Data Form Web Part to display the XSL Editornode
5 Click the Source Editor button
6 This opens the Text Entry window for the Web Parts XSL property You can modify the XSLT directly in this window however you may find it easier to copy the code to a file You can then edit that file using an application such as Visual Studio 2005
7 After you have finished editing the file you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Here There Be Dragons
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Dragons 1 Note the infrastructure update where Microsoft rolled
the features of Search Server 2008 into MOSS 2007 that includes federated search ability and a unified administration dashboard Read more here
httpblogsmsdncomsharepointarchive20080715announcing-availability-of-infrastructure-updatesaspx
Also please note that it is not an easy installation and that users must read the entire documentation for it before upgrading their portal More people destroy their portal than upgrade it due to not
reading the documentation and installing the prerequisite patches
Must ensure a schedule for the incremental crawl to catch additions to the document set
Must turn on PDF indexer and stemming
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Dragons 2 Use the Web part to accommodates wildcard
search Found here
httpwwwsharepointblogscommirrorarchive20080609new-web-part-for-wildcard-search-in-enterprise-searchaspx
Use of special characters in the thesaurus can lead to highly irrelevant results and impact ldquodid you meanrdquo capabilities
The Expert search capacity is predicated on the My Sites profile Employee participation critical to optimal functionality
Benefits of click-distance are missed if Authority sites are not configured
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Dragons 3 The value of statistical ranking can vary from the partial
indexes to the master merge index Without authoritative sites configured in the relevance
settings the benefits of click-distance are missed Results delayed from servers without Internet connections Backward compatibility
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Index files scopes search alerts filters word breakers thesaurus files not upgraded
Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Resources Microsoft Enterprise Search website httpwwwmicrosoftcomenterprisesearch Webcast Installing and Configuring Search in MOSS 2007
httpmseventsmicrosoftcomcuiWebCastEventDetailsaspxculture=enUSampEventID=1032325467ampCountryCode=US
Tune Search server 2008 httpwwwnonlinearcablogindexphp20080227how-to-tune-microsoft-search-server-express-2008-etc
Configuring MOSS 2007 Search (Cale Hoopes) httpcalehoopesblogspotcom200711configuring-moss-as-search-appliancehtml
MOSS Developer Center on MSDN httpmsdnmicrosoftcomofficeservermossdefaultaspx
MOSS 2007 Software Developers Kit httpmsdn2microsoftcomen-uslibraryms550992aspx
MOSS 2007 on TechNet httptechnet2microsoftcomOfficeen-uslibrary3e3b8737-c6a3-4e2c-a35f-f0095d952b781033mspx
Search Optimization for a MOSS 2007 Content Management site httpmsdnmicrosoftcomen-uslibrarycc721591aspx
Faceted Search from the Microsoft SharePoint Team Blog httpblogsmsdncomsharepointarchive20080317open
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
More Resources Enterprise search bloghttpblogsmsdncomenterprisesearch MOSS BDC Search
httpblogsmsdncomgunterstaesarchive20070116putting-it-all-together-moss-2007-business-data-catalog-search-excel-services-sql-analysis-servicesaspx
Find it All with SharePoint Enterprise Search httptechnetmicrosoftcomen-usmagazinecc162512aspx
Google Enterprise Connector for MOSS 2007 httpcodegooglecomapissearchappliancedocumentation50connector_adminsharepoint_connectorhtml
Ontologica Search for MOSS 2007 httpwwwontolicacomuploadpdffactsheetsontolicasearch_featurelistpdf
Michael Gannotti on SharePoint httpsharepointmicrosoftcomblogsmikegListsCategoriesCategoryaspxName=Search20Technologies
Sitemapxml Generator httpwwwthesugorgblogslsuslinkyListsPostsPostaspxID=14
SEO Advice from a Propellerhead for hellip httpwwwmossseocom
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Even More Resources MOSS 2007 Administrator Documentation
httpjamorganwordpresscom20060907administrator-documentation-for-moss-2007-wss-v3
SharePoint Search linkshttpwwwvirtual-generationscom20070129sharepoint-moss-2007-search-links
All About SharePoint SS Ahmed httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-1aspx
Working with MOSS search - creating scopes httpwwwsharepointblogscomssaarchive20070119working-with-sharepoint-search-part-2aspx
MOSS 2007 search customization httpblogstechnetcompavelkaarchive20070524moss-2007-search-customizationaspx
MOSS 2007 Search amp Indexing httpwwwsharepointblogscomzimmerarchive20061116moss-2007-search-and-indexingaspx
Create a custom Search Page httpwwwsharepointblogscomzimmerarchive20070825moss-2007-connect-a-custom-search-page-to-a-custom-search-scopeaspx
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Appendix
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Auto Classification Products Concept Searching
Auto-classifies documents for MOSS 2007 Uses established probabilistic methods to distinguish
multiword concepts and weight by importance (relevance) Extracts concepts and weights their relevance to searcher
queryndash Presents for search refinement
httpwwwconceptsearchingcomconceptHMSO (insider trading)
Integration with MOSS Extracts metadata and compound terms Incorporates with existing taxonomy if one exists Appends metadata and stores as MOSS property Part of the main MOSS index Uses standard MOSS administration features
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Adjusting Relevance Property weights
Assign different weights to properties so that certain properties such as lsquoTitlersquo have a bigger influence on ranking
Change default property weights through the Schema Object Model
using MicrosoftOfficeServerSearchAdministration())
Ranking ranking = new Ranking(SearchContextGetContext( appGuid ))dump parametersforeach (RankingParameter param in rankingRankingParameters) RankingParameter lookedup = rankingRankingParameters[paramName] ConsoleWriteLine(lookedupName + + lookedupValue)Lookup by indexfor (int i = 0 i lt rankingRankingParametersCount i++) RankingParameter param = rankingRankingParameters[i] ConsoleWriteLine(paramName + + paramValue) Setting the weight of property lsquoproprsquo to lsquoweightrsquorankingRankingParameters[property]Value = floatParse(weight) rankingStartRankingUpdate(RankingUpdateTypeClickDistanceUpdate)ConsoleWrite(Updating )while (rankingStatus = RankingUpdateStatusIdle) ConsoleWrite()
SystemThreadingThreadSleep(1000) ConsoleWriteLine(Done)
Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
PushPull Data to Users Alerts
Same alerting infrastructure for WSS and MOSS ndash Timer service is used to handle all alerts notifications
Frequency can be set to DailyWeeklyndash Notifications for search alerts will be sent according to the creation time
lsquoAlert Mersquo link can be addedremoved using a web part propertyon the Search Action Links web part and on the Search CoreResults web part
A rollup of all userrsquos alerts for a site collectionndash httpltsitecollectiongt_layoutsMySubsaspx
Alert ldquogotchasrdquondash No ldquoMy Alerts Summaryrdquo web partndash No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for
WSS alert types
RSS Feeds Ability to subscribe for an RSS feed on the search results lsquoRSSrsquo link can be addedremoved using a web part property on the
Search Action Links web part and on the Search Core Results web part
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Protocol Handlers
Connects to a content source and enumerates the documents
Ships with support for Web Content NTFS File Shares Exchange
Public Folders Lotus Notes Databases SharePoint Content SharePoint profiles and Business Data Catalog
Partners providing support for Documentum Hummingbird OpenText
FileNet Interwoven and others httpmsdnmicrosoftcomlibraryen-usspssd
khtml_introduction_to_a_protocol_handleraspframe=true
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
The Query object modelKeywordQuery request = new KeywordQuery(site)requestQueryText = strQueryrequestResultTypes |= ResultTypeRelevantResults if we want to get more than one result table requestResultTypes |= ResultTypeSpecialTermResults Setting optional parameters on the Query objectrequestRowLimit = 10requestStartRow = 0requestKeywordInclusion = KeywordInclusionAllKeywords Executing the queryResultTableCollection results = requestExecute()
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Metadata Property Mapping
Crawled properties Emitted by iFilters and Protocol Handlers Identified by a property set (GUID) and property
ID (name or numeric ID) Managed properties
Mapping target for crawled properties (many-to-many)
Identified by internal ID Friendly name used in queries
ndash Can be used in the query with property Value
Recommended