Upload
others
View
27
Download
0
Embed Size (px)
Citation preview
Understanding and ApplyingCloud Hybrid Search
@jefffried
Jeff Fried CTO, BA Insight
we love hybrid search - it's amazing how fast usage is growing
Jeff Teper @jeffteper
Focused on Search and
SharePoint since 2004
Longtime
Search Nerd
• CTO, BA Insight
• Senior PM, Microsoft
• VP, FAST
• SVP, LingoMotors
About Jeff Fried
Passionate About
• Search
• SharePoint
• Search-driven
applications
• Information Strategy
Blog:
BAinsight.com/blog
Technet Column
“A View from the
Crawlspace”
About BA Insight
– Connectivity
– Applications -
– Classification -
– Analytics
KCTCS (background)
Search is not stationary
Demo
9
–
–
–
–
–
Why Hybrid SharePoint?
The
Evolution
of
SharePoint:
HYBRID Management ExtensibilityExperiences
| Server
Experiences Management Extensibility
| Server | Server
HYBRID
Team
Sites
Portals
Enterprise
Content Mngt
BI
Search Provides a Unified View
SharePoint 2013/2016 Search Architecture
Web Service (CEWS)
“Classic” Hybrid Search is Federated
not a single result set OOB
Cloud Hybrid Search
Benefits of Cloud Hybrid Search
2) Makes finding content easy, wherever the content lives
1) Simpler, easier, and less costly to run search
SharePoint Server
(On-premises or Hosted)Office 365
SharePoint Online Content
Onedrive for Business ContentSharePoint Content
Cloud Hybrid Search
Case Study: Split Users with SharePoint
SupportSales & Marketing
Knowledge Articles
Fileshares
OneDrive
Support forum
SPO
Search Farm
SP 2013 content SP 2010 content
On-premises
Office 365
SPO content
SP 2013/2016
Cloud SSA
Setting up Cloud Hybrid Search
•
•
1.
2.
3.
4.
Use search verticals with Cloud Hybrid Search
SharePoint Online
Custom result source using Local SharePoint results plus a filter which excludes results from on-premises
TIP: Can be used during validation of hybrid search in the production tenant.
Result source query:
{searchTerms} NOT(IsExternalContent:1)
Result Sources are your friend
The Support Search vertical only searches sites that are relevant to the Support team.
It uses Local SharePoint results plus a filter on which sites to include in the search results
Result source query:
{searchTerms} (
Path:»http://sp2010» OR
Path:»file://fileshare» OR
Path:»http://demohybrid.../../supportforum»)
SharePoint Online Support Search
Demo
25
Single node topology
VM
Crawler
CPC
(unused)
APC
(unused)
Indexer
(unused)QPC
Multi-node topology
1.
2.
3.
VM
Crawler
QPC
VM
Crawler
CPC
(unused)
APC
(unused)
Indexer
(unused)QPC
Reduce your footprint
Servers
Volume of Content(indexable items) Pattern
On-prem Search Farm
Cloud Hybrid Search
0-10 million items small 4 App + 2 DB 1 or 2
10-40 million items medium 12 App + 2 DB 2
40-100 million items large 28 App + 4 DB 2
400 million items XL example (SP2016) 86 App + 4DB 2 or 3
Item Limits and Pricing
Licensing: 1M items of external content in index for every 1TB storage in O365
1TB included by default
+ 0.5 GB per licensed O365 user
No limit on number of items from O365 in the index
Default throttling at 20M external items; current threshold at 25M
2000 users x 0.5 GB = 1TB
+ 1TB default = 2 TB total
-> 2M external items indexed
+ Can also buy the “Office 365 Extra File Storage” Add-on
$0.20/GB/Month = $200/TB/Month = $200/M items/Month
50,000 users x 0.5 GB = 25TB
+ 1TB default = 26 TB total
-> 26M external items indexed
SharePoint 2016 Hybrid
Cloud Hybrid
Search User Profiles Following
Extranet
Compliance
(DLP/e-
Discovery)
Config
Experience
Built on Search
Advantages•
•
Disadvantages
Cloud SSA Pro/Con versus on-prem
External Content
(on-premises and/or
in the cloud)
SharePoint Server
(On-premises or Hosted)Office 365
SharePoint Online Content
Onedrive for Business Content
Co
nn
ecto
rs
SharePoint Content
Adding External Content
Cloud Hybrid Search
Also drives:
• Office Graph (delve,..)
• Compliance (DLP, …)
Connectors to MANY Enterprise Systems
•
•
•
•
ERP and Portal Systems•••••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
External Content in O365 UX
Unified view across all content - on-premises and on-line- inside and outside SharePoint
DLP Sensitive Data Search works with hybrid
Search for sensitive data across on-premises and SharePoint Online
All Built-in sensitive types
Identification and export
Extends to data in OneDrive
Sensitive Information type detection through KQL searches
Get instant statistics
Preview & export results
Current Caveats:
1) don’t see thumbnails, just file icons
2) Have to query for it to show up
–
–
–
–
Case Study: Cloud SSA, external content
Large global company
in materials science
DirSync SP 2007/2010/2013 Fileshares BCS
Cloud SSA
SPO
Search Index
1
2
34
5
6
7
Logical architecture: crawling
Corporate
network
Office 365
3rd Party Connectors
External Content
(on-premises and/or
in the cloud)
Custom
Processing
CEWS
Bottlenecks:
1) Source systems
2) Content Processing
3) Indexer
….
External Content
(on-premises and/or
in the cloud)
Bottlenecks:
1) Uplink
2) Source systems
….
42
Performance
500K items crawled on an Azure D3
50 DPS 100 DPS
1 hour
SCS under the hood
Crawler
Content
Indexing
API
Blob store
Document state table
Work queues
Backend
API
Index/Graph
On-Premises content source
Search farm
Azure
Broker
Crawler
Content
SPO content source
What is pushed to the SCS Endpoint?
SharePoint 2013/ 2016
FileShares
Her user token gets rehydrated with her online claims as she is authenticated against Office 365.
Cloud SSA
SPO
Search Index
Logical architecture: query
Corporate network
SP 2013
1
2a
Jaden issues a query from Office 365.
Her user token contains her online identity and group memberships.
1
Jaden issues a query from a site on-premises. This sends over her on-premises claims to SPO
2a
2b
2b
Office 365
SUPPORTED
– Custom IFilter
– BCS connectors
– Partner connectors
Customizations with Cloud Hybrid Search
SUPPORTED
– Tenant level schema mapping
– Query rules
– Result sources
Cloud SSA SCS/O365
NOT SUPPORTED
• Content that requires custom security trimming
NOT SUPPORTED
• Site collection level schema mapping
• Custom security trimming
• Custom entity extraction
• Content enrichment web service
Issues with Cloud Hybrid Search (1)Cloud Hybrid Search "annoyances"
Performance Characteristicsslower query latency for on-prem queries against Cloud SSA
SharePoint Online Limitationsno synonyms
no site-level schema
no full trust code access
Hybrid Administration Weaknessesclunky metadata mapping
can't remove on-premises search results from Cloud SSA
trickier to test & debug crawls
can't reset index from Cloud SSA
Be aware of these
& compensate for them
(Fixed in August PU)
(Semi-addressed in June PU)
And it’s getting better:
Should I run index reset?
NO!DeleteAllCloudHybridSearchContent()
https://blogs.technet.microsoft.com/beyondsharepoint/2016/07/07/cloud-hybrid-search-service-application-removing-items-from-the-office-365-search-index/
Issues with Cloud Hybrid Search (2)
50
Content Enrichmentno CEWS
no Entity Extraction
Securityno Custom Security Trimming
Can't crawl across Multiple Domains
Can't Crawl SP in Classic Auth Mode
Data Sovereigntyexport-restricted content
can't be put in O365 index
Limitations of Cloud SSA
External Content
(on-premises and/or
in the cloud)
SharePoint Server
(On-premises or Hosted)
SPO Content
OneDrive Content
Co
nn
ecto
rs SharePoint Content
Connector
Framework
Office 365
AutoClassifier
(app version)
CEWS
Custom
Processing
Case study:Content Enrichment
Content
CloudSSA
Connector Framework
IndexingConnectors
Smart Pipeline
AutoClassifierCustom Stage A
CustomStage C
Custom Stage B
Online
On-Prem
Cloud Hybrid Search under the coversSecurity = identity sync + ACL mapping
Cloud SSACloud SSA
ParseCrawl
SCS
ACL Map Process
Blob store
queue
•
•
Directory Synchronization
SID S-1-5-21-1212121212-1212121212-1212
msOnline-OnPremiseSecurity
Identifier
S-1-5-21-1212121212-1212121212-1212
PUID PUID-XXXX-XXXXXXXXXX
Mapping of Access Control Lists
Allow: S-1-5-21-1212121212-1212121212-1212 Allow: PUID-XXXX-XXXXXXXXXX
• User SIDs are mapped to PUIDs
• Group SIDs are mapped to Object IDs
• «Everyone» and «Authenticated users» are mapped to
«Everyone except external users»
Only AD Users and Groups,
Only from one domain
Case Study: Crawling Cross-Domain
A global single index solution
Cloud SSA
Cloud SSA
Cloud SSA
Cloud SSA
Cloud SSA
BUT export-restricted content
can’t be in the global index
Issues with Cloud Hybrid Search OOB
Content Enrichmentno CEWS
no Entity Extraction
Securityno Custom Security Trimming
Can't crawl across Multiple Domains
Can't Crawl SP in Classic Auth Mode
Data Sovereigntyexport-restricted content
can't be put in O365 index
Limitations of Cloud SSA BA Insight Solution
Connector Framework
AutoClassifier
Connector Framework
can 'map down' to AD groups
can 'map across' cross-domain
can crawl and map security
Federator
Key Considerations for Hybrid: Workloads, Environment, Data, Customizations
Availability of features Online versus
On-Premises on particular workloads
Significant investments in
customization of On-Premises
workloads
Concerns over global network
performance with remote sites
Regulatory
considerations
Manageability concerns