Upload
jonah-skinner
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Deep Dive Into Shredded StorageBill Baer, MicrosoftChris Givens, Architecting Connected Systems
SPC416
Chris’s BackgroundBS Computer Science, Math, Business5 years at IBM
Microsoft Certified Trainer (MCT) since 2007CISSP, CCNP, JAVA, MCSD, SharePoint 4x
CEO ACS, leading SharePoint courseware provider to Microsoft Certified Training centersTop selling titles in Development, BI and Search
SharePoint Sr. Architect eBayGeneral Atomics
Bill Baer is a Senior Technical Product Manager and Microsoft Certified Master for SharePoint in the SharePoint product group in Redmond, Washington; having previously worked at Hewlett-Packard Bill Baer has a proven background in infrastructure engineering and enterprise deployments of SharePoint Products and Technologies. While at Hewlett-Packard Bill Baer was awarded the MVP award for his contributions in the Technology Solutions Group, now known as HP Enterprise Business, which encompasses server and storage hardware, technology consulting, and software sales.
Twitter @williambaer
LinkedIn /billbaer
TechNet /b/wbaer
Bill Baer (ˈbɛər)Senior Product Marketing Manager
SharePoint Microsoft Corporation
www.wbaer.net
AgendaStructured and Unstructured dataSharePoint Storage Architecture historyShredded Storage OverviewShredded Storage ComponentsTesting your knowledge (be ready to vote!)Recommendations
DataStructured, Unstructured
“on average 20% of data is structured, 80% is unstructured or semi-structured”
Unstructured DataNo specific format or sequenceNot tied to rulesUnpredictableExamples:Text, Video, Audio, Images, Word, PowerPoint
Structured DataOrganized in semantic chunks (entities)Tied to relationships and has attributesAssociated with a defined schemaAll entities have the defined formatHave a predefined length
ExampleEDI
Data in SharePointBLOB = Binary Large ObjectBLOB is the data stream associated with a fileSharePoint file metadata and BLOBs are stored in SQL databasesBLOBs do not participate in query operationsSample BLOB operations: Get, Put, Read range, etc.
SharePoint is built around the fileDocument libraries, Record Centers
BLOBs generally represent 80% of total content
SQL BLOBSBinary large objects stored in data tables (varbinary(MAX) – 2010 and 2013, image in 2007)Image was limited to 2GBVarbinary virtually unlimited, but SharePoint still has limit of 2GB in code
SQL BLOBS are traditional method of storing and retrieving binary large objects with SharePoint
BLOB Storage ChallengesStorageSQL storage is usually more expensiveSAN versus CAS stores
PerformanceImpacts load on SQL Server box
Policy requirementsExpunge, BLOB immutability
Storage Evolution
SharePoint Storage HistorySharePoint Portal Server 2001 (10.145.3941)Web Storage System
SharePoint Portal Server 2003 (11.0.5704.0)Relational Database Storage
SharePoint Server 2007External BLOB Storage (EBS)
SharePoint Server 2010Remote Blob Storage (RBS)
SharePoint Server 2013Shredded Storage (Awesome sauce)
External Blob Storage (EBS)Introduced in SharePoint 2007Runs parallel to the content databaseRequires COM interface (ISPExternalBinaryProvider) to coordinate data storesUses simple semantics to recognize file Save and Open commands and invokes redirection calls to the BLOB store when it recognizes BLOB data streams.
Remote Blob Storage (RBS)RBS is the technology that allowed the file blobs to be saved outside of the databaseFeature of SQL Server
Designed to delineate structured (metadata) and unstructured (BLOB data) dataAllows for more optimized data tiering with solutions such as StorSimple (Microsoft owned company)Commonly used files reside in memory\SSDRules push the data down to less expensive and less performant storage based on usage
Shredded StorageQuestion?What is Shredded Storage?
Simple AnswerA technology that break apart files into smaller chunks
Advanced AnswerA platform for other higher level applications to take advantage of
Logic Behind The TechnologyDeparture from storage of files as a monolithic stream or independently addressable BLOBsDistributes monolithic unstructured data into chunksChunks are uniquely identified for recompiling data in monolithic stream to service user requests
Goals
Shredded Storage GoalsReduce StorageOptimize BandwidthOptimize File I/OSecurity
Reduce StorageShredded storage will reduce storage requirements by only saving the parts of a file that have changedThis prevents the entire file from being saved over and over again when versioning is enabled
In heavily collaborative and versioned environments, this storage saving is significant
Optimize BandwidthHigher level applications (Office Client and Office Web Apps) take advantage of Shredded Storage and Cobalt’s bandwidth optimizationOffice clients will only save back the parts that have been changed, not the entire fileOffice Web Apps will request updates from multi-user sessions where parts of the file are locked and being editing
Optimize File I/OReduce performance penalties related to partial file updatesSmoother IO PatternsEnsure write costs are proportional to size of change
Two communication pathsClient to WFEWFE to Database
WFE to DatabaseRather than send the entire file back to save, only the shreds that have changed are sent back for persistenceThis in turn generates smaller transaction log files on your database server (think disaster recovery)
SecurityPrevious vulnerabilities to data leak are prevented using shredded storage!It is much more difficult to get a file out of the database using simple PowerShell scripts
And now….”Fort Knox”
SSL
Secure Shredded Store
Contoso Sales.pptx
AES256 Encrypted Storage
64KB
64KB
64KB
Enc. Partition
Enc. Partition
Enc. Partition
Customer ImpactExisting customersNo downtimeNew and/or modified content pushed at runtimeAsynchronous jobs update existing content
Securing contentData is chunked and encrypted uniquely on every chunk.Keys to chunks stored locally on ‘dedicated hardware’ Access keys per customer/account refreshed regularlyCerts and access data are stored separately of the data and require domain accounts specific to the customer
End <> End SecurityMoving DataStrong SSL encryption for all Server/Client and Service/Service communications.
@ RESTNetwork and domain isolation limits access to your environmentBitLocker encryption guards against physical theftSecure Shredded Store guards against logical theft, encrypts individual blobs to limit the scope of accessSecrets (the keys to the keys) are also encrypted-at-rest, held in a secure store, and updated frequently.MFA
Operational AccessTime-bound approval to perform specific actions and access to customer data.Scoped access to only the minimal set of actions necessary for the task.Today, 10 engineers have standing access – we are driving this to Zero.
Content Database Changes
Content Database ChangesChanges were needed to support Shredded StorageAllDocStreams -> DocStreamsDocsToStream (NEW)AllDocsAllDocVersions
Non-relational binary large objects stored in dbo.DocStreamsEnables logical transactional consistency between relational data and the associated non-relational file contentsSmaller exaggerated storage utilization until T-log is purged
BLOB Sequence Numbers (BSN)BSNs are used to keep track the sequence of each blob. BSN field (bigint) added to AllDocVersions, DocsToStreams and DocStreams tables.NextBSN stores the last BSN for each file.
Streams will be accessed from AllDocs/AllDocVersions DocsToStreams DocStreams.
Demo
Content Database Changes
Client ApplicationsProtocols, Impacts, Considerations
Overview (Windows Native SOAP Stack)
Windows Native SOAP StackDesign pattern based on wire contracts and loosely coupled systemsStandards-based and interoperable
Consolidation of existing stacksMIG, WDPG (Windows)ATL Server (VC++)SOAP Toolkit (Office)SQL Server
WCF ‘light’Peer to WCF, not a replacementWCF does not layer on the Windows Native SOAP StackSmall, fast, minimal dependenciesWindows layer 20
Win32 Developer APIsPublic “Flat” C APINo MFC, ATL, COM, C++
Why is Sapphire important?Client operationsOffice BackstageStore, Share, Sync
FileWriteChunkSize versus FileReadChunkSizeDownload size versus partition size
CellStorage.svcWCF endpoint that manages download and upload of files to SharePointUsed by Office clients and OneDrive for BusinessAPI Layer that implements locking and coauth of documents
Demo
CellStorage.svc
Office Web AppsOnenote.ashxOffice documents are transformed into JSON arraysEach section of the document is distinguished and tracked separatelyThis allows for the multi-user editing of OWA
Shredded Storage will create new partitions and shreds based on the users editing document sections
Demo
OneNote.ashx
Configuration
Configuration ParametersFileWriteChunkSizeThe target size of the shreds of a file binary
FileReadChunkSizeThe size of the data returned from each Stored Procedure call to a file binary
FileWriteChunkSizeThis value should not exceed 4MBSignificant hit on I/O will occur
The value should not be set lower than 64KOptimal setting will be based on workload1-4MB (Depending on performance testing, RBS, Dedup)OneDrive is set to 2MB
FileReadChunkSizeRecommendations:>12.5% of average file size = normal operation6%<x>12.5% = 10% hit on read operations3%<x>6% = 20% hit on read operationsX<3% = 50% hit on read operations
Average size of out of box content database files is <64KBeware Too high of a setting OneDrive for Business will stop workingICsiError: csierrWebService_QuotaExceeded (0x662)
What is your average file size in your content databases?This average will drive your setting of FileReadChunkSize
Shredded Storage Testing Framework
Shredded Storage Testing FrameworkTool developed by Chris Givens with support from SharePoint ISVshttp://shreddedstorage.codeplex.com/
Monitoring features include:Office Client to WFEWFE to SQL trafficSQL activities
Achieved withFiddler integrationSQL Profiler integrationSupporting result tracking database
Demo
Shredded Storage Testing Framework
Knowledge Check
Question #1Can You Disable Shredded Storage?
Answer #1No, it cannot be disabledSetting the Write Chunk Size to 2GB will not disable it and will only cause performance issuesAny other “unknown” means will destroy your farm
Question #2TRUE or FALSEIf you set the File Write size to 128K, will the size of all the shreds be 128K except for the last?
Answer #2FALSEThe algorithms do not break up the shreds based solely on the File Write Chunk sizeIn some cases the header and footer of the shred will be of varying size
Question #3TRUE or FALSEIf you set the File Write size to X, will the size of all the shreds be less than X?
Answer #3FALSESimilarly, the algorithms do not break up the shreds based solely on the File Write Chunk sizeThe header and footer of the shred will be of varying size. This metadata does not count towards the File Write Chunk size
Question #4Is a lower or higher FileReadChunkSize better for download speeds?
Answer #4HigherEach chunk must be executed via a Stored Procedure call, the more calls you make, the more CPU and network activities will be generated
But not TOO high
Question #5TRUE OR FALSEImages in Word and PowerPoint files are broken into their own shreds
Answer #5FALSEImages are not distinguished from other entities inside the office file XML, therefore they are not shredded separately
Question #6TRUE OR FALSEShredded Storage will apply to all instances of the same file binary (ie, same file binary uploaded to multiple libraries)
Answer #6FALSEShredded storage works at an SPListItem level. Each time you upload a file to a document library, a new SPListItem is generated, therefore, no dedup across libraries
Question #7TRUE OR FALSEChanging the TITLE property of a versioned document will cause new shreds to get created
Answer #7TRUEEven though the file does not change, a new set of shreds are created for the new file version!This is a side effect of the SharePoint platform and not a bug in shredded storageTitle property is special and SharePoint set the property in the binary of the Office file upon modification (you didn’t change the file, but SharePoint did)
Question #8What is the max FileWriteChunkSize and everything works?
Answer #88.25MBIf you pass this value, OneDrive for Business will error out with the following:ICsiError: csierrWebService_QuotaExceeded (0x662)
RecommendationsDon’t modify FileWriteChunkSize without justification (keep less than 4MB)FileReadChuckSize should be proportional to your average file size (dependent on your workload)Test your RBS and Storage vendors hardware and software for acceptable performance
SummaryShredded Storage is AWESOME!Shredded Storage adds security to SharePointFort Knox
Read and Write chunk sizes will be different for workloadsYou cannot disable Shredded StorageShredded Storage with the combination of RBS and File DeDup should always be tested for performance
Questions?What do you want to know about Shredded Storage?
Events Evening Event – 7pm
SurveyDon’t forget to fill out your survey!Session SPC416
ContactBill BaerTwitter: @williambaerEmail: [email protected]
Chris GivensTwitter: @givenscjEmail: [email protected]
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.