Upload
vantu
View
220
Download
3
Embed Size (px)
Citation preview
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Using Data Classification to Manage File Servers
Adi Oltean – Senior SDE, Microsoft CorporationRan Kalach – Principal Dev Manager, Microsoft Corporation
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Agenda
Customer challengesSolution: File Classification
Manage data based on business valueGrow the ecosystem in classification solutions
File Classification InfrastructureThe classification pipelineAggregation, conflict resolutionIncremental classificationChallenges, Mitigations & Best Practices
Conclusions
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Customer challenges – file serversStorage growth
Storage cost
Compliance Security and Information leakage
Data sharing and search
Replication
Backup
HSM
Security
Archive
Encryption
Expiration
Increasing data management needs / many data management tools
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
ITBusiness
File shares and business requirements
4
Need per project share
Make sure high business impact files do not leak out
Backup files with personal information to encrypted store
Expire low business impact files created three years ago and not touched for a
year
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Some time later …
5
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Classify and apply policy
Step 1:
Classify data
Step 2:
Apply policy based on
classification
Manual
Line Of Business application
Automatic classification•Location•Content•Owner
IT Scripts
Backup
Archive
Reports
HSM
Expiration
Replication
Security
Encryption
Search
Classification methods
Actions based on classification
Leakage prevention
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
ITBusiness
File shares and business requirements
7
Need per project share
Make sure high business impact files do not leak out
Personal Business Information Impact
Backup files with personal information to encrypted store
Expire low business impact files created three years ago and not touched for a
year
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Customer benefits - Summary
Reduce Cost• Expire files to reduce
storage purchasing needs• Move files to less
expensive storage• Optimize backup SLAs• Replicate only business
related files
Manage risk• Find sensitive files on public
servers• Watermark documents• Keep files containing personal
information encrypted in backup
• Apply rights management to high secrecy files
• Comply with retention policies
Apply Policies Based on Classification=
Manage data based on business value!
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Agenda
Customer challengesSolution: File Classification
Manage data based on business valueGrow the ecosystem in classification solutions
File Classification InfrastructureThe classification pipelineAggregation, conflict resolutionIncremental classificationChallenges, Mitigations & Best Practices
Conclusions
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
File Classification Infrastructure
Set classification properties API for external applications
Classify Data
Store classification properties
File Classification Extensibility points
Apply Policy based on
classification
Discover Data
Extract classification properties
Get classification properties API for external applications
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Classification Runtime Process
Hosting ProcessHosting ProcessHosting Process
Classification pipeline – an example
ScannerGets basic file properties
Office Storage [Load]
Folder Classifier
Content Classifier
Office Storage [Save]
Reporting Engine
Property bags can cross processes• Security checks are performed on cross-process
data transfers
Most modules are hosted within a separate process
Each component passes property bags to the next one
Property bag object
discovery load properties classification save properties run policies
This is an example of a pipeline setup with one storage module and two classifiers
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Aggregation and Conflict Resolution
Problem: • A classification rule may provide conflicting value with the value already
stored in the file• Two classification rules may provide conflicting values for the same
property• Example:
Admin creates a “Business Impact” property with possible values (LBI, MBI, HBI)A file previously classified as MBI is copied to a folder x:\fooThe Folder rule for x:\foo classifies all files as LBIThe Content classifier scans the file and classifies it as HBIWhat is the correct value?
Solution: • Provide several types of classification rules:
Default: rule runs only if the property not present in the file. Otherwise: rules can either explicitly aggregate or overwrite previously-stored properties.
• Value aggregation depends on the property type
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Incremental Classification Goal: Minimize re-classification of already classified files
Crucial for scalability (large amount of files)
Automatic classification (scheduled)Cache classification results in ADS (alternate data stream)
ADS contains a hash of certain file properties (last-modify-time, file-path, file-id, etc)ADS contains the last classification timeAllows determining whether the cached classification is up-to-date
Re-classify the file only if:The file changed or was added since previous classification (hash is different), orA rule has changed since previous classification, orThe configuration of a classifier has been updated since previous classification.
Get Property API (on-demand)If cache is present and up to date, return cached propertiesOtherwise (out-of-date classification), application can choose:
Accuracy: classify the file “on the fly” Performance: return stored properties
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
1 - PerformanceContent classification is expensive (I/O , CPU)
Must optimize to scan & classify only when neededMust be able to cache results
Minimize performance impact on host of data being classified
Classify on another machineWhen classifying locally, throttle machine resource usage and back out when the machines becomes non-idleBe smart with how you schedule classification, support pause/resume
Challenges, Mitigations & Best Practices
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Challenges, Mitigations & Best Practices
2 - AccuracyAutomatic Classification can almost never be 100% accurate
Tune your rules for false-positive / false-negative according to the scenario
Example: secure files – false positive, expire files – false negative
Policy execution: revert in case of classification errorExample: backup files one last time just before you expire them
Examine classification results periodically Modify your rules or classifiers till they’re optimized for your data-set
Enable manual classification
Clear and consistent policy for aggregating and resolving conflicts
Support flexible rules that allow tuning by administrator or applicationOne answer doesn’t fit all!
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
3 - Real-time Classification and PoliciesSome policies require real-time or near real-time execution
Example: removing confidential file from unsecured share
Solution: event-based classificationFile-system activity can be a triggerNeed a hook to file-system operations, (many implementation options exist)Consider Classifying only when the file content is “stable”Avoid overloading the server performance with too aggressive classification
Challenges, Mitigations & Best Practices
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Examples of FCI-enabled solutions
Solution ExampleClassification solutions An LOB app that maintains special
classification rules for PII data it generates.
Custom “classifiers” that extract metadata from files
A medical imaging classifier extracts embedded metadata from scanned images
Custom “storage modules” that load/store custom metadata in files
Load/store metadata in your custom file formats (example: videos)
Add “classification awareness” to existing data management solutions.
A backup app can have special backup policies for HBI data
Build “intelligent” policy-based data management solutions
Define a policy to automatically apply encrypt HBI data
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
Opportunities for you
Why participate in the File Classification Infrastructure ecosystem?Use FCI for existing software
Enhance existing data-producing apps to also attach classification to generated files (ex: LOB applications)Enhance existing data management apps to consume classification
Use FCI for new software solutionsDevelop solutions on top of FCIDevelop components for the FCI ecosystem
Classifiers Storage modules
How I can develop against it?File Classification Infrastructure can be consumed through a rich, scriptable COM API FCI can be extended using C++/C# code, or Powershell scripts
When can I start? Now: FCI is part of the latest Server releases (starting with Windows Server 2008 R2)
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.
More information about FCI
General informationHome page: http://www.microsoft.com/windowsserver2008/en/us/fci.aspx
Team blog: http://blogs.technet.com/filecab
API documentation on MSDN: http://msdn.microsoft.com/en-us/library/bb972746(VS.85).aspx
Sample codeWindows SDK http://msdn.microsoft.com/en-us/windows/bb980924.aspx
Sample FCI clients (C++, C#)Sample classifiers (C++, C#)
Code Gallery: http://code.msdn.microsoft.com/fci