Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:
Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.
This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Total Enterprise Data Growth
IDC estimates the volume of digital data is growing at 40% to 50% per year. By 2020, IDC predicts the number will have reached 40 Zettabytes (ZB). According to Gartner 42% state they will have invested in Big Data by 2014
3
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Big Data Impact
4
IT Budget
IT Legal & Compliance Support
IT Satisfaction
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Reporting and Analytics
IT tools are not designed to provide file-level insight and analytics against big data.
Access Logs Security information No detailed file information
Block level/Capacity Planning No insight into file level data
File System Metadata Light metadata
File Metadata Full metadata and duplicate information (MD5) Full content (optionally)
Business units need file level knowledge to determine disposition
5
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Big Unstructured Data
Traditional unstructured data includes user generated content such as transactional data and logs. Including user generated data such as email and documents adds value and knowledge to big data. Data is an asset! Not a liability. Mining value from unstructured user data will add knowledge and valuable intellectual property to Big Data analytics.
6
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Big Data vs. Better Data
Profile big data using reports and analysis Classify data into groups
Business Value No Business Value Redundant Risk
Work with business units to determine disposition Manage big data more intelligently Purge what is no longer required Turn Big Data into Better Data
7
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Data Profiling Technology
High-speed indexing platform Integrates into storage environment Maintains index of unstructured content
Access to all sources Primary and secondary storage Support legacy backup tape data
Enterprise ready Cost effective Easy to incrementally deploy
8
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Index Engines
What is Metadata?
User Files Dates (Modified, Accessed, Created) Size File Name/Path Author/Owner Signature
Email Dates (Sent, Received Deleted) to/from/cc/bcc Mailboxes/folders Signature
Backup B/U time Server Volume/B/U Set
Content (optional) Full text PII (SSN/CCN)
File Properties Email Properties Backup Properties Location
Servers Tapes Desktops
9
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Dis
posi
tion
Combine
queries/filters and reports on metadata
index
Make decisions about data
Extract file and email metadata from
unstructured user data
Dates Modified Accessed Created
User Author
To/CC/BCC
Location Path
File Name Size
Backup B/U Set
B/U Date B/U Host
Unstructured Data Profiling
Scan data sources using
high speed indexing technology
Generate rich metadata
index that is incrementally updated
User Shares
Dept Servers
Email Backup Tapes
10
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Performance/Formats
NFS/CIFS crawling or NDMP Bandwidth can be throttled Incremental updates Schedule can be defined
Formats Supported Unstructured user data Email (Exchange, Notes)
Profiling Options Light – metadata only Full – full content/text
11
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Active Directory/LDAP Integration
Integrates with Active Directory(AD) and Lightweight Directory Access Protocol (LDAP) to take advantage of user and group information Active users vs. inactive group Departmental groups Reports summarized based on groups
Supports charge backs by department
Security audits Query user ACLs – determine read/write/browse
12
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Data Classification Policies
13
• Owned by ex-employees and no access in years Abandoned • Not accessed in more than three years Aged • Duplicate content Redundant • Multimedia files such as iTunes and movies Personal • Sensitive content such as PII and legal hold Risk • Data with long term business value (Value Based Archives) Archive • Manage data in place to determine future disposition Active
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
administratorjohn.doesally.smithbugs.bunnyclark.kent
Data Profiling in Action
100TB Unstructured
User Data
14
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Cleaning Up Admin Owned Files
• Reassign owner based on metadata properties, location, content, file name, etc.
• Reassign owner based on location path, extract path info into file name.
• Tag content based on metadata properties
Owner = Admin
15
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
daffy.duckjohn.doesally.smithbugs.bunnyclark.kent
Cleaned Owners Report
16
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
ManufacturingR&DSalesMarketingLegalHR
Departmental Report/Chargebacks
Active Directory
LDAP
17
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Data Policy and Disposition
Abandoned – Defensibly Delete
Aged – Migrate to Lower Cost Storage/Cloud
Redundant – Purge and Consolidate
Personal – Notify and Enforce Policy
Risk – Secure in Legal Hold Archive
Intellectual Property – Preserve in Archive
Active – Monitor
18
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Reclaim and Manage Capacity
Defensibly Deleted
Migrated to New Platform/Cloud
Consolidation of Redundant Data
Personal Files Removed
Sensitive Content Archived
Active Content Managed
19
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Workflow: Aged Data
Filter on locations
Report on last modified age
Analyze capacity
20
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Workflow: Aged Data
Filter on 5+ years
Report on owners
21
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Disposition Options
Copy, delete and archive are included in GUI csv text file output
Use detailed file listing to determine disposition of data:
Purge Move Copy Encrypt etc.
22
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Case Studies
Client Overview Solution
Manufacturing Legal issues related to unmanaged PSTs across corporate networks.
Audit and clean up 14,000 PSTs across 500TB.
Business Services 550TB of legal hold data on Data Domain – upgrade required.
Extract 1TB of actual legal hold data and reclaim capacity.
Financial Services Clean user share according to corporate policies.
Execute chargeback plan on 40TB server with map of usage.
Top 5 Financial Services Prepare for mortgage lawsuits and archive 175,000 users email.
Profile 220,000 legacy tapes (17PB) and extract relevant data to an archive.
Oil and Gas Migrate research data to cloud archive for long term preservation.
Find files by type across 2PB of storage and department and migrate to cloud.
23
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Data Profiling Advantages
Better insight into all corporate data assets
Streamline storage capacity by cleaning up unnecessary data
Legacy tape remediation
Improve support for legal and compliance
Find and manage data more effectively
24
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Sample Workflows
Aged Data Clean Up Data Tiering (Cloud On-Ramping) Archiving On-Ramping Managing Large Files (Multimedia) PII/Security Audit Email (PST) Management Storage Capacity Allocations/Chargebacks Data Center Migrations and Consolidations Technology Refresh/Audits
25
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Best Practices
Start Small User shares – common area of unmanaged data growth Servers requiring capacity upgrades Monitor data growth by user
Engage legal/compliance/records management team Communicate how data can be profiled Help refine/define data policies based on risk Work to implement and audit policies
Implement chargeback's Profile data by department and deliver a view into content Provide disposition options that allow them to control expenses Get support from legal/compliance to enforce clean up
26
Developing Enterprise Policy Over big Data © 2013 Storage Networking Industry Association. All Rights Reserved.
Attribution & Feedback
27
Please send any questions or comments regarding this SNIA Tutorial to [email protected]
The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial.
Authorship History Name/Date of Original Author here: Jim McGann Updates: Name/Date Name/Date
Additional Contributors Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here Name of contributor here