Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
4/21/17
1
Bringing Order to Enterprise Data
Nuix at a Glance
§ 350+employees§ Originaldevelopmentteams;llinplace–neverlosta
developer§ In-houseteamsofexpertsinsecurity,inves;ga;ons,
eDiscovery,informa;ongovernance,archivesandmigra;ons§ Stable,long-termexecu;veteam§ OfficesinSydney,SanFrancisco,LA,WashingtonDC,
Philadelphia,NewYork,Cork,London
§ Commercialized2006§ Profitablesince2008§ 100%fundedbycashflow
§ Manyoftheworldsregulatoryagencies,compe;;onauthori;es,governments,na;onallawenforcementagencies,leadingfinancialins;tu;ons,largestglobaladvisoriesetc.
§ Allthelargestelectronicinves;ga;oncasescanonlybedoneinNuix–Toolofchoice.
§ 2000+customersinover60countrieswith95%customerreten;on
§ Con;nuedgrowthofdevelopmentresourcesinAustraliaandUSA
§ Largeinvestmentindevelopingnext-gensolu;ons–major2016releases:
§ Nuix7.0Engine:Q2§ NuixAdap;veSecurity:Q2§ NuixWebReview&Analy;csV6.2.9:Q2§ NuixDirectorV6.2.9:Q2§ NuixSensi;veDataFinderV2.2:Q2§ NuixLegalHold:Q2§ NuixManagementConsole:Q1§ NuixInsight:Q3
§ 65%growthin2014;57%in2015§ 62%growtheveryyearoverpast5years§ GrowthpahernfromAustraliatoUK,
Europe,Asia,MiddleEastandUS
4/21/17
2
Unmatched speed and scale • Nuix’s patented parallel processing engine attacks data at the binary level enabling it to index virtually unlimited volumes of unstructured data with forensic certainty and unparalleled speed
The Nuix Engine
Ignores proprietary APIs • Nuix binary level indexing ignores the
APIs created by proprietary software and indexes the binary data on disk. This is key to its ability to scale and access text from virtually all file types
Thousands of file and container types • Nuix supports thousands of file types – but most importantly it supports the containers where where most data is stored: file shares, SharePoint, email archives, cloud repositories, email servers, desktops, mobile phones etc.
Patented processing technology • The patented secret sauce of the Nuix Engine combines load balancing, fault tolerance and intelligent processing decisions. This gives Nuix the speed, scale and minimal hardware footprint to enable fact based decisions on virtually any quantity of unstructured data.
Content Identification, Analysis, Classification, and Action
ECRM
Office365SharePointS3bucketsDropBox,BoxGoogleDriveSharedDrivesSharedPCsSharePointSQLDataDCTMEmailSystemsEmailArchivesPSTfilesIMAP,POP
• OS(name,dates,owner)• Proper;es(Author,
organiza;on,metadata,policy)• Text(Summary,Similarity,OCR)• Paherns(Numbers)• Othercontents(color,
coordinates,faces,audiotranscrip;on)
• AnunprotectedSSNisrisky• AnPersonnelAc;onForm
containsanEmployeeIDandisheldinHR
• Acontracthasanumber,adate,issimilartoothercontracts,andisauthorizedbyBob
• Etc…
LOAD
TOOL
4/21/17
3
Informa;onGovernance
RecordsManagement
Reten;onCategories
RetainedRecords
Compliance
Regula;on
RiskMi;ga;on
Informa;onTechnologies
Policies
Informa;onManagement
Opera;ons
Informa;onNeedsandIssues
ImprovedProduc;vity
ECRM
EnterpriseMetadata
MinimizedMigra;onEffort
Legal
Preserva;on
ReducedLi;ga;onCosts
Stakeholders
Requirements
Benefits
Working Together - IGRM
Source:IGRMwww.edrm.net
BigData
eDiscovery
Inves1ga1on/Audit(internal)
Informa1onProtec1on
Normalcourseinforma1onmanagement
RecordsManagement
ECRMPlanningandMigra1on
4/21/17
4
• ECRM, RIM – Strategy, Design, Taxonomy, Implementation, Migration to build RIM program and ECM solution
• Risk Management, Compliance – One time, or periodic sweep of content to been regulatory, legal, or business compliance criteria
• M&A, Divestiture – Grooming, cleanup, due diligence, content merging or separation to meet business driven organizational change
• IT Optimization, Migration – Storage and Infrastructure driven prioritization, index, cleanup and migration to optimize IT operations
• eDiscovery, Investigation
Solutions
IG for Records Management
Requirements and Controls • Retention Policy • Retention Schedule
Benefits and Value v Mapping content to custodians, drives, and
sites – Share/repository owners – Biggest users
v Metrics of compliance v Records management definitions:
– granularity and accuracy of schedule, – additional useful classifications – optimal retention periods, lifecycles, deltas, – level of versioning, – priorities.
v Content control or purge
Implemen;ngrecordsmanagementpolicyonunstructuredcontent
4/21/17
5
IG for InfoTech
Requirements and Controls • Acceptable use of corporate
assets • Storage quotas and
requirements • IT platform standards
Benefits and Value v Map content to owners to technologies v Tech architecture needs including:
– Email and PST, – database creation, access and protection, – web-content, – application development
v Policy and practice definitions: – ILM – tiered storage management
requirements, disaster recovery/vital records
v Content splitting, merging, or cloud
Managingtechnologyinfrastructureandaccessibility
IG for Infosec and Compliance
Requirements and Controls • Risk content specifications • Information security
classifications • IP definitions
Benefits and Value v Identify risk and non-risk areas
– Risk content mapping to location v Process and ownership details v Policy and practice definitions:
– Risk mitigation priorities and level of compliance
v Reduce risk exposure
Minimizingtheimpactofriskontheorganiza;on
4/21/17
6
IG for ECRM
Requirements and Controls • Enterprise taxonomy and
metadata standards
Benefits and Value v Plan for effort and storage requirements v Supplements traditional design effort
– integration opportunities – collaborative opportunities,
v ECRM design elements include: – content types, – metadata usage and standards, – security, – templates,
v Migration integrated to site deployment process
Planningandimplemen;ngECRMtobehergoverninforma;on
IG for Legal
Requirements and Controls • Litigation hold definitions • Client management needs
Benefits and Value v Find the critical content v Find the smoking gun v Organization of content based
on case/issue v Compliant removal of non-
responsive or expired content
Improveabilitytorespondtopreserva;onandli;ga;onneeds
4/21/17
7
IG for the Business
Requirements and Controls • Information should facilitate
business decisions – Projects, cases, products,
workgroups and other structures • Best practices
Benefits and Value v Groom the business for better
performance v Employee productivity in searching
and finding v Determine why people can’t find
things v Merge information subjects/topics
– M&A – Reorganizations – New products, services
Adjus;nginforma;onstructuresinresponsetocri;calbusinesschanges
Getting to the facts
Togovernunstructureddatatoreducecostandrisk,andaddvalueusingarepeatablefour-stageprocess:• Iden1fyandinventoryrepositoriesandcontentto
makesenseofmurkypoolsofdarkunstructureddata• Understandtheage,ownership,format,andcontent
ofeachitem;andwhatthattellsyouaboutprac;cesandissues.
• Classifycontentbasedonthefactsincontexttodeterminewhetherinforma;onisanassetoraliability
• Actandexecuteongovernancedecisions:op;mizestoragesystems;classify,migrateandprotectdata;ormakeitmorereadilyavailabletothebusiness
4/21/17
9
• Content value - Good, bad, and ugly • Retention categories • Information security • Projects, cases, clients
Classify
17
• Migrate documents to ECRM • Register documents in ECRM • Register documents in federated RM (MIP) • Purge in place • Move • Hide
Act
18
4/21/17
10
Range of actions to take
<= Stuff I want to migrate and control in ECRM <= Stuff I want to schedule the retention of <= Stuff I want to keep <= Stuff I need to produce <= Stuff I can’t inventory, but can’t get rid of <= Stuff I need to ask someone’s permission <= Stuff somebody else owns that is abandoned <= Stuff I need to create a policy before I delete <= Stuff I can delete now
Methodologies
ECRM Strategy Planning TaxonomyImplementaiton Migra;on
RiskManagement Iden;fy Assess Rank Monitor Manage
EDRM Govern Iden;fy Preserve,Collect Process,Review,Analyze Produc;on
Merger Groom Evaluate DueDiligence Integra;on
Compliance Requirements Objec;ves RulesandConstraints Monitor Report
InventoryIden1fyUnderstandClassifyAct
4/21/17
11
DuplicatesandTrivial20%
Databases7%
Applica;ons12%
WebContent5% Mul;media
14%
Templates2%
AhributeBasedClassifica;on
25%
SubjectBasedClassifica;on
10%
Iden;fiableExpiredRecords
5%
Other40%
TypicalEnterpriseUnstructuredContent
Classifiedbyseman;canalysis
Contentwith“text”
Addressedbynuix
April 21, 2017 COPYRIGHT NUIX 2014
22
Target Mitigation aka “Defensible Deletion”
Client requests a cleanup of unstructured content 1. Conduct analysis to prioritize cleanup activities 2. Preserve 3. Remove – eTrash
– Unilateral – Enterprise wide consistent policy based decisions 4. Review – if potentially low value
– Uniform – RM Categories, workgroup – Unified – Review by the custodian or owner
5. Relocate – for IT efficiency, RM expediency 6. Retain – high value, utility
4/21/17
12
23
Content Categories findings Where can we take action?
24
Review
Relocate
Remove
RetainRecent
VoluminousImportantMigratable
DatabaseApplica;onWeb-contentuni;zedrecord
Iden;fiedgarbage-"Tobedeleted"Photos,iden;fiedlowvaluePolicydeletesDuplicates
TemporaryBackupsZerocontent
Throwaway
Giveaway Putaway Organize
4/21/17
13
• Provide systematized deletion process rather than individual or arbitrary process
• Implement consistent approach across the company • Approve and document standard (reusable) queries • Approve and document custom queries and results • Customize company-specific forms to document approvals from Legal, RM,
IT and Business Unit • Isolate and preserve documents subject to litigation holds • Build query architecture on approved Records Management definitions • Human validation of query results to document “reasonableness” • Provide audit trails of work performed and documents deleted
Defensibility
• Duplicates around 20-30% of storage – Much of that is also eTrash – Average of 3 documents per duplicate set – 15% of the duplicates cannot be deleted – 85% of the duplicates were created by humans, 15% by systems – Storage level deduplication does not benefit productivity, backups,
eDiscovery, retention management
Duplicate factoids
4/21/17
14
• Capture important data • Create compound
documents • Collaborate • Modify documents • Manage versions • Share knowledge • Retain/dispose
Current practices
27
• Backup files or folders • Develop software code • Set up websites • Keep personal data • Manage databases • Share media • Store renditions • Archive PSTs
WorkswellinECRMDoesn’tworkwellWhatdopeopleuseshareddrivesfor?
• We build classifications to support – Cleanup Categories
• eTrash • Drafts
– Records Categories • Information Technology – Governance – Staffing
– Content Types – ties to retention category and SharePoint metadata – Metadata tags
• Status • Naming convention
– Security • Confidential
– Litigation preservation and discovery
Sample Classifications
4/21/17
15
Bills files Altos3.2 EPA Payments JGAGDG Final
Folder structures
LDM BJL MMW HGS HSH TKJ
Accoun;ng Finance HR Marke;ng Legal
Duplicates25%15%5%Accuracy65%75%95%
April 21, 2017 COPYRIGHT NUIX 2014
30
10 Steps toward RM classification
1. Parse and enhance the descriptions to include real titles 2. Create taxonomy structure to the content type level 3. Identify unstructured structure by leveraging folders, acronyms,
templates, nicknames, numbering systems, creators and other attributes
4. Identify near duplicate content or clusters where appropriate 5. Exclude false positives 6. Prioritize categories and content sources 7. Validate with Subject Matter Experts 8. Authorize 9. Tag and migrate OR purge 10. Continually improve
<=Engageusers
4/21/17
16
• Perception: Allen and Julia have the most experience writing contracts
• Fact: Frank appears to have contributed the most content
Expertise
ContractsandProposalsBob
Julia
Frank
Alice
Debra
Alan
• Perception: We can base our tiered storage on the last access dates of files
• Fact: Last Accessed dates are often reset by virus or backup programs (or they were)
Records Management
0100200300400500600700
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
ModifiedDate
AccessedDate
4/21/17
17
• Perception: The desired business retention of XXX1020 is 10 years
• Fact: 90% of reports are never accessed after 4 days; 100% after 10 days
Content aging
020406080100120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
AgeModified
Age
MM/DD/YYYY 4579FY 2995YYYY 2050NstQtr 484MMM 408MM.DD.YYYY 327DD_Mmm_YYYY 324MmmYYYY 91MM.YYYY 51QtrN 35DD_MmmYYYY 32QtrEnding 28YYYYMmm 25MM_YYYY 25MM_DD_YYYY 18MM-YY 13Qtrly 13YYYYMMDD 12MM_DD_YY 6YYYY_Mmm 6DDDDDD 4MM_YY 4MM.DD 3YYYYMmmm' 3Qtr 3
Metadata
• Highfrequencyof(non-standard)datelabelsonfolders
• En;;esbasedonnamesandacronymsincontent
• ContentTypes–Showscommonlyusedtypesiden;fiedincontent
• Sec;ons-basedonsub-folder• Places–Showsrela;veusageof
loca;ontermsindataset• Topics–Showsmostappropriate
keywordsforclustersofrelatedcontent.Helpsaidinmigra;ng
DateFolderNames
4/21/17
18
• 128,564 documents fit into 45,395 clusters of near duplicates • 70,559 files with version tags
– Draft, Redline, Final – Version, Revision, v1, v_1, v001
• 12k (26%) clusters have version identifiers • 115,000 potential “old versions” to not migrate
Versions
RelatedTable:VersionAnalysis
• 10 categories = 80% of expired records
Prioritize classification efforts
36
RowLabels Event Retention Expired 2,012 2,011 2,010 2,009 2,008 2,007 2,006 2,005 2,004UNV2020-Retaindraftsandworkingfilesnolongerthancompletionofthefinalversionand/orcompletionoftherelatedtransactiondocument2 - 17,702 8,343 9,483 7,362 4,771 2,180 714 534 438 347UHN3040-7Years 7 9,792 12,789 13,507 9,393 8,949 4,499 1,511 1,666 951 564PRC4000-NoLongerthan3Years 3 7,986 1,462 3,118 1,587 2,406 1,862 951 871 559 560AFP9900-NoLongerthan5years;recommendedretentionofupto3years - 5 6,101 114 72 99 67 34 8 4 1 -LCR2010-ContractExecution+10Years 5 10 3,836 9,970 7,317 2,626 3,786 5,317 882 1,185 1,439 2,029SMK1000-LastAuthorizedUse+6Years 3 6 3,325 7,922 12,515 4,418 2,281 1,588 664 1,226 563 923AFP8000-NoLongerthan5Years - 5 2,288 315 491 616 1,248 574 461 169 270 865LCR7000-Expiration+6Years 5 6 2,140 255 142 34 254 295 65 39 161 15LCR3200-Resolution+10Years 2 10 1,906 3,947 4,317 9,496 970 1,346 330 128 223 393OPS2000-10Years 10 1,598 6,904 8,746 8,938 7,429 9,795 923 918 920 928ITS1300-Untilsystemisnolongerrelevanttoexistingdata 3 1,248 773 587 197 389 421 217 119 37 3OPS1300-10Years 10 1,111 592 727 1,061 863 408 290 112 20 131HPI5300-DateofUse+6years 1 6 992 1,186 1,444 1,165 2,722 3,045 2,626 60 13 17AFP1000-10Years - 10 988 2,657 3,439 1,650 3,417 21,644 931 247 48 82ITS3210-90daysto3Years 3 957 1,411 914 899 548 301 75 16 3 2AFP7000-10Years - 10 714 453 1,208 1,197 137 405 51 71 19 70OPS1200-6Years 6 701 3,018 2,092 1,004 1,156 1,430 2,420 115 406 35HMC1040-UntilUpdated 3 - 695 983 404 22 285 130 18 99 32 14HPI5200-ConclusionofServices+6Years 3 6 650 9,539 12,956 4,458 3,872 2,543 828 697 1,703 209UNV1030-NoLongerthan5Years 5 577 545 920 333 704 471 261 209 18 13LCR5100-UntilOutdated+6Years 2 6 526 3,418 2,065 1,383 1,792 276 87 85 593 210AFP2010-NoLongerthan5Years - 5 521 179 80 41 22 12 3 1 1 1UNV2000-NoLongerthan5Years 5 503 7,173 7,024 2,109 1,327 398 288 184 4 4HMC5010-6Years 6 473 - - - 123 354 375 314 132 26OPS3110-PaymentDate+10Years 1 10 456 6,921 7,930 4,565 4,076 74,347 1,484 673 474 174LCR5070-UntilFiled+6Years 2 6 426 5,666 5,561 4,058 3,331 2,678 915 738 2,314 57OPS1350-GroupTermination+10Years 2 10 381 4,500 6,014 2,118 4,536 3,623 944 507 771 684CSP1140-Longerof:Majority+10YearsorDateLastSeen+10Years 4 10 329 14,187 10,759 2,447 2,148 936 416 285 553 145
80%compliance
4/21/17
19
Benefit Summary
Benefit Basis eDiscoveryculling;me&expense $200perGBperli;ga;on,300-1,500ac;ve
li;ga;oncases Manualmigra;ontoRepository(Y/N) 30secondsperfile;mes2000filesperperson=
2days Tagging;metorepository
60secondsperfile;mes600filesperperson=~2days
Produc;vityofsearching;me 30minutes/staff/week
Greenimpactofnetworkstorage $6-30/GB/mo
• Important or voluminous information has been foldered, named, nicknamed, acronymed, numbered, templated, or isolated
• Leverage policy, communication, and SMEs to maximize benefit and minimize user impact
• Consistent approach and less accuracy is better than individual interpretation – “Don’t let perfection be the enemy of good”
• A semi-automated process, not just a tool • Throw away, put away, organize • Don’t boil the ocean, nail Jello to the wall, herd cats, or miss the slow fat
rabbit
Guiding Principles
38
4/21/17
20
• Critical Factors for Information Governance – Scalable (multiple TB in parallel, manageable threads) – Deep (sources, details from each type, and flexible settings) – Extensible (scripts, lists, interoperability) – Deployable (platforms, access, review) – Repeatable (Search and Tag, Taxonomy)
Critical Factors for Information Governance
FIND OUT MORE:
blog.nuix.com
facebook.com/nuixsoftware
linkedin.com/company/nuix
twitter.com/nuix
youtube.com/nuixsoftware