20
4/21/17 1 Bringing Order to Enterprise Data Nuix at a Glance § 350+ employees § Original development team s;ll in place – never lost a developer § In-house teams of experts in security, inves;ga;ons, eDiscovery, informa;on governance, archives and migra;ons § Stable, long-term execu;ve team § Offices in Sydney, San Francisco, LA, Washington DC, Philadelphia, New York, Cork, London § Commercialized 2006 § Profitable since 2008 § 100% funded by cash flow § Many of the worlds regulatory agencies, compe;;on authori;es, governments, na;onal law enforcement agencies, leading financial ins;tu;ons, largest global advisories etc. § All the largest electronic inves;ga;on cases can only be done in Nuix – Tool of choice. § 2000+ customers in over 60countries with 95% customer reten;on § Con;nued growth of development resources in Australia and USA § Large investment in developing next-gen solu;ons – major 2016 releases: § Nuix 7.0 Engine: Q2 § Nuix Adap;ve Security: Q2 § Nuix Web Review & Analy;cs V6.2.9: Q2 § Nuix Director V6.2.9: Q2 § Nuix Sensi;ve Data Finder V2.2: Q2 § Nuix Legal Hold: Q2 § Nuix Management Console: Q1 § Nuix Insight: Q3 § 65% growth in 2014; 57% in 2015 § 62% growth every year over past 5 years § Growth pahern from Australia to UK, Europe, Asia, Middle East and US

Bringing Order to Enterprise Data - Arma Silicon Valleyarmasv.org/wp-content/uploads/2017/03/Cleaning-Shared... · 2017-05-12 · • ECRM, RIM – Strategy, Design, Taxonomy, Implementation,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

4/21/17

1

Bringing Order to Enterprise Data

Nuix at a Glance

§  350+employees§  Originaldevelopmentteams;llinplace–neverlosta

developer§  In-houseteamsofexpertsinsecurity,inves;ga;ons,

eDiscovery,informa;ongovernance,archivesandmigra;ons§  Stable,long-termexecu;veteam§  OfficesinSydney,SanFrancisco,LA,WashingtonDC,

Philadelphia,NewYork,Cork,London

§  Commercialized2006§  Profitablesince2008§  100%fundedbycashflow

§  Manyoftheworldsregulatoryagencies,compe;;onauthori;es,governments,na;onallawenforcementagencies,leadingfinancialins;tu;ons,largestglobaladvisoriesetc.

§  Allthelargestelectronicinves;ga;oncasescanonlybedoneinNuix–Toolofchoice.

§  2000+customersinover60countrieswith95%customerreten;on

§  Con;nuedgrowthofdevelopmentresourcesinAustraliaandUSA

§  Largeinvestmentindevelopingnext-gensolu;ons–major2016releases:

§  Nuix7.0Engine:Q2§  NuixAdap;veSecurity:Q2§  NuixWebReview&Analy;csV6.2.9:Q2§  NuixDirectorV6.2.9:Q2§  NuixSensi;veDataFinderV2.2:Q2§  NuixLegalHold:Q2§  NuixManagementConsole:Q1§  NuixInsight:Q3

§  65%growthin2014;57%in2015§  62%growtheveryyearoverpast5years§  GrowthpahernfromAustraliatoUK,

Europe,Asia,MiddleEastandUS

4/21/17

2

Unmatched speed and scale • Nuix’s patented parallel processing engine attacks data at the binary level enabling it to index virtually unlimited volumes of unstructured data with forensic certainty and unparalleled speed

The Nuix Engine

Ignores proprietary APIs • Nuix binary level indexing ignores the

APIs created by proprietary software and indexes the binary data on disk. This is key to its ability to scale and access text from virtually all file types

Thousands of file and container types • Nuix supports thousands of file types – but most importantly it supports the containers where where most data is stored: file shares, SharePoint, email archives, cloud repositories, email servers, desktops, mobile phones etc.

Patented processing technology • The patented secret sauce of the Nuix Engine combines load balancing, fault tolerance and intelligent processing decisions. This gives Nuix the speed, scale and minimal hardware footprint to enable fact based decisions on virtually any quantity of unstructured data.

Content Identification, Analysis, Classification, and Action

ECRM

Office365SharePointS3bucketsDropBox,BoxGoogleDriveSharedDrivesSharedPCsSharePointSQLDataDCTMEmailSystemsEmailArchivesPSTfilesIMAP,POP

•  OS(name,dates,owner)•  Proper;es(Author,

organiza;on,metadata,policy)•  Text(Summary,Similarity,OCR)•  Paherns(Numbers)•  Othercontents(color,

coordinates,faces,audiotranscrip;on)

•  AnunprotectedSSNisrisky•  AnPersonnelAc;onForm

containsanEmployeeIDandisheldinHR

•  Acontracthasanumber,adate,issimilartoothercontracts,andisauthorizedbyBob

•  Etc…

LOAD

TOOL

4/21/17

3

Informa;onGovernance

RecordsManagement

Reten;onCategories

RetainedRecords

Compliance

Regula;on

RiskMi;ga;on

Informa;onTechnologies

Policies

Informa;onManagement

Opera;ons

Informa;onNeedsandIssues

ImprovedProduc;vity

ECRM

EnterpriseMetadata

MinimizedMigra;onEffort

Legal

Preserva;on

ReducedLi;ga;onCosts

Stakeholders

Requirements

Benefits

Working Together - IGRM

Source:IGRMwww.edrm.net

BigData

eDiscovery

Inves1ga1on/Audit(internal)

Informa1onProtec1on

Normalcourseinforma1onmanagement

RecordsManagement

ECRMPlanningandMigra1on

4/21/17

4

•  ECRM, RIM – Strategy, Design, Taxonomy, Implementation, Migration to build RIM program and ECM solution

•  Risk Management, Compliance – One time, or periodic sweep of content to been regulatory, legal, or business compliance criteria

•  M&A, Divestiture – Grooming, cleanup, due diligence, content merging or separation to meet business driven organizational change

•  IT Optimization, Migration – Storage and Infrastructure driven prioritization, index, cleanup and migration to optimize IT operations

•  eDiscovery, Investigation

Solutions

IG for Records Management

Requirements and Controls •  Retention Policy •  Retention Schedule

Benefits and Value v  Mapping content to custodians, drives, and

sites –  Share/repository owners –  Biggest users

v  Metrics of compliance v  Records management definitions:

–  granularity and accuracy of schedule, –  additional useful classifications –  optimal retention periods, lifecycles, deltas, –  level of versioning, –  priorities.

v  Content control or purge

Implemen;ngrecordsmanagementpolicyonunstructuredcontent

4/21/17

5

IG for InfoTech

Requirements and Controls •  Acceptable use of corporate

assets •  Storage quotas and

requirements •  IT platform standards

Benefits and Value v  Map content to owners to technologies v  Tech architecture needs including:

–  Email and PST, –  database creation, access and protection, –  web-content, –  application development

v  Policy and practice definitions: –  ILM – tiered storage management

requirements, disaster recovery/vital records

v  Content splitting, merging, or cloud

Managingtechnologyinfrastructureandaccessibility

IG for Infosec and Compliance

Requirements and Controls •  Risk content specifications •  Information security

classifications •  IP definitions

Benefits and Value v  Identify risk and non-risk areas

–  Risk content mapping to location v Process and ownership details v Policy and practice definitions:

–  Risk mitigation priorities and level of compliance

v Reduce risk exposure

Minimizingtheimpactofriskontheorganiza;on

4/21/17

6

IG for ECRM

Requirements and Controls •  Enterprise taxonomy and

metadata standards

Benefits and Value v  Plan for effort and storage requirements v  Supplements traditional design effort

–  integration opportunities –  collaborative opportunities,

v  ECRM design elements include: –  content types, –  metadata usage and standards, –  security, –  templates,

v  Migration integrated to site deployment process

Planningandimplemen;ngECRMtobehergoverninforma;on

IG for Legal

Requirements and Controls •  Litigation hold definitions •  Client management needs

Benefits and Value v Find the critical content v Find the smoking gun v Organization of content based

on case/issue v Compliant removal of non-

responsive or expired content

Improveabilitytorespondtopreserva;onandli;ga;onneeds

4/21/17

7

IG for the Business

Requirements and Controls •  Information should facilitate

business decisions –  Projects, cases, products,

workgroups and other structures •  Best practices

Benefits and Value v  Groom the business for better

performance v  Employee productivity in searching

and finding v  Determine why people can’t find

things v  Merge information subjects/topics

–  M&A –  Reorganizations –  New products, services

Adjus;nginforma;onstructuresinresponsetocri;calbusinesschanges

Getting to the facts

Togovernunstructureddatatoreducecostandrisk,andaddvalueusingarepeatablefour-stageprocess:•  Iden1fyandinventoryrepositoriesandcontentto

makesenseofmurkypoolsofdarkunstructureddata•  Understandtheage,ownership,format,andcontent

ofeachitem;andwhatthattellsyouaboutprac;cesandissues.

•  Classifycontentbasedonthefactsincontexttodeterminewhetherinforma;onisanassetoraliability

•  Actandexecuteongovernancedecisions:op;mizestoragesystems;classify,migrateandprotectdata;ormakeitmorereadilyavailabletothebusiness

4/21/17

8

Identify

15

NAS02006 NAS01772 NAS11266

FinanceMarke;ngHRIT

Understand

16

4/21/17

9

•  Content value - Good, bad, and ugly •  Retention categories •  Information security •  Projects, cases, clients

Classify

17

•  Migrate documents to ECRM •  Register documents in ECRM •  Register documents in federated RM (MIP) •  Purge in place •  Move •  Hide

Act

18

4/21/17

10

Range of actions to take

<= Stuff I want to migrate and control in ECRM <= Stuff I want to schedule the retention of <= Stuff I want to keep <= Stuff I need to produce <= Stuff I can’t inventory, but can’t get rid of <= Stuff I need to ask someone’s permission <= Stuff somebody else owns that is abandoned <= Stuff I need to create a policy before I delete <= Stuff I can delete now

Methodologies

ECRM Strategy Planning TaxonomyImplementaiton Migra;on

RiskManagement Iden;fy Assess Rank Monitor Manage

EDRM Govern Iden;fy Preserve,Collect Process,Review,Analyze Produc;on

Merger Groom Evaluate DueDiligence Integra;on

Compliance Requirements Objec;ves RulesandConstraints Monitor Report

InventoryIden1fyUnderstandClassifyAct

4/21/17

11

DuplicatesandTrivial20%

Databases7%

Applica;ons12%

WebContent5% Mul;media

14%

Templates2%

AhributeBasedClassifica;on

25%

SubjectBasedClassifica;on

10%

Iden;fiableExpiredRecords

5%

Other40%

TypicalEnterpriseUnstructuredContent

Classifiedbyseman;canalysis

Contentwith“text”

Addressedbynuix

April 21, 2017 COPYRIGHT NUIX 2014

22

Target Mitigation aka “Defensible Deletion”

Client requests a cleanup of unstructured content 1.  Conduct analysis to prioritize cleanup activities 2.  Preserve 3.  Remove – eTrash

–  Unilateral – Enterprise wide consistent policy based decisions 4.  Review – if potentially low value

–  Uniform – RM Categories, workgroup –  Unified – Review by the custodian or owner

5.  Relocate – for IT efficiency, RM expediency 6.  Retain – high value, utility

4/21/17

12

23

Content Categories findings Where can we take action?

24

Review

Relocate

Remove

RetainRecent

VoluminousImportantMigratable

DatabaseApplica;onWeb-contentuni;zedrecord

Iden;fiedgarbage-"Tobedeleted"Photos,iden;fiedlowvaluePolicydeletesDuplicates

TemporaryBackupsZerocontent

Throwaway

Giveaway Putaway Organize

4/21/17

13

•  Provide systematized deletion process rather than individual or arbitrary process

•  Implement consistent approach across the company •  Approve and document standard (reusable) queries •  Approve and document custom queries and results •  Customize company-specific forms to document approvals from Legal, RM,

IT and Business Unit •  Isolate and preserve documents subject to litigation holds •  Build query architecture on approved Records Management definitions •  Human validation of query results to document “reasonableness” •  Provide audit trails of work performed and documents deleted

Defensibility

•  Duplicates around 20-30% of storage –  Much of that is also eTrash –  Average of 3 documents per duplicate set –  15% of the duplicates cannot be deleted –  85% of the duplicates were created by humans, 15% by systems –  Storage level deduplication does not benefit productivity, backups,

eDiscovery, retention management

Duplicate factoids

4/21/17

14

•  Capture important data •  Create compound

documents •  Collaborate •  Modify documents •  Manage versions •  Share knowledge •  Retain/dispose

Current practices

27

•  Backup files or folders •  Develop software code •  Set up websites •  Keep personal data •  Manage databases •  Share media •  Store renditions •  Archive PSTs

WorkswellinECRMDoesn’tworkwellWhatdopeopleuseshareddrivesfor?

•  We build classifications to support –  Cleanup Categories

•  eTrash •  Drafts

–  Records Categories •  Information Technology – Governance – Staffing

–  Content Types – ties to retention category and SharePoint metadata –  Metadata tags

•  Status •  Naming convention

–  Security •  Confidential

–  Litigation preservation and discovery

Sample Classifications

4/21/17

15

  Bills files   Altos3.2   EPA   Payments   JGAGDG   Final

Folder structures

  LDM  BJL  MMW  HGS  HSH  TKJ

  Accoun;ng  Finance  HR  Marke;ng  Legal

Duplicates25%15%5%Accuracy65%75%95%

April 21, 2017 COPYRIGHT NUIX 2014

30

10 Steps toward RM classification

1.  Parse and enhance the descriptions to include real titles 2.  Create taxonomy structure to the content type level 3.  Identify unstructured structure by leveraging folders, acronyms,

templates, nicknames, numbering systems, creators and other attributes

4.  Identify near duplicate content or clusters where appropriate 5.  Exclude false positives 6.  Prioritize categories and content sources 7.  Validate with Subject Matter Experts 8.  Authorize 9.  Tag and migrate OR purge 10.  Continually improve

<=Engageusers

4/21/17

16

•  Perception: Allen and Julia have the most experience writing contracts

•  Fact: Frank appears to have contributed the most content

Expertise

ContractsandProposalsBob

Julia

Frank

Alice

Debra

Alan

•  Perception: We can base our tiered storage on the last access dates of files

•  Fact: Last Accessed dates are often reset by virus or backup programs (or they were)

Records Management

0100200300400500600700

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

ModifiedDate

AccessedDate

4/21/17

17

•  Perception: The desired business retention of XXX1020 is 10 years

•  Fact: 90% of reports are never accessed after 4 days; 100% after 10 days

Content aging

020406080100120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

AgeModified

Age

MM/DD/YYYY 4579FY 2995YYYY 2050NstQtr 484MMM 408MM.DD.YYYY 327DD_Mmm_YYYY 324MmmYYYY 91MM.YYYY 51QtrN 35DD_MmmYYYY 32QtrEnding 28YYYYMmm 25MM_YYYY 25MM_DD_YYYY 18MM-YY 13Qtrly 13YYYYMMDD 12MM_DD_YY 6YYYY_Mmm 6DDDDDD 4MM_YY 4MM.DD 3YYYYMmmm' 3Qtr 3

Metadata

•  Highfrequencyof(non-standard)datelabelsonfolders

•  En;;esbasedonnamesandacronymsincontent

•  ContentTypes–Showscommonlyusedtypesiden;fiedincontent

•  Sec;ons-basedonsub-folder•  Places–Showsrela;veusageof

loca;ontermsindataset•  Topics–Showsmostappropriate

keywordsforclustersofrelatedcontent.Helpsaidinmigra;ng

DateFolderNames

4/21/17

18

•  128,564 documents fit into 45,395 clusters of near duplicates •  70,559 files with version tags

–  Draft, Redline, Final –  Version, Revision, v1, v_1, v001

•  12k (26%) clusters have version identifiers •  115,000 potential “old versions” to not migrate

Versions

RelatedTable:VersionAnalysis

•  10 categories = 80% of expired records

Prioritize classification efforts

36

RowLabels Event Retention Expired 2,012 2,011 2,010 2,009 2,008 2,007 2,006 2,005 2,004UNV2020-Retaindraftsandworkingfilesnolongerthancompletionofthefinalversionand/orcompletionoftherelatedtransactiondocument2 - 17,702 8,343 9,483 7,362 4,771 2,180 714 534 438 347UHN3040-7Years 7 9,792 12,789 13,507 9,393 8,949 4,499 1,511 1,666 951 564PRC4000-NoLongerthan3Years 3 7,986 1,462 3,118 1,587 2,406 1,862 951 871 559 560AFP9900-NoLongerthan5years;recommendedretentionofupto3years - 5 6,101 114 72 99 67 34 8 4 1 -LCR2010-ContractExecution+10Years 5 10 3,836 9,970 7,317 2,626 3,786 5,317 882 1,185 1,439 2,029SMK1000-LastAuthorizedUse+6Years 3 6 3,325 7,922 12,515 4,418 2,281 1,588 664 1,226 563 923AFP8000-NoLongerthan5Years - 5 2,288 315 491 616 1,248 574 461 169 270 865LCR7000-Expiration+6Years 5 6 2,140 255 142 34 254 295 65 39 161 15LCR3200-Resolution+10Years 2 10 1,906 3,947 4,317 9,496 970 1,346 330 128 223 393OPS2000-10Years 10 1,598 6,904 8,746 8,938 7,429 9,795 923 918 920 928ITS1300-Untilsystemisnolongerrelevanttoexistingdata 3 1,248 773 587 197 389 421 217 119 37 3OPS1300-10Years 10 1,111 592 727 1,061 863 408 290 112 20 131HPI5300-DateofUse+6years 1 6 992 1,186 1,444 1,165 2,722 3,045 2,626 60 13 17AFP1000-10Years - 10 988 2,657 3,439 1,650 3,417 21,644 931 247 48 82ITS3210-90daysto3Years 3 957 1,411 914 899 548 301 75 16 3 2AFP7000-10Years - 10 714 453 1,208 1,197 137 405 51 71 19 70OPS1200-6Years 6 701 3,018 2,092 1,004 1,156 1,430 2,420 115 406 35HMC1040-UntilUpdated 3 - 695 983 404 22 285 130 18 99 32 14HPI5200-ConclusionofServices+6Years 3 6 650 9,539 12,956 4,458 3,872 2,543 828 697 1,703 209UNV1030-NoLongerthan5Years 5 577 545 920 333 704 471 261 209 18 13LCR5100-UntilOutdated+6Years 2 6 526 3,418 2,065 1,383 1,792 276 87 85 593 210AFP2010-NoLongerthan5Years - 5 521 179 80 41 22 12 3 1 1 1UNV2000-NoLongerthan5Years 5 503 7,173 7,024 2,109 1,327 398 288 184 4 4HMC5010-6Years 6 473 - - - 123 354 375 314 132 26OPS3110-PaymentDate+10Years 1 10 456 6,921 7,930 4,565 4,076 74,347 1,484 673 474 174LCR5070-UntilFiled+6Years 2 6 426 5,666 5,561 4,058 3,331 2,678 915 738 2,314 57OPS1350-GroupTermination+10Years 2 10 381 4,500 6,014 2,118 4,536 3,623 944 507 771 684CSP1140-Longerof:Majority+10YearsorDateLastSeen+10Years 4 10 329 14,187 10,759 2,447 2,148 936 416 285 553 145

80%compliance

4/21/17

19

Benefit Summary

Benefit Basis eDiscoveryculling;me&expense $200perGBperli;ga;on,300-1,500ac;ve

li;ga;oncases Manualmigra;ontoRepository(Y/N) 30secondsperfile;mes2000filesperperson=

2days Tagging;metorepository

60secondsperfile;mes600filesperperson=~2days

Produc;vityofsearching;me 30minutes/staff/week

Greenimpactofnetworkstorage $6-30/GB/mo

•  Important or voluminous information has been foldered, named, nicknamed, acronymed, numbered, templated, or isolated

•  Leverage policy, communication, and SMEs to maximize benefit and minimize user impact

•  Consistent approach and less accuracy is better than individual interpretation –  “Don’t let perfection be the enemy of good”

•  A semi-automated process, not just a tool •  Throw away, put away, organize •  Don’t boil the ocean, nail Jello to the wall, herd cats, or miss the slow fat

rabbit

Guiding Principles

38

4/21/17

20

•  Critical Factors for Information Governance –  Scalable (multiple TB in parallel, manageable threads) –  Deep (sources, details from each type, and flexible settings) –  Extensible (scripts, lists, interoperability) –  Deployable (platforms, access, review) –  Repeatable (Search and Tag, Taxonomy)

Critical Factors for Information Governance

FIND OUT MORE:

blog.nuix.com

facebook.com/nuixsoftware

linkedin.com/company/nuix

twitter.com/nuix

youtube.com/nuixsoftware