62
Data Lake Best Practices

Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Data Lake Best Practices

Page 2: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Agenda

Why DataLakeKeyComponents ofaDataLakeModernDataArchitectureSomeBestPracticesCaseStudySummaryTakeaways

Page 3: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

WhatisaDataLake?

Page 4: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

What,whyetc.

Whatisadatalake?• Itisanarchitecturethatallowsyoutocollect,store,process,analyzeand

consumealldatathatflowsintoyourorganization.Why datalake?• Leveragealldatathatflowsintoyourorganization

• Customercentricity• Businessagility• BetterpredictionsviaMachineLearning• Competitiveadvantage

Page 5: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

ComparisonofaDataLaketoanEnterpriseDataWarehouse

Complementary to EDW (not replacement) Data lake can be source for EDW

Schema on read (no predefined schemas) Schema on write (predefined schemas)

Structured/semi-structured/Unstructured data Structured data only

Fast ingestion of new data/content Time consuming to introduce new content

Data Science + Prediction/Advanced Analytics + BI use cases BI use cases only (no prediction/advanced analytics)

Data at low level of detail/granularity Data at summary/aggregated level of detail

Loosely defined SLAs Tight SLAs (production schedules)

Flexibility in tools (open source/tools for advancedanalytics) Limited flexibility in tools (SQL only)

EnterpriseDWEMR S3

Page 6: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

KeyConceptsAssociatedwithaDataLake

Page 7: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

STORAGECOMPUTE

COMPUTE COMPUTE

COMPUTECOMPUTE

COMPUTE

COMPUTE

COMPUTE

Page 8: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

ComponentsofaDataLake

DataStorage

• Highdurability• Storesrawdatafrominputsources• Supportforanytypeofdata• Lowcost

Streaming

• Streamingingestoffeeddata• Providestheabilitytoconsumeanydataset

asastream• Facilitateslowlatencyanalytics

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 9: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

ComponentsofaDataLake

Catalogue

• Metadatalake• Usedforsummarystatisticsanddata

Classificationmanagement

Search

• SimplifiedaccessmodelfordatadiscoveryStorage&Streams

Catalogue&Search

Entitlements

API&UI

Page 10: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

ComponentsofaDataLake

Entitlementssystem

• Encryption• Authentication• Authorisation• Chargeback• Quotas• Datamasking• Regionalrestrictions

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 11: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

ComponentsofaDataLake

Storage&Streams

Catalogue&Search

Entitlements

API&UI API&UserInterface

• Exposesthedatalaketocustomers• Programmaticallyquerycatalogue• ExposesearchAPI• Ensuresthatentitlementsarerespected

Page 12: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

The Modern Data Architecture

Page 13: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 14: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 15: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 16: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 17: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 18: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 19: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 20: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 21: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 22: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

WhyIsAmazonS3theFabricofDataLake?• Nativelysupportedbybigdataframeworks(Spark,Hive,Presto,etc.)• Decouplestorageandcompute

• Noneedtoruncomputeclustersforstorage(unlikeHDFS)• CanruntransientHadoopclusters&AmazonEC2SpotInstances• Multiple&heterogeneousanalysis clusterscanusethesamedata

• Virtuallyunlimitednumberofobjectsandvolumeofdata• Veryhighbandwidth– noaggregatethroughputlimit• Designedfor99.99%availability– cantoleratezonefailure• Designedfor99.999999999%durability• Noneedtopayfordatareplication• Nativesupportforversioning• Tiered-storage(Standard,IA,AmazonGlacier)vialife-cyclepolicies

• UseHDFSforveryfrequentlyaccessed(hot)data

• Secure– SSL,client/server-sideencryptionatrest• Lowcost

Page 23: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 24: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

AWS Lambda

AWS Lambda

Metadata Index(DynamoDB)

Search Index(Amazon Elasticsearch

Service or AmazonCloudSearch)

ObjectCreatedObjectDeleted PutItem

Update Stream

Update Index

Extract Search Fields

Indexing and Searching using Metadata

Page 25: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 26: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 27: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 28: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Identity&AccessManagement

• Manageusers,groups,androles• IdentityfederationwithOpenID• TemporarycredentialswithAmazonSecurityToken

Service(AmazonSTS)• Storedpolicytemplates• Powerfulpolicylanguage• AmazonS3bucketpolicies

Page 29: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

DataEncryption

AWSCloudHSMDedicatedTenancySafeNet LunaSAHSMDevice

CommonCriteriaEAL4+,NISTFIPS140-2

AWSKeyManagementServiceAutomatedkeyrotation&auditing

IntegrationwithotherAWSservices

AWSserversideencryptionAWSmanagedkeyinfrastructure

Page 30: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 31: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

DataLakeAPI&UI

ExposestheMetadataAPI,search,andAmazonS3storageservicestocustomers

CanbebasedonTVM/STSTemporaryAccessformanyservices,andabespokeAPIforMetadata

DriveallUIoperationsfromAPI?

Page 32: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

IntroducingAmazonAPIGateway

HostmultipleversionsandstagesofAPIs

CreateanddistributeAPIkeystodevelopers

LeverageAWSSigv4toauthorizeaccesstoAPIs

Throttleandmonitorrequeststoprotectthebackend

LeveragesAWSLambda

Page 33: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 34: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 35: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 36: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 37: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Storage&Streams

Catalogue&Search

Entitlements

API&UI

Page 38: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 39: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 40: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 41: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 42: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

https://aws.amazon.com/big-data/partner-solutions/

DataIntegrationPartnersReducetheefforttomove,cleanse,synchronize,manage,andautomatizedatarelatedprocesses.

Page 43: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 44: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 45: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Putting it all together

Page 46: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Building a Data Lake on AWS

Kinesis Firehose AthenaQuery Service

1

2

3

4

5

6

7

8

GlueBatch

9

10

Page 47: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Processing Data for Analytics on your data lake

Page 48: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service
Page 49: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Processing&Analytics

Real-time Batch

AI&Predictive

BI&DataVisualization

Transactional&RDBMS

AWS LambdaApache Storm

on EMR

Apache Flinkon EMR

Spark Streaming on EMR

ElasticsearchService

Kinesis Analytics, Kinesis Streams

DynamoDB

NoSQL DB Relational DatabaseAurora

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

Amazon LexSpeech recognition

Amazon Rekognition

Amazon PollyText to speech

Machine LearningPredictive analytics

Kinesis Streams & Firehose

Page 50: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Important considerations

Page 51: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

DataTemperature

Hot Warm ColdVolume MB–GB GB–TB PB–EBItemsize B–KB KB–MB KB–TBLatency ms ms,sec min,hrsDurability Low–high High VeryhighRequestrate Veryhigh High LowCost/GB $$-$ $-¢¢ ¢

Hot data Warm data Cold data

Page 52: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

WhichStream/MessageStorageShouldIUse?AmazonDynamoDBStreams

AmazonKinesisStreams

AmazonKinesisFirehose

ApacheKafka

AmazonSQS(Standard)

AmazonSQS(FIFO)

AWS managed Yes Yes Yes No Yes Yes

Guaranteed ordering Yes Yes No Yes No Yes

Delivery(deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once

Dataretentionperiod 24hours 7days N/A Configurable 14 days 14days

Availability 3 AZ 3 AZ 3AZ Configurable 3 AZ 3AZ

Scale /throughput

Nolimit/~ tableIOPS

Nolimit/~shards

No limit/automatic

Nolimit /~nodes

No limits/automatic

300 TPS/queue

Parallelconsumption Yes Yes No Yes No No

StreamMapReduce Yes Yes N/A Yes N/A N/A

Row/objectsize 400KB 1MB Destinationrow/objectsize

Configurable 256KB 256KB

Cost Higher(tablecost)

Low Low Low (+admin) Low-medium Low-medium

Hot Warm

New

Page 53: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

BatchTakesminutestohoursExample:Daily/weekly/monthlyreportsAmazonEMR(MapReduce,Hive,Pig,Spark)

InteractiveTakessecondsExample:Self-servicedashboardsAmazonRedshift,AmazonAthena,AmazonEMR(Presto,Spark)Subsecond:ElastiCache (Redis 3.2TiB,MemCache),SAPHana

MessageTakesmillisecondstosecondsExample:MessageprocessingAmazonSQSapplicationsonAmazonEC2

StreamTakesmillisecondstosecondsExample:Fraudalerts,1minutemetricsAmazonEMR(SparkStreaming),AmazonKinesisAnalytics,KCL,Storm,AWSLambda

ArtificialIntelligenceTakesmillisecondstominutesExample:Frauddetection,forecastdemand,texttospeechAmazonAI(Lex,Polly,ML,Rekognition),AmazonEMR(SparkML),DeepLearningAMI(MXNet,TensorFlow,Theano,Torch,CNTKandCaffe)

AnalyticsTypes&FrameworksPROCESS/ANALYZE

Message

AmazonSQSappsAmazonEC2

Streaming

AmazonKinesisAnalytics

KCLapps

AWSLambda

Stream

AmazonEC2

AmazonEMR

Fast

AmazonRedshift

Presto

EMR

Fast

Slow

AmazonAthena

Batch

Interactive

AmazonAIAI

Page 54: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

WhichAnalysisToolShouldIUse?AmazonRedshift AmazonAthena AmazonEMR

Presto Spark Hive

Use case Optimizedfordatawarehousing

Ad-hocInteractiveQueries

InteractiveQuery

Generalpurpose(iterativeML,RT,..)

Batch

Scale/throughput ~Nodes Automatic/No limits ~Nodes

AWSManagedService

Yes Yes, Serverless Yes

Storage Localstorage Amazon S3 AmazonS3,HDFS

Optimization Columnarstorage,datacompression,andzonemaps

CSV,TSV,JSON,Parquet,ORC, ApacheWeblog

Framework dependent

Metadata AmazonRedshiftmanaged AthenaCatalogManager HiveMeta-store

BI toolssupports Yes(JDBC/ODBC) Yes(JDBC) Yes(JDBC/ODBC&Custom)

Accesscontrols Users, groups,andaccesscontrols

AWSIAM Integration withLDAP

UDF support Yes(Scalar) No Yes

Slow

Page 55: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Case Study

Page 56: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

“For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren’t able to do before, and that is priceless.”

- Steve Randich, CIO

Case Study: Re-architecting Compliance

What FINRA needed• Infrastructure for its market surveillance platform• Support of analysis and storage of approximately 75

billion market events every day

Why they chose AWS• Fulfillment of FINRA’s security requirements• Ability to create a flexible platform using dynamic

clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3

Benefits realized• Increased agility, speed, and cost savings• Estimated savings of $10-20m annually by using AWS

Page 57: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Fraud Detection

FINRAusesAmazonEMRandAmazonS3toprocessupto75billiontradingeventsperdayandsecurelystoreover5petabytesofdata,attainingsavingsof$10-20mmperyear.

Page 58: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Summary

Page 59: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

• AWS enables you to build sophisticated data lakes and related analytics applications

• Retrospective, Real-time, Predictive

• You can build incrementally, adding use cases and increasing scale as you go

• AWS provides a broad range of security and auditing features to enable you to meet your security requirements

https://aws.amazon.com/big-data/

Page 60: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

Takeaways

Page 61: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

• Prescriptiveguidanceandrapidlydeployablesolutionstohelpyoustore,analyze,andprocessbigdataontheAWSCloud

• DeriveInsightsfromIoT inMinutesusingAWSIoT,AmazonKinesisFirehose,AmazonAthena,andAmazonQuickSight

• DeployingaDataLakeonAWS- March2017AWSOnlineTechTalks

• Harmonize,Search,andAnalyzeLooselyCoupledDatasetsonAWS

• BestPracticesforBuildingaDataLakewithAmazonS3-August2016MonthlyWebinarSeries- YouTube

http://bit.ly/2qiElYx

http://amzn.to/2mzGppL

http://bit.ly/2qipA8h

http://amzn.to/2qpiFaK

http://amzn.to/2lpbc8p

Page 62: Data Lake BestPractices - aws-de-media.s3-eu-west-1 ...... · AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 AWS Key Management Service

?