Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sanjay Srivastava - Product manager, AWS Lake Formation
Mert Hocanin – Big data architect, AWS Lake Formation
August 2021
What's New with AWS Lake FormationSecuring and Governing Your Data Lake
© 2021, Amazon Web Services, Inc. or its Affiliates.
What is a data lake?
A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for
analysis.
A data lake enables you to break down data silos and combine different types of analytics and ML to gain insights and guide better
business decisions.
© 2021, Amazon Web Services, Inc. or its Affiliates.
The Lake House Approach
S C A L A B L E D A T A L A K E S
P U R P O S E - B U I L T D A T A S E R V I C E S
A U T O M A T E D D A T A M O V E M E N T
C E N T R A L G O V E R N A N C E
P E R F O R M A N T A N D C O S T - E F F E C T I V E
Non-relational databases
Machinelearning
Datawarehousing
Loganalytics
Big data processing
Relationaldatabases
GovernedStorage
Amazon S3
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora
Amazon DynamoDB
Amazon Elasticsearch
Service
Amazon SageMaker
AmazonEMR
Amazon Redshift
Build data lakes quickly
Easily discover and share data
Simplify security management
Catalog all of your data assets and easilyshare datasets between consumers
Centrally define and enforce security, governance, and auditing policies
Move, store, update, and catalog your data fasterAutomatically organize and optimize your data
AWS Lake Formation
Build a secure data lake in days
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lake Formation - Recap
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Components of the security and governance layer
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv
S3 Bucket S3 Partitions S3 Object
Data is organized inApache Hive style tables
2018
Oct. Nov. Dec.
29 30 1
Americas
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Components of the security and governance layer
Data is organized inApache Hive style tables
Data catalogs providedatabase and table abstraction
2018
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv
Oct. Nov. Dec.
29 30 1
Americas
S3 Object
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data3.csv…
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data2.csv
AWS Glue Data Catalog
Database 1
Database 2
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Components of the security and governance layer
Data is organized inApache Hive style tables
Data catalogs providedatabase and table abstraction
2018
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv
Oct. Nov. Dec.
29 30 1
Americas
S3 Object
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data3.csv…
S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data2.csv
AWS Glue Data Catalog
Database 1
Database 2
Lake Formation provides authorization layer over Glue Data Catalog
Fine grained Access ControlsDatabaseTableColumnsRows (preview)
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Centralized security and governance layer
AWS Lake Formation
Amazon Redshift Spectrum
AWS Glue ETL
Amazon Athena
Amazon EMR
Partner
solutions
Data is organized inApache Hive style tables
Data catalogs providedatabase and table abstraction
2018
Oct. Nov. Dec.
29 30 1
Americas
S3 Object AWS Glue Data Catalog
Database 1
Database 2
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Secure data sharing
© 2021, Amazon Web Services, Inc. or its Affiliates.
To share data across accounts you were . . .
Producer Consumer Producer Consumer
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data sharing made simple with Lake Formation
Share entire database
Share multiple tables
Share columns & rows
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lake Formation cross-account sharing
Producer
GrantResources
Consumer
Createresource links
Shared resources
Analytic engines use resource links
Permissions onshared resources
Share
Sales OpportunitiesSales Opportunities
eu_sales
eu_oppsEU Account
Consumer “soft links” shared resources
Data lake admin delegates access to users
Producer GRANTs permissions to consumers
Data catalogData catalogAWS Lake
Formation
AmazonAthena
AWSGlue
AmazonEMR
Amazon Redshift
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LF Supports common data sharing topologies
Centralized, hub and spoke Across orgs and companies
e.g., vendors, suppliers, aggregators, distributors
Data mesh
Organization
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lake Formation – What’s new
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Share a large fact table across groups and departments
Why row-level security?
Today, requires multiple redacted datasets
Admins see all records
Doctors and nurses see their patients’ records
Store managers see their store’s records
Regional managers see all stores’ records
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lake Formation row-level security
Read APIs uniformly enforce granular compliance policiesRow-level security permissions
Row filter expressions are “WHERE” clause in “PartiQL”
Supports many S3-based table formatsOpen and managedGoverned, Amazon Redshift data shares, Apache HiveApache Iceberg, Apachi Hudi, Delta Lake, . . .
Easy to audit permissions and access
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lake Formation cell-level security
cell-level security permissions builds on row-level security
Restrict column access based on row predicates
Reuse data filters to scale governance
Data filter = set of columns along with row expression‘SELECT” columns “*” or “Column1, Column2” “WHERE” clause in “PartiQL”
Mask out restricted data with multiple data filter grants
Select * where country=US
Select * except IPwhere country=UK
Country IP
UK
UK
US
US
Country IP
UK ********
UK ********
US
US
Effective access with masked IP column
US-Non-Sensitive
UK-Non-Sensitive
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: Lake Formation cell-level security
Setup: ‘customer’ table with sensitive data for US and Canada.
- Compliance require US analyst can only see US rows and Canada analyst only sees Canada rows.
Country
Canada
Canada
US
US
Select * where country=US
US-Analyst
Select * except addresswhere country=Canada
Canada-Analyst
Summary: create named data filters specifying row and column permissions,
grant permissions on named data filters
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lake Formation: Why LF tag-based access control?
Difficult to scale permissions as
number of resources and principals
increases
Tight coupling
Permissions cannot be granted before
resources are created
Policy explosion–every resource add requires permissions update
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lake Formation: Tag-based access control
Classify data using LF Tag ontology
Grant principals access on LF tags independently
Tag databases, table, columns as resources are created
Scale management of large number of resources easily with LF-tag hierarchy
Item Region Email AddressSales Price
141414 West [email protected] $65.00
124141 [email protected]
$41.50
135355 East [email protected] $54.10
423514 East [email protected] $81.43
Email Address First Name Last Name channel Acq Costs
[email protected] Andy McDowell Facebook $2.00
[email protected] Kathy Bates Facebook $1.75
[email protected] Jenna Bush adwords $1.40
Table: Sales
Table: Mktg
Scale enforcement easilyDept=Sales
PII !=true
Sales Mgr
PII=true
Executive
Dept=MKTG
PII=true
Marketing Exec
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo: Lake Formation Tag-based access control
Setup: Database ‘Sales ‘ with ‘customer’ table containing PII data.
- Compliance require data analysts to have access to non PII data.
Country Address DOB
Table: Customer
Tag PII=false
Business-Analyst
Summary: create an ontology, decouple resource creation and access grants to
scale governance
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges in managing your data
• Complex ETL
• Delays in data freshness
• Expensive, brittle & error-prone
Continuous updates
• over-scan data
• Lots of small files
• Partition updates
• Management overhead
Inconsistent performance
• Difficult to find needle in very large haystack
Complying with regulations
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing Governed Tables
New type of S3 table
ACID
ACID transactions
Metadata and data
Multiple operations
many tables
many users
various engines
No lock-in
Retain control over data
Remains in your S3 buckets
Open file formats:Parquet, CSV, JSON, . . .
Import and export
Popular table formats
Apache Hudi, Delta Lake, Apache Iceberg
Time travel
Access version of data lake at an earlier point in
time
Reading Governed Tables with Query acceleration
PartiQL
In Preview
Writing to governed tablesM A N I F E S T T R A N S A C T I O N S A N D R E A D
In Preview
Catalog TransactionsC R E A T E / U P D A T E / D E L E T E T A B L E S W I T H T R A N S A C T I O N S E M A N T I C S
BeginTransactionCreate Table/Delete Table/Update TableUpdate data..CommitTransaction
In Preview
Governed Tables: Storage optimizer
Automatic Compaction
In Preview
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Transactions make data lakes trustworthy
“. . . Transactional ETL processes are an important part of how we
ensure data integrity and . . . required additional development time and
complexity. We’re excited about AWS Lake Formation Transactions’
ability to simplify our ETL and reduce the overall effort needed to
produce trustworthy data in our data lake.”
Rob Hruska
Engineering Director
Hudl
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo – Governed tables
Governed tables make it easy to perform transactional reads and writes data easily.
In this demo we:
- Write to governed tables from Glue and Python
- Read from Athena, EMR, and Python script
Simple and Easy
No cluster to setup, no spark runtime required
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Transactions, row-level security, and acceleration
New AWS Lake Formation update and access APIs to S3 data lakes
Accelerates access to S3 data lakes
Row-level security and updates
Open and public APIs –Build your own application
Integrations
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lake Formation new feature availability
Lake Formation Tag-based - GA Governed Tables & Row/Cell-level security - Preview
Northern Virginia Northern Virginia
Oregon Oregon
Ohio
Ireland
Tokyo
Seoul
Singapore
Sydney
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Resources to get you started
https://aws.amazon.com/blogs/big-data/category/analytics/aws-lake-formation/
• Part1: Getting started with Governed tables
• Part 2: Creating a governed table for streaming data sources
• Part 3: Using ACID transactions on governed tables
• Part 4: Implement cell-level and row-level security
• Part 5: Securing data lakes with row-level access control
• Easily manage your data lake at scale using LF tag-based access control
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sign up for preview
AWS Lake Formation: Transactions, row-level security, and accelerationA C C E L E R A T E A N D G O V E R N A C C E S S T O Y O U R A M A Z O N S 3 D A T A L A K E
Sign up here: https://aws.amazon.com/lake-formation/preview/