Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
© 2019 Snowflake Inc. All Rights Reserved
SNOWFLAKEBEST PRACTICES
LOUIS LEESALES ENGINEER
CLIVE ASTBURY REGIONAL SALES ENGINEERING MANAGER
© 2019 Snowflake Inc. All Rights Reserved
AGENDA
2
Virtual Warehouse Management
Cost Management
Network Security Policies
User Authentication
Role Management
Snowflake Community
© 2019 Snowflake Inc. All Rights Reserved 4
VIRTUAL WAREHOUSE MANAGEMENT
Considerations• Key SLA’s and challenges with
meeting SLA’s
• Data load and transformation workloads
• Reporting, ad hoc analysis, and data science workloads
• Cost management
Topics• Sizes and approach to right-sizing
• Scaling up vs. scaling out
• Automating suspend/resume, sizing, and multi-cluster scale-out
© 2019 Snowflake Inc. All Rights Reserved
WAREHOUSE SIZESSizes Servers / Cluster Credits / Hour Notes
X-Small 1 1 Default size when created using CREATE WAREHOUSE.
Small 2 2
Medium 4 4
Large 8 8
X-Large 16 16 Default size for warehouses created in the web UI.
2X-Large 32 32
3X-Large 64 64
4X-Large 128 128
5
Doubling the number of servers halves the run-time...
SCALE UP - LOADING 1BN RECORDS
Doubling the number of servers halves the run-time...
… but you pay per-server, per-second of compute...
… so you can get your answer 8x faster for the same cost.
SCALE OUT - MULTI-CLUSTER WAREHOUSES
4x increase in servers
4x increase in servers (at peak load)
both are 16 servers, in different configurations
multi-cluster is also half the cost of the xlarge single cluster
multi-cluster gives better results
S
M
MM
time
All three examples contain the same amount of work.
Using scale up and scale out, total run-time is significantly reduced.
You pay per-server, per-second so they all cost the same.
ALL TOGETHER - SCALE, ELASTICITY, COST
© 2019 Snowflake Inc. All Rights Reserved 10
AUTOMATING SUSPEND/RESUME
Auto Suspend/Resume• On-demand, end-user workloads• Suspend idle time setting should take into
account data caching
Programmatic Suspend/Resume• Scheduled jobs where process orchestration is
controlled• Programmatically resume at the start of
processing and suspend at the end of processing to avoid idle time costs
© 2019 Snowflake Inc. All Rights Reserved 12
Considerations• Compute Costs• Storage Costs• Service Costs• Data Transfer (Egress) Costs• Monitoring & Alerting
COST MANAGEMENT
Topics
● Resources Incurring Costs● Compute
○ Viewing Usage○ Resource Monitors
● Storage○ Time Travel & Fail-Safe○ Viewing Usage
● Services○ Non-warehouse compute
© 2019 Snowflake Inc. All Rights Reserved
RESOURCES INCURRING COSTS
Materialized ViewsAccount
Virtual Warehouses
Databases Schemas
Tables
Permanent
Temp/Transient
AutomaticClustering
Service
Stages
Internal
Cross-RegionExtract Egress
PipesCompute Costs
Storage CostsService CostsPass-through Costs
Materialized Views
13
© 2019 Snowflake Inc. All Rights Reserved 14
RESOURCE MONITOR• Align with team-by-team warehouse
separation for granular cost governance
• Set at account level if specific virtual warehouse quotas are not needed
• Leverage tiered triggers with escalating actions (e.g., Notify > Notify > Suspend)
• Enable notifications using ACCOUNTADMIN role and set e-mail address
© 2019 Snowflake Inc. All Rights Reserved
TABLE TYPES
Tied to an individual session and persists only for the duration of the session. Used for storing non-permanent, transitory data (e.g. ETL data, session-specific data).
TemporarySpecifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary tables), but does not need the same level of data protection and recovery provided by permanent tables.
TransientDesigned for data that requires the highest level of data protection and recovery with both a Time-Travel and Fail-Safe period, and is the default for creating tables.
Permanent
Time-Travel
Fail-Safe x x
© 2019 Snowflake Inc. All Rights Reserved 17
TIME TRAVELSTORAGE
• High churn detected with ratio such as:
TIME_TRAVEL_BYTES / ACTIVE_BYTES
from TABLE_STORAGE_METRICS view
• For Enterprise (or higher), retention period can be up to 90 days; verify retention period on all large or high-churn tables
• Reduce retention period if data can be regenerated/reloaded and time/effort to do so is within acceptable boundaries/SLAs
• Use periodic zero-copy-cloning (snapshots) instead of time travel to provide longer retention period at discrete points in time (daily, weekly, etc)
Areas Of Focus• Dimensional Tables• Persistent Staging Areas• Materialized Relationships,
Derivations, Other Business Rules
© 2019 Snowflake Inc. All Rights Reserved 18
FAIL-SAFESTORAGE
• Permanent tables follow full CDP lifecycle; temp/transient tables NEVER use fail-safe
• Utilize temp tables for session-specific intermediate results in complex data processing workflow
• Temporary tables are dropped (and storage released) as soon as session ends
• Utilize transient tables for staging where frequent truncate/reload operations occur
• Consider designating databases/schemas as transient to simplify table creation
Areas Of Focus• Staging Tables• Intermediate Result Tables• Work Areas for Developers, Analysts
& Data Scientists• Reporting Tool Materialized Results
© 2019 Snowflake Inc. All Rights Reserved
LAYERED SECURITY
To protect customer data using AES 256 bit encryption, and periodic re-keying
Network(AuthenticateConnection)
Account(Authenticate User)
Object(Authorization)
Data(Encryption)
1 2 3 4
To authenticate users using a Password, Multi-Factor Authentication or Single Sign-On
To restrict access to specific Databases, Schemas, Tables, Views, etc.
Using Roles and Privileges
To restrict access to specified IP address/rangeOptionally: To restrict via Secure Private Network
20
© 2019 Snowflake Inc. All Rights Reserved 21
NETWORKSECURITY
Considerations• IP Whitelisting &
Blacklisting• Public Internet Exposure
Considerations
Topics• Network Security Policies
• AWS/Azure PrivateLink
© 2019 Snowflake Inc. All Rights Reserved 22
• Managed by ACCOUNTADMIN or SECURITYADMIN roles
• Only one network policy object can be active at any one time
• Supports IPv4 addresses & CIDR notation
• Maintain consistency with other enterprise application network security policies
• Connectivity test plan should include all networks (i.e., internal, vpn, etc.)
• Utilize IP ranges versus IP lists whenever possible (e.g., 192.168.1.0/24)
• Blocked IP’s are enforced first and require careful consideration when overlapping an allowed IP range (e.g., 0.0.0.0/0 blocks all IP’s)
NETWORK SECURITY POLICIES
© 2019 Snowflake Inc. All Rights Reserved 25
USER AUTHENTICATION
Considerations• Multi-Factor
Authentication• Federated Authentication• User Group Scenarios• Service Account
Scenarios
Topics• Multi-Factor Authentication
• Federated Authentication & SSO
• OAuth
© 2019 Snowflake Inc. All Rights Reserved 26
• Provides increased login security for users connecting to Snowflake
• Powered by Duo Security, which is managed by Snowflake
• Can self-enroll
• Strongly recommend requiring MFA for all users with ACCOUNTADMIN role
• Duo-generated passcode can be used when connecting through Python, SnowSQL, JDBC or ODBC
MULTI-FACTOR AUTHENTICATION
© 2019 Snowflake Inc. All Rights Reserved 27
• Enables user SSO (single sign-on) through federated authentication
• Browser-based supports for most SAML 2.0-compliant identity providers (Google, Azure, Onelogin, PingOne)
• Native support for Okta and Microsoft ADFS
• Browser-based SSO can be used in combination with MFA
FEDERATED AUTH & SSO
© 2019 Snowflake Inc. All Rights Reserved 28
• Open-standard 2.0 protocol that allows supported clients authorized access to Snowflake without sharing or storing user login credentials
• Supports Tableau Desktop/Server/Online and custom clients configured by your organization
• Supports OAuth with AWS PrivateLink
• ACCOUNTADMIN and SECURITYADMIN are blocked roles by default, but can be enabled by Snowflake Support
• Currently only the default role for a user is authorized or PUBLIC if no default is set
OAUTH
© 2019 Snowflake Inc. All Rights Reserved 30
ROLE MANAGEMENT
Considerations• Administrators• Developers & DevOps
Flow• End-Users• Service Accounts
Risks • Inappropriate or Overly
Restrictive Access
• Lack of Extensibility & Control
• Burdensome Maintenance
• Future Rework & Reconfiguration
© 2019 Snowflake Inc. All Rights Reserved 31
SYSTEM-DEFINED ROLES
Users & Roles Objects
ACCOUNTADMINOwns the Snowflake account and can operate on all objects in the account, view and manage Snowflake billing and credit data, and stop any running SQL statements
SECURITYADMINPrimary role for managing users, custom roles and object access (grants)
SYSADMINPrimary role for creating and managing objects (i.e., warehouses, databases, tables, etc.) and administering object access through custom roles
PUBLICPseudo-role that is automatically granted to every user and every role in your account
© 2019 Snowflake Inc. All Rights Reserved 32
EXAMPLE Functional Roles● Analyst Team Lead● Junior Analyst
Analyst Team Lead● Has all (CRUD) access to a working schema
● Read access to the main schema
Junior Analyst● Limited to read access to the main schema
Both roles share access to a Virtual Warehouse
Analyst Team Lead
Table
Database: DWH
Schema: Working Area
Select
JuniorAnalyst
Table
Schema: Main
READ ONLY PATTERN: SOLUTION
OBRIAN WSMITH
Usage
Virtual Warehouse
Usage
Usage
Usage
© 2019 Snowflake Inc. All Rights Reserved 34
Naming Convention• Establish and use a consistent naming
convention across entire account
Future Grants• Allows defining a role with an initial set of
privileges on new objects of a certain type (e.g., tables or views) within a schema or database (pr-preview)
Viewing Granted Roles & Privileges• SHOW GRANTS TO USER <user>;• SHOW GRANTS TO ROLE <role>;• SHOW GRANTS OF ROLE <role>;• Query INFORMATION_SCHEMA
Managed Access Schema• Centralizes grant management to the
schema owner or role with MANAGE GRANTS
OTHERCONSIDERATIONS
• Naming Convention• Future Grants• Viewing Granted Roles &
Privileges• Managed Access Schema
© 2019 Snowflake Inc. All Rights Reserved 36
SNOWFLAKE COMMUNITY
Snowflake Community• We are moving our forum
to Stack Overflow• Use existing forum for
Snowflake account-related questions
• Everything else will remain the same with Snowflake Community
Stack Overflow • Technical Q&A
• Use the “[snowflake]” tag
• Include relevant information like error messages