November 12, 2014 | Las Vegas, NV
BDT206
See How Amazon Redshift is Powering Business
Intelligence in the EnterpriseRahul Pathak, Amazon Redshift
Jason Timmes, Nasdaq
Kevin Diamond, Hautelook
Amazon
Redshift
Amazon Elastic
MapReduce
Amazon EC2
Analyze
AWS Data
Pipeline
Amazon
GlacierAmazon
DynamoDB
Store
AWS Direct
Connect
Collect
Amazon Kinesis
Amazon
S3
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Data Source ET
Direct
Connect
Client
Forwarder
LoaderState Management
SandboxAmazon Redshift
S3
11
LEADING INDEX PROVIDER WITH
41,000+ INDEXES ACROSS ASSET CLASSES AND
GEOGRAPHIES
Over 10,000 Corporate Clients in
60 countries
Our technology
powers over
70
MARKETPLACES,
regulators, CSDs
and clearing-
houses
in over
50 COUNTRIES
100+ DATA
PRODUCT OFFERINGS
supporting 2.5+ millioninvestment professionals
and users
IN 98 COUNTRIES
26 Markets
3 Clearing Houses
5 Central Securities
Depositories
Lists more than 3,500
companies in 35 countries,
representing more than $8.8
trillion in total market value
Our warehouse can be used to
analyze market share, client
activity, surveillance, power our
billing, and more…
• Pay close attention to manifest mandatory flag! – Amazon Redshift UNLOAD always sets this to false!!!
• TableIngestStatus– We originally put this table in Amazon Redshift itself
– Turns out Amazon Redshift is not efficient on really small data sets
– Significantly impacted performance, and increased concurrency
contention
• Solution: Moved TableIngestStatus to a separate
transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow
states
• Direct Connect (private lines)
• VPC
• Encryption in flight (HTTPS/SSL/TLS on API, JDBC)– Parameter Group: require_ssl = true
– Use Amazon Redshift cluster SSL certificate to verify cluster
identity
• Encryption at rest– AES-256 encrypt files prior to loading to S3 (not using S3 SSE)
– Amazon Redshift encryption
• Specified at cluster creation, applies to backups/snapshots too
• Amazon Redshift will store the cluster key in a
single customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM
– Requires certificate exchange between cluster and HSM
– Requires cluster have an EIP
• On our side, required static 1-to-1 NAT of HSM private IP
• VPC Security Groups still apply; can still isolate cluster from others
– Encrypted database key decrypted in HSM, passed over encrypted
channel to cluster on startup, stored in memory to decrypt data
encryption (block) keys
– If running an HSM HA group, must synchronize keys after creation
• HSM integration was critical to Nasdaq adoption
• Monitor cluster access, react to any unauthorized
connections– STL_CONNECTION_LOG
• Query system table on a timed basis, alert to any unexpected access
– CloudTrail to Splunk Amazon Redshift connection & user logs
• Captures all API calls, not activity inside Amazon Redshift
– STL_DDLTEXT
• Audits all schema changes in the cluster
• In response to an alert, Amazon Redshift/HSM connectivity
is severed, and cluster is immediately shut down
• With validation, data integrity, and security
requirements met, the challenge remains to
optimize ingest
• Why?– Concurrency is a huge performance factor; can’t afford to be
loading yesterday’s data when clients are running queries
-
20
40
60
80
100
120
140
1 2 4 6 8 10 12 14 16 18
Th
rou
gh
pu
t (M
B/s
ec)
Concurrent Threads
S3 (over HTTPS) Multithreaded Throughput
On premises AWS Regional (Multi-AZ) Scope AWS (US-East,
primary AZ/VPC)
S3
Amazon SNS
Redshift
Database
Cluster
HSM Key
Appliance
Cluster
MySQL
Redshift
Load files/
Manifests
Redshift
Snapshots/
Backups
Data
Loaded
Topic
RMS Input
Sources
(multiple
systems)
Data Ingest
Process
November 12, 2014 | Las Vegas, NV
BDT206
See How Amazon Redshift is Powering Business
Intelligence in the Enterprise
Kevin Diamond, Nordstromrack.com | HauteLook
Amazon Redshift
Staging ProdEMR
Data Pipeline Data Pipeline
Staging Prod
medium speed
medium storage
$3.7k/month
awesome support
small storage
$3.7k/month
awesome support
medium concurrency
$10k/month
awesome support
Total Storage
Daily Transfer
Monthly Growth
Monthly Spend
Estimated 3yr Savings
http://bit.ly/awsevals