Upload
amazon-web-services
View
741
Download
0
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Hagen, Cloud Security Operations @ Netflix
June 21, 2016
Reactive Cloud Security
Toward Self-Defending Cloud Environments
Introductions, because they
matter.
Me
● Bachelor’s in Political Science, International Studies,
Minor in Mandarin Chinese
● Master’s in Information Assurance
● Security Operations Center at Motorola
● Consulting at Motorola and Neohapsis
● Security at Obama 2012
● Security Operations at Netflix
Netflix
● 81+ million members
● Supporting 1,000+ device types
● Available in every* country
● Concurrent delivery from 3 global regions
● > 1/3 of all US broadband
● 1,000+ of developers/1,000s of applications
● A very large monthly AWS bill
● High elasticity
Netflix
● Application owners “own” their own DevOps
● Immutable server pattern
● Everything scales
● The average TTL of an instance is < 3 days
Security @ Netflix
● A paved road
● Enablers not blockers
● Application owners “own” their security; Security teams
help them make the right choices
●❤️❤️ Self-service, automation, and architecture ❤️❤️
Let’s talk about reactive cloud
security
The old model
● A network firewall blocks traffic
● An intrusion prevention system blocks traffic
● A web application firewall blocks traffic
● Authentication/authorization blocks access
Block, block, block, block ...
We can do better.
What is Reactive Cloud Security?
● Environments should be architected for change
● Security models should understand and leverage these
changes
● Reactive Cloud Security should ...• Understand the context of events within your environment
• Automatically adapt the environment based on security
conditions
That sounds great. What are
some examples?
Environmental changes
● Scale an Auto Scaling group
● Modify security groups
● Adjust AWS Identity and Access Management (IAM)
object privileges
● Turn on/off logging
● Isolate a system
● Tag a system
● Redeploy a system
● Shift traffic
● ...
OK. I get it. But how does it
work?
The easy stuff: binary conditions
● There are things about your environment which should
never change
● AWS CloudTrail should always be on
● Administrators should always have high privileges
● External traffic should only be terminated on Elastic
Load Balancing load balancers
● SSL certificates should always be valid
● ...
Less easy stuff: fuzzy conditions
● There are things about your environment that could
change
● Web server CPU load should never exceed X%
● Patterns of inter-application traffic
● Engineers/administrators logging into systems
● API access patterns
● Inbound/outbound traffic patterns
● ...
Laying the groundwork: AWS
● AWS CloudTrail• Make sure CloudTrail is turned on ... for all the things
• Stream to CloudWatch logs (> 10 min latency)
• Use CloudWatch Events when you can (< 1 min latency)
• Connect both to AWS Lambda functions monitoring for specific
conditions
● AWS Lambda functions identify, log/notify, and react to
these conditions• Create specific “OK” conditions, break glass buttons, etc.
Laying the groundwork: Non-AWS events
● Requires a robust, reliable, and (programmatically)
accessible logging infrastructure
● Access logs, authentication logs, performance logs, etc.
● A leveragable pipeline ... ELK is a good start, but not
appropriate for everything• CloudWatch Logs, CloudWatch Metrics, Datadog, Statsd,
$plunk, New Relic, etc.
● At Netflix we use Atlas, ES, and other big data pipelines
(https://github.com/Netflix/atlas)
Strategy is important.
Three categories of events
#1 Fully automatable #2 Almost automatable #3 Never automatable
Please talk about some more
relevant buzzwords.
ChatOps
● Baby steps toward full reactive automation (for
managing bucket #2 type events)
● Use a single shared interface to facilitate notifications,
log work, provide context, and interact with tools
● Automation gets you the context and notification
● Humans approve and execute commands
● Two-factor is important!
Right sizing your environment
● Monitor your environment so that security policies
match reality• IAM roles (look out for RepoMan from Netflix)
• Security groups (working on something here too)
• Amazon S3 policies 😣
● Start off with more than you need during development
● Monitor for X days
● Adjust policy based on actual usage; expose this
information!
● Enable break-glass and self-service changes to
automation
In closing ...
● Cloud environments and modern
development/deployment technologies can increase
Security
● Architect for flexibility and varying security conditions
● Seek to remove practices which can’t be automated
● ChatOps and right sizing are your friends
Thanks!
Feel free to reach out:
... or yell at me publicly:
● @benhagen