1. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Empire Building a PaaS with Docker and AWS
2. Agenda A little background about why we decided to build an internal PaaS. Introduction to Empire. How were leveraging ECS as the backend. Demo Q&A
3. Who am I Eric Holmes Infrastructure Engineer at Remind I like building things for other developers Work mostly with Go and Ruby You can find my open source stuff at https://github.com/ejholmes
4. Whats Remind? Remind is a messaging platform for teachers, students and parents. Chat/Announcements/Files ~25 million users. ~350,000 new users per day during BTS ~5 million messages per day. ~50 employees. ~30 engineers.
6. Started as a monorail
7. We started growing...
8. Broke apart the monolith Sidekiq queues were IO bound and constantly backed up during BTS Message delivery workers were tightly coupled to the rest of the application. Difficult to scale out horizontally Database would need to be sharded Started breaking the monolith apart into loosely coupled services. Now have ~50 production services
9. Heroku Entirely hosted on Heroku Heroku has been awesome; never needed an ops team. Allowed us to focus on building product.
10. But we ran into issues... Internal micro-services need to be exposed publicly. Databases need to be opened up to all traffic. Little visibility into performance of hosts. No control over the routing layer.
11. What do we want? Want to use AWS services. Want to maintain operational simplicity. Support 12 factor apps. http://12factor.net/ Maintain shared patterns for deployment. Faster iteration and build + release cycles No ops. Decrease our surface area and only expose a single app publicly. Robust and resilient to failure. Self-healing. If we can, continue to use containers as a unit of deployment.
12. Why containers? Fast to build* Let us isolate dependencies as a portable, easy- to-distribute package. Allow us to create better development environments with more dev/prod parity. Limit the number of moving parts when we deploy. Better resource utilization and cost management
13. Were not the first company to want a PaaS Netflix - Asgard SoundCloud - Bazooka Every other company in our investors portfolio...
14. Something we can re-use? Flynn Alpha Undergoing many architectural changes Custom load balancer Deis More than it needed to be Nobody using it successfully in production (that we knew of)
15. Empire was born Initially started as a management layer on top of CoreOS + fleet. Load balancing via nginx configured through confd + etcd. Unit of deployment was Docker containers Implemented a subset of the Heroku API
16. Therein lies the rub... Fleet initially worked well, until we started testing failure modes. Fleet had a lot of bugs etcd was fragile We needed resilience and stability We didnt want to run and operate our own clustering.
17. ECS becomes GA ECS became GA while we were looking for an alternative scheduler. Looked promising to serve as the scheduling backend.
18. What is ECS? Pools hosts together as a single compute resource. Provides a set of APIs for placing tasks on machines Scheduler supports services for scaling tasks horizontally and maintaining desired state. Services integrate with ELB for connection draining, zero downtime, and healthchecks.
20. ECS Resources Task Definition Service Task Cluster
21. ECS for Empire Solid set of primitives to serve as the scheduling backend Managed service Failure modes behaved as we expected them to ELB integration allowed us to remove custom routing layer Service discovery via DNS
22. What is Empire? Open source internal PaaS for micro-services A layer of usability on top of ECS for 12 factor apps Single binary. Minimal deps. Easy to run. Provides an API and CLI to create apps, deploy docker images, update configuration, run one off tasks etc. Allows you to use Procfiles to build multiple ECS services
23. Is it ready for production? Running ~15 production services within ECS managed via Empire for a little over a month Empire is hands off after youve deployed. AWS services take over Moving directly onto EC2 showed huge performance improvements for services
25. What does Empire not do? Bring your own logging and metrics (soon?) It doesnt handle building your Docker images Doesnt handle the creation of attached resources like Databases
26. Things to keep an eye on http://www.convox.com/
27. Thank you GH: @ejholmes Twitter: @vesirin https://github.com/remind101/empire https://github.com/ejholmes/empire-demo http://12factor.net/