Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Revamped and Automated the infrastructure for NTN Buzztime
NTN Buzztime Inc. was looking for scalable infrastructure with a new platform that could support display of real-time restaurant menus on 50,000 devices placed across the United States. HashedIn modernized NTN Buzztime’s infrastructure and enabled them to go to the market faster.
Executive Summary
Problem Statement
NTN Buzztime Inc. is an interactive gaming company known for introducing innovative dining technologies to bars and restaurants across the United States and Canada. Buzztime wanted to modernize their infrastructure to go to market faster. They wanted a new platform that would allow effective monitoring with zero downtime.
Business Requirements
Objective
To Modernize architecture and provide a stable platform that performed consistently across all environments.
Key Requirements
Summarized requirements put forth by the client were as below:
● Modernize existing architecture with latest tech stack which will solve the cost and scalability
● Build a stable platform that can handle a large volume of requests with zero downtime
● Implement continuous deployment and delivery to enable ‘Release Early and Often’
● Standardize environment and version control
● Proactive infrastructure maintenance
●
Impact and involvement of stakeholders
Buzztime has a big user base. There is also a big network of operations team involved on a daily basis.
Restaurant Customers & Employees: Food menu delivered to 50K tablets, so the platform was used by customers and employees in restaurants to place orders Website users: Website data was powered by the same platform to deliver data. Approx 200K requests were served in a dayWeb Trivia game users: Gamers used the Metadata for live web trivia games. The platform had to scale to serve a large number of live gamersProduct Managers: Product managers were focused on delivering features early and oftenInfrastructure Admins: The admins focused on reducing time spent on release management tasksDevelopers: The developers were carrying out the task of identifying incidents before users were impacted.
Solution ApproachOur Solution Structure
Docker- Containerization using extremely lightweight Docker resulted in utilizing the VM resources effectively. Isolation and segregation enabled by docker resulted in reducing the regression areas.
Docker Swarm- Continuous deployment of containers was implemented in docker swarm clusters and swarm failover was enabled to protect from node outages and achieve zero downtime. Docker swarm was also set up to add or subtract container iteration as changing the computing demands resulted in cost deduction.
Nginx reverse-proxy and load balancing was implemented with advanced security implementations which helped to avail 100% uptime
Smart caching using Redis implementation enabled to serve real-time food menus in 50000 tablets across the US. A Redis sentinel setup with 3 redis servers and 3 hosts instead of one make the system more fault tolerant than the usual.
A Gunicorn application server was implemented, tuned for high performance with lower response time.
New-Relic was setup for enhanced monitoring, which helped in proactive maintenance of application & infrastructure.
Log entries were implemented for centralized log management of docker and other applications, which helped in achieving proactive alert system in case of live incidents and also reduced the debugging time by 30%.
Log rotate was setup for periodically collecting logs from each node of applications. This also includes periodic cleanup of old logs to maintain steady disk capabilities.
Beanstalk and Jenkins were used in continuous integration and orchestration which resulted in smooth deployment and error-free build with fall back ability in case of failures.
Apache workbench & Gatling (Automated testing) enabled automated performance testing, which helped in achieving stable systems during each deployment.HAProxy and Docker swarm contribute towards making the system more efficient to handle a large load.
Apache workbench & Gatling (Automated testing) enabled automated performance testing, which helped in achieving stable systems during each deployment.
Solution Dynamics and Interactions
The overall solution is composed of the following layers:● HAProxy Layer
○ HAProxy is the load balancer which is the first point of interaction from the coming request. It routes the requests forward to one of the swarm load balancers in the docker swarm cluster.
● Docker Swarm Load Balancer○ The Docker swarm load balancer’s job is to forward the request to one of the multiple
NGINX container across the three hosts.
● NGINX Layer○ The NGINX layer is connected with multiple upstreams, which essentially are
containers for a different application supported by the platform. Each of which is a set of wsgi container running on different port. Each of the requests is routed to the correct container based on the URL.
● WSGI Layer○ This is the application server layer, this runs the Gunicorn server which is the process
which handles the request. This layer has interaction with multiple external services like Database, Mongo DB, Firebase etc.
● Redis Layer○ The requests to Redis are the only ones which reach this layer. The redis follows a
sentinel pattern, which consists of one master and two slaves, to avoid possible downtime.
Technology Stack
Nginx
Docker and Docker Swarm
Gunicorn
Redis
Log entries
New Relic
Beanstalk
Jenkins
Apache workbench
HashedIn has helped many promising firms across the globe by building customized solutions to give the users a completely hassle-free experience. Kindly let us know if you
have any specific problem/use case, where we can provide more information or consult you.
https://hashedin.com/contact-us/
Business Outcomes
The infrastructure delivered was serving 2 million requests with zero downtime. There was 4X
improvement in application performance with the new infrastructure. The platform was stable
since it went live. The continuous integration with automated performance testing enabled in
finding issues earlier and in reducing development cycles. Consistency across environments via
docker helped in stabilizing the program.
●