How to witness disaster, - IT and business …...How to witness disaster, fail big, and lose your job all in a day: A cautionary tale about performance engineering for IT Being popular

How to witness disaster, fail big, and lose your job all in a day:

A cautionary tale about

performance engineering for IT

Being popular is great. Especially in business—as long as you’re ready.

The cure for managing surges is performance engineering. System

performance engineering isn’t about functional requirements

or cool features, it’s about making systems work in all situations across all

platforms (e.g., the web, the mobile app, etc.).

Unfortunately, a lot of IT pros who launch systems don’t really understand how much traffic is coming in and how much capacity or workload a system can handle. They might not measure usage on an ongoing basis. They may also have under-performing redundancy and resiliency plans. Many don’t have an emergency protocol for regulating system usage so at least some users get access or the most critical business features are made available in a time of crisis. Some don’t have plans or the capability to quickly scale new infrastructure, even with cloud implementations.

The cure for managing surges is performance engineering. System performance engineering isn’t about functional requirements or cool features, it’s about making systems work in all situations across all platforms (e.g., the web, the mobile app, etc.).

Ideally, system performance engineering should be a formal, planned part of any software launch, whether it is a brand new system or an upgrade. It is usually thought of as a last step, but really performance considerations should be baked into the entire design and development process.

Good performance engineering practice starts with defining realistic and plausible non-functional requirements for performance, scalability, and availability. A common example is setting the minimally acceptable user metrics such as “this page should always load in X seconds.” But it should also consider the expected peak load for each transaction, particularly what reasonable surges in users or activity might be possible. Depending on the costs, it would be wise to overstate the possibility especially if the system is a critical application to the business.

During software development and implementation, a good performance engineer will perform platform/environment validation to validate key architectural decisions early in the design phase. Performance testing and benchmarking is useful to ensure you’re meeting your performance goals and to compare yourself to the competition. Performance regression should be tested to make sure new software in an upgrade doesn’t disturb previous features.

Monitoring is probably the most important thing to do. If you don’t know how much activity is going through and how the system is responding, it can be very hard to understand and react to events that happen. Because monitoring is so important, it’s often the first thing we recommend when a system has already gone live. Monitoring should also be the basis for continuous improvement. As performance is monitored over time, the savvy performance engineer can get a view of reality, e.g., see that feature X is slowing down when Y happens, see how lag times vary by user population size, and so forth. The performance engineer can then identify things to change, which may include code changes, database or application server configuration changes, or improving the infrastructure.

Just ask the trading platform that stopped working after the Brexit announcement caused thousands of unexpected transactions in an instant, niche panic attack.

Just ask the retailer who had no idea that a certain promotion would be so hot that the mobile app stopped responding, preventing thousands of interested shoppers from buying, and making them frustrated with the company.

Just ask the video game company whose game launch was so popular that most people couldn’t register because of server traffic. Not only was massive revenue lost, but the Internet and YouTube also exploded with people venting their frustration.

IT events happen, and they can happen to anyone who is not prepared. Sometimes it is a market or news event that creates unexpected surges in use. Other times, the surge is self-inflicted as part of a product launch or marketing event. It’s become much more common now that everyone carries a smartphone with them at all times. Popularity is great, but when a system or app isn’t ready for the traffic or capacity, bad things happen. A failed system can be catastrophic in some instances. At a minimum, a down or slow system can reduce revenue, decrease productivity, and create a crisis of confidence with users. At its worst, system failure can destroy your business’s reputation. In some cases, people get fired over it.

Start listening nowIf you haven’t already, it’s time to start listening to your systems now. Establish a monitoring function and see how usage and traffic are affecting performance so you can make required change. If you are at the beginning of the development process, get smart on software quality and performance engineering, and put a plan and program into place.

Market events happen to the best of us, and those who are prepared are the ones who survive. Be the one who makes being popular the best thing that ever could’ve happened, not the worst.

Getting your SWAT team readyWe can’t always create indestructible software, or at least not within a reasonable budget. Certainly when the video game company mentioned above released their app, they must have had some educated guess on expected usage. They may have even done primary benchmarking and research to back up their plan. But in the end, they still had issues. So when the unexpected happens, quick action is needed to keep a disruption from turning into a disaster.

The performance engineer here can create resiliency plans to ensure any required response will be quick and effective. These actions include having a SWAT team of the right experts to intervene when an event happens. The plans should also include infrastructure plans, like knowing how to quickly stand up new servers or buy extra on-demand capacity in the cloud. There may be other controls available so that the software prioritizes resources towards must-have functions or regulates the amount of users so the app is still working for some and not completely shut down.

About CGI

Founded in 1976, CGI is one of the largest IT and business process services providers in the world. We combine innovative services and solutions with a disciplined delivery approach that has resulted in an industry-leading track record of delivering 95% of projects on time and within budget. Our global reach, combined with our proximity model of serving clients from hundreds of locations worldwide, provides the scale and immediacy required to rapidly respond to client needs. Our business consulting, systems integration and managed services help clients leverage current investments while adopting technology and business strategies that achieve top and bottom line results. As a demonstration of our commitment, our client satisfaction score consistently measures 9 out of 10. Visit cgi.com for more information.

www.cgi.com [email protected]

© 2017 CGI

Documents

How to witness disaster, - IT and business …...How to witness disaster, fail big, and lose your job all in a day: A cautionary tale about performance engineering for IT Being popular