10

Click here to load reader

A Next Generation Infrastructure - CUNA ... - CUNA · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

  • Upload
    ledat

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

WSECU

WSECU Information Technology

Jack Funk – Infrastructure Architect

Aaron Robel – Sr. Security Engineer

David Luchtel – VP – Infrastructure & Ops

A Next Generation Infrastructure

Page 2: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

1 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Executive Summary: The challenge: How to cost effectively support our changing security, availability and growth requirements. The solution: A new way of managing our infrastructure, an application centric virtualized environment. As WSECU infrastructure is growing 15%+ year over year as we have implemented new member facing systems, scaled our systems to support our member growth and implemented new enterprise systems. The traditional model of securing assets with physical firewalls, DMZs and subnets has become unsustainable in security, cost and complexity. Even with segmentation, shared network subnets allow unintended communications between hosts and applications sharing subnets must be failed over simultaneously. Our operational complexity, availability and performance suffered due to our growth. As an example, WSECU typically experienced 4 to 5 hour member outage for DR test to complete the network and services tasks required to bring services online; even though, our virtual server’s failover to the DR data center took minutes. Over time we have learned that the best way of preventing security breaches is to keep intruders out of the network at the perimeter. When your company becomes the target of an attack and the perimeter is cracked, the attackers gain access to everything within your trusted domain. When Target was breached in 2013, the intruders utilized weakened security on an HVAC system managed by an external vendor to gain access to the network, after which they were able to deploy malicious software to reap sensitive customer data. The story repeats itself over and over with breaches at Home Depot, the Federal Office of Personnel Management, the IRS, Ashley Madison, Bank of America and the list goes on. Many of these breaches were discovered months after the breach actually occurred, resulting in large financial penalities and lost customer loyalty. But what if they could have been prevented? What if, instead of building more complexity and physical granularity in networking and resilience, applications with sensitive data became the focus of security instead of the infrastructure they were operating on? With this in mind, WSECU re-architected its infrastructure to be 100% virtualized and configured around each applications unique requirements utilizing industry leading technology from VMWare.

Page 3: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

2 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Transition: In 2015 WSECU began to consider changing the view of security within the data center, taking an application focused perspective instead of piece meal network additions. In order to do this, WSECU launched the Virtual Infrastructure Project (aka VIP) which would redesign the data center infrastructure from the ground up.

The VIP team identified several opportunities that could bring real business value to WSECU. Interestingly, the first step was not about technology but rather the definition of technology standards and practices that incorporated into our longer term architecture roadmap of simplicity and resiliency. By defining a tight practice and documented standards, we found that isolating applications by bringing security controls next to the application hosts we could achieve our security goals.

The second step was to transition to an application centric approach to infrastructure, which is like a bubble that incorporates all of the required pieces to sustain the function and security of that single application. Once this is achieved the application bubble can move anywhere, even to cloud based IaaS, if desired. To further achieve application independence within the bubble, we realized that we needed to go beyond server virtualization to virtualizing the network and security components as well. This would allow us to keep the network tightly integrated within the application bubble to help achieve our failover goals.

Each application and required networking in “bubble”

Third, we knew we needed to move away from traditional firewalls and that our strategy would require a firewall for every virtual server to provide the granularity, flexibility and control we needed. We needed to move from a single harden perimeter to a harden perimeter around each mission critical application (aka: micro-segmentation) for security.

Page 4: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

3 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Finally, along with all of this we needed to reduce the operational overhead and provide as close to a single pane of glass for administration as possible. We see the line being blurred between network and systems administration. We wanted to use this concept and strategy as an opportunity to provide a single simplified pane of glass for our staff to use. This meant, eliminating redundancy of virtual platforms and choosing a platform that would meet our lofty strategic goals above. We ended up choosing VMware virtualization with network and firewall virtualization through their NSX module. To further achieve our strategy for holistic virtualization we would also migrate network and security physical appliances to VMware’s platform. On the operational side this shift would help with our strategic goals of reducing the staff skillsets required for support and eliminating costly life-cycle hardware replacements costs. Finally, by virtualizing these components we would be able to reduce delivery timelines for our customer’s solutions to help WSECU be more competitive and faster to market, increasing technology business value. The VIP project set out to deploy VMware ESX with network and firewall virtualization using their NSX solution and vCloud software. NSX, a pre-2014 Nicira acquisition by VMware, layers into the virtualization platform by including a private cloud based routing and firewalling layer equivalent to the granularity only seen in the large cloud providers like AWS. By virtualizing the routers, switches, load balancing, firewalls and the servers the underlying WSECU network is simplified greatly. As WSECU adds new applications all aspects of the build are done in software and can be scripted for automation. This has greatly reduced time to market for our solutions.

Virtual Server Guests

Virtual Server Guests

PhysicalVMware Host

App1 App2 App3 App4PhysicalVMware Host

FW FW FW FW FW FW FW FW

FWFW

FW

FW

FW

FW

DMZ

Trusted

FW

Member Data PCI Data

PCS Data

Page 5: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

4 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Business Case: As we developed our business case for the VIP project, four key areas of return were identified: CapEx savings, OpEx Savings, improve resource efficiency and ensure member data security. From a cost perspective, we looked at our capital and operations budgets over a five year time frame. The capital budget, with a significant license expense for NSX up front, realized savings through reduced lifecycle replacements of hardware appliances resulting in an anticipated 10% reduction in overall capital expense vs our current hardware centric plan.

In the operations area, the software defined networking is anticipated to offset growth of 4 FTEs to keep up with infrastructure resource demand over the next 5 years. In fact, our estimation is that we will reduce infrastructure resource demand by 2 FTEs whose availability will be re-directed to focus on business projects. The operations budget is anticipated to also save approximately 10% over the 5 years over the current hardware centric plan.

Based on industry standards for FTE to server ratios and our documented increased security posture, the VMware NSX and vCloud investment will be a solid financial investment with a less than 2-year breakeven point. The new architectural approach provided the additional flexibility to protect member data immediately wherever it is located and the flexibility to meet business technology needs through a software driven model will drive WSECU’s competitiveness and continue our drive to best in class member service.

Page 6: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

5 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Preparation: One of the first hurdles to implementing VMware NSX is the cost of licensing – even as an Enterprise Plus customer, the cost per processor to upgrade to vCloud Standard with NSX was significant. Being creative, WSECU looked at options to mitigate the steep license cost and did an assessment of our VMware environment. After running tools to assess typical work load and peak demand, we determined that quadrupling the memory footprint in our quad processor hosts could reduce our processor footprint by 60%, thus reducing the number of VMware NSX licenses required. We also looked at the need for NSX features and created a dedicated cluster for production and staging environments where the business value existed for NSX. Dev and Test workloads were moved to a dedicated cluster of hosts whose virtual servers are isolated by a physical firewall and not exposed to the internet alleviating the need for NSX. By taking these pre-project consolidation tasks, WSECU was able to reduce the overall VMware license upgrade costs by reducing the number of physical hosts requiring more expensive NSX upgraded licenses. Deployment: With VMware NSX, there is a logical progression to implementation. Per VMware best practices, 3 VMware clusters were created to support NSX in both primary and disaster recovery data centers: a management cluster for VMware management servers, monitoring tools and logging; an edge cluster for virtualized “internet” perimeter appliances; a compute cluster for all of the application and infrastructure servers needed to run WSECU’s everyday business. Of these clusters, the compute cluster is licensed for NSX and all virtual servers that require application security and data center resiliency exist in this cluster. The NSX installation is simple and goes right into your production environment without impacting existing services.

3-Cluster VMware Best Practice Design

Page 7: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

6 | P a g e

A NEXT GENERATION INFRASTRUCTURE

Security: The Distributed Firewall (DFW) component of NSX is available immediately after installing the NSX software, allowing WSECU to isolate sensitive data and protect it at its source. Since this was a key selling point for NSX, our first 45 days in the project after installation were dedicated to learning about and configuring DFW rules. There is a significant paradigm shift when you implement firewall rules that block within the application bubble created by our new strategy. The real value to DFW is use dynamic grouping approach. Dynamic groups allow rules to be created and applied to servers based on the name or security tag or even IP address if desired. This allows a system administrator to add a server to an application bubble like online banking by assigning a set of tags to the server. Once this is done the server will be automatically secured in line with the online banking security controls. DFW has enabled WSECU to move away from the physical appliance firewall model that required more hardware, multiple staff for administration and time for requests to go through their lifecycle. DFW is available on every virtual servers and can be extended to physical appliances or servers as well using VMware’s Edge Services Gateway (ESG). This has enabled a very granular security approach for applications and we have moved to a model where we only allow the specific communication required for the application to function. As a result, we can easily segment any confidential or sensitive applications and data regardless of network location. It’s also given us very granular visibility into the traffic flows between any workload in the network. System administrators can easily see traffic to their servers and traffic initiated by their servers on an individual basis. This has increased the staff’s ability to self-serve and sped up any migration efforts by quickly identifying application dependencies. Simplicity: Simplifying the network infrastructure is a natural outcome of NSX. In a physical world, every segment created must be added to each physical network component in order to move traffic across your environment. In NSX, a single transport network is created to the network infrastructure, then the network is virtualized within VMWare to allow for security, routing and switching all within the virtual plane. Creating networks to support applications that are available across your virtual farm are built in seconds, and routed interfaces are placed on virtualized routers. All of these virtual network entities can be fully built through automated scripts. As a router is brought online, dynamic routing support allows easy route distribution to your core network further simplifying the administrative effort to operate and fast convergence for failover. Since everything “network” for an application is software defined, the same software configuration is applied to the DR site making it available for replicated virtual servers in the event of a disaster. WSECU replicates the virtual workloads to our DR data center to enable servers up and running in our DR tests within minutes. With the NSX virtualized networking, the network requirements for the virtual servers are now available at the DR site and when activated, immediately takes advantage of dynamic routing protocols to bring the entire application bubble up in seconds. Performance: While performance is often anecdotal vs factual, our anticipation is that we have achieved better performance with our new compute cluster even though we have introduced guest level firewall capabilities. From an virtual server preparation perspective, increasing the RAM by 4x has made us significantly more efficient in the use of available processing and has almost eliminated the chatter that

Page 8: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

7 | P a g e

A NEXT GENERATION INFRASTRUCTURE

we regularly saw with vMotion attempting to optimize workloads. While vMotion is a mature product and works seamlessly, it obviously impacts processor and I/O availability while guests are being migrated. From a network perspective, we run multiple 10Gb/s links uplinks from each host for network and storage traffic and we have not seen any negative impact using software based routing within NSX. The ability to dynamically inject OSPF routes directly from NSX routing appliances makes integration into existing hardware environments simple and easy to manage. The big performance benefit of our approach comes from mitigating the risk of micro-segmenting our network with firewalls. If we took a traditional approach, every application would have all its traffic inspected by a firewall. In essence, firewalls would be the core of the network. This would have added significant overhead to each network communication, since firewalls don’t work at network speed and there are additional network hops in every transaction. With DFW using NSX, all the of micro-segmentation and firewall inspection happens in the VMWare hypervisor at memory speed. This eliminates all of the extra network hops and the physical firewall overhead of inspecting all traffic, which mitigates the performance risk. To date, we have not detected any application performance issue due to using NSX. Virtualization: Since VMware is the market leader in virtualization, WSECU made a strategic decision that VMware is the best platform to consolidate virtual network appliances. As we researched physical appliance replacements, like F5 load balancers, we required that they be fully supported on VMware and be able to handle WSECUs capacity demands in a virtual footprint. The goal for the replacement of physical network appliances was to eliminate costly hardware lifecycle replacements and end up on a VMware. The edge cluster, identified and deployed early in the project, is the destination for all virtual appliances that have a hardened internet facing interface. Mail gateways, F5 load balancers, Application Firewalls, Citrix NetScaler, Global Traffic Managers, VPN appliances and other perimeter devices are deployed on a hardened VMware cluster. This cluster has two network connections one for external facing networks and the other has only trusted networks. Any traffic that is allowed into the WSECU network is done through a virtual appliance which sits across these two networks within the VMWare cluster. The one edge device that WSECU chose to keep on dedicated physical hardware was the egress firewall. Part of the WSECU security posture improvement included SSL inspection on egress traffic so that client SSL connection traffic could be inspected or restricted if needed and to allow for data loss prevention (DLP) to be applied to encrypted traffic. As we worked with vendors to select our egress firewall replacement and inquired about virtual editions, we found that vendors and consultants all recommended hardware accelerated platforms for SSL inspection of egress traffic. Based on these recommendations, we deployed traditional hardware based egress firewalls. Efficiency: The final project deliverable of the VIP project was to be able to shorten application delivery times and build in disaster recovery within the initial deployment. With the purchase of VMWare’s vCloud we were able to enhance visibility and empower automation. As new applications are designed for deployment at WSECU, the specific virtualized networks are replicated to the DR site. Virtual workloads

Page 9: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

8 | P a g e

A NEXT GENERATION INFRASTRUCTURE

deployed at the production data center are also replicated to the secondary data center, where they start up on a matching networks without any changes needed. Then through scripting and dynamic routing the virtual server is available in the secondary data center as if nothing changed. New applications are deployed in a structured manner with a failover script that allows each to be failed over on demand. Applications can now be regularly tested for failover and can be dynamically failed back to the primary site quickly and easily. This removes any question as to whether critical applications can be recovered in WSECU’s DR test scenarios. Lessons Learned: As we dove into this large project to re-architect our infrastructure with a very different way of thinking, we knew challenges would come up. NSX is still very new technology and with limited customer adoption due to cost. We worked closely with VMWare’s product management through deployment, it was evident that components of the technology were not quite production ready yet. Specifically the Distributed Firewall UI is still very early in maturity. To the staff working with the firewall UI is like taking a step back about 8 years. Luckily we were able to develop a set of standards for building policies through the use of dynamic tagging that has eased the burden left behind by the beta feel of the UI. We also knew this was going to take a large effort on the part of all staff to achieve. What we have found is that it’s a much bigger effort than we had anticipated, because we needed to know each application’s configuration and communication patterns in detail. Each application has to be essentially re-deployed with support from development teams and vendors in order to fully understand the impacts of the migration into the new architecture. But, this has provided us the opportunity to better understand our applications and document them appropriately as we migrate each one. The new architecture has also shown that documentation of our applications can follow a repeatable templated approach, removing variance in the build out of solutions. Finally, an area that really came to light through the process is that we have a great technical team, where everyone was willing to pitch in to help in the implementation. However, expecting staff to now bridge from a network/service approach to an application centric approach with new tools and technology has brought a big learning curve for each technician. We have since instituted weekly discussions on VIP to bring everyone up to speed, and has required a cadence of constant conversations and even 1 on 1 coaching from the architects to sit side by side as an application is broken down for migration. In all the lessons we have learned have all been great opportunities and have made us stronger. Conclusion: Virtualized networks have enabled WSECU to respond to application network demands in minutes versus days and have shifted 80% of the network implementation tasks to the virtual environment from the physical. Once you have a strong core routing, WAN and egress infrastructure in place, all other configuration for VMware hosted virtual servers takes place within the vCenter Web interface and is deployed on demand, often by server engineers and system administrators, freeing up the traditional network engineers to focus on internet connectivity, infrastructure capacity, traffic inspection and security. Resiliency has also been dynamically changed, with WSECU now viewing DR testing of applications a repeatable task via a service catalog vs a yearly audit requirement. Failovers that previously took 4 to 5

Page 10: A Next Generation Infrastructure - CUNA ... - CUNA  · PDF fileA Next Generation Infrastructure . 1 ... A new way of managing our infrastructure, ... hardware replacements costs

9 | P a g e

A NEXT GENERATION INFRASTRUCTURE

hours are now migrated in 90 seconds per application. We no longer need to impact all departments and members, but can do application specific DR tests to validate functionality. The disciplines implemented in the VIP project are now ongoing processes for new application deployments. With our application focused infrastructure, WSECU is positioned ourselves to be ready for the dynamic demands of our business. This was demonstrated when we deployed our new Member Identify Management (MIM) project, which contains member sensitive data and access controls for all our online members. The MIM project infrastructure, deployed in a MIM application bubble, was documented and then deployed in hours across development, test, staging and production environments. This removed the time and cost from the project to purchase, configure and implement physical infrastructure, which are the initial critical path of a projects. By having the new system environments ready in hours it made our development, vendor and business resources ready to start work on the first week of the project and not 1-2 months later. Now that is starting a critical business project on the right foot!!!