Weaver: Language andruntime for software definedenvironments
M. H. KalantarF. Rosenberg
J. DoranT. Eilam
M. D. ElderF. Oliveira
E. C. SnibleT. Roth
Continuous delivery of software and related infrastructureenvironments is a challenging proposition. Typical enterpriseenvironments, comprising distributed software and its supportinginfrastructure, exhibit non-obvious, often implicit dependencies andrequirements. Further increasing this challenge is that knowledgeabout configuration is fragmented and informally recorded. Giventhis situation, we propose Weaver, a domain-specific languagedesigned to formally specify blueprints, desired state descriptions ofenvironments. An associated runtime executes blueprints to createor modify environments through a set of target-specific platformproviders that supply cloud-specific implementations. New andexisting automation to implement and maintain the desired statecan be associated with a blueprint specified in Weaver. Furthermore,Weaver supports the definition of conditions to validate ablueprint at design time and deployment time, as well as tocontinuously validate a deployed environment. We demonstratethe use of Weaver to deploy IBM Connections, an enterprisesocial software platform.
IntroductionTypical enterprise systems comprise a multitude ofdistributed software components that exhibit non-obvious,often implicit dependencies and infrastructure requirements.The deployment and operation of such complex systems,including their applications and infrastructure, are thereforechallenging tasks. Even more challenging is to supportcontinuous delivery Vthat is, to continuously deploythe environment in a test environment that is reasonablysimilar to the actual production environment as partof development and testing efforts and to promote it toproduction when appropriate. Aggravating these challengesis the fact that typically, the knowledge of the configurationof running environments is fragmented and informallyrecorded.We identify two types of configuration dependencies in
these environments: (1) dependencies between differentsoftware components and (2) dependencies between thosesoftware components and the underlying infrastructure.In many cases, the configuration of the infrastructure dependson the requirements of the software. For example, network
firewall configuration depends on the softwarecommunication requirements. The optimization of theplacement of compute resources across data centers dependson bandwidth requirements and constraints on the availabilityof the running system. In addition, modifications to thehardware infrastructure may require software updates.For example, the addition of compute resources toincrease capacity may require the reconfiguration of aload balancer.We observe that enterprises often lack a holistic
knowledge of the entire environment configuration. Thesystem configuration knowledge is often distributed acrossorganizational boundaries, where different teams knowthe details of a subset of the environment components. Whenrecorded, configuration knowledge is kept in multiple,informal documents.Thus, it is no surprise that environment
development expectations and assumptions do not matchoperational reality. The barrier between developmentand operations teams impedes iterative environmentdevelopment, increasing the risk of instability inthe transition to production and in the update of theproduction environment.
Copyright 2014 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done withoutalteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
Digital Object Identifier: 10.1147/JRD.2014.2304865
M. H. KALANTAR ET AL. 10 : 1IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 10 MARCH/MAY 2014
0018-8646/14 B 2014 IBM
Given this situation, we propose Weaver, a language thatallows one to specify a blueprintVa formal description ofthe desired state of an environment. By environment,we mean a distributed application and its supportinginfrastructure. Weaver provides the language constructs thatdescribe all desired aspects of an environment: compute,storage and network resources, services, and software.Automation artifacts are associated with the blueprint (orWeaver program) to implement and maintain the desiredstate. Blueprints specified as Weaver programs can bemanaged via source control; that is, they can be versionedand shared. The execution of a Weaver program validatesthe specified environment, and deploys or updates it.The key goal associated with Weaver is to improve
agility and reduce the risk associated with continuouslydelivering software. Agility is the ability to respond quicklyto changing requirements by introducing new applicationfunctions or making structural changes in the infrastructuretopology, such as adding a firewall for better isolationand security. With the Weaver approach, even suchstructural changes as inserting a virtual firewall are treatedprogrammatically (Bas code[) by editing the Weaverblueprint and re-executing it. In addition, Weaver isdesigned to support effective collaboration between domainexperts, modularity, and re-use of code. Weaver languageconstructs make it easier to map application componentsdifferently on different infrastructures. These designprinciples will be further explored in the section BWeaverlanguage design.[The design of Weaver is motivated and influenced by
the DevOps  discipline. DevOps is a methodology toenhance the collaboration between development andoperations teams by applying development techniques, suchas iterative development, automation, automated testing,and versioning, to both application code and deploymentautomation code. Weaver does not replace the need forlow-level automation building blocks to install and configureindividual software components. Existing scriptinglanguages, including special-purpose configuration languagessuch as Chef  and Puppet , can be used to defineautomation on single nodes. The main objective associatedwith Weaver is to provide a programmable view of theentire environment, including software components that spansystems, and the infrastructure elements that are neededto support them.To validate the concepts of Weaver, we experimented with
using the approach to automate end-to-end a large andcomplex social software systemVIBM Connections . Weuse this system to exemplify the challenges and describe howWeaver successfully addresses them. The next sectiondescribes IBM Connections in more detail. The approachand the Weaver language are then described. Finally,we return to a discussion of the results of our experimentwith IBM Connections and conclude.
Motivating example: IBM ConnectionsIBM Connections consists of a set of social softwareapplications including, for example, community, wiki,personal profile, forum, and file sharing applications. Thisset of hosted applications, offered as a service to all IBMemployees, is extensively used. The instance supportingIBM has had the number of visitors increase by more than110% in less than a year, the number of user profilescurrently exceeds 650,000, more than 600,000 wikis andcommunities have been created, and file storage is growingat 8% per month.Typical deployments of IBM Connections are large and
complex. Figure 1 shows a simplified deployment, yetsufficient to exemplify the complexity. The topology consistsof two IBM HTTP (Hypertext Transfer Protocol) servers(IHS), 16 IBM WebSphere* Application Servers (WAS)grouped in four clusters that can be deployed in varyingsizes, and 1 IBM WebSphere Deployment Manager(DMGR). All of the WAS nodes are connected to an externaldatabase (IBM DB2*) and an external network file system(NFS). The set of social applications are distributedamong the four clustersVeach cluster in the topology hostsa number of them.In addition to the inherit complexity of the topology,
multiple nonfunctional requirements must be addressed. Asthis is a critical business application, it must be highlyavailable and massively scalable. In addition, there are strictnetwork isolation and data privacy requirements. Figure 1also illustrates the implications of the non-functionalrequirements on the topology. Firewalls must be presentbetween the web and application tiers. The clusters must beconfigurable with varying sizes, and the IHS servers must beconfigured for high availability.Like many other large distributed applications, developing,
testing, deploying, and maintaining IBM Connectionspresent several challenges. First, system knowledge isfragmented among different teams of experts (e.g., WAS andDB2 configuration experts), all of whom must participate toimplement changes and updates. Second, dependenciesbetween different software components and between softwareand the supporting infrastructure is not well documentedor verifiable. Figure 2 illustrates some of the temporal anddata dependencies between the different steps required toinstall the IBM Connections stacks. Note that some of thedata dependencies cut horizontally across systems where theoutcome of a step is required in order to properly complete adifferent step in a different stack. These data dependencieshence imply additional temporal dependencies and needfor coordination across the various systems. Finally,non-functional requirements such as availability and securitypose additional requirements on the intersection betweensoftware and systems, for example, th