Weaver: Language andruntime for software definedenvironments
M. H. KalantarF. Rosenberg
J. DoranT. Eilam
M. D. ElderF. Oliveira
E. C. SnibleT. Roth
Continuous delivery of software and related infrastructureenvironments is a challenging proposition. Typical enterpriseenvironments, comprising distributed software and its supportinginfrastructure, exhibit non-obvious, often implicit dependencies andrequirements. Further increasing this challenge is that knowledgeabout configuration is fragmented and informally recorded. Giventhis situation, we propose Weaver, a domain-specific languagedesigned to formally specify blueprints, desired state descriptions ofenvironments. An associated runtime executes blueprints to createor modify environments through a set of target-specific platformproviders that supply cloud-specific implementations. New andexisting automation to implement and maintain the desired statecan be associated with a blueprint specified in Weaver. Furthermore,Weaver supports the definition of conditions to validate ablueprint at design time and deployment time, as well as tocontinuously validate a deployed environment. We demonstratethe use of Weaver to deploy IBM Connections, an enterprisesocial software platform.
IntroductionTypical enterprise systems comprise a multitude ofdistributed software components that exhibit non-obvious,often implicit dependencies and infrastructure requirements.The deployment and operation of such complex systems,including their applications and infrastructure, are thereforechallenging tasks. Even more challenging is to supportcontinuous delivery Vthat is, to continuously deploythe environment in a test environment that is reasonablysimilar to the actual production environment as partof development and testing efforts and to promote it toproduction when appropriate. Aggravating these challengesis the fact that typically, the knowledge of the configurationof running environments is fragmented and informallyrecorded.We identify two types of configuration dependencies in
these environments: (1) dependencies between differentsoftware components and (2) dependencies between thosesoftware components and the underlying infrastructure.In many cases, the configuration of the infrastructure dependson the requirements of the software. For example, network
firewall configuration depends on the softwarecommunication requirements. The optimization of theplacement of compute resources across data centers dependson bandwidth requirements and constraints on the availabilityof the running system. In addition, modifications to thehardware infrastructure may require software updates.For example, the addition of compute resources toincrease capacity may require the reconfiguration of aload balancer.We observe that enterprises often lack a holistic
knowledge of the entire environment configuration. Thesystem configuration knowledge is often distributed acrossorganizational boundaries, where different teams knowthe details of a subset of the environment components. Whenrecorded, configuration knowledge is kept in multiple,informal documents.Thus, it is no surprise that environment
development expectations and assumptions do not matchoperational reality. The barrier between developmentand operations teams impedes iterative environmentdevelopment, increasing the risk of instability inthe transition to production and in the update of theproduction environment.
Copyright 2014 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done withoutalteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
Digital Object Identifier: 10.1147/JRD.2014.2304865
M. H. KALANTAR ET AL. 10 : 1IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 10 MARCH/MAY 2014
0018-8646/14 B 2014 IBM
Given this situation, we propose Weaver, a language thatallows one to specify a blueprintVa formal description ofthe desired state of an environment. By environment,we mean a distributed application and its supportinginfrastructure. Weaver provides the language constructs thatdescribe all desired aspects of an environment: compute,storage and network resources, services, and software.Automation artifacts are associated with the blueprint (orWeaver program) to implement and maintain the desiredstate. Blueprints specified as Weaver programs can bemanaged via source control; that is, they can be versionedand shared. The execution of a Weaver program validatesthe specified environment, and deploys or updates it.The key goal associated with Weaver is to improve
agility and reduce the risk associated with continuouslydelivering software. Agility is the ability to respond quicklyto changing requirements by introducing new applicationfunctions or making structural changes in the infrastructuretopology, such as adding a firewall for better isolationand security. With the Weaver approach, even suchstructural changes as inserting a virtual firewall are treatedprogrammatically (Bas code[) by editing the Weaverblueprint and re-executing it. In addition, Weaver isdesigned to support effective collaboration between domainexperts, modularity, and re-use of code. Weaver languageconstructs make it easier to map application componentsdifferently on different infrastructures. These designprinciples will be further explored in the section BWeaverlanguage design.[The design of Weaver is motivated and influenced by
the DevOps  discipline. DevOps is a methodology toenhance the collaboration between development andoperations teams by applying development techniques, suchas iterative development, automation, automated testing,and versioning, to both application code and deploymentautomation code. Weaver does not replace the need forlow-level automation building blocks to install and configureindividual software components. Existing scriptinglanguages, including special-purpose configuration languagessuch as Chef  and Puppet , can be used to defineautomation on single nodes. The main objective associatedwith Weaver is to provide a programmable view of theentire environment, including software components that spansystems, and the infrastructure elements that are neededto support them.To validate the concepts of Weaver, we experimented with
using the approach to automate end-to-end a large andcomplex social software systemVIBM Connections . Weuse this system to exemplify the challenges and describe howWeaver successfully addresses them. The next sectiondescribes IBM Connections in more detail. The approachand the Weaver language are then described. Finally,we return to a discussion of the results of our experimentwith IBM Connections and conclude.
Motivating example: IBM ConnectionsIBM Connections consists of a set of social softwareapplications including, for example, community, wiki,personal profile, forum, and file sharing applications. Thisset of hosted applications, offered as a service to all IBMemployees, is extensively used. The instance supportingIBM has had the number of visitors increase by more than110% in less than a year, the number of user profilescurrently exceeds 650,000, more than 600,000 wikis andcommunities have been created, and file storage is growingat 8% per month.Typical deployments of IBM Connections are large and
complex. Figure 1 shows a simplified deployment, yetsufficient to exemplify the complexity. The topology consistsof two IBM HTTP (Hypertext Transfer Protocol) servers(IHS), 16 IBM WebSphere* Application Servers (WAS)grouped in four clusters that can be deployed in varyingsizes, and 1 IBM WebSphere Deployment Manager(DMGR). All of the WAS nodes are connected to an externaldatabase (IBM DB2*) and an external network file system(NFS). The set of social applications are distributedamong the four clustersVeach cluster in the topology hostsa number of them.In addition to the inherit complexity of the topology,
multiple nonfunctional requirements must be addressed. Asthis is a critical business application, it must be highlyavailable and massively scalable. In addition, there are strictnetwork isolation and data privacy requirements. Figure 1also illustrates the implications of the non-functionalrequirements on the topology. Firewalls must be presentbetween the web and application tiers. The clusters must beconfigurable with varying sizes, and the IHS servers must beconfigured for high availability.Like many other large distributed applications, developing,
testing, deploying, and maintaining IBM Connectionspresent several challenges. First, system knowledge isfragmented among different teams of experts (e.g., WAS andDB2 configuration experts), all of whom must participate toimplement changes and updates. Second, dependenciesbetween different software components and between softwareand the supporting infrastructure is not well documentedor verifiable. Figure 2 illustrates some of the temporal anddata dependencies between the different steps required toinstall the IBM Connections stacks. Note that some of thedata dependencies cut horizontally across systems where theoutcome of a step is required in order to properly complete adifferent step in a different stack. These data dependencieshence imply additional temporal dependencies and needfor coordination across the various systems. Finally,non-functional requirements such as availability and securitypose additional requirements on the intersection betweensoftware and systems, for example, the presence of firewallsand their proper port configuration and the spreading ofsoftware across physical machines and racks.
10 : 2 M. H. KALANTAR ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 10 MARCH/MAY 2014
As a result of these challenges, the IBM Connections teamhas not yet been successful in fully automating thedeployment and update of the entire production environment.Consequently, they suffer from infrequent release cycles (twoevery year) and require a long manual planning phase tomake any structural changes such as improved isolationbetween tiers.To validate the concept of configuration as Bcode,[ we
fully automated the deployment of IBM Connections for thetopology shown in Figure 1. This effort required severalpeople-months of work including consultations with variousdomain experts and authoring and testing the requiredautomation building blocks (in Chef ) and the Weaverblueprints that reference them. We were able to achieve areliable and repeatable deployment of IBM Connections andto demonstrate many of the goals for which Weaver isdesigned: modularity and reuse, agility, and reduced risk.The lessons we learned are described in the sectionBDiscussion.[
ApproachIn principle, a custom script could be developed to provisioninfrastructure resources in a cloud and to install softwareon those resources. Such an approach works well only forsmall systems that are deployed only once; it is not a scalable
solution. Such scripts are fragile and complex to implementon large systems. Minor modifications, such as the additionof a new cloud resource, may require changes in severalplaces. Furthermore, allowing the concurrent execution ofdifferent installation and configuration operations on differentnodes may be difficult to achieve in a large-scale systemdue to multiple data dependencies. Critically, such anapproach does not enable collaboration and code reuseamong different stakeholders.We propose Weaver as a language that allows developers
to express blueprints as code. A blueprint specifies thedesired state of an environment in terms of its resources,services, and software. A blueprint also referencesautomation scripts needed to implement and maintain thedesired state. An environment, comprising applicationand its infrastructure, is managed as a unit. Environmentsmay be deployed in support of development and testactivities, and for running a production system.Using Weaver, a blueprint is expressed by a set of
Weaver files that comprise a Weaver program. The corelanguage concepts express all relevant resources for anenvironment such as servers, storage, software components,automation scripts, etc. Weaver is an internal domain-specificlanguage (DSL)  built in Ruby. An internal DSL buildsupon the host language and is therefore tightly coupled to
Simplified physical topology for IBM Connections. Replicated IHSs support four WAS clusters, WAS_C1 through WAS_C4. Each cluster containsfour nodes: Ci_M0 through Ci_M3. The clusters are managed by a DMGR and are supported by a database and shared file server. Lines represent validcommunication paths, whereas the red boxes represent firewalls.
M. H. KALANTAR ET AL. 10 : 3IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 10 MARCH/MAY 2014
its syntax and semantics. The use of Ruby as the hostlanguage allows the use of a well-known type system,expression syntax, and semantics that have been adopted bypopular infrastructure-as-code frameworks (e.g., Chef ).The Weaver language is described in detail in the nextsection.The Weaver runtime creates and modifies environments
by executing a blueprint expressed as a Weaver program.The runtime coordinates the creation or modification ofthe resources described by the blueprint via a set ofplatform-specific providers which implement the interactionswith target clouds. With reference to Figure 3, theruntime first creates an in-memory model of the desiredenvironment from the blueprint (Parsing and Transformationcomponent). This model is used to validate the blueprint(Validation component) prior to deployment. The model isthen analyzed to identify relationships between propertyvalues and dependencies. The dependencies are used toderive coordination requirements used during softwareconfiguration. Finally, the in-memory model is traversed tocreate or modify resources using a set External Services andPlatform Providers which provide target cloud specificresource implementations. Weaver currently implementsproviders for the SDI (software defined infrastructure)Controller , OpenStack** , and Amazon ComputeCloud . As virtual machine (VM) instances start they areexecute a startup script to configure the software. The
software configuration is coordinated between instancesusing a Coordinator currently implemented on ApacheZooKeeper** . The coordination ensures that propertyvalues propagated between instances are available whenand where they are needed. After an environment hasbeen deployed, the Persistence component serializesthe in-memory model as the actual system state in theDatastore, currently implemented using ApacheCouchDB .
Weaver language design
Desired state represent...