The Science DMZ: A network design patterndownloads.hindawi.com/journals/sp/2014/701405.pdf · The Science DMZ: A network design pattern for data-intensive science1 Eli Darta,∗,LaurenRotman

Scientific Programming 22 (2014) 173–185 173DOI 10.3233/SPR-140382IOS Press

The Science DMZ: A network design patternfor data-intensive science 1

Eli Dart a,∗, Lauren Rotman a, Brian Tierney a, Mary Hester a and Jason Zurawski b

a Energy Sciences Network, Lawrence Berkeley National Laboratory, Berkeley, CA, USAE-mails: {eddart, lbrotman, bltierney, mchester}@lbl.govb Internet2, Office of the CTO, Washington DC, USAE-mail: [email protected]

Abstract. The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networksto interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedesscientific progress. The Science DMZ paradigm comprises a proven set of network design patterns that collectively address theseproblems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity,and performance tools, that creates an optimized network environment for science. We describe use cases from universities,supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operationalsettings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodatesemerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.

Keywords: High performance networking, perfsonar, data-intensive science, network architecture, measurement

1. Introduction

A design pattern is a solution that can be applied toa general class of problems. This definition, originat-ing in the field of architecture [1,2], has been adoptedin computer science, where the idea has been usedin software designs [6] and in our case network de-signs. The network design patterns we discuss are fo-cused on high end-to-end network performance fordata-intensive science applications. These patterns fo-cus on optimizing the network interactions betweenwide area networks, campus networks, and computingsystems.

The Science DMZ model, as a design pattern, canbe adapted to solve performance problems on any ex-isting network. Of these performance problems, packetloss has proven to be the most detrimental as it causesan observable and dramatic decrease in data through-put for most applications. Packet loss can be caused

1This paper received a nomination for the Best Paper Award at theSC2013 conference and is published here with permission of ACM.

*Corresponding author: Eli Dart, Energy Sciences Network,Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.E-mail: [email protected].

by many factors including: firewalls that cannot ef-fectively process science traffic flows; routers andswitches with inadequate burst capacity; dirty optics;and failing network and system components. In addi-tion, another performance problem can be the miscon-figuration of data transfer hosts, which is often a con-tributing factor in poor network performance.

Many of these problems are found on the local areanetworks, often categorized as “general-purpose” net-works, that are not designed to support large sciencedata flows. Today many scientists are relying on thesenetwork infrastructures to share, store, and analyzetheir data which is often geographically dispersed.

The Science DMZ provides a design pattern de-veloped to specifically address these local area net-work issues and offers research institutions a frame-work to support data-intensive science. The ScienceDMZ model has been broadly deployed and has al-ready become indispensable to the present and futureof science workflows.

The Science DMZ provides:

• A scalable, extensible network infrastructure freefrom packet loss that causes poor TCP perfor-mance;

1058-9244/14/$27.50 © 2014 – IOS Press and the authors. All rights reserved

174 E. Dart et al. / The Science DMZ: A network design pattern for data-intensive science

• Appropriate usage policies so that high-performance applications are not hampered byunnecessary constraints;

• An effective “on-ramp” for local resources to ac-cess wide area network services; and

• Mechanisms for testing and measuring, therebyensuring consistent performance.

This paper will discuss the Science DMZ from itsdevelopment to its role in future technologies. First,Section 2 will discuss the Science DMZ’s originaldevelopment in addressing the performance of TCP-based applications. Second, Section 3 enumerates thecomponents of the Science DMZ model and how eachcomponent adds to the overall paradigm. Next, Sec-tions 4 and 5 offer some sample illustrations of net-works that vary in size and purpose. Following, Sec-tion 6 will discuss some examples of Science DMZ im-plementations from the R&E community. And lastly,Section 7 highlights some future technological ad-vancements that will enhance the applicability of theScience DMZ design.

2. Motivation

When developing the Science DMZ, several keyprinciples provided the foundation to its design. First,these design patterns are optimized for science. Thismeans the components of the system – including allthe equipment, software and associated services – areconfigured specifically to support data-intensive sci-ence. Second, the model is designed to be scalable inits ability to serve institutions ranging from large ex-perimental facilities to supercomputing sites to multi-disciplinary research universities to individual researchgroups or scientists. The model also scales to serve agrowing number of users at those facilities with an in-creasing and varying amount of data over time. Lastly,the Science DMZ model was created with future inno-vation in mind by providing the flexibility to incorpo-rate emerging network services. For instance, advancesin virtual circuit services, 100 Gigabit Ethernet, andthe emergence of software-defined networking presentnew and exciting opportunities to improve scientificproductivity. In this section, we will mostly discuss thefirst principle since it is the driving mission for the Sci-ence DMZ model.

The first principle of the model is to optimize thenetwork for science. To do this, there are two entitiesor areas of the network that should be considered: the

wide area network and the local area networks. Thewide area networks (or WANs) are often already op-timized and can accommodate large data flows up to100 Gbps. However, the local area networks are still achoke point for these large data flows.

Local area networks are usually general-purposenetworks that support multiple missions, the first ofwhich is to support the organization’s business op-erations including email, procurement systems, webbrowsing, and so forth. Second, these general networksmust also be built with security that protects finan-cial and personnel data. Meanwhile, these networksare also used for research as scientists depend on thisinfrastructure to share, store, and analyze data frommany different sources. As scientists attempt to runtheir applications over these general-purpose networks,the result is often poor performance, and with the in-crease of data set complexity and size, scientists oftenwait hours, days, or weeks for their data to arrive.

Since many aspects of general-purpose networks aredifficult or impossible to change in the ways necessaryto improve their performance, the network architecturemust be adapted to accommodate the needs of scienceapplications without affecting mission critical businessand security operations. Some of these aspects that aredifficult to change might include the size of the mem-ory buffers for individual interfaces; mixed traffic pat-terns between mail and web traffic that would includescience data; and emphasis on availability vs. perfor-mance and what can be counted on over time for net-work availability.

The Science DMZ model has already been im-plemented at various institutions to upgrade thesegeneral-purpose, institutional networks. The NationalScience Foundation (NSF) recognized the ScienceDMZ as a proven operational best practice for univer-sity campuses supporting data-intensive science andspecifically identified this model as eligible for fund-ing through the Campus Cyberinfrastructure–NetworkInfrastructure and Engineering Program (CC–NIE).2

This program was created in 2012 and has since beenresponsible for implementing approximately 20 Sci-ence DMZs at different locations – thereby serving theneeds of the science community. Another NSF solicita-tion was released in 2013 and awards to fund a similarnumber of new Science DMZ’s are expected.

2NSF’s CC–NIE Program: http://www.nsf.gov/pubs/2013/nsf13530/nsf13530.html.

E. Dart et al. / The Science DMZ: A network design pattern for data-intensive science 175

Fig. 1. Graph shows the TCP throughput vs. round-trip time (latency) with packet loss between 10 Gbps connected hosts, as predicted by theMathis Equation. The topmost line (shown in purple) shows the throughput for TCP in a loss-free environment. (The colors are visible in theonline version of the article; http://dx.doi.org/10.3233/SPR-140382.)

2.1. TCP performance

The Transmission Control Protocol (TCP) [15] ofthe TCP/IP protocol suite is the primary transport pro-tocol used for the reliable transfer of data between ap-plications. TCP is used for email, web browsing, andsimilar applications. Most science applications are alsobuilt on TCP, so it is important that the networks areable to work with these applications (and TCP) to op-timize the network for science.

TCP is robust in many respects – in particular it hassophisticated capabilities for providing reliable datadelivery in the face of packet loss, network outages,and network congestion. However, the very mecha-nisms that make TCP so reliable also make it per-form poorly when network conditions are not ideal. Inparticular, TCP interprets packet loss as network con-gestion, and reduces its sending rate when loss is de-tected. In practice, even a tiny amount of packet loss isenough to dramatically reduce TCP performance, andthus increase the overall data transfer time. When ap-plied to large tasks, this can mean the difference be-tween a scientist completing a transfer in days ratherthan hours or minutes. Therefore, networks that sup-port data-intensive science must provide TCP-basedapplications with loss-free service if TCP-based appli-cations are to perform well in the general case.

As an example of TCP’s sensitivity, consider the fol-lowing case. In 2012, Department of Energy’s (DOE)Energy Sciences Network (ESnet) had a failing 10Gbps router line card that was dropping 1 out of 22,000packets, or 0.0046% of all traffic. Assuming the linecard was working at peak efficiency, or 812,744 regu-lar sized frames per second,3 37 packets were lost each

3Performance Metrics, http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html.

second due to the loss rate. While this only resultedin an overall drop of throughput of 450 Kbps (on thedevice itself), it reduced the end-to-end TCP perfor-mance far more dramatically as demonstrated in Fig. 1.This packet loss was not being reported by the router’sinternal error monitoring, and was only noticed usingthe owamp active packet loss monitoring tool, which ispart of the perfSONAR Toolkit.4

Because TCP interprets the loss as network conges-tion, it reacts by rapidly reducing the overall sendingrate. The sending rate then slowly recovers due to thedynamic behavior of the control algorithms. Networkperformance can be negatively impacted at any pointduring the data transfer due to changing conditions inthe network. This problem is exacerbated as the la-tency increases between communicating hosts. This isoften the case when research collaborations sharingdata are geographically distributed. In addition, feed-back regarding the degraded performance takes longerto propagate between the communicating hosts.

The relationship between latency, data loss, and net-work capability was described by Mathis et al. as amechanism to predict overall throughput [12]. The“Mathis Equation” states that maximum TCP through-put is at most:

maximum segment sizeround-trip time

× 1√packet loss rate

.

(1)

Figure 1 shows the theoretical rate predicted by theMathis Equation, along with the measured rate for bothTCP-Reno and TCP-Hamilton across ESnet. These

4perfSONAR Toolkit: http://psps.perfsonar.net.


tests are between 10 Gbps connected hosts configuredto use 9 KByte (“Jumbo Frame”) Maximum Transmis-sion Units (MTUs).

This example is indicative of the current opera-tional reality in science networks. TCP is used forthe vast majority of high-performance science applica-tions. Since TCP is so sensitive to loss, a science net-work must provide TCP with a loss-free environment,end-to-end. This requirement, in turn, drives a set ofdesign decisions that are key components of the Sci-ence DMZ model.

3. The science DMZ design pattern

The overall design pattern or paradigm of the Sci-ence DMZ is comprised of four sub-patterns. Each ofthese sub-patterns offers repeatable solutions for fourdifferent areas of concern: proper location (in networkterms) of devices and connections; dedicated systems;performance measurement; and appropriate securitypolicies. These four sub-patterns will be discussed inthe following subsections.

3.1. Proper location to reduce complexity

The physical location of the Science DMZ (or “lo-cation design pattern") is important to consider duringthe deployment process. The Science DMZ is typicallydeployed at or near the network perimeter of the in-stitution. The reason for this is that it is important toinvolve as few network devices as reasonably possiblein the data path between the experiment at a sciencefacility, the Science DMZ, and the WAN.

Network communication between applications run-ning on two hosts traverses, by definition, the hoststhemselves and the entire network infrastructure be-tween the hosts. Given the sensitivity of TCP to packetloss (as discussed in Section 2.1), it is important to en-sure that all the components of the network path be-tween the hosts are functioning properly and config-ured correctly. Wide area science networks are typi-cally engineered to perform well for science applica-tions, and in fact the Science DMZ model assumes thatthe wide area network is doing its job. However, thelocal network is often complex, and burdened with thecompromises inherent in supporting multiple compet-ing missions. The location design pattern accomplishestwo things. The first is separation from the rest of thegeneral network, and the second is reduced complexity.

There are several reasons to separate the high-performance science traffic from the rest of the net-work. The support of high-performance applicationscan involve the deployment of highly capable equip-ment that would be too expensive to use throughout thegeneral-purpose network but that has necessary fea-tures such as high-performance filtering capabilities,sufficient buffering for burst capacity, and the ability toaccurately account for packets that traverse the device.In some cases, the configuration of the network devicesmust be changed to support high-speed data flows – anexample might be conflict between quality of servicesettings for the support of enterprise telephony and theburst capacity necessary to support long-distance high-performance data flows. In addition, the location pat-tern makes the application of the appropriate securitypattern significantly easier (see Section 3.4).

The location design pattern can also significantlyreduce the complexity of the portion of the networkused for science applications. Troubleshooting is time-consuming, and there is a large difference in opera-tional cost and time-to-resolution between verifyingthe correct operation of a small number of routers andswitches and tracing the science flow through a largenumber of network devices in the general-purpose net-work of a college campus. For this reason, the Sci-ence DMZ is typically located as close to the networkperimeter as possible, i.e. close to or directly connectedto the border router that connects the research institu-tion’s network to the wide area science network.

3.2. Dedicated systems: The Data Transfer Node(DTN)

Systems used for wide area science data transfersperform far better if they are purpose-built for and ded-icated to this function. These systems, which we calldata transfer nodes (DTNs), are typically PC-basedLinux servers constructed with high quality compo-nents and configured specifically for wide area datatransfer. The DTN also has access to storage resources,whether it is a local high-speed disk subsystem, a con-nection to a local storage infrastructure, such as a stor-age area network (SAN), or the direct mount of a high-speed parallel file system such as Lustre5 or GPFS.6

The DTN runs the software tools used for high-speeddata transfer to remote systems. Some typical softwarepackages include GridFTP7 [3] and its service-oriented

5Lustre, http//www.lustre.org/.6GPFS, http://www.ibm.com/systems/software/gpfs/.7GridFTP, http://www.globus.org/datagrid/gridftp.html.


front-end Globus Online8 [4], discipline-specific toolssuch as XRootD,9 and versions of default toolsets suchas SSH/SCP with high-performance patches10 applied.

DTNs are widely applicable in diverse science en-vironments. For example, DTNs are deployed to sup-port Beamline 8.3.2 at Berkeley Lab’s Advanced LightSource,11 and as a means of transferring data to andfrom a departmental cluster. On a larger scale, setsof DTNs are deployed at supercomputer centers (forexample at the DOE’s Argonne Leadership Comput-ing Facility,12 the National Energy Research ScientificComputing Center,13 and Oak Ridge Leadership Com-puting Facility14) to facilitate high-performance trans-fer of data both within the centers and to remote sites.At even larger scales, large clusters of DTNs providedata service to the Large Hadron Collider (LHC)15 col-laborations. The Tier-116 centers deploy large numbersof DTNs to support thousands of scientists. These aresystems dedicated to the task of data transfers so thatthey provide reliable, high-performance service to sci-ence applications.17

DTNs typically have high-speed network interfaces,but the key is to match the DTN to the capabilities ofthe wide area network infrastructure. For example, ifthe network connection from the site to the WAN is1 Gigabit Ethernet, a 10 Gigabit Ethernet interface onthe DTN may be counterproductive. The reason for thisis that a high-performance DTN can overwhelm theslower wide area link causing packet loss.

The set of applications that run on a DTN is typi-cally limited to parallel data transfer applications likeGridFTP or FDT.18 In particular, user-agent applica-tions associated with general-purpose computing andbusiness productivity (e.g., email clients, document ed-itors, media players) are not installed. This is for tworeasons. First, the dedication of the DTN to data trans-fer applications produces more consistent behavior andavoids engineering trade-offs that might be part of sup-porting a larger application set. Second, data transferapplications are relatively simple from a network secu-

8Globus Online, https://www.globusonline.org/.9XRootD, http://xrootd.slac.stanford.edu/.10HPN-SSH, http://www.psc.edu/networking/projects/hpn-ssh/.11LBNL ALS, http://www-als.lbl.gov.12ALCF, https://www.alcf.anl.gov.13NERSC, http://www.nersc.gov.14OLCF, http://www.olcf.ornl.gov/.15LHC, http://lhc.web.cern.ch/lhc/.16US/LHC, http://www.uslhc.us/The_US_and_the_LHC/

Computing.17LHCOPN, http://lhcopn.web.cern.ch/lhcopn/.18FTD, http://monalisa.cern.ch/FDT/.

rity perspective, and this makes the appropriate secu-rity policy easier to apply (see Section 3.4).

Because the design and tuning of a DTN can betime-consuming for small research groups, ESnet hasa DTN Tuning guide19 and a Reference DTN Imple-mentation guide.20 The typical engineering trade-offsbetween cost, redundancy, performance and so on. ap-ply when deciding on what hardware to use for a DTN.In general, it is recommended that DTNs be procuredand deployed such that they can be expanded to meetfuture storage requirements.

3.3. Performance monitoring

Performance monitoring is critical to the discov-ery and elimination of so-called “soft failures” in thenetwork. Soft failures are problems that do not causea complete failure that prevents data from flowing(like a fiber cut), but causes poor performance. Ex-amples of soft failures include packet loss due to fail-ing components; dirty fiber optics; routers forward-ing packets using the management CPU rather thanthe high-performance forwarding hardware; and inad-equate hardware configuration. Soft failures often goundetected for many months or longer, since most net-work management and error reporting systems are op-timized for reporting “hard failures”, such as loss ofa link or device. Also, many scientists do not knowwhat level of performance to expect, and so they do notknow when to alert knowledgeable staff about a poten-tial problem.

A perfSONAR host [16] helps with fault diagno-sis on the Science DMZ. It offers end-to-end testingwith collaborating sites that have perfSONAR tools in-stalled, which allows for multi-domain troubleshoot-ing. perfSONAR is a network monitoring softwaresuite designed to conduct both active and passive net-work measurements, convert these to a standard for-mat, and then publish the data so it is publicly acces-sible. The perfSONAR host can run continuous checksfor latency changes and packet loss using OWAMP,21

as well as periodic “throughput” tests (a measure ofavailable network bandwidth) using BWCTL.22 If aproblem arises that requires a network engineer totroubleshoot the routing and switching infrastructure,

19DTN Tuning, http://fasterdata.es.net/science-dmz/DTN/tuning/.

20Reference DTN, http://fasterdata.es.net/science-dmz/data-transfer-node-reference-implementation/.

21OWAMP, http://www.internet2.edu/performance/owamp/.22BWCTL, http://www.internet2.edu/performance/bwctl/.


Fig. 2. Regular perfSONAR monitoring of the ESnet infrastructure.The color scales denote the “degree” of throughput for the data path.Each square is halved to show the traffic rate in each direction be-tween test hosts. (Colors are visible in the online version of the arti-cle; http://dx.doi.org/10.3233/SPR-140382.)

the tools necessary to work the problem are alreadydeployed – they need not be installed before trou-bleshooting can begin.

By deploying a perfSONAR host as part of the Sci-ence DMZ architecture, regular active network testingcan be used to alert network administers when packetloss rates increase, or throughput rates decrease. Thisis demonstrated by “dashboard” applications, as seenin Fig. 2. Timely alerts and effective troubleshootingtools significantly reduce the time and effort requiredto isolate the problem and resolve it. This makes highperformance the norm for science infrastructure, andprovides significant productivity advantages for data-intensive science experiments.

3.4. Appropriate security

Network and computer security are of critical im-portance for many organizations. Science infrastruc-tures are no different than any other information infras-tructure. They must be secured and defended. The Na-tional Institute for Standards and Technology (NIST)framework for security uses the CIA concepts – Confi-dentiality, Integrity, and Availability.23 Data-intensivescience adds another dimension – performance. If thescience applications cannot achieve adequate perfor-mance, the science mission of the infrastructure hasfailed. Many of the tools in the traditional network se-curity toolbox do not perform well enough for use inhigh-performance science environments. Rather thancompromise security or compromise performance, the

23FIPS-199, http://csrc.nist.gov/publications/PubsFIPS.html.

Science DMZ model addresses security using a multi-pronged approach.

The appropriate security pattern is heavily depen-dent on the location and the dedicated systems pat-terns. By deploying the Science DMZ in a separate lo-cation in the network topology, the traffic in the Sci-ence DMZ is separated from the traffic on the rest ofthe network (i.e., email, etc.), and security policy andtools can be applied specifically to the science-onlytraffic on the Science DMZ. The use of dedicated sys-tems limits the application set deployed on the ScienceDMZ, and also reduces the attack surface.

A comprehensive network security capability usesmany tools and technologies, including network andhost intrusion detection systems, firewall appliances,flow analysis tools, host-based firewalls, router accesscontrol lists (ACLs), and other tools as needed. Appro-priate security policies and enforcement mechanismsare designed based on the risk levels associated withhigh-performance science environments and built us-ing components that scale to the data rates requiredwithout causing performance problems. Security for adata-intensive science environment can be tailored forthe data transfer systems on the Science DMZ.

Science DMZ resources are designed to interactwith external systems, and are isolated from (or havecarefully managed access to) internal systems. Thismeans the security policy for the Science DMZ can betailored for this purpose. Users at the local site who ac-cess resources on their local Science DMZ through thelab or campus perimeter firewall will typically get rea-sonable performance, since the latency between the lo-cal users and the local Science DMZ is low (even if thefirewall causes some loss), TCP can recover quickly.

4. Sample designs

As a network design paradigm, the individual pat-terns of the Science DMZ can be combined in manydifferent ways. The following examples of the overallScience DMZ model are presented as illustrations ofthe concepts using notional network diagrams of vary-ing size and functionality.

4.1. Simple Science DMZ

A simple Science DMZ has several essentialcomponents. These include dedicated access to high-performance wide area networks, high-performancenetwork equipment, DTNs, and monitoring infrastruc-


Fig. 3. Example of the simple Science DMZ. Shows the data paththrough the border router and to the DTN (shown in green). Thecampus site access to the Science DMZ resources is shown in red.(The colors are visible in the online version of the article; http://dx.doi.org/10.3233/SPR-140382.)

ture provided by perfSONAR. These components areorganized in an abstract diagram with data paths inFig. 3.

The DTN is connected directly to a high-performance Science DMZ switch or router, which isattached to the border router. By attaching the ScienceDMZ to the border router, it is much easier to guar-antee a packet loss free path to the DTN, and to cre-ate virtual circuits that extend all the way to the endhost. The DTN’s job is to efficiently and effectivelymove science data between the local environment andremote sites and facilities. The security policy enforce-ment for the DTN is done using access control lists(ACLs) on the Science DMZ switch or router, not on aseparate firewall. The ability to create a virtual circuitall the way to the host also provides an additional layerof security. This design is suitable for the deploymentof DTNs that serve individual research projects or tosupport one particular science application. An exam-ple use case of the simple Science DMZ is discussedin Sections 6.1 and 6.2.

4.2. Supercomputer center network

The notional diagram shown in Fig. 4 illustrates asimplified supercomputer center network. While thismay not look much like the simple Science DMZ dia-gram in Fig. 3, the same principles are used in its de-sign.

Many supercomputer centers already use the Sci-ence DMZ model. Their networks are built to handle

Fig. 4. Example supercomputer center built as a Science DMZ. (Col-ors are visible in the online version of the article; http://dx.doi.org/10.3233/SPR-140382.)

high-rate data flows without packet loss, and designedto allow easy troubleshooting and fault isolation. Testand measurement systems are integrated into the in-frastructure from the beginning, so that problems canbe located and resolved quickly, regardless of whetherthe local infrastructure is at fault. Note also that accessto the parallel filesystem by wide area data transfersis via data transfer nodes that are dedicated to widearea data transfer tasks. When data sets are transferredto the DTN and written to the parallel filesystem, thedata sets are immediately available on the supercom-puter resources without the need for double-copyingthe data. Furthermore, all the advantages of a DTN –i.e., dedicated hosts, proper tools, and correct config-uration – are preserved. This is also an advantage inthat the login nodes for a supercomputer need not havetheir configurations modified to support wide area datatransfers to the supercomputer itself. Data arrives fromoutside the center via the DTNs and is written to thecentral filesystem. The supercomputer login nodes donot need to replicate the DTN functionality in orderto facilitate data ingestion. A use case is described inSection 6.4.

4.3. Big data site

For sites that handle very large data volumes (e.g.,for large-scale experiments such as the LHC), individ-ual data transfer nodes are not enough. These sites de-ploy data transfer “clusters”, and these groups of ma-chines serve data from multi-petabyte data storage sys-tems. Still, the principles of the Science DMZ apply.Dedicated systems are still used for data transfer, andthe path to the wide area is clean, simple, and easy to


Fig. 5. Example of an extreme data cluster. The wide area data pathcovers the entire network front-end, similar to the supercomputercenter model. (Colors are visible in the online version of the article;http://dx.doi.org/10.3233/SPR-140382.)

troubleshoot. Test and measurement systems are inte-grated in multiple locations to enable fault isolation.This network is similar to the supercomputer center ex-ample in that the wide area data path covers the entirenetwork front-end, as shown in Fig. 5.

This network has redundant connections to the re-search network backbone, each of which is capableof both routed IP and virtual circuit services. The en-terprise portion of the network takes advantage of thehigh-capacity redundant infrastructure deployed forserving science data, and deploys redundant firewallsto ensure uptime. However, the science data flows donot traverse these devices. Appropriate security con-trols for the data service are implemented in the rout-ing and switching plane. This is done both to keep thefirewalls from causing performance problems and be-cause the extremely high data rates are typically be-yond the capacity of firewall hardware. More discus-sion about the LHC high-volume data infrastructurecan be found in Johnston et al.’s paper presented at the2013 TERENA Networking Conference [9].

5. Network components

When choosing the network components for a Sci-ence DMZ, it is important to carefully select network-ing hardware that can efficiently handle the high band-width requirements. The most important factor is todeploy routers and switches that have enough queuebuffer space to handle “fan-in” issues, and are prop-erly configured to use this buffer space, as the defaultsettings are often not optimized for bulk data transfers.(The fan-in issue is described in detail at the end of this

section.) One should also look for devices that haveflexible, high-performance ACL (Access Control List)support so that the router or switch can provide ade-quate filtering to eliminate the need for firewall appli-ances. Note that some care must be taken in readingthe documentation supplied by vendors. For example,Juniper Network’s high-performance router ACLs areactually called “firewall filters” in the documentationand the device configuration. In general, it is impor-tant to ask vendors for specifics about packet filtering,interface queues, and other capabilities.

As discussed, two very common causes of packetloss are firewalls and aggregation devices with inad-equate buffering. In order to understand these prob-lems, it is important to remember that a TCP-basedflow rarely runs at its overall “average” speed. Whenobserved closely, it is apparent that most high-speedTCP flows are composed of bursts and pauses. Thesebursts are often very close to the maximum data ratefor the sending host’s interface. This is important, be-cause it means that a 200 Mbps TCP flow betweenhosts with Gigabit Ethernet interfaces is actually com-posed of short bursts at or close to 1 Gbps with pausesin between.

Firewalls are often built with an internal architec-ture that aggregates a set of lower-speed processors toachieve an aggregate throughput that is equal to thespeed of the network interfaces of the firewall. This ar-chitecture works well when the traffic traversing thefirewall is composed of a large number of low-speedflows (e.g., a typical business network traffic profile).However, this causes a problem when a host with a net-work interface that is faster than the firewall’s internalprocessors emerges. Since the firewall must buffer thetraffic bursts sent by the data transfer host until it canprocess all the packets in the burst, input buffer sizeis critical. Firewalls often have small input buffers be-cause that is typically adequate for the traffic profile ofa business network. If the firewall’s input buffers aretoo small to hold the bursts from the science data trans-fer host, the user will suffer severe performance prob-lems caused by packet loss.

Given all the problems with firewalls, one might askwhat value they provide. If the application set is lim-ited to data transfer applications running on a DTN, theanswer is that firewalls provide very little value. Whena firewall administrator is collecting the informationnecessary to allow a data transfer application such asGridFTP to traverse the firewall, the firewall adminis-trator does not configure the firewall to use a special-ized protocol analyzer that provides deep inspection of


the application’s traffic. The firewall administrator asksfor the IP addresses of the communicating hosts, andthe TCP ports that will be used by the hosts to commu-nicate. Armed with that information, the firewall ad-ministrator configures the firewall to permit the traf-fic. Filtering based on IP address and TCP port num-ber can be done on the Science DMZ switch or routerwith ACLs. When done with ACLs on a modern switchor router, the traffic does not need to traverse a fire-wall at all. This is a key point: by running a limited setof applications on the Science DMZ DTNs, the appli-cation profile is such that the Science DMZ can typ-ically be defended well without incurring the perfor-mance penalties of a firewall. This is especially true ifthe ACLs are used in combination with intrusion de-tection systems or other advanced security tools. How-ever, an intrusion detection system should be used evenif a firewall is present.

Aggregation (“fan-in”) problems are related to thefirewall problem in that they too result from the com-bination of the burstiness of TCP traffic and smallbuffers on network devices. However, the fan-in prob-lem arises when multiple traffic flows entering a switchor router from different ingress interfaces are destinedfor a common egress interface. If the speed of the sumof the bursts arriving at the switch is greater than thespeed of the device’s egress interface, the device mustbuffer the extra traffic or drop it. If the device doesnot have sufficient buffer space, it must drop some ofthe traffic, causing TCP performance problems. Thissituation is particularly common in inexpensive LANswitches. Since high-speed packet memory is expen-sive, cheap switches often do not have enough bufferspace to handle anything except LAN traffic. Note thatthe fan-in problem is not unique to coincident bursts. Ifa burst from a single flow arrives at a rate greater thanthe rate available on the egress interface due to existingnon-bursty traffic flows, the same problem exists.

6. Use cases

In this section we give some examples of how ele-ments of the Science DMZ model have been put intopractice. While a full implementation of recommen-dations is always encouraged, many factors influencewhat can and cannot be installed at a given location dueto existing architectural limitations and policy. Theseuse cases highlight the positive outcomes of the designmethodology, and show that the Science DMZ modelis able to please both administrative and scientific con-stituencies.

6.1. University of Colorado, Boulder

The University of Colorado, Boulder campus was anearly adopter of Science DMZ technologies. Their corenetwork features an immediate split into a protectedcampus infrastructure (beyond a firewall), as well asa research network (RCNet) that delivers unprotectedfunctionality directly to campus consumers. Figure 6shows the basic breakdown of this network, along withthe placement of measurement tools provided by perf-SONAR.

The physics department, a participant in the Com-pact Muon Solenoid (CMS)24 experiment affiliatedwith the LHC project, is a heavy user of campus net-work resources. It is common to have multiple streamsof traffic approaching an aggregate of 5 Gbps affiliatedwith this research group. As demand for resources in-creased, the physics group connected additional com-putation and storage to their local network. Figure 7shows these additional 1 Gbps connections as they en-tered into the portion of the RCNet on campus.

Despite the initial care in the design of the network,overall performance began to suffer during heavy usetimes on the campus. Passive and active perfSONARmonitoring alerted that there was low throughput to

Fig. 6. University of Colorado campus network, showing RCNetconnected at the perimeter as a Science DMZ. (Colors are visi-ble in the online version of the article; http://dx.doi.org/10.3233/SPR-140382.)

Fig. 7. University of Colorado Network showing physics group con-nectivity. (Colors are visible in the online version of the article;http://dx.doi.org/10.3233/SPR-140382.)

24CMS, http://cms.web.cern.ch.


downstream facilities, as well as the presence ofdropped packets on several network devices. Furtherinvestigation was able to correlate the dropped packetsto three main factors:

• Increased number of connected hosts,• Increased network demand per host,• Lack of tunable memory on certain network de-

vices in the path.

Replacement hardware was installed to alleviate thisbottleneck in the network, but the problem remainedupon initial observation. After additional investigationby the vendor and performance engineers, it was re-vealed that the unique operating environment (e.g.,high “fan-out” that featured multiple 1 Gbps connec-tions feeding a single 10 Gbps connection) was con-tributing to the problem. Under high load, the switchchanged from cut-through mode to store-and-forwardmode, and the cut-through switch was unable to pro-vide loss-free service in store-and-forward mode.

After a fix was implemented by the vendor and ad-ditional changes to the architecture were implemented,performance returned to near line rate for each memberof the physics computation cluster.

6.2. The Pennsylvania State University & VirginiaTech Transportation Institute

The Pennsylvania State University’s College of En-gineering (CoE) collaborates with many partners onjointly funded activities. The Virginia Tech Transporta-tion Institute (VTTI), housed at Virginia Polytech-nic Institute and State University, is one such part-ner. VTTI chooses to collocate computing and storageresources at Penn State, whose network security andmanagement is implemented by local staff. However,due to policy requirements for collocated equipment,a security mechanism in the form of a firewall was re-quired to protect both the campus and VTTI equip-ment. Shortly after collocation, VTTI users noticedthat performance for hosts connected by 1 Gbps lo-cal connections were limited to around 50 Mbps over-all; this observation was true in either direction of dataflow.

Using perfSONAR, network engineers discoveredthat the size of the TCP window was not growing be-yond the default value of 64 KB, despite the fact thathosts involved in data transfer and measurement test-ing were configured to use auto-tuning – a mechanismthat would allow this value to grow as time, capacity,and demand dictated. To find the correct window size

Fig. 8. Penn State College of Engineering network utilization, col-lected passively from SNMP data. (Colors are visible in the onlineversion of the article; http://dx.doi.org/10.3233/SPR-140382.)

needed to achieve network speeds close to 1 Gbps, thesites were measured at 10 ms away in terms of round-trip latency, which yielded a window size of:

1000 Mb/s8 B/b

∗ 10 ms ∗ 1 s1000 ms

= 1.25 MB. (2)

This theoretical value was 20 times less than the re-quired size. Further investigation into the behavior ofthe network revealed that there was no packet lossobserved along the path, and other perfSONAR testservers on campus showed performance to VTTI thatexceeded 900 Mbps. From some continued perfor-mance monitoring, the investigation began to center onthe performance of the CoE firewall.

A review of the firewall configuration revealed thata setting on the firewall, TCP flow sequence checking,modifies the TCP header field that specifies windowsize (e.g., a clear violation of tcp_window_scaling, setforth in RFC 1323 [8]). Disabling this firewall settingincreased inbound performance by nearly 5 times, andoutbound performance by close to 12 times the originalobservations. Figure 8 is a capture of overall networkutilization to CoE, and shows an immediate increase inperformance after the change to the firewall setting.

Because CoE and VTTI were able to utilize the Sci-ence DMZ resources, like perfSONAR, engineers wereable to locate and resolve the major network perfor-mance problem. Figure 8 also shows that numeroususers, not just VTTI, were impacted by this abnormal-ity. The alteration in behavior allowed TCP to reachhigher levels of throughput, and allowed flows to com-plete in a shorter time than with a limited window.

6.3. The National Oceanic and AtmosphericAdministration

The National Oceanic and Atmospheric Administra-tion (NOAA) in Boulder houses the Earth System Re-search Lab, which supports a “reforecasting” project.The initiative involves running several decades of his-


torical weather forecasts with the same current versionof NOAA’s Global Ensemble Forecast System (GEFS).Among the advantages associated with a long refore-cast data set, model forecast errors can be diagnosedfrom the past forecasts and corrected, thereby dramat-ically increasing the forecast skill, especially in fore-casts of relatively rare events and longer-lead forecast.

In 2010, the NOAA team received an allocationof 14.5 million processor hours at NERSC to per-form this work. In all, the 1984–2012 historical GEFSdataset totaled over 800 TB, stored on the NERSCHPSS archival system. Of the 800 TB at NERSC, theNOAA team sought to bring about 170 TB back toNOAA Boulder for further processing and to make itmore readily available to other researchers. When theNOAA team tried to use an FTP server located behindNOAA’s firewall for the transfers, they discovered thatdata trickled in at about 1–2 MB/s.

Working with ESnet and NERSC, the NOAA teamleveraged the Science DMZ design pattern to set up anew dedicated transfer node enabled with Globus On-line to create a data path unencumbered by legacy fire-walls. Immediately the team saw a throughput increaseof nearly 200 times. The team was able to transfer 273files with a total size of 239.5 GB to the NOAA DTNin just over 10 minutes – approximately 395 MB/s.

6.4. National Energy Research Scientific ComputingCenter

In 2009, both NERSC and OLCF installed DTNs toenable researchers who use their computing resourcesto move large data sets between each facility’s massstorage systems. As a result, WAN transfers betweenNERSC and OLCF increased by at least a factor of20 for many collaborations. As an example, a com-putational scientist in the OLCF Scientific Comput-ing Group who was researching the fundamental nu-clear properties of carbon-14, in collaboration withscientists from Lawrence Livermore National Labora-tory (LLNL) and Iowa State University, had previouslywaited more than an entire workday for a single 33 GBinput file to transfer – just one of the 20 files of sim-ilar size that needed to be moved between the sites.With the improved infrastructure, those researcherswere immediately able to improve their transfer rateto 200 MB/s enabling them to move all 40 TB of databetween NERSC and OLCF in less than three days.

Since 2009, several science collaborations includ-ing those in astrophysics, climate, photon science, ge-nomics and others have benefitted from the Science

DMZ architecture at NERSC. Most recently, it hasenabled high-speed multi-terabyte transfers betweenSLAC Linear Accelerator National Lab’s Linac Co-herent Light Source and NERSC to support proteincrystallography experiments as well as transfers be-tween Beamline 8.3.2 at Berkeley Lab’s AdvancedLight Source and NERSC in support of X-ray tomog-raphy experiments.

7. Future technologies

In addition to solving today’s network performanceproblems, the Science DMZ model also makes it easierto experiment and integrate with tomorrow’s technolo-gies. Technologies such as dynamic “virtual circuits”,software-defined networking (SDN), and 40/100 Gbpsethernet can be deployed in the Science DMZ, elimi-nating the need to deploy these technologies deep in-side campus infrastructure.

7.1. Virtual circuits

Virtual circuit services, such as the ESnet-developedOn-demand Secure Circuits and Reservation System,or OSCARS platform [7,14], can be used to connectwide area layer-2 circuits directly to DTNs, allow-ing the DTNs to receive the benefits of the bandwidthreservation, quality of service guarantees, and traf-fic engineering capabilities. The campus or lab “inter-domain” controller (IDC)25 can provision the localswitch and initiate multi-domain wide area virtual cir-cuit connectivity to provide guaranteed bandwidth be-tween DTN’s at multiple institutions. An example ofthis configuration is the NSF-funded Development ofDynamic Network System (DYNES) [17] project thatis supporting a deployment of approximately 60 uni-versity campuses and regional networks across the US.Virtual circuits also enable the use of new data transferprotocols such as RDMA (remote direct memory ac-cess) over Converge Ethernet (RoCE) [5] on the Sci-ence DMZ DTNs. RoCE has been demonstrated towork well over a wide area network, but only on a guar-anteed bandwidth virtual circuit with minimal com-peting traffic [11]. Kissel et al. show that RoCE canachieve the same performance as TCP (39.5 Gbps fora single flow on a 40GE host), but with 50 times lessCPU utilization.

25IDC, http://www.controlplane.net/.


7.2. 100-Gigabit Ethernet

100 Gigabit Ethernet (GE) technology is being de-ployed by research networks around the world, to sup-port data-intensive science. The NSF CC–NIE pro-gram is increasing the rate of 100GE deployment at UScampuses with solicitations offered in 2012 and 2013.While 100GE promises the ability to support next-generation instruments and facilities, and to conductscientific analysis of distributed data sets at unprece-dented scale, 100GE technology poses significant chal-lenges for the general-purpose networks at researchinstitutions. Once a site is connected to a 100GE back-bone, it would be very costly to distribute this new in-creased bandwidth across internal campus infrastruc-ture. With the Science DMZ model, all hosts need-ing the increased bandwidth are near the border router,making it much easier to benefit from the 100GE con-nection.

7.3. Software-defined networking

Testing and deploying software defined networking,particularly the use of OpenFlow as a platform [13],is a timely example of how the Science DMZ modelcould be used for exploring and hardening new tech-nologies.

Software-defined networking concepts and produc-tion uses of OpenFlow are still in their early stagesof adoption by the community. Many innovative ap-proaches are still being investigated to develop bestpractices for the deployment and integration of theseservices in production environments. ESnet and its col-laborators at Indiana University have demonstrated anOpenFlow-based Science DMZ architecture that inter-operates with a virtual circuit service like OSCARS. Itis easy to set up an OSCARS virtual circuit across theWAN, but plumbing the circuit all the way to the endhost must be done by hand. OpenFlow simplifies thisprocess.

Another promising use of OpenFlow is as a mech-anism to dynamically modify the security policy forlarge flows between trusted sites. Multiple groups havedemonstrated the use of OpenFlow to dynamically by-pass the firewall (e.g., Kissel et al.’s research on SDNwith XSP [10]). Further, one could also use OpenFlowalong with an intrusion detection system (IDS) to sendthe connection setup traffic to the IDS for analysis, andthen once the connection is verified allow the flow tobypass the firewall and the IDS.

8. Conclusion

The Science DMZ model has its roots in opera-tional practices developed over years of experience,and incorporates aspects of network architecture, net-work security, performance tuning, system design, andapplication selection. The Science DMZ, as a designpattern, has already been successfully deployed at mul-tiple sites across the US, and many through NSF fund-ing. The Science DMZ model and its contributing tech-nologies are well-tested and have been effectively usedat supercomputer centers, national laboratories, anduniversities as well as in large-scale scientific experi-ments.

The Science DMZ model provides a conceptualframework for the deployment of networks andnetwork-enabled tools and systems for the effectivesupport of data-intensive science. With many sciencecollaborations moving to large-scale or distributed ex-periments, the purpose of sharing best practices is be-coming more important. This paper shares our workin developing the Science DMZ for the larger sciencecommunity.

Acknowledgements

The authors would like to thank NOAA, NERSC,the Pennsylvania State University, and the Universityof Colorado, Boulder, for their contributions to thiswork.

The authors wish to acknowledge the vision of theNational Science Foundation for its support of the CC–NIE program.

This manuscript has been authored by an author atLawrence Berkeley National Laboratory under Con-tract No. DE-AC02-05CH11231 with the U.S. Depart-ment of Energy. The U.S. Government retains, andthe publisher, by accepting the article for publica-tion, acknowledges, that the U.S. Government retainsa non-exclusive, paid-up, irrevocable, world-wide li-cense to publish or reproduce the published form ofthis manuscript, or allow others to do so, for U.S. Gov-ernment purposes.

Disclaimer

This document was prepared as an account of worksponsored by the United States Government. Whilethis document is believed to contain correct informa-


tion, neither the United States Government nor anyagency thereof, nor the Regents of the University ofCalifornia, nor any of their employees, makes any war-ranty, express or implied, or assumes any legal respon-sibility for the accuracy, completeness, or usefulnessof any information, apparatus, product, or process dis-closed, or represents that its use would not infringe pri-vately owned rights. Reference herein to any specificcommercial product, process, or service by its tradename, trademark, manufacturer, or otherwise, does notnecessarily constitute or imply its endorsement, rec-ommendation, or favoring by the United States Gov-ernment or any agency thereof, or the Regents of theUniversity of California. The views and opinions ofauthors expressed herein do not necessarily state orreflect those of the United States Government or anyagency thereof or the Regents of the University of Cal-ifornia.

References

[1] C. Alexander, The Timeless Way of Building, Oxford Univ.Press, New York, 1979.

[2] C. Alexander, S. Ishikawa and M. Silverstein, A Pattern Lan-guage: Towns, Buildings, Construction, Oxford Univ. Press,New York, 1977.

[3] W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Du-mitrescu, I. Raicu and I. Foster, The globus striped GridFTPframework and server, in: Proceedings of the 2005 ACM/IEEEConference on Supercomputing, SC’05, IEEE Computer Soci-ety, Washington, DC, USA, 2005, p. 54.

[4] B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy,R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett et al.,Software as a service for data scientists, Communications ofthe ACM 55(2) (2012), 81–88.

[5] I.T. Association, InfiniBand. Architecture Specification Re-lease 1.2.1 Annex A16: RoCE, 2010.

[6] E. Gamma, R. Helm, R. Johnson and J. Vlissides, DesignPatterns: Elements of Reusable Object-Oriented Software,Addison-Wesley Longman Publishing, Boston, MA, USA,1995.

[7] C. Guok, D. Robertson, M. Thompson, J. Lee, B. Tierney andW. Johnston, Intra and interdomain circuit provisioning us-ing the OSCARS reservation system, in: Third InternationalConference on Broadband Communications Networks and Sys-tems, IEEE/ICST, October 2006.

[8] V. Jacobson, R. Braden and D. Borman, TCP Extensions forHigh Performance, RFC 1323 (Proposed Standard), May 1992.

[9] W.E. Johnston, E. Dart, M. Ernst and B. Tierney, Enabling highthroughput in widely distributed data management and analy-sis systems: Lessons from the LHC, in: TERENA NetworkingConference (TNC) 2013, June 2013.

[10] E. Kissel, G. Fernandes, M. Jaffee, M. Swany and M. Zhang,Driving software defined networks with XSP, in: Workshop onSoftware Defined Networks (SDN’12), International Confer-ence on Communications (ICC), IEEE, June 2012.

[11] E. Kissel, B. Tierney, M. Swany and E. Pouyoul, Efficient datatransfer protocols for big data, in: Proceedings of the 8th Inter-national Conference on eScience, IEEE, July 2012.

[12] M. Mathis, J. Semke, J. Mahdavi and T. Ott, The macro-scopic behavior of the tcp congestion avoidance algorithm,SIGCOMM Comput. Commun. Rev. 27(3) (1997), 67–82.

[13] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar,L. Peterson, J. Rexford, S. Shenker and J. Turner, Openflow:enabling innovation in campus networks, SIGCOMM Comput.Commun. Rev. 38(2) (2008), 69–74.

[14] I. Monga, C. Guok, W.E. Johnston and B. Tierney, Hybrid net-works: Lessons learned and future challenges based on ESnet4experience, IEEE Communications Magazine, May 2011.

[15] J. Postel, Transmission Control Protocol, Request for Com-ments (Standard) 793, Internet Engineering Task Force,September 1981.

[16] B. Tierney, J. Boote, E. Boyd, A. Brown, M. Grigoriev, J. Met-zger, M. Swany, M. Zekauskas and J. Zurawski, perfSONAR:Instantiating a global network measurement framework, in:SOSP Workshop on Real Overlays and Distributed Systems(ROADS’09), Big Sky, MT, USA, ACM, October 2009.

[17] J. Zurawski, R. Ball, A. Barczyk, M. Binkley, J. Boote,E. Boyd, A. Brown, R. Brown, T. Lehman, S. McKee,B. Meekhof, A. Mughal, H. Newman, S. Rozsa, P. Sheldon,A. Tackett, R. Voicu, S. Wolff and X. Yang, The dynes instru-ment: A description and overview, Journal of Physics: Confer-ence Series 396(4) (2012), 042065.

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


Documents

The Science DMZ: A network design patterndownloads.hindawi.com/journals/sp/2014/701405.pdf · The Science DMZ: A network design pattern for data-intensive science1 Eli Darta,∗,LaurenRotman