Joshua New, Jibonananda Sanyal, Mahabir Bhandari, and ...web.eecs.utk.edu/~jnew1/publications/2012_SimBuild.pdf2 emis-sions in 2005 with an annual growth rate of 1.2% from 1990-2005

AUTOTUNE E+ BUILDING ENERGY MODELS

Joshua New, Jibonananda Sanyal, Mahabir Bhandari, and Som ShresthaOak Ridge National Laboratory, Oak Ridge, TN

ABSTRACTThis paper introduces a novel “Autotune” methodol-

ogy under development for calibrating building energymodels (BEM). It is aimed at developing an automatedBEM tuning methodology that enables models to repro-duce measured data such as utility bills, sub-meter, and/orsensor data accurately and robustly by selecting best-match E+ input parameters in a systematic, automated,and repeatable fashion. The approach is applicable to abuilding retrofit scenario and aims to quantify the trade-offs between tuning accuracy and the minimal amount of“ground truth” data required to calibrate the model. Au-totune will use a suite of machine-learning algorithms de-veloped and run on supercomputers to generate calibra-tion functions. Specifically, the project will begin witha de-tuned model and then perform Monte Carlo simula-tions on the model by perturbing the “uncertain” parame-ters within permitted ranges. Machine learning algorithmswill then extract minimal perturbation combinations thatresult in modeled results that most closely track sensordata. A large database of parametric EnergyPlus (E+)simulations has been made publicly available. Autotune iscurrently being applied to a heavily instrumented residen-tial building as well as three light commercial buildings inwhich a “de-tuned” model is autotuned using faux sensordata from the corresponding target E+ model.

INTRODUCTIONIn 2006, the US consumed $220 billion in annual en-

ergy costs with 39% of primary energy (73% of total elec-trical energy) being consumed by buildings; with less than2% of this energy demand being met by renewable re-sources, the US constituted 21% of worldwide CO2 emis-sions in 2005 with an annual growth rate of 1.2% from1990-2005 (U.S. Dept. of Energy, 2010) (Figure 1). Forreasons financial, environmental, and social, the UnitedStates Department of Energy (DOE) has set aggressivegoals for energy efficiency, which constitutes the low-hanging fruit for slight to moderate energy savings in theUS buildings sector.

A central challenge in the domain of energy efficiencyis being able to realistically model specific building typesand scaling those to the entire US building stock (Deru

Figure 1: Summary of US Primary Energy Consumption(U.S. Dept. of Energy, 2010) and Production (U.S. EIA,2009).

et al., 2011) across ASHRAE climate zones (IECC 2009and ASHRAE 90.1-2007; Briggs et al., 2003a,b), thenprojecting how specific policies or retrofit packages wouldmaximize return-on-investment with subsidies throughfederal, state, local, and utility tax incentives, rebates, andloan programs. Nearly all energy efficiency projectionsare reliant upon accurate models as the central primitiveby which to integrate the national impact with meaning-ful measures of uncertainty, error, variance, and risk. Thischallenge is compounded by the fact that retrofits and con-struction of buildings happen one at a time and an individ-ual building is unlikely to closely resemble its prototypi-cal building. Unlike vehicles and aircraft, buildings aregenerally manufactured in the field based on one-off de-

signs and have operational lifetimes of 50-100 years; eachbuilding would need to be modeled uniquely and moreprecisely to determine optimal energy efficiency practices.

This challenge has been partially addressed throughthe many software packages developed for energy mod-eling and software tools which leverage them. Thereare over 20 major software tools with various strengthsand weaknesses in their capability of realistically model-ing the whole-building physics involved in building en-ergy usage (Crawley et al., 2008). The major softwaresupported by DOE is EnergyPlus (E+), constituting ap-proximately 600,000 lines of FORTRAN code. There aremany tools which use similar simulation engines, suchas the National Renewable Energy Laboratory’s (NREL)BEopt (Christensen et al., 2006) and Lawrence Berke-ley National Laboratory’s (LBNL) Home Energy Saver(HES) (Mills, 2008), in order to determine a set of opti-mal retrofit measures. There are many other use cases forenergy simulation engines and tools, some of which arebecoming required by law such as the progressive Cal-ifornia Legislature Assembly Bills AB1103 (CaliforniaEnergy Commission, 2010a) and AB758 (California En-ergy Commission, 2010b) which require energy modelinganytime commercial property changes owners. The in-creasing application of energy software and the accuracyof projected performance is entirely contingent upon thevalidity of input data a sufficiently accurate input modelof an individual building and its use is required.

One of the major barriers to DOE’s Building Tech-nology Program (BTP) goals and the adoption of build-ing energy modeling software is the user expertise, time,and associated costs required to develop a software modelthat accurately reflects reality (codified via measureddata). The sheer cost of energy modeling makes it some-thing that is primarily done by researchers and for largeprojects. It is not a cost that the retrofit market or most usecases would absorb in the foreseeable future without dras-tic reductions in the cost of having cheaper and more ac-curate model generation. This weak business case, alongwith concerns regarding the cost for upkeep, maintenance,and support of the very capable E+ simulation engine,has driven DOE sponsors to investigate facilitating tech-nologies that would enable the energy modeler and retrofitpractitioner in the field.

The business-as-usual approach for modeling whole-building energy consumption involves a building modelerusing the software tool they have most experience with tocreate the geometry of a building, layer it with detailedmetrics encoding material properties, adding equipmentcurrently or expected to be in the building, with antici-pated operational schedules. An E+ building model has∼ 3,000 inputs for a normal residential building with veryspecific details that most energy modelers do not have

Figure 2: Autotune workflow for E+ building energy mod-els as a cost-effective solution for generating accurate in-put models.

the sources of data for. Experimentation has establishedthat even the ASHRAE handbook and manufacturer’s la-bel data are not reliable due to substantial product vari-ability for some materials (DeWit, 2001). This is com-pounded by the fact that there is always a gap between theas-designed and as-built structure (e.g., contractors mayneglect to fill one of the corner wall cavities with insula-tion). Due to the sources of variance involved in the inputprocess, it should come as no surprise that building mod-els must often be painstakingly tuned manually to matchmeasured data. This tuning process is highly subjectiveand repeatable across neither modelers nor software pack-ages. An automated self-calibration mechanism capableof handling intense sub-metering data is called for.

The development of an autotuning capability (Figure2) to intelligently adapt building models or templates tobuilding performance data would significantly facilitatemarket adoption of energy modeling software, aid in ac-curate use cases such as the effective retrofit strategies forexisting buildings, and promote BTP’s goals of increasedmarket penetration for energy modeling capabilities. Theidea of self-calibrating energy models has been around fordecades and expertly consolidated in an ASHRAE reporton the subject (Reddy et al., 2006), but is generally lack-ing in its employ of machine learning algorithms or simi-lar autonomous application of modern technology. In thisinitial paper, we discuss the general methodology behindthe Autotune project, specific technologies enabling itsimplementation, and preliminary data generation resultsincluding a large database of parametric E+ simulationsavailable for general use.

SIMULATION/EXPERIMENTThe goal of the Autotune project is to save building

modelers time spent tweaking building input parametersto match ground-truth data by providing an ”autotune”

Figure 3: A virtual building model (software space) and a real building (sensor space), when viewed as vectors ofnumbers, allows a mathematical mapping between vector spaces for direct comparison between simulation state andsensed world state.

easy button for their computer which intelligently ad-justs model inputs. In order to achieve this, the Autotuneproject entails running millions of parametric E+ simula-tions on supercomputers, multi-objective optimization ofE+ variables via sensitivity analysis, using machine learn-ing systems to characterize the effect of individual vari-able perturbations on E+ simulations, and adapting an ex-isting E+ model to approximate sensor data. The systemwill be demonstrated using an E+ building model auto-matically matched to a subset of the 250+ sensors in aheavily instrumented residential research building as wellas to DOE’s commercial reference buildings (Field et al.,2010) for a medium office, stand-alone retail, and ware-house in which 3 customized buildings will provide fauxsensor data for tuning the original models. This paper willsummarize the Autotune methodology focusing primarilyon the definition of parametric simulations and accessibil-ity of the public database.

Parametric AnalysisSensitivity analysis is a standard statistical technique

(Bradley et al., 1977) in which a large parametric sweepof possible values for each input variable in a simulationis altered and then mathematically classified as contribut-ing the variance in the final simulation result. This tech-nique has been the hallmark mathematical technique forseveral analyses regarding energy efficiency. In fact, theoft-referenced Building Energy Data Book (U.S. Dept. ofEnergy, 2010) does not use direct measurements of thereported data, but relies upon ratios developed in earlierreports (Huang et al., 1987), some of which can be tracedback to reports from the Energy Crisis in the late 1970s. In

Huang et al. (1987), the authors used thousands of DOE-2simulations to establish sensitivities and develop look-uptables for practitioners in the field since energy modeling,particularly in a mobile fashion, was inaccessible at thattime. As a potential use case, DOE sponsors have consid-ered forming a new basis consisting of hundreds of mil-lions of E+ simulations, rather than thousands of DOE-2runs, to develop more modern and robust data for use ina reconstruction project. As such, we are using the latestversion of E+ and OpenStudio to run millions of simu-lations, store those in a database, and make that databasepublicly accessible for anyone to mine for relevant knowl-edge.

The computational space for this search problem is onecrucial aspect of the project. While a database of mil-lions of simulations would be a boon to the energy anal-ysis community, it would not be sufficient for the successof this project. Domain experts have defined a set of pa-rameters for a building model that it would be preferen-tial to vary; however, all combinations of these variableswould require 5×1052 E+ simulations. There are manytechniques to be utilized in an effort to effectively pruneand intelligently sample the search space. First, domainexperts have identified ∼ 156 parameters typically usedby energy modelers that need to be varied and rankedthem in several importance categories. Second, buildingexperts have realistic (minimum, maximum, and step size)ranges for those variables. Third, researchers have definedmeta-parameters that allow several individual parametersto be varied as a function of a single variable. Fourth,low-order Markov simulations are being conducted to de-

Figure 4: Sensitivity analysis of E+ simulations mapped to their effect in sensor space.

termine variables with a monotonic effect on sensor datathat could reliably be interpolated to estimate impact ofa given variable. Fifth, sources of variance for individ-ual variables in the initial results will be used to guidehigher sampling rates for more sensitive variables. Sixth,an expert in multi-parameter optimization will be investi-gating computational steering algorithms to determine theoptimal sampling strategy for the remaining space beyondthe brute-force sampling of higher order Markov chains ofMonte Carlo simulations.

Mapping MechanismIn order for autotuning to work, there must be a map-

ping from the measured data to the corresponding statevariables within the simulation (Figure 3). By defining amathematical mapping between measurements in sensorspace and simulation variables in software space, a Eu-clidean or similar vector-distance approach can be usedto identify “how close” the software simulation is to themeasured performance.

This mapping must be performed by domain expertsinitially, but the expert-defined mapping will be minedto discover labeling patterns used by the domain experts.The final result will be a data dictionary in which otherfield experiments can easily have their sensor data mappedto internal software state using labels (i.e. Temperature °F,north wall, 3′ above grade). We also plan to investigate au-tomating the mapping for new sensor data using machinelearning techniques. This general mapping mechanism isnecessary for widespread use of the autotune technology.

While vector-distance is used as an error metric, itshould be pointed out that the search space is so large thatthere most likely exists a large multitude of feasible so-

lutions (buildings which match the measured data withinsome threshold). We anticipate eventually using clus-tering to present unique/representative solutions. How-ever, as additional outputs are added (e.g. room temper-atures), the problem becomes more difficult to find a ro-bust match, thereby reducing the number of potential solu-tions and allowing quantification of the tradeoffs betweenvector size and tuning accuracy. While the commercialbuildings discussed in the Commercial Building Simula-tion section were selected to allow direct comparison of“actual” building properties to the tuned models, it is im-portant to realize that approaches employed by Autotuneoffer the capability of compensating not only for input er-rors, but for the unavoidable algorithmic approximationsrequired by software modeling algorithms on computingdevices.

Suite of Machine Learning AlgorithmsMachine learning allows the autonomous generation of

algorithms by iteratively processing empirical data in or-der to allow repeatable detection of patterns (Figure 4).More importantly, cross-validation techniques ensure thateach instance of a machine learning technique (agent)learns only from a small portion of the data and thenits classification accuracy is tested on data which it hasnot seen before. This process of validation is crucial tothe generalized learning necessary for properly capturingBEM dynamics without over-fitting for a specific build-ing. This process is rarely used by energy modelers inthe manual tuning process and is the primary culprit forpost-retrofit measurements not matching a model that wasexpertly tuned.

Each type of learning system has its own strengths and

weaknesses, making it particularly suited for solving aparticular type of problem. Moreover, a given type oflearning system can vary in its performance based upon itsown internal variables (learning rate, etc.). We have pre-viously developed a suite of machine learning algorithms,called MLSuite, that allows general XML-file based def-inition of jobs to run on supercomputers and was pub-lished previously for testing “sensor-based energy mod-eling” (sBEM) in which whole building electrical us-age was predicted as a function of sensor data (Edwardset al., 2012). MLSuite currently allows various types ofparameter-settings for multiple learning systems, input or-derings, cross-validation techniques, and accuracy metricsto analyze the patterns in simulation data. It includes thefollowing 8 machine learning algorithms: linear regres-sion, genetic algorithms, feed forward neural networks,non-linear support vector regression, hierarchical linearregression experts, hierarchical least-squares support vec-tor regression experts, hierarchical feed forward neuralnetwork experts, and Fuzzy C-means with local modelsof feed forward neural networks.

A massive amount of data will be generated during theparametric sensitivity analysis, and mapped to the sensordata. This data captures dynamics that can quickly in-form the role multiple simulation input variables have onthe simulation output to inform the Autotuning process.There are three primary learning tasks that have been de-fined for MLSuite which constitute novel and promisingdata mining use cases for the building community: patterndetection, simulation approximation, and inverse model-ing (Kissock et al., 2003).

Pattern detection of single-variable parametric simula-tions (all other variables constant) can be used to deter-mine the sensitivity and pattern changes evoked by that“knob” of the simulation. By detecting the patterns forevery pre-computed combination of parameters, a set of“knob turns” can be defined which is expected to push thesimulation results into alignment with sensor data.

The primary problem and focus of development effortin the latest E+ 7.0 was to address the long simulation run-time. E+ simulations vary with the amount of temporalresolution required in reporting, algorithms used to modelcertain properties, the amount of equipment included, andmany other properties. While an envelope-only simula-tion takes 2 minutes, one with ground loops and additionalequipment currently takes ∼ 9 minutes. The parametricdatabase stores a compressed and vectorized version ofthe E+ input file (*.idf) and 15-minute data for 82 E+ re-port variables (*.csv). By applying MLSuite to processthe IDF as the input feature vector to learn and reliablymatch the CSV output feature vector, machine learningagents can be developed which require kilobytes (KB) ofhard drive space to store and can give approximate E+

Figure 5: EnergyPlus model of the ZEBRAlliance housewith Structurally Insulated Panels (SIP).

simulation results for a given input file in seconds ratherthan minutes. Tradeoffs between storage, runtime, andaccuracy are currently undergoing study.

Inverse modeling (Kissock et al., 2003) is a methodof working backwards from observed sensor data to in-formation about a physical object/parameter; this methodworks even if the physical parameter is not directly ob-servable. In the context of BEM, inverse modeling oftenworks backwards from utility bill data and use mathemat-ical models (primarily statistics and model assumptions)to identify more specific breakdown of energy use withina building. By using CSV data as the input feature vec-tor and IDF as the output feature vector, machine learningalgorithms can be used to predict E+ input files as a func-tion of sensor data and is the primary autotuning techniquecurrently being tested.

DISCUSSION AND RESULT ANALYSISResidential Building Simulations

A three-level highly energy efficient research house,with a conditioned floor area of 382 m2, was selectedfor the initial phase of this project. This house isone of the four energy efficient ZEBRAlliance houses(http://zebralliance.com) built using some of the mostadvanced building technology, products, and techniquesavailable at the time of construction. The main reasonsfor this house selection was to eliminate the uncertain-ties of input parameters and schedules (such as lighting,plug loads and occupancy) through emulated occupancyand since it was very heavily instrumented for valida-tion studies allowing investigations into the tuning capa-bilities with intense submetering. In this unoccupied re-search house, human impact on energy use is simulated tomatch the national average according to Building Americabenchmarks (Hendron and Engebrecht, 2010) with show-ers, lights, ovens, washers and other energy-consumingequipment turned on and off exactly according to sched-ule. This house uses a structurally insulated panel (SIP)envelope with a thermal resistance of 3.7 m2K/W, with

Figure 6: Large database of publicly available E+ para-metric simulations.

very low air leakage (measured ACH50 = 0.74) and thushas very low heat gain and loss through the building enve-lope. The details of this house’s envelope and other char-acteristics are described in Miller et al. (2010) (Figure 5).

This E+ model was created and carefully iterated andcompared to sensor data by domain experts, but manydiscrepancies still exist. This is compounded by the factthat there are many input parameters for which a precisevalue cannot be attained; as examples: the conductivity ofall materials used, radiant fraction of all lighting fixtures,submetering of all individual plug loads, or heat dissipatedby the dryer to the conditioned space. Various studies, in-cluding Hopfe and Hensen (2011), highlight the danger incombining multiple uncertainties in input parameters dueto their different source of nature (climatic, structural, orserviceability parameters), controllability, etc.; therefore,during the first part of this project the main focus is on thebuilding envelope related input parameter uncertainties. Aset of 156 parameters was selected for the initial variation.Since many of the characteristics for this house were iden-tified through lab tests, experts decided to specify a real-istic range for the uncertain parameters manually insteadof assigning a fixed percentage variation as used in severalcalibration and uncertainty analyses (O’Neill et al., 2011).A base, minimum and maximum value was assigned toeach of the 156 parameters. This approach allows greaterspecificity over the parameter values while reducing thenumber of parameter variations.

Commercial Building Simulations

While the residential application allows connection toreal-world sensor data and tests for practical deploymentdecisions, the commercial buildings were chosen to al-low a cleaner approach to the testing of multiple autotun-ing methodologies. DOE’s reference buildings for ware-house, medium office, and stand-alone retail were selecteddue to their predominance in either number of buildings orsquare footage in the US. In an approach similar to signal-processing, we have made changes to the original mod-els to create 3 “golden” models, added noise by permut-ing random variables to create 3 “de-tuned” models, and

Figure 7: Illustration of the database on the server, theuser views of the data, and remote client methods for ac-cessing the E+ simulation data.

then use internal E+ variables from simulation runs of thegolden models as “sensor data” for tuning the “de-tuned”models back to the golden” models.

The warehouse, retail, and office golden models havebeen defined to use approximately 5%, 10%, and 20%more electrical energy than the original models, respec-tively. These changes were created using overlapping sub-sets of input variables and show, in agreement with previ-ous sensitivity analysis studies, that small changes add upquickly.

Open Research Buildings DatabaseIn order to deploy Autotune as a desktop program in

which a limited number of E+ simulations can be run,several mechanisms are required to speed up the pro-cess. In addition to the application of machine learn-ing, pre-computing E+ simulations using supercomputersare necessary to explore the combinatorial search spaceof E+ input parameters. Time on several supercomput-ers have been competitively awarded or used to demon-strate the ability to scale software and algorithms for theseresources. Systems include the 1024-core shared mem-ory Nautilus, 2048-core Frost, and 224,256-core Jaguarwhich is currently the 3rd fastest supercomputer in theworld at 2.3 petaflops and is in transition to become the299,008-core Titan. Frost is being used as a staging areato verify large computational parameter sweeps beforerunning on Jaguar and both are used primarily for em-barrassingly parallel compute-bound E+ simulation jobs.Nautilus unique shared-memory architecture allows ev-ery core to access the 4TB (terabytes) of Random AccessMemory (RAM) for processing of memory-bound jobscommon in machine learning.

The parametric simulations run by desktops and super-computers has been uploaded to a centralized database toallow public access to this data (Figure 6). It is anticipatedthat this data would be of interest to researchers at univer-

Figure 8: Screenshot of EplusCleaner showing desktopclient upload of simulations.

sities, data-mining experts, entrepreneurs, industry, andseveral other organizations for a myriad of purposes. Sev-eral tools have been developed for easily defining and run-ning parametric E+ simulations, compressing data, andsending to a centralized server. In addition, several meth-ods have been made available for the general public tofreely access and use this data (Figure 7).

The data storage and access software has been ar-chitected as a distributed, heterogeneous, client-serverframework. The embarrassingly parallel nature of the in-dependent simulations allows us to exploit computationalresources that are remote and disparate leading to an ar-chitecture capable of collecting simulation results fromindividual desktop systems as well as supercomputing re-sources. Our experiments indicate that the system func-tions efficiently and has been found to be bound primar-ily by the network bandwidth connecting the resourcesor local hard disk access.The database engine currentlyin use is the MyISAM relational MySQL database, al-though tools have been designed in a general manner soas to allow easy interchange as database storage technolo-gies continue to evolve. The database has been createdin a manner that allows data compression and efficient re-trieval. Data access patterns are being studied to allow re-architecting the database and load-balancing for higher ef-ficiency. The internal data storage format is not tied to theformat of input or output E+ variables but instead uses itsown generic internal naming scheme. Depending on thecurrent set of variables and preferences of the users, a cus-

tom view of the data is provided that can be easily queried,summarized, and analyzed, providing the full benefits ofa relational database system. Figure 7 shows the varioussoftware components of the Autotune database illustrat-ing the independence of the data storage mechanism fromthe user view of the data, the software components onthe server, and the remote web-based clients. There areseveral methods for accessing the data: a web-based IDFreconstructor, command-line access for MySQL queries,phpMyAdmin for GUI-based data interaction, a webpagefor uploading simulation data, and EplusCleaner.

One of the components of this framework is an appli-cation named EplusCleaner (Figure 8) which has been de-veloped using the Qt Software Development Kit (SDK)(Nokia) for platform-independent support. It has been ar-chitected to provide a powerful and intuitive interface ca-pable of cleaning up after E+ simulation runs, compress-ing the input idf and the E+ output, and sending it to theserver for database entry while waiting on the server forsuccess/error status messages. It concurrently, continu-ally, and remotely consolidates the parametric simulationdata. Options for compression or deletion on the localmachine keep E+ from flooding the local hard drive withstorage of simulation results. The EplusCleaner clientwas run simultaneously on several machines and exhib-ited proper cleaning, compressing, and parsing with noobservable slow-down in the simulation process indicat-ing a bottleneck at the client machines access to the localhard drive. The database server keeps track of submis-sions from clients and a comprehensive log of the dataprovenance such that back-traces for troubleshooting maybe performed if necessary. Upon receiving a compressedunit of E+ data, the server decompresses the data, createsa vector representative of the input, and commits the en-tire unit to a database.

A web-based method for reconstructing an IDF filefrom the database vector is provided which allows usersto retrieve an IDF file from a stored vectorized set ofinput parameters. A web-interface is also available foruploading external E+ simulations input and output filesto the database. External access to this database canbe provided upon request using several user validationand access methods including a command line interface,password-protected phpMyAdmin for interactive queries,drill-down, and analysis of the simulation database. Asof the time of this writing, the server currently hosts tensof thousands of parametric E+ simulations in 136GB frommultiple distributed workstations and supercomputers, butmillions of simulations (trillions of data points) are antic-ipated by time of publication. The latest Autotune projectinformation, including database size and access methods,can be found at http://autotune.roofcalc.com.

CONCLUSIONA heavily instrumented residential building has been

selected to leverage intense submetering for the autotun-ing process while eliminating variability due to occupantbehavior through emulated occupancy. An E+ model ofthis house has been iteratively refined by experts to modelthe house. Experts have identified 156 input parametersto be varied with min, max, and step-sizes for underminedproperties.

DOE’s reference buildings for warehouse, stand-aloneretail, and medium office have been selected for creating“golden” models that use 5%, 10%, and 20% more elec-trical energy, respectively. “De-tuned” models have beencreated by permuting an undisclosed number of overlap-ping subsets of E+ input parameters. E+ variables fromruns of the “golden” models will be used for autotuning“de-tuned” models back to “golden” models.

The database and submission/retrieval software toolsfor Autotune have been developed with generalizabilityand scalability in mind. Capabilities developed include aplatform-independent Qt application named EplusCleanerfor continual curation, compression, and upload of simu-lation data to a centralized MyISAM server storing tens ofthousands of E+ parametric simulations with many mech-anisms allowing public access. The software system isdistributed, heterogenous, scalable and could potentiallyevolve into a full-fledged simulation, curation, and dataassimilation framework.

ACKNOWLEDGMENTThis work was funded by field work proposal CEBT105

under the Department of Energy Building TechnologyActivity Number BT0201000. We would like to thankAmir Roth for his support and review of this project.We would like to thank our collaborators which includeMr. Richard Edwards and Dr. Lynne Parker from TheUniversity of Tennessee, Dr. Aaron Garrett from Jack-sonville State University, and Mr. Buzz Karpay from Kar-pay Associates. This research used resources of the OakRidge Leadership Computing Facility at the Oak RidgeNational Laboratory, which is supported by the Office ofScience of the U.S. Department of Energy under ContractNo. DE-AC05-00OR22725. Our work has been enabledand supported by data analysis and visualization experts,with special thanks to Pragnesh Patel, at the NSF fundedRDAV (Remote Data Analysis and Visualization) Centerof the University of Tennessee, Knoxville (NSF grant no.ARRA-NSF-OCI-0906324 and NSF-OCI-1136246).

Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Dept. of Energy under contractDE-AC05-00OR22725. This manuscript has been au-thored by UT-Battelle, LLC, under Contract Number DE-AC05-00OR22725 with the U.S. Department of Energy.

The United States Government retains and the publisher,by accepting the article for publication, acknowledges thatthe United States Government retains a non-exclusive,paid-up, irrevocable, world-wide license to publish or re-produce the published form of this manuscript, or allowothers to do so, for United States Government purposes.

REFERENCESS. Bradley, A. Hax, and T. Magnanti. Applied mathemat-

ical programming. Addison Wesley, 1977.

R.S. Briggs, R.G. Lucas, and Z.T. Taylor. Climate clas-sification for building energy codes and standards: Part1–development process. ASHRAE Transactions, 109(1):109–121, 2003a.

R.S. Briggs, R.G. Lucas, and Z.T. Taylor. Climate clas-sification for building energy codes and standards: Part2–zone definitions, maps, and comparisons. ASHRAETransactions, 109(1):122–130, 2003b.

California Energy Commission. Assembly bill no. 1103,commercial building energy use disclosure program.docket num 09-ab1103-1, 2010a. URL http://www.energy.ca.gov/ab1103/.

California Energy Commission. Assembly bill no. 758,2010b. URL http://www.energy.ca.gov/ab758/.

C. Christensen, R. Anderson, S. Horowitz, A. Court-ney, and J. Spencer. BEopt software for buildingenergy optimization: features and capabilities. USDOE EERE BTP, National Renewable Energy Labora-tory, 2006. URL http://www.nrel.gov/buildings/pdfs/39929.pdf.

D.B. Crawley, J.W. Hand, M. Kummert, and B.T. Grif-fith. Contrasting the capabilities of building energy per-formance simulation programs. Building and Environ-ment, 43(4):661–673, 2008.

M. Deru, K. Field, D. Studer, K. Benne, B. Griffith, P. Tor-cellini, B. Liu, M. Halverson, D. Winiarski, M. Rosen-berg, et al. Us department of energy commercial ref-erence building models of the national building stock.2011.

M.S. DeWit. Uncertainty in predictions of thermal com-fort in buildings. PhD thesis, Technische UniversiteitDeft, Netherlands, 2001.

Richard E. Edwards, Joshua New, and Lynne E. Parker.Predicting Future Hourly Residential Electrical Con-sumption: A Machine Learning Case Study. Energyand Buildings, In Print, 2012.

K. Field, M. Deru, and D. Studer. Using doe commercialreference buildings for simulation studies, simbuild,new york august 1113, 2010.

R. Hendron and C. Engebrecht. Building Amer-ica Research Benchmark Definition. Technical re-port, NREL/TP-550-47246, National Renewable En-ergy Laboratory, Golden, CO, 2010.

C.J. Hopfe and J.L.M. Hensen. Uncertainty analysis inbuilding performance simulation for design support.Energy and Buildings, 43:2798–2805, 2011.

Y.J. Huang, R. Ritschard, J. Bull, S. Byrne, I. Turiel,D. Wilson, C. Hsui, and D. Foley. Methodology and as-sumptions for evaluating heating and cooling energy re-quirements in new single-family residential buildings:Technical support document for the pear (program forenergy analysis of residences) microcomputer program.Technical report, Lawrence Berkeley National Labora-tory, CA, 1987.

IECC 2009 and ASHRAE 90.1-2007. Energy CodeClimate Zones. URL http://resourcecenter.pnl.gov/cocoon/morf/ResourceCenter/article/1420.

J.K. Kissock, J.S. Haberl, and D.E. Claridge. InverseModeling Toolkit: Numerical Algorithms. ASHRAETransactions, 109(2), 2003.

W. Miller, J. Kośny, S. Shrestha, J. Christian, A. Kara-giozis, C. Kohler, and D. Dinse. Advanced residential

envelopes for two pair of energy-saver homes. Proceed-ings of ACEEE Summer Study on Energy Efficiency inBuildings, 2010.

Evan Mills. The Home Energy Saver: Documenta-tion of Calculation Methodology, Input Data, andInfrastructure. Lawrence Berkley National Labora-tory, 2008. URL http://evanmills.lbl.gov/pubs/pdf/home-energy-saver.pdf.

Nokia. Qt 4.7 Software Development Kit. URL http://qt.nokia.com/products/qt-sdk.

Z. O’Neill, B. Eisenhower, S. Yuan, T. Bailey,S. Narayanan, and V. Fonoberov. Modeling andCalibration of Energy Models for a DoD Building.ASHRAE Transactions, 117(2):358, 2011.

T.A. Reddy, I. Maor, S. Jian, and C. Panjapornporn.Procedures for reconciling computer-calculated resultswith measured energy data. ASHRAE Research projectRP-1051, 2006.

U.S. Dept. of Energy. Building Energy DataBook. D&R International, Ltd., 2010. URLhttp://buildingsdatabook.eren.doe.gov/docs%5CDataBooks%5C2008_BEDB_Updated.pdf.

U.S. EIA. Annual Energy Review. United States EnergyInformation Administration, August 2009.

Documents

Joshua New, Jibonananda Sanyal, Mahabir Bhandari, and ...web.eecs.utk.edu/~jnew1/publications/2012_SimBuild.pdf2 emis-sions in 2005 with an annual growth rate of 1.2% from 1990-2005