6
Future Generation Computer Systems 18 (2002) 807–812 A time cost model for distributed objects parallel computation R. Shevchenko a,b , A. Doroshenko a,a Institute of Software Systems of NASU, Academician Glushkov prosp. 40, 13187 Kyiv, Ukraine b Gradsoft, Kyiv, Ukraine Abstract A time cost model for parallel computation in CORBA-distributed objects is introduced and a methodology for enhancing performance of distributed applications is proposed. A new four-tiered architecture, against traditional three-tiered one, is derived form constructed cost model for Internet distributed applications. © 2002 Elsevier Science B.V. All rights reserved. MSC: 68Q10; 68Q20; 68Q25 Keywords: Distributed objects; Parallel computation; CORBA 1. Introduction Widespread emergence of computer networks, In- tranet technologies and universal distribution Internet services imply essential shifts in basic paradigms for software systems design. Essentially, they can be char- acterised by ever growing demands in performance and easy programming of distributed applications. These fundamental changes have caused development of whole classes of middleware software architectures and CORBA (common object request broker archi- tecture) is one of the distinguished one among them. CORBA standard determines the architecture of distributed objects and interaction between them in heterogeneous networks [4]. It gives a way of or- ganisation of the distributed computation having a number of properties attractive for a designer such as precise object model, separation of object descrip- tion from its implementation and call transparency. Corresponding author. E-mail addresses: [email protected] (R. Shevchenko), [email protected] (A. Doroshenko). URL: http://www.gradsoft.com.ua/eng/ There are known a number of works that investigate performance issues in CORBA and propose meth- ods to improve program efficiency [3,8–10]. How- ever, to our knowledge these efforts are insufficient in construction of quantitative performance models and development methods in CORBA application design. The purpose of this paper is to give a model and methodology for building high performance dis- tributed CORBA applications suitable for practical use in industrial-strength software. The work relies on our experience of research and development of a CORBA-based enterprise distributed software sys- tem [7]. We give a new insight on methods of en- hancing parallelism and performance optimisation of distributed object-oriented computing and illustrate presentation with experimental measuring perfor- mance of sample applications. The paper concludes with a proposal of a new four-tiered architecture of Internet-based distributed applications which enriches the traditional three-tiered one with an extra logic component aimed to enhance system performance by means of various minimisation methods for objects interaction time. 0167-739X/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. PII:S0167-739X(02)00053-5

A time cost model for distributed objects parallel computation

Embed Size (px)

Citation preview

Page 1: A time cost model for distributed objects parallel computation

Future Generation Computer Systems 18 (2002) 807–812

A time cost model for distributed objects parallel computation

R. Shevchenkoa,b, A. Doroshenkoa,∗a Institute of Software Systems of NASU, Academician Glushkov prosp. 40, 13187 Kyiv, Ukraine

b Gradsoft, Kyiv, Ukraine

Abstract

A time cost model for parallel computation in CORBA-distributed objects is introduced and a methodology for enhancingperformance of distributed applications is proposed. A new four-tiered architecture, against traditional three-tiered one, isderived form constructed cost model for Internet distributed applications. © 2002 Elsevier Science B.V. All rights reserved.

MSC: 68Q10; 68Q20; 68Q25

Keywords: Distributed objects; Parallel computation; CORBA

1. Introduction

Widespread emergence of computer networks, In-tranet technologies and universal distribution Internetservices imply essential shifts in basic paradigms forsoftware systems design. Essentially, they can be char-acterised by ever growing demands in performanceand easy programming of distributed applications.These fundamental changes have caused developmentof whole classes of middleware software architecturesand CORBA (common object request broker archi-tecture) is one of the distinguished one among them.

CORBA standard determines the architecture ofdistributed objects and interaction between them inheterogeneous networks [4]. It gives a way of or-ganisation of the distributed computation having anumber of properties attractive for a designer suchas precise object model, separation of object descrip-tion from its implementation and call transparency.

∗ Corresponding author.E-mail addresses: [email protected] (R. Shevchenko),[email protected] (A. Doroshenko).URL: http://www.gradsoft.com.ua/eng/

There are known a number of works that investigateperformance issues in CORBA and propose meth-ods to improve program efficiency [3,8–10]. How-ever, to our knowledge these efforts are insufficientin construction of quantitative performance modelsand development methods in CORBA applicationdesign.

The purpose of this paper is to give a model andmethodology for building high performance dis-tributed CORBA applications suitable for practicaluse in industrial-strength software. The work relieson our experience of research and development ofa CORBA-based enterprise distributed software sys-tem [7]. We give a new insight on methods of en-hancing parallelism and performance optimisation ofdistributed object-oriented computing and illustratepresentation with experimental measuring perfor-mance of sample applications. The paper concludeswith a proposal of a new four-tiered architecture ofInternet-based distributed applications which enrichesthe traditional three-tiered one with an extra logiccomponent aimed to enhance system performance bymeans of various minimisation methods for objectsinteraction time.

0167-739X/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved.PII: S0167-739X(02)00053-5

Page 2: A time cost model for distributed objects parallel computation

808 R. Shevchenko, A. Doroshenko / Future Generation Computer Systems 18 (2002) 807–812

2. Time cost model of CORBA objects interaction

General prerequisites of CORBA interactions arespecified by OMA architecture [5]. It defines GeneralInterORB Protocol (GIOP) as the protocol for datatransfer between object brokers. This architecture es-tablishes following basic limitations on client–serverinteractions:

• Interaction establishes permanent connection be-tween the server and client while request process-ing; multiple request could be multiplexed over thesame connection.

• Method invocation is synchronous, i.e., onceclient’s thread has executed remote method invo-cation it is blocked until the reply; asynchronouscomputation can be founded on existence of an-other (parallel) thread communicating a serverthread by means of synchronous method invoca-tion (in this paper we do not use new CORBAAMI feature to provide asynchronous method in-vocation interface—this is a subject for anotheranalysis).

• Implementation of the remote method may requiresending some context needed for correct executionof the method, e.g. identifier of current transac-tion or information about codeset used in currentsession.

• Stages of request processing and their order are pre-defined.

Our analysis of time costs of request brokers hasshown that performance of a broker is mainly depen-dent on the following functions underlying stages ofa request processing.

1. Marshalling (demarshalling) function M(x)

(Dm(x)) which implement coding (decoding)stages in requestx processing. These func-tions are almost additive in space,M(x|y) =M(x)|M(y)|pad(x, y), where x|y is concatena-tion, andpad(x, y) is a quantity of bytes aligningy after x. Also, they are almost linear in timeTM(x|y) = TM(x) + TM(y) + δ(x, y), whereTM(x) is a time for codingx and δ(x, y) is neg-ligible with respect toTM(x). Note that the sizeof appropriate GIOP sequence|M(x)| = KSM|x|in bytes can be considered as proportional to sizeof request|x| with some coefficientKSM. Time

for coding and decoding are considered as nearlyequal. CoefficientsKSM are not dependent on abroker (ORB) and fully defined by type of re-quest and coding used (usually GIOP).TM(x)

is dependent on quality of ORB marshallingalgorithm.

2. Search objects function. The main parameter of thisfunction is dependent on the amount of objects sup-ported in the system. So the time cost for invocationof this function can be designated asTFo(o, No),where No is the size of objects table in thesystem.

3. Search methods function stands for searches intable of methods with time cost designated asTFm(m, Nm(o)), wherem is a method andNm(o)

is a table size of remote methods of the object.4. Activation object and invocation method function

estimated in time asTI(o, m) which includes timefor servant invocation and, if needed, its thread ini-tialisation.

5. Network data transfer function is a mean valueKs

for one byte transfer time.

Let there be a program codey = o.m(x) with propa-gating a contextc, wherem is the method of remoteobjecto with input parameterx and outputy. Timecost estimation for this basic piece of code of dis-tributed application will include a number of followingconstituents characterised in terms of functions intro-duced above (req(o, m, x, c) is a function of sendingappropriate request foro.m(x) operation). Coding re-quest time isTM(req(o, m, x, c)) ≈ TM(o|m|x|c) ≈KM×(|o|+|m|+|x|+|c|). Transferring request time isrepresented asTs(req(o, m, x, c)) ≈ K ′

s ×(|o|+|m|+|x|+|c|). Decoding request time isTDM(o, m, x, c) ≈KM(|o|+|m|+|x|+|c|). Time of search object in ac-tive object map, object activation, method invocationand evaluation of request can be seen asTFo(o, No)+TFm(m, Nm(o)) + FI(o, m). Time for reply transfer isTs(reply(y)) ≈ Ts(y) ≈ K ′

s × (|y| + |c|). Time forreply decoding isTDM(y, c) ≈ KM × (|y| + |c|).

Summarising these time estimations, we can deducethe following timing cost model of CORBA remotemethod invocation:

Ty=o,m(x) = K1 × (|o| + |m| + |x| + |y|) + 2K1

× |c| + TFo(o, No) + TFm(m, NM(o))

+ TI(o, m).

Page 3: A time cost model for distributed objects parallel computation

R. Shevchenko, A. Doroshenko / Future Generation Computer Systems 18 (2002) 807–812 809

3. Techniques to improve distributed applicationsperformance

It is difficult if not impossible to give the uniformdefinition of performance concept suitable for allclasses of distributed applications. In each case ofinterest it can be a particular set of criteria. Amongthem are known: computing performance—speed ofoperation execution estimated as general time spentby the processor on steps of computation performed;application reactivity (responsiveness)—a time in-terval between input by the user of the data andoccurrence of the new information in his client’s ap-plication; and efficiency—a degree of processor timeutility as a share of actual computation time in gen-eral time of task residence in the system. In this paperwe consider timing characteristics as the most crucialfor performance of most distributed applications.

In lack of space we present briefly a few examplesof our techniques for improving performance charac-teristics followed by a sample application illustratingtheir effect. Basically, the techniques use time optimi-sation of service interfaces design.

1. Using composite operations. CORBA remote andlocal method invocations looks identical from ap-plication programmer point of view. But actuallytheir time costs are different: in local case it isTI (time of invocation), in remote case this canbe Ts (time of network data transfer). To reduceoverhead an aggregation of multiple nonlocalmethod invocation into a composite one can beexploited. For two subsequent invocations on re-mote object:y1=o.req1(x1) and y2=o.req2(x2)we can save time ofK1|o| + K2|c| + TFo(No) +TFm(req12, NM+1)+�TI(o, (req12, req1+ req2))

that is equal to cost of empty remote invoca-tion void f(void). This transformation can im-prove all kinds of the timing characteristics ofperformance. Naturally the effect from com-posite operation is more significant in cases ofcoupling multiple invocations like in branch op-eration if (o.m1(x)) o.m2(x); else o.m3(x). Andthe most effective this optimisation is for loopconstruct like for(ULong i=0; i<x.length();+i)r[i] =o.m(x[i]);+. Examples of such technique canbe found in CORBA collection services [6]. As ourmodel shows this transformation is recommended

if Ts and TM are the main constituents of ap-plication cost model. For implementation thisoptimisation is realised as server side compositeoperation equivalent to given sequence of methodsinvocations on the remote object.

2. Nonblocking execution of coarse-grained com-putation in parallel threads. This transforma-tion is applied primarily if it is necessary toincrease responsiveness of the application pro-gram. For example, let we have client source codey=o.m(x);F;ShowY(y) where time consumingoperationm is carried out on a server, the fol-lowing piece of codeF is data independent ony and ShowY(y) is the closest operation whichneeds computed value of verb+y+. Then it isreasonable not to block the client program and totransform the code with: (1) replacingy=o.m(x)statement by starting equivalent operation in par-allel thread on the client where the methodmis actually performed; and (2) inserting wait-forstatement just before theShowY(y) to protectyvariable form too early evaluation. This exampleexposes the simplest case of static source codetransformation based on local analysis of pro-gram statements data independence. Advancedmodels with extended implications for paral-lelism extraction have been developed by authorsin [1].

3. Customisation of marshalling. Cost of networktransfer can be decreased by changing GIOP mar-shalling to customised one with more efficientcharacteristics by means of supply of adapter li-brary for coding and decoding custom-marshalledbyte streams. Custom marshalling can be moreefficient with respect to GIOP because we canuse known structure of transferred data. Notethat it is still possible to use CORBA net-work data transfer layer by encapsulation ofthe marshalled data stream into CORBA-typesequence<octet>. This technique also can beapplied in case of object collocation [5], wherewe can simply skip marshalling/demarshallingstages.

We developed our own stream format calledRC-stream for passing of relatively large data setsof known structure through low speed network.Adapters for writing and reading from RC-streamare available to application programmer. If to

Page 4: A time cost model for distributed objects parallel computation

810 R. Shevchenko, A. Doroshenko / Future Generation Computer Systems 18 (2002) 807–812

denote difference in marshalling speed algorithmas �KM and the difference in multiplicator ofmarshalled data size as�Kt then we can com-pare difference in execution of two identical re-quests with two different marshalling constants as(�KM + �Kt ∗ K ′

s)(|o| + |m| + |x| + |y| + 2|c|).So in ideal case parameters of marshalling al-gorithm must depend on data transfer speed.If we increase time of marshalling at�KM

then appropriate benefit in decreasing requestsize must be more than�KM/Ks , whereKs isspeed of data transfer in communication chan-nel. It follows that custom marshalling is use-ful in low-speed network environments such asInternet.

4. Elimination of metainformation. CORBA providesreach facilities for building high-level generalschemes of object interaction based on general de-sign patterns. But their exploiting usually meansexpensive usage of metainformation such as pass-ing Any-type objects with type codes or usingInterface repository in runtime. Metainformationtransfer leads to significant overhead. So it is de-sirable to use high-level generic components in

Fig. 1. Time measurement of query processing experiments.

performance critical subsystems and instead touse specialisation of general schemes where allinformation about object types is static, all calls toextra interfaces are known at compile-time and allparameters types are concrete.

To demonstrate the effect of our optimisation tech-niques some experiments were undertaken on 10 MbsLAN processing a sample SQL Read-request todatabase consisting of 10,000 records. Identicalfront-ends clients and few different implementationsof CORBA middle layer server were tested. Therequest was coded in different languages with andwithout applying of RC coding, with and withoutapplying collocation on a machine. Also number ofrecords retrieved during one remote method invo-cation was varied. Following combinations of theseopportunities have been tested (shown in Fig. 1):

1. Server (C++), client (C++), server and clients arecollocated in one address space on single computer,sequence of records are marshalled with a help ofGIOP coding.

2. Server (C++), client (C++), server and clients arecollocated in one address space on single computer,

Page 5: A time cost model for distributed objects parallel computation

R. Shevchenko, A. Doroshenko / Future Generation Computer Systems 18 (2002) 807–812 811

sequence of records are marshalled with help ofRC coding.

3. Server (C++), client (C++), server and client arenot collocated, invocations are executed on singlemachine via LAN interface, sequence of recordsare marshalled with help of GIOP coding.

4. Server (C++), client (C++), server and client arenot collocated, invocations are executed on singlemachine via LAN interface, sequence of recordsare marshalled with a help of RC coding.

5. Server (C++), client (C++), calls are executed viaLAN, GIOP coding is used.

6. Server (C++), Client (C++), calls are executedvia LAN, RC coding is used.

7. Server (C++), Client (Java), calls are executed viaLAN, GIOP coding is used.

8. Servers(C++), Client (Java), calls are executed viaLAN, RC coding is used.

Results of time measurement with Sun Enterprise 450under Solaris 2.6 and Oracle database acting as server,Pentium 300 under Windows NT acting as client areshown in Fig. 1.X-axis stands for number of recordspassed in one remote invocation andY-axis standsfor the time in milliseconds spent for processingrequests.

4. Four-tiered architecture for Internetapplications

What is the difference between Intranet and Inter-net applications is the cost of network data transfer:10–100 Mbs for LAN and 1–10 Kbs for Internet. Thusa good design of Internet application implies min-imisation of network data transfer timeTs , while forIntranet application the time of invocationTI can bemore critical. One of the consequences for makingdecisions in architecture design is to insert additionalsoftware layer for collecting data passed in largepieces of information in order to speed up integral per-formance of Internet application. Such Internet casearchitecture with four layers (database, logic, serverfront-end, client) can be more efficient than traditionalthree-tiered one consisting of database, server andclients. Suppose that for organising data in a singlechunk it is needed to processN requests with methodinvocations of approximately equal time complexity.

So for standard three-tiered architecture we havefollowing time cost evaluation:TN(y=o,m(x)) =NK1(const+ |x| + |y|) + NTL

y=o,m(x).For extra layer of logic where we have hetero-

geneous medium with transfer factor for externalmedium (Internet) and internal (LAN) asK1 andK∗

1,respectively. If WWW servlet executes all invocationsin LAN environment collecting all needed parameterswith additional information and sending it to remotebrowser in a single chunk, then we obtainT ∗

N(o,m(x)) =K1(const+N |x|+N |y|+|z|)+NK∗

1(const+|x|+|y|)+NTL

y=o,m(x) +T LZ , where is|Z| a size of additional in-

formation added by servlet andT LZ is an overhead due

to servlet invocation. The difference will beT −T ∗ =K1((N −1)const−|z|)−NK∗

1(const+|x|+|z|)−T LZ .

ConsideringK∗1 K1 and the difference is of 3

orders of magnitude, we can conclude that LAN ex-penses is usually negligible with respect to the timeof network transfer. So the benefitT − T ∗ is surelypositive and can be significant if the size of additionaldata is not enormous|z| < (|x| + |y|)∗N and if timeof operation is determined mostly by time of networkdata transfer.

5. Conclusion

We have presented a time cost model for CORBA-distributed applications performance and proposedenhancing parallelism and performance optimisationtechniques of distributed programs. This paper isinspired by practical experience of CORBA-basedindustrial distributed software design project un-dertaken at GradSoft (Kiev, Ukraine). Some resultson measuring performance of experiment applica-tions reported in this paper show that the model andtechniques is a good basis for software architecturedevelopment tools in support of high performanceparallel and distributed computing [2]. Particularly,a new four-tiered architecture against traditionalthree-tiered one is proposed for Internet distributedapplications.

References

[1] A. Doroshenko, Modelling synchronization and communi-cation abstractions for dynamical parallelization, in:

Page 6: A time cost model for distributed objects parallel computation

812 R. Shevchenko, A. Doroshenko / Future Generation Computer Systems 18 (2002) 807–812

High-Performance Computing and Networking, Proceedingsof the International Conference, Lecture Notes in ComputerScience, Vol. 1225, Vienna, Austria, April 1997, pp. 752–761.

[2] A. Doroshenko, L.-E. Thorelli, V. Vlassov, Coordinationmodels and facilities could be parallel software accelerators,in: High Performance Computing and Networking, Procee-dings of the International Conference, Lecture Notes inComputer Science, Vol. 1593, 1999, pp. 1219–1222.

[3] K. Maad, Efficient bulk transfers over CORBA, UppsalaUniversity, Sweden, 1997.http://www.docs.uu.se/kmaad/streams.ps.

[4] Object Management Group, formal/98-12-01, The commonobject request broker: architecture and specifications,CORBA/IIOP 2.3.1, 712 pp. ftp://ftp.omg.org/pub/formal/98-12-01.pdf.

[5] Object Management Group, OMA: a discussion of theobject management architecture, January 1997, 44 pp.www.omg.org/library/oma/oma-all.pdf.

[6] Object Management Group, formal/97-12-24, CORBAservices: collection service specification. ftp://ftp.omg.org/pub/formal/97-12-09.pdf.

[7] R. Shevchenko, Analysis of efficiency enhancing methods ofCORBA based distributed applications, in: Proceedings ofthe Second International Conference on Programming, May23–26, 2000, Kiev, Ukraine, pp. 226–240 (in Russian).

[8] D.C. Schmidt, T. Harrison, Evaluating the performance ofOO network programming toolkits, C++ Report, SIGS, Vol.8, No. 7, July/August 1996, 8 pp.http://www.cs.wustl.edu/schmidt/C++-report-doc-perf.ps.gz.

[9] S. Vinoski, Collocation optimizations for CORBA, C++Report, SIGS, Vol. 11, No. 9, October 1999.

[10] A. Vogel, Efficient data transfer with CORBA. Java Report,Vol. 8, 1998.http://archive.javareport.com/9808/html/features/archive9806/corbatalk.html.

R. Shevchenko graduated from NationalTechnical University of Ukraine, “KyivPolytechnical Institute” in 1999. Cur-rently, he is a PhD student in Institute ofSoftware Systems of National Academyof Sciences of Ukraine and Chief Soft-ware Architect in Gradsoft Ltd. (Kyiv,Ukraine). Previously, he was engagedin projects of CORBA-based high per-formance middleware development and

corporate information systems. His current interests include soft-ware development methodologies, distributed software systemsand software architecture design.

A. Doroshenko received his Master de-gree in Computer Science in 1973 from T.Shevchenko Kyiv State University (Kyiv,Ukraine). He received his PhD and higherdoctorate degrees in 1989 and 1997, re-spectively, both from Glushkov Instituteof Cybernetics of National Academy ofSciences of Ukraine (NASU). His currentposition is leading research scientist at In-stitute of Software Systems of NASU and

visiting professor of computer science at Kyiv-Mohyla University(Kyiv, Ukraine). His professional activity includes research anddevelopment of parallel programming methods for various multi-processor platforms and teaching parallel computer systems at theUniversity. He is an author of more than 60 technical papers injournals and international conference proceedings. Currently, he isengaged in a project on investigation of high performance compu-tation in metacomputing architectures. His current interests con-centrate on models of parallel computation, parallel programmingmethodologies and coordination issues in distributed softwaresystems.