20
A platform for the automatic generation of attribute evaluation hardware systems Alexandros C. Dimopoulos, Christos Pavlatos , George Papakonstantinou National Technical University of Athens, School of Electrical and Computer Engineering, Heroon Polytechniou 9, 15780 Zografou, Athens, Greece article info Article history: Received 6 February 2009 Received in revised form 27 August 2009 Accepted 30 September 2009 Keywords: Attribute grammars FPGA Attribute evaluation Hardware Semantic evaluation abstract Attribute grammars (AG) allow the addition of context-sensitive properties into context free grammars, augmenting their expressional capabilities by using syntactic and semantic notations, making them in this way a really useful tool for a considerable number of applications. AGs have extensively been utilized in applications such as artificial intelligence, structural pattern recognition, compiler construction and even text editing. Obviously, the performance of an attribute evaluation system resides in the efficiency of the syntactic and semantic subsystems. In this paper, a hardware architecture for an attribute evaluation system is presented, which is based on an efficient combinatorial implementation of Earley’s parallel parsing algorithm for the syntax part of the attribute grammar. The semantic part is managed by a special purpose module that traverses the parse tree and evaluates the attributes based on a proposed stack-based approach. The entire system is described in Verilog HDL (hardware design language), in a template form that given the specification of an arbitrary attribute grammar, the HDL synthesizable source code of the system is produced on the fly by a proposed automated tool. The generated code has been simulated for validation, synthesized and tested on an Xilinx FPGA (field programmable gate arrays) board for various AGs. Our method increases the performance up to three orders of magnitude compared to previous approaches, depending on the implementation, the size of the grammar and the input string length. This makes it particularly appealing for applications where attribute evaluation is a crucial aspect, like in real-time and embedded systems. Specifically, a natural language interface is presented, based on a question-answering application from the area of airline flights. & 2009 Elsevier Ltd. All rights reserved. 1. Introduction Some 40 years ago Knuth [1] extended context free grammars (CFG) by allowing the addition of context-sensitive properties and thus introducing the appealing formalism of attribute grammars (AG). Specifically, semantic rules and attributes were added to CFGs augmenting their expressional capabilities and therefore offering many advantages in the domain of language specification, analysis and translation. AGs allow high-level context-sensitive properties of individual constructs in a language to be described in a declarative way and to be automatically computed for any program in the language. The primary field of AG usage is in computer languages [2] but they are also convenient in fields such as Artificial Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cl Computer Languages, Systems & Structures ARTICLE IN PRESS 1477-8424/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.cl.2009.09.003 Corresponding author. Tel.: þ30 6945779553. E-mail addresses: [email protected] (A.C. Dimopoulos), [email protected] (C. Pavlatos), [email protected] (G. Papakonstantinou). Computer Languages, Systems & Structures 36 (2010) 203–222

A platform for the automatic generation of attribute evaluation hardware systems

Embed Size (px)

Citation preview

  • National Technical University of Athens, School of Electrical and Computer Engineering, Heroon Polytechniou 9, 15780 Zografou, Athens, Greece

    a r t i c l e i n f o

    Article history:

    Received 6 February 2009

    Received in revised form

    27 August 2009

    Accepted 30 September 2009

    Keywords:

    Attribute grammars

    FPGA

    sitiveand

    n thedomain of language specication, analysis and translation. AGs allow high-level context-sensitive properties of individual

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/cl

    Computer Languages, Systems & Structures

    ARTICLE IN PRESS

    Computer Languages, Systems & Structures 36 (2010) 2032221477-8424/$ - see front matter & 2009 Elsevier Ltd. All rights reserved.

    doi:10.1016/j.cl.2009.09.003 Corresponding author. Tel.: 30 6945779553.E-mail addresses: [email protected] (A.C. Dimopoulos), [email protected] (C. Pavlatos), [email protected]

    (G. Papakonstantinou).constructs in a language to be described in a declarative way and to be automatically computed for any program in thelanguage. The primary eld of AG usage is in computer languages [2] but they are also convenient in elds such as ArticialSome 40 years ago Knuth [1] extended context free grammars (CFG) by allowing the addition of context-senproperties and thus introducing the appealing formalism of attribute grammars (AG). Specically, semantic rulesattributes were added to CFGs augmenting their expressional capabilities and therefore offering many advantages i& 2009 Elsevier Ltd. All rights reserved.

    1. IntroductionAttribute evaluation

    Hardware

    Semantic evaluationa b s t r a c t

    Attribute grammars (AG) allow the addition of context-sensitive properties into context

    free grammars, augmenting their expressional capabilities by using syntactic and

    semantic notations, making them in this way a really useful tool for a considerable

    number of applications. AGs have extensively been utilized in applications such as

    articial intelligence, structural pattern recognition, compiler construction and even

    text editing. Obviously, the performance of an attribute evaluation system resides in the

    efciency of the syntactic and semantic subsystems. In this paper, a hardware

    architecture for an attribute evaluation system is presented, which is based on an

    efcient combinatorial implementation of Earleys parallel parsing algorithm for the

    syntax part of the attribute grammar. The semantic part is managed by a special purpose

    module that traverses the parse tree and evaluates the attributes based on a proposed

    stack-based approach. The entire system is described in Verilog HDL (hardware design

    language), in a template form that given the specication of an arbitrary attribute

    grammar, the HDL synthesizable source code of the system is produced on the y by a

    proposed automated tool. The generated code has been simulated for validation,

    synthesized and tested on an Xilinx FPGA (eld programmable gate arrays) board for

    various AGs. Our method increases the performance up to three orders of magnitude

    compared to previous approaches, depending on the implementation, the size of the

    grammar and the input string length. This makes it particularly appealing for

    applications where attribute evaluation is a crucial aspect, like in real-time and

    embedded systems. Specically, a natural language interface is presented, based on a

    question-answering application from the area of airline ights.A platform for the automatic generation of attribute evaluationhardware systems

    Alexandros C. Dimopoulos, Christos Pavlatos , George Papakonstantinou

  • Intelligence [3,4], Pattern Recognition [5] or even Biomedicine [6]. Regardless the eld of application, in an AG, knowledgeis represented using syntactic and semantic (attribute evaluation rules) notation. For the rst (syntax) part, a parser is

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222204responsible for the recognition of the syntax and the construction of the parse tree as well. For the second part, that of thetree traversal and semantic evaluation, referred in the following as tree decoration, an evaluator is needed. The evaluatortraverses the parse tree, which is covered with applicable rules in each node and executes the corresponding actions,associating attribute values in each node. Typically, the values of an attribute can be dened in terms of values of otherattributes.

    Attribute grammar evaluation is an operation that is usually divided into two subparts, the syntactic and the semantic.Concerning the syntactic analysis, one signicant factor that can positively inuence the performance of the CFG parser isundoubtedly the selected parsing algorithm. Two well-known parsing algorithms for general CFGs are the Earleysalgorithm [7] and the CockeYoungerKassami (CYK) algorithm [8]. Both of them are basically a dynamic programmingprocedure and have a time complexity On3jGj, where n is the length of the input string and jGj is the size of the grammar.A closer look on the previously mentioned parsing algorithms shows that there are some strong connections [9] betweenthe two. After the introduction of Earleys and CYK algorithms, several modications [9] and improvements viaparallelization [1013] have been proposed for these algorithms. Chiang and Fu [10] and Cheng and Fu [11] have presenteddesigns using VLSI arrays for the hardware implementation of the aforementioned parsing algorithms, although they donot propose an efcient implementation for the operator they use, while Ibbara [12] and Ra [13] presented softwareimplementations running on parallel machines. All these approaches are not implemented in recongurable hardware andthe scale of the hardware is input string length dependent. The hardware oriented approach was reinvigorated bypresenting implementations in recongurable FPGA (eld programmable gate arrays) boards of the CYK algorithm [14,15]and Earleys algorithm [16]. The early architectures, proposed in [14,15], either fail to fully exploit the availableparallelization of the parsing algorithms or demand excessive storage. Whereas the software approaches by Ibbara [12] andRa [13] execute parts of the parsing algorithms sequentially and, thus, do not achieve the maximum possible speed-up.On the other hand, existing hardware methodologies must overcome the complexity imposed by the operations of theparsing algorithms, something that leads to increased storage needs. In order to relax the hardware complexity, most of theproposed architectures implement the CYK algorithm, whose basic operations are much simpler than those of Earleys.The rst FPGA implementation of Earleys algorithmwas given in [16]. The approach proposed in [17] uses a combinatorialcircuit for the fundamental operator of Earleys algorithm. In this manner, an decrease in time by a factor of one to twoorders of magnitude is achieved, compared to previous hardware implementations [16], depending on the size of thegrammar and on the input string length. The speed-up, compared to the pure software implementation, varies from twoorders of magnitude for toy-scale grammars to six orders of magnitude for large real life grammars, a speed-up which isreally important in embedded real-time systems.

    Concerning the semantic analysis, there are various software approaches that apart from syntactic analysis, also tacklewith the attribute evaluation. A well-known approach is that of Yacc [18], which generates a parser based on an analyticgrammar written in a notation similar to BNF. The class of grammars accepted is LALR(1)1 grammars with disambiguatingrules and is based on the S-attributed approach for AGs [19]. Another similar approach, more current, is that of Bison [20], ageneral-purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a Cprogram to parse that grammar. Bison [20] is based on the L-attributed [19] approach for AGs. A combined variety ofstandard tools that implement compiler construction strategies into a domain-specic programming environment is calledEli [21,22]. For the attribute evaluation, Eli makes usage of LIDO [23], a language for the specication of computations intrees, that supports both inherited and synthesized attributes. Furthermore, in the Utrecht University the tool UUAG [24]has been implemented that decorates a given parse tree, omitting the task of syntactic analysis, supporting both inheritedand synthesized attributes.

    Aside from the software approaches, recently hardware oriented approaches were reinvigorated by presentingimplementations in recongurable FPGA boards and prior to that, VLSI approaches were presented. In [25] a hardwareimplementation of a shape grammar was proposed. Nevertheless, there were two main limitations. The rst was that theattributes were standard and only for the proposed attribute grammar and the second was that the scale of the hardwarewas dependent on the size of the problem and the input string length. In 2005 Panagopoulos [26] presented a RISCmicroprocessor for attribute evaluation implemented on FPGA. It was the rst effort to design a specialized microprocessorfor attribute grammar evaluation and exploiting its merits in performance and programming simplicity in knowledgeengineering applications. It was semantically driven, it supported dynamic parsing by exploiting tree-pruning techniquesto increase efciency and prevent the memory explosion problem of storing all possible parse trees while giving all possiblesolutions (nondeterministic), i.e. it could provide all possible parse trees for a specic input string. The main limitation isthat the implementation was based on Floyds sequential parser [27] and therefore was not efcient enough, compared toparsing algorithms of references [7,8]. Furthermore, the attribute evaluation was executed on an extended RISCmicroprocessor implemented on an FPGA, hence like a common software approach but running at a lower frequency, sinceFPGAs cannot achieve the high frequencies of current processors. The same year another approach was proposed [28] usingEarleys parallel parsing algorithm [10] implemented in a hardware module [16] mapped on an FPGA collaborating with an

    1 Look Ahead Left to right, Rightmost derivation parser with 1 lookahead symbol.

  • external RISC microprocessor responsible for the attribute evaluation. This implementations allowed faster attributeevaluation due to the hardware nature of the syntactic part and the efciency of the algorithm. Moreover, the highfrequency of the external RISC increased farther the performance. Of course, due to the existence of an external RISC theproposed architecture lacks in portability and it requires complex hardware. In 2006 [29] the same hardwareimplementation of the parsing algorithm [16] was used for the construction of an attribute evaluator, where the semanticanalysis took place in two SoftCore microcontrollers (PicoBlaze) embedded on the same FPGA board as the parsing module.This architecture enhanced the portability compared to [28] but inevitably the performance was reduced due to the usageof these microcontrollers. Considering all the above, in order to construct an efcient system special care had to be given toeach of the two subparts. The proposed implementation in this paper contributes to different elds and will be nextexplained.

    1. For the syntactic part, an efcient parsing algorithm had to be chosen, which was capable of constructing the parse treeneeded for the next operation of attribute evaluation. In case of ambiguous grammars, all possible parse trees should beconstructed, in order that our implementation is also applicable in the eld of articial intelligence. The chosen parserwas that presented in [17] that can handle any arbitrary CFG. However, the specic hardware implementation does notconstruct a parse tree but a parse table, where data for all trees that can possibly be derived from the input string,recognized so far, are stored in each execution step. So the original implementation had to be extended in order to createall possible parse trees, keeping of course its combinatorial nature, responsible for the high performance and in order toallow semantic evaluation.

    2. For the semantic part, a different approach than those already mentioned was proposed. The attribute evaluation does

    3.

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 205Platform

    SystemOutput

    Fig. 1. Overview of our approach.knowledge this is the rst attempt to automatically generate special purpose hardware on an FPGA, given an arbitraryL-attributed AG.The nal and main goal of this paper is to present a platform (see Fig. 1) that takes as input the specications of anL-attributed grammar [19] and automatically outputs the description of the parser described in synthesizable VerilogHDL (hardware description language). In order to achieve this goal, we designed general purpose architectures for thethree generated submodules (Extended Parser, Parse Tree Constructor and Semantic Evaluator) that are entirelyindependent from a specic grammar and are composed by general purpose components (memory, multiplexors, logicgates, etc.). Therefore, these architectures can be used as templates that can be modied to build the parser for a specicAG example. Consequently, the platform takes as input the AG written in an extended BNF form (BNF rules plussemantic rules) and extracts all the necessary parameters such as number of terminal symbols, number of nonterminalnot take place on an external nor on an internal SoftCore microprocessor, but on dedicated hardware modules. Thesemodules are designed especially for the execution of the actions necessary to evaluate attributesboth inherited andsynthesized and thus are extremely effective. The underlying model for implementing these modules is a stack-basedapproach that achieves the attribute evaluation mainly by the usage of simple push and pop commands on specicstacks. Hence, the entire proposed system is implemented in special purpose hardware, increasing the efciencyand more specically all subparts are downloaded into the same FPGA board, eliminating unnecessary timeconsuming communication between distant subparts as well as providing portability. The hardware nature of theimplementation allows the rapid processing of numerous inputs that need to be served concurrently. To the best of our

  • symbols, etc. Using these parameters, the platform properly modies the three templates in order to produce ahardware parser for the specic input AG, as it will be explained in detail later.

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222206The proposed architecture has been tested for various AGs and the outcome was more than encouraging; opposed tosoftware approaches the speed-up2 is two to three orders of magnitude, while the execution time of our implementation isabout one order of magnitude slower, when software runs on contemporary fast processors that run at extremely higherfrequency. Compared to previous hardware approaches the speed-up is one to three orders of magnitude, depending on theimplementation and the size of the grammar and the input string length.

    The rest of the paper is organized as follows. Section 2 provides the necessary theoretical background; in Section 3 anoverview of our approach is presented, our theoretical enhancements are analyzed and the implementation details aredescribed and explained via the usage of an illustrative example. In Section 4 the overall system performance is evaluatedand compared to other approaches, while in Section 5 a natural language example is illustrated. Finally the directions forfuture work are outlined in Section 6.

    2. Background

    In order to provide a better understanding of the implemented system and make the paper self-contained, some extrabackground knowledge must be given. In the following section, the nature of AGs will be claried and a brief description ofthe original CFG parser [17], prior to the extension, will be given. For more details on the AGs the reader is encouraged torefer to [19] and for the parser to [17].

    2.1. Attribute grammars

    An attribute grammar (AG) is based upon a context free grammar (CFG) and therefore the CFG denition follows. A CFG[2] is a quadruple G fN; T;R; Sg, where N is the set of nonterminal symbols, T is the set of terminal symbols, R is the set ofgrammar rules and S S 2 N is the start symbolthe root of the grammar. V N [ T is called vocabulary of the grammar.Grammar rules are written in the form A-a, where A 2 N and a is a string of terminals and nonterminals a 2 N [ T.Capital letters A;B;C; . . . denote nonterminal symbols, lowercase a; b; c; . . . terminal symbols, Greek lowercase a;b; g; . . .strings of terminals and nonterminals and l is the null string. The notation A! g means that g can be derived from A byapplying zero or more times some rules of R. A nonterminal A is called nullable if A! l. A stringw of terminals is a sentenceof G if S! w.

    An AG is also a quadruple AG fG;A; SR;dg, where G is a CFG, A [AXwhere AX is a nite set of attributes associatedwith each symbol X 2 V . Each attribute represents a specic context-sensitive property of the corresponding symbol. Thenotation X:a is used to indicate that attribute a is an element of AX. AX is partitioned into two disjoint sets; the set ofsynthesized attributes ASX and the set of inherited attributes AIX. Synthesized attributes X:s are those whose values aredened in terms of attributes at descendant nodes of node X of the corresponding decorated parse tree. Inherited attributesX:i are those whose values are dened in terms of attributes at the parent and (possibly) the left sibling nodes of node X ofthe corresponding decorated parse tree. The start symbol does not have inherited attributes. Each of the productions p 2 Rp : X0-X1X2; . . . ;Xn of the CFG is augmented by a set of semantic rules SRp that dene attributes in terms of otherattributes of terminals and on terminals appearing in the same production. The way attributes are evaluated depends bothon their dependencies to other attributes in the tree and also on the way the tree is traversed. Finally d is a function thatgives for each attribute a its domain d(a).

    Two important categories of AGs are S-attributed and L-attributed grammars [19]. S-attributed grammars are thosehaving only synthesized attributes and can be evaluated in one bottom-up pass and thus is good match for LR parsing.L-attributed grammars are those that apart from synthesized attributes also support inherited attributes. Each inheritedattribute of Xj 1rjrn on the right side of rule A-X1X2; . . . ;Xn depends only on attributes of X1; . . . ;Xj1, and on inheritedattributes of A. L-attributed grammars can be evaluated in a single top-down left to right pass. Obviously S-attributedgrammars are a subset of L-attributed grammars.

    2.2. The combinatorial parser

    In 1970 Earley [7] presented a top-down parser, whose basic innovation was the introduction of a symbol called dot that does not belong to the grammar. The utility of the dot in a rule (now called dotted rule) is to separate the right part ofthe rule into two subparts. For the subpart on the left of the dot, it has been veried that it can generate the input string(or substring) examined so far. However, for the subpart on the right of the dot, it still remains to be checked whether or notit can generate the rest of the input string. When the dot is at the last position, the dotted rule is called completed. Prior to

    2 The performance comparison is based on the clock cycles needed for the execution, assuming identical clock rates. Consequently the speedup is the

    ratio of the corresponding clock cycles consumed. Moreover, in Section 4 we also present a set of measurements based on actual execution time in order to

    provide a more tangible indication of acceleration.

  • reading any input symbol, the dotted rules are in the form A-a. As the reading of the input symbols commences, newdotted rules are created. If after reading the last input symbol a dotted rule of the form S-a exists, then the input string is

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 207a sentence of the grammar. The way in which dotted rules are created, during the parsing of the input string, can beefciently formulated by the operator (see Denition 2.1) that was rst introduced by Chiang and Fu [10]. This is a binaryoperator that takes as inputs either two sets of dotted rules or a terminal symbol and a set of dotted rules and produces anew set of dotted rules. In this way, Chiang and Fus [10] version of Earleys [7] parallel parsing algorithm decides whetheror not an input string a1a2; . . . ; an of length n is a sentence of a CFG G by constructing a n 1 n 1 parsing table PTwhose elements pti; j are sets of dotted rules.

    Let Y PredictN SA2NPredictA andPredictA fA-gdjA-gd 2 R and g!

    lg

    Denition 2.1. Given the sets of dotted rules Q and U and the terminal symbol b, operator is dened as follows:

    Q U fA-dDbgjA-dDbg 2 Q ;b!l and D-m 2 Ug

    and fB-jCxZjg l;B-jCxZ 2 Y ; and x!l;C! Ag

    Q b fA-dbbgjA-dbbg 2 Q ;b!lg

    and fB-jCxZjg l;B-jCxZ 2 Y ; and x!l;C! Ag

    Details about the implementation of operator are given in Ref. [17], consequently it will not be analyzed since that itwould be a repetition of what is described in [17].

    3. The proposed methodology

    The output system is consisting of three major submodules: the Extended Parallel Parser, the Tree Constructor and theSemantic Evaluator (see Fig. 1) corresponding to three templates. The parser handles the recognition task and constructsthe parse table, based on the given input string. When the parsing process is over, the parse tree is constructed andafterwards, while being traversed, the corresponding attributes are evaluated. The output system is automaticallygenerated by the Automated System Generator Platform that takes as input an AG, extracts its basic parameters andproperly modies three templates in order to generate synthesizable Verilog HDL source code for the three submodules ofthe output system. It must be claried that the rst two submodules are automatically produced by the proposed tool,based on the given AG, as well as the generic architecture for the attribute handling. If complex functions are dictated bythe semantics, special submodules are required. Therefore, these submodules cannot yet be produced automatically buthave to be application specic functions provided by the user. In case semantic rules consist only of simple arithmeticexpressions, the whole system can be automatically generated, as shown in Fig. 1. Usually in the process of analyzing largeinput strings, the input string is divided into smaller strings of reasonable length. Therefore, a future aspect, we arecurrently working on, is the application of pipeline techniques between the three major modules, when long input stringhave to be divided and executed in real time, which will lead into a further increase of performance. Obviously, theimplementation using the pipelined techniques can also be utilized in applications where successive independent inputstrings are to be analyzed.

    In the following subsections, the architectures of the three submodules will be analyzed. An illustrative example will beused in order to clarify the proposed system. The methodology for the automatic generation of the nal hardware will nextbe described. Finally, implementation details will be given.

    3.1. Extended Parallel Parser template

    In [17] a highly efcient architecture for the hardware implementation of CFG parsers was presented. Its efciencystems from an innovative combinatorial circuit that implements the fundamental operator in time complexity Olog2jGj,where jGj is the size of the CFG. The parsing table (PT) is constructed following the architecture shown in Fig. 2(a). All PTcells are computed in parallel by applying the operator , which is now implemented by the proposed combinationalcircuit C. All the PT cells that belong to the main diagonal are initialized to the set Y PredictN, where N is the set ofnonterminals. Existing methodologies parallelize the construction of the parsing table with respect to the length n of theinput string. This is achieved by computing at each execution step the cells pti; j that belong to the same diagonal line.3The computation of pti; j is based on the equation shown in Fig. 2(b) and all the cells pti;m and ptm; j for whichi 1rmrj 1 (see Fig. 2(b, c)) should already have been computed. During the k th execution step tek, processingelement Px computes ptx k; x and during the k th communication step tck, Px transmits ptx k; x together withptx k; x 1, ptx k; x 2 via bit-vector u (see Fig. 2(a)). Additionally, all the already computed cells (by Px) are used,via bit-vector q, for the computation of the next cell that belongs to the same column. Each Px processing element gradually

    3 As diagonal line is considered a line parallel to the main diagonal.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222208pt(0,0)

    pt(1,1)

    pt(0,3)pt(0,1) pt(0,2)

    pt(1,2)a1 te3

    tc1

    tc1

    tc2

    pt(1,3)

    pt(i, j)

    pt i,i 1 pt i 1, jpt i,i 2 pt i 2, j

    pt i, j 1 pt j 1, jjpt j 1,i a

    q q

    u

    u

    u

    uY

    Y

    1computes the cells of the x th column. All operators can be executed in parallel by multiple C circuits and the union oftheir outputs produces pti; j (see Fig. 2(c)). Each processing element Px takes as input the ax input symbol. Hence, twolevels of parallelism can be identied: a local or cell-level that corresponds to the parallel execution of the operatorsinside each cell, and a global or architecture-level that corresponds to the tabular form of Earleys algorithm. The proposeddesign achieves recognition of an input string of length n in Onlog2njGj time. Taking also into consideration thehardware nature of the implementation, the architecture presented in [17] achieves a speed-up factor that varies from twoorders of magnitude for toy-scale grammars to six orders of magnitude for large real life grammars compared to softwareapproaches. C is constructed from the characteristic equations of the underlying CFG G, which are algorithmicallyderived. An abstract implementation of the combinational circuit C architecture is illustrated in Fig. 3, where the crucialsub circuits as well as their connections are shown. For more details on the parser and the combinational circuit C thereader is encouraged to refer to [17].

    pt(2,3)pt(2,2)

    a3

    a2

    te1

    te2

    1 2 3

    pt(3,3)

    q

    Yi,i+1aj

    U

    ...

    i,i+2 ... i,j-1 i,i+3

    i+1,j

    i+2,j

    i+3,j

    j-1,j

    ...

    23

    Fig. 2. Parsing architecture presented in [17] for n 3.

    u bit vector

    q bit vector

    bit vector

    h bit vector

    CUCYC12

    Fig. 3. Abstract implementation of the combinational circuit C architecture.

  • Having chosen the parsing system presented in [17], certain modication had to be made in the direction of convertingit to a suitable parser for attribute evaluation. The parse tree is a necessity for the attribute evaluation and hence thecombinatorial parser must construct it. In case of ambiguous grammars, all possible parse trees should be constructed, inorder that our implementation is applicable in numerous elds such as articial intelligence. As it can easily becomprehended, the parser should keep its combinatorial nature [17], which provides our speed-up. Therefore, all changesare made while bearing in mind this precondition.

    The most essential extension is the storage for every created dotted rule of its origin. Due to the nature of the operator,a dotted rule can be the result of an operation between two dotted rules or between a dotted rule and one terminal symbolof the input string. As mentioned before, the computation of pti; j is the union between the results of a number ofoperations between two sets of dotted rules and the result of one operation between a set of dotted rules and a terminalsymbol (see Fig. 2(c)). For the computation of a cell, the maximum number of operations between two sets of dotted rulesis n 1 (for pt0;n). Thus, for every dotted rule the maximum number of possible origins is n, where n is the input stringlength. For example, if n 3, pt0;3 pt0;2 a3 [ pt0;2 pt2;3 [ pt0;1 pt1;3, so the dotted rulesbelonging to pt0;3 may have been created from one of the three applications of operator . Consequently, a bit-vectorof length n, called source, is required in order to store the origin of the cell. For this example, if bit source0 1, then cellpt0;3 has produced due to operation pt0;2 a3. In case, bit-vector source has more than one bits set due to operationsbetween sets of dotted rules that indicates that the grammar is ambiguous.

    The parser in every execution step calculates the dotted rules for the cells that belong to the same diagonal. At the end ofeach execution step, a whole column of the parse table is lled and therefore these columns data (with the source bit-

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 209vectors) can be transmitted to the next module, the Parse Tree Constructor, as shown in Fig. 1. It must be claried that dueto the parallel nature of the parser, at each execution step, all possible rules are created and stored into the parse table. Thusthe parse table contains numerous dotted rules, many of which are not even needed for the derivation of the given inputstring. Furthermore, for the parse tree construction only completed dotted rules are used and therefore only those aretransmitted from the parser module to the tree constructor. The separation of the completed dotted rules is extremely easyand fast due to a bit-vector mask that has set only the bits that correspond to the positions of the completed dotted rules.The bit-vector representation of dotted rules will be explained in Section 3.1.1.

    The architecture of the parser is shown in Fig. 2(a) and the architecture of operator is shown in Fig. 3. The proposedPlatform is a program written using programming language C that takes as input a grammar and properly modies thearchitectural templates described in Verilog HDL (hardware description languages), in order to build the output system forthe specic grammar. Given a grammar, the proposed platform extracts all the necessary information in order to modify theparser template, such as the number of terminals symbols, the number of nonterminals symbols, the maximum inputstring length, the number of syntactic rules, the maximum size of a syntactic rule, the number of the dotted rules, etc.The platform also constructs the set Y (see Section 2.2) and using the equations given in [17] constructs the combinatorialcircuit C. The platform using these data dene the basic constants of the processing element Px (Fig. 2(a)) andcreate as many instances of Px as needed. The three tasks of a processing element Px are receiving data, calculatinga cell using C circuits and sending data. These three tasks are implemented by an FSM (nite state machine) whoseparameters are set by the platform using the data extracted by the grammar and the position of the Px in the architecture.For example P2 is constructed by modifying the FSM template of Px to calculate two cells, receive data once from P1 andsend data twice to P3. All processing elements are controlled and synchronized by a Control Unit whose constants are alsodened by the platform. Finally the parse table is also dened and initialized by the platform. More details are given inSection 3.4.

    Table 1Arithmetic operation grammar Gop.

    Rule number Syntactic rule Semantic rule in AG notation Corresponding stack actions

    0 S-E Ss Es pop result;1 E1-T E2 E1s Ts E2s pop x; pop y; evaluate xy; push result2 E-T Es Ts 3 T-F T Ts Fs Ts pop x; pop y;

    evaluate x y; push result;4 T-F Ts Fs 5 F-E Fs Es 6 F-N Fs Ns 7 N-DN Ns 10 Ds Ns pop x; pop y;

    evaluate 10yx; push result;8 N-D Ns Ds 9 D-0 Ds 0 push 0;

    . . . . . . . . .

    18 D-9 Ds 9 push 9;

  • 3.1.1. Illustrative example of Extended Parser

    In order to further describe the methodology, an illustrative example is given based on the AG Gop (see Table 1), whichdescribes the basic arithmetic operations of addition and multiplication between two or more operands.

    The produced system can recognize input strings that describe arithmetic operations and furthermore evaluate the nalresult, resembling a common (inx notation) calculator. The last two columns of Table 1 will be analyzed at nextsubsections, since the same example will be also used for the analysis of the tree constructor and the Semantic Evaluator.

    Once the expression 35 4 is given as input to the system, the rst step is the construction of the parse table by theExtended Parser, which is shown in Fig. 4, where the main diagonal is not shown, since it contains initial rules that havebeen precomputed. The representation in the gure is not in bit-vector, for simplicity. Actually, the data are represented inbit-vectors of length 45, imposed by the possible number of dotted rules that can be derived from the 19 syntactic rules ofthe grammar as explained in Fig. 5. This parse table contains both completed and not completed dotted rules; therefore thenot completed rules must be screened out using the mask that has set only the bits that correspond to the position of thecompleted dotted rules. To the Parse Tree Constructor the parse table is transmitted containing only the completed rules,consequently bit-vectors of only 19 bits are needed, one bit for each rule. This mask is automatically produced and for Gop is101010101010101010101010010100010100010100010 (see Fig. 5). After the mask application, the parse table is transformedto the one shown in Fig. 6, where the real bit-vector representation is used. The use of larger font size to some bits will beexplained in next section. Obviously, bit-vectors of only 19 bits are now required to encode the 19 possible (completed)rules. Additionally 10 bit-vectors of length 4 (input string length) called source are required in order to store the origin ofthe 10 cells as explained in Section 3.1.

    3.2. Parse Tree Constructor Template

    Once the whole input string is read by the parser, i.e. the nal column of the parse table is lled and transmitted to thenext module, the parse tree construction may begin. The entire operation starts from the top right cell of the parse table,

    ARTICLE IN PRESS

    D 3 N D N D N F N T F T F * T E T E T + E S E

    N D N F N T F T F * T E T E T + E S E

    T F * T T F * T E T E T + E S E

    D 5 T F * T T F * T pt(0,1) pt(0,2) pt(0,3) pt(0,4)

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222210N D N D N F N T F * T T F * T E T E T + E S E

    E T E T + E S E

    D 4 N D N D N F N T F * T T F E T E T + E S E

    pt(1,2) pt(1,3) pt(1,4)

    pt(2,3) pt(2,4)

    pt(3,4)

    Fig. 4. Parse table for 35 4.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 211S ES EE T+EE T +EE T+ EE T+E

    E TE T

    E FE F

    T F*TT F *T

    E (E)E ( E)...

    D 9D 9

    T F* TT F*T

    Bit Vector Encoding

    Dotted Rulesb0b1b2b3b4b5b6b7b8b9b10b11b12b13b43b44 b14... b15where the starting rule of the grammar is sought. Once found, it is placed into the root of the parse tree and the rulesresponsible for the creation of the start rule are fetched, via the usage of the source bit-vector and the application of theappropriate masks. The result is temporarily kept in bit-vector MASKED (see Fig. 7) which nally will be stored as treebranch. The source bit-vector may have stored either two or one cell position, depending on whether the operands of were just dotted rules or a dotted rule and a terminal symbol. For every rule found, the same procedure is followediteratively until a rule belonging to the parse table major diagonal is reached, which denotes the end of the particularbranch and therefore the node is degenerated to a leaf. In this way, branch by branch the whole parse tree is constructed.The owchart of the algorithm in order to construct the parse tree from the parse table is analytically shown in Fig. 7. Adetail that needs to be elucidated is the way useful completed rules are separated from others existing in the same cell andhence in the same bit-vector. In order to preserve the combinatorial nature of the parser, bit-vector masks are used forevery rule. These masks are created based on the concept that a parent node in the parse tree has children nodes that haveas left-hand side symbol (lhss), of the corresponding rule, nonterminals that exist in the right-hand side symbol (rhss) ofthe parent node. To clarify these concept, we consider the rule A- BC at the parent node. The rules at the children nodecan only have as lhss the nonterminal B or C and not any other. Therefore, the appropriate mask for this rule would allowonly rules with the specic lhss. Because of the bit-vector encoding followed, these masks can easily be built and appliedwith the aid of AND gates. In case there are more than two nonterminals at the right-hand side of the rule, still two masksare utilized, one for the rightmost nonterminal and one for all the others. The construction of all the masks is anunsophisticated task that is carried out automatically together with the entire system construction by the proposedplatform. It must be noted that the usage of masks does not increase the parser complexity, since it only adds the need for

    Completed Rule Mask

    0100010100010101 0... 0

    Fig. 5. Bit-vector representation for Gop.

    000000 000101010101 00000000000 010101 0000000000000000000 000000000000000 0

    0000 00000 01010101 0000000000000000000 0000000000000001101

    00000 0000 0 0 0101

    00000000000000000000000000000000000000

    Fig. 6. Binary parse table for 35 4.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222212No

    NoYesIs it a leaf?Apply the mask and store the result in

    MASKED

    Is there a 1 in the MASKED bit-vector

    that hasnt been processed?

    Store MASKED as tree part

    Store MASKED as tree part

    Seek for the next cell based on the current

    cell source

    Fetch the mask for the right most bit of

    MASKEDYes

    Apply the mask for Top Right Cell.

    Store the result in MASKED

    Is there a 1 in the MASKED bit-vector

    that hasnt been processed?

    Apply the mask for the right most 1 that

    hasnt been processed yet

    Logic OR between the result and the

    MASKEDsome extra AND gates and consequently the parser retains its combinatorial nature and extreme performance. More detailswill be given later with examples.

    The owchart of Fig. 7 is the algorithm description of the FSM which implements the Parse Tree Constructor Template.The corresponding architecture of the FSM is trivially obtained by the above owchart. This task is carried out by theplatform. The bounds and the constants of the FSM (e.g. bit-vector lengths, number of cells) are set by the platform whichalso constructs the necessary masks given a specic grammar.

    If a completed dotted rule is derived by more than one operations between sets of dotted rules, that indicates that thegrammar is ambiguous, i.e. it has more than one origin, obviously the grammar is ambiguous and more than one parsetrees should be constructed. For that reason, the process forks for each of the stored origins. By forks is meant thatthe instance of the parse tree is copied to another register memory space as well as the parameters, i.e. the current state ofthe FSM. When the construction of the rst parse tree nishes, the second parse tree is constructed after loading the copyof the parse tree as well as the parameters, to the FSM.

    3.2.1. Illustrative example of tree construction

    After the completed rules mask application, the parse table is transformed to the one shown in Fig. 6, where the real bit-vector representation is used. Obviously, bit-vectors of only 19 bits are now required to encode the 19 possible (completed)rules. The next step is to create the parse tree, i.e. to keep only the useful completed rules. For this process, two setsof masks have been created for each syntactic rule that contains at least one nonterminal on the right-hand side, i.e. therst 9 rules of the grammar. These masks are automatically constructed based on the methodology explained in Section 3.2and are presented in Table 2.

    Yes

    No

    YesNo

    No

    Yes

    Apply the mask for the right most 1 that

    hasnt been processed yet

    More cells?Apply the mask and store the result in

    MASKED

    Logic OR between the result and the

    MASKED

    ENDTree

    Constructed

    Logic OR between the result and the

    MASKED

    Apply the mask for the right most 1 that

    hasnt been processed yet

    Is there a 1 in the MASKED bit-vector

    that hasnt been processed?

    Fig. 7. Flow chart of the parse tree construction algorithm.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 213Table 2Masks for the useful completed rules separation in Gop .

    Rule # Rule Left mask Right mask

    0 S-E E

    0000000000000000110

    1 E-T E T E0000000000000011000 0000000000000000110

    2 E-T T

    0000000000000011000

    3 T-F T F T0000000000001100000 0000000000000011000

    4 T-F F

    0000000000001100000

    5 F-E E 0000000000000000110

    6 F-N N

    0000000000110000000

    7 N-DN D N

    1111111111000000000 0000000000110000000

    8 N-D D

    1111111111000000000 The rules that are nally used for the parse tree (see Fig. 8) construction are represented in the bit-vector of Fig. 6 withlarger font size than the rest. Having kept only these necessary rules, the parse tree is stored in the table form shown inTable 3(a).

    The rst column is merely the index of the table, while the second column contains the rules and the third thedescendant subtree (referring to the corresponding index in the same table). For example, the descendants of rule 3 in thetree of Fig. 8 are stored in index position 1 (rules 6 and 7) and 2 (rules 4, 6, 8 and 13), for the left and right branch,respectively.

    The numbered octagons of Fig. 8 as well as Table 3(b) will be explained at next subsection.

    3.3. Semantic Evaluator template

    The parse tree constructed in the rst two submodules of the output system of Fig. 1 has no attributes at its nodes.The decoration of the tree is a task carried out by the last submodule, that of semantic evaluation, based on the stack-basedapproach proposed in this paper and explained next.

    For each attribute, imposed by the AG, synthesized or inherited, a stack is dened, having the same name as theattribute. The decoration of the tree begins from its root. At the beginning, initial values are pushed into the correspondinginherited attribute stacks. The procedure followed for every node, including the root node, is to evaluate the attributes ofthe children nodes and then to evaluate the current node. The evaluation of the children nodes begins from the leftmosttowards the rightmost. All the AG semantic rules are converted to push and pop actions, following a methodologyexplained below. In every node, the push and pop actions dictated by the semantic rules are executed, beginning from theevaluation of inherited attributes and afterwards continuing with the synthesized. Because of the top-down and leftmost torightmost nature of the algorithm, while pushing the inherited attributes into the stacks the tree is traversed towards its

    Fig. 8. Parse tree for 35 4.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222214Table 3(a) Parse tree storage form. (b) Action execution.

    (a)

    Index Rules Descendants

    0 0,2,3 1,2

    1 6,7 3,4

    2 4,6,8,13

    3 12

    4 8,14

    (b)

    Node Action Stack

    1 push 3; 3

    2 push 5; 3, 5

    3 pop x; pop y; push 10y x 354 push 4; 35, 4

    5 pop x; pop y; push x y; 140pop result; leftmost leaf. When reached, its synthesized attribute is evaluated and the parent (or the right sibling) node can startevaluating its synthesized attributes as well. For functions between attributes stored into the same or different stacks,dedicated components are used that output the result which is then pushed into the appropriate stack. As expected, theend of the entire procedure is denoted by the evaluation of all the root node attributes. The path followed for the decorationof a simple parse tree is shown in Fig. 9.

    As already mentioned, the AG semantic rules are converted to push and pop actions. Regarding the synthesizedattributes, every time the tree traversal reaches a leaf or a node whose descendant nodes attributes have already beenevaluated, the synthesized attributes of the rhss are popped. These synthesized attributes are at the top of thecorresponding stacks. Then, the synthesized attribute of the parent node is calculated according to the correspondingsemantic rule and is pushed into the appropriate stack. In this way, it is ensured that at the top of the synthesized attributesstacks, attributes of the children nodes (up to the current child) are placed in sequence. In Fig. 10(a) this procedure isillustrated for an arbitrary rule with n nonterminals at the right side, where n consecutive pops of the stack S have to bedone in order to obtain all the necessary attributes for the evaluation of the lhss synthesized attribute S and then this valueis pushed into the top of the stack S.

    Regarding the inherited attributes, a pop is done the rst time the node is traversed and the inherited attribute I isneeded. If in a rule an inherited attribute is used by more than one rhss, then the same number of pushes should bedone into the corresponding stack. This is something reasonable, since every time a value from the stack is needed ithas to be popped and therefore leaves the stack. All the above are illustrated in Fig. 10(b), where the output valuesof the appropriate semantic rules f1; . . . ; fj that take as parameter the inherited attribute I of the lhss are pushed into thestack I.

    Fig. 9. Parse tree traversal.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 215Fig. 10. (a) Synthesized attribute S example. (b) Inherited attribute I example.Finally, for the AG semantic rules that impose value transfer (for the same attribute), no action is required for eitherinherited or synthesized attributes because the equivalent actions would be a successive pop and push into the same stack.

    The architecture for the semantic evaluation stage is shown in Fig. 11. The parse tree, in the form of Table 3(a), should betraversed top-bottom and from left to right. The branches of the tree are stored at the second column of the table of Fig. 8.Each branch is represented by a bit-vector of length p, where p is the number of syntactic rules. For example, the rstbranch (rules 0, 2 and 3) of that parse tree is represented by the bit-vector 0000000000000001101. When the attributes ofa branch of the tree should be evaluated, the corresponding bit-vector is loaded to bit-vector TreeBranch of Fig. 11. It is notedthat the sequence of the syntactic rules in a branch of the tree it is not indicated in the representative bit-vector. For thatreason when the platform reads the given input grammar sorts the syntactic rules appropriately so that the moresignicant bit that is set, should be evaluated rst. Each module responsible for the attribute evaluation of a rule is enabledvia signal En (Enable), which is driven by the result of the logical gate AND that takes as inputs the value of thecorresponding bit of bit-vector TreeBranch and the signal EnN (Enable Next) that is set by the previous module. When amodule nishes its execution (evaluation of attribute) sets signal EnN, in order to allow the next module to be executed.A module is going to be executed when En EnN 1. In case the corresponding bit of bit-vector TreeBranch is 0, thenEn 0 and when the input EnN is going to be set the module will not be executed but it will just set its EnN in order toallow the next module to be executed. The input EnN of the most signicant bit is always set as it has the greatest priority.

    Control bus

    Semantic Rules of Rule4

    1

    En

    Semantic Rules of Rule3

    En

    Semantic Rules of Rule2

    En

    Semantic Rules of Rule1

    En

    Semantic Rules of Rule0

    En

    b0b1b2b3b4

    Control Unit

    Stack1

    Data bus

    Stackm...

    EnN EnN EnN EnN

    TreeBranch

    Semanticsubmodules

    Fig. 11. Semantic Evaluator architecture.

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222216Each module communicate via the control and data bus with the stacks it needs in order to evaluate the attributes.The whole process is synchronized by a Control Unit which at each clock step enables one of the semantic submodules.Then the enabled submodule sends the corresponding control signals to the appropriate stacks, in order to execute thecorrect push or pop operations. Moreover, the Control Unit is responsible for advancing the semantic evaluation procedure.

    The architecture presented in Fig. 11 has been described in Verilog as a template. The platform using the parameters ofthe grammar modies this template by dening all needed constants such as the length of the bit-vector TreeBranch. Ifcomplex functions are dictated by the semantics, special submodules are required. Therefore, these submodules cannot yetbe produced automatically but have to be application specic functions provided by the user. In case semantic rules consistonly of simple arithmetic expressions, they can be automatically generated.

    3.3.1. Illustrative example of semantic evaluation

    According to our methodology presented in the previous section, only one stack is needed since only one synthesizedattribute S is required by the grammar Gop. Furthermore, in the case of rules, where a simple assignment is imposed bythe semantics, no action is needed. All other semantic rules have been transformed into simple push and pop actions(see Table 1) and simple arithmetic operations.

    For the nal step, that of attribute evaluation, the parse tree is traversed and for each rule the corresponding push/popaction is executed (see Table 3(b)). The tree is traversed top-bottom and from left to right, as shown by the dashed line inFig. 8. The action execution as well as the stack contents are shown in Table 3(b). It must be noted that the numberedoctagons of Fig. 8 refer to the nodes of Table 3(b). At the end of the action execution, at the top of the stack the nal result isproduced, i.e. the number 140.

    3.4. Methodology for the automatic generation of the system

    The proposed Platform is a programwritten using programming language C, that takes as input a grammar and properlymodies architecture templates described in Verilog HDL (hardware description languages), in order to build the outputsystem for the specic grammar. The templates the platform should modify are the processing element of the parser Px, theControl Unit for these processing elements that are executed in parallel, the Tree Constructor that is an FSM, the SemanticEvaluator and the FSM that reads the parse tree and fetches it to the Semantic Evaluator branch by branch. Additionally thePlatform constructs by scratch the combinatorial circuit C of operator using the equations given in [17] and alsoconstructs the components for the semantic rules, in case they consist only of simple arithmetic expressions.

    In practice, the platform does not write in the les of the templates, but rst initialize a le where there are all theconstants and the parameters that these templates are using and after create as many instances of these modules asneeded. This le is nally included into the nal project, before the process of synthesis, in order all the templates to beproperly initialized. All the templates are entirely independent from a specic grammar, are composed by general purposecomponents and can be totally parameterized using this le.

    The proposed platform, given a grammar, extracts all the necessary information in order to modify these templates, suchas the number of terminals symbols, the number of nonterminals symbols, the maximum input string length, the numberof syntactic rules, the maximum size of a syntactic rule, the number of the dotted rules, the form of the semantic rules, thenumber of processing element Px instances needed, the number of circuit C instances needed, etc. The platform alsoconstructs the set Y (see Section 2.2) and all the masks, the Tree Constructor is using. For the example used in previoussubsections the generated (by the extracted parameters) denitions are totally 157, i.e. 45 for the completed rules mask, 38for the rest masks, 45 for set Y, 19 for dening the length of each rule and 10 general purpose denitions.

    3.5. Implementation details

    The output of the Platform is Verilog HDL synthesizable source code that has been simulated for validation, synthesizedand tested on an Xilinx Virtex-5 ML506 FPGA (eld programmable gate arrays) board for various AGs. Three are the basicsubmodules that are downloaded to the FPGA, the Extended Parser, the Tree Constructor and the Semantic Evaluator.

    The Extended Parser creates the parse table. In practice, the parse table is not stored in a single register memory addressspace but each processing element is storing the cells of the parse table it constructs, i.e. each Px nally stores a column ofthe parse table. After each column is constructed, the corresponding Px is storing this column, after screening out the non-completed rules, to another register memory space with negligible communication cost. In this way, practically there is nocommunication between the Extended Parser and the Tree Constructor but the Tree Constructor, after the termination ofthe Extended Parser, uses the parse table from the register memory space it was stored. Similarly, there is nocommunication between the Tree Constructor and the Semantic Evaluator.

    The parse table is composed by n 1=2 cells that each one is a bit-vector of length p and a bit-vector of length n that isthe source bit-vector, where n is the input string length and p is the number of syntactic rules. Consequently, the size of theparse table is n 1 n p=2 bits. For example, a grammar that has 100 syntactic rules and a maximum input stringlength 20 the parse table is 158 bytes. At the same time there is not need for storing the grammar, since the grammar hasbeen mapped directly in hardware. The grammar characteristics are incorporated into the combinatorial circuit

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 217automatically by the platform, with the help of the binary equations as described in detail in [17]. The size of the table, theTree Constructor is creating, as well as the maximum length of the stacks, depends on the depth of the tree but only incase of really large grammars that create parse trees of more than 80 levels depths and complicated semantic rules theirsize may be more than 2 kbytes. The total register memory requirements depend upon many parameters like: maximuminput string length, number of syntactic rules, depth of generated trees, number and complexity of semantic rules and thedomain of the attributes. These parameters are computed by the platform for a given grammar and suitable registermemory space is allocated. Taking into consideration all the above and the simple nature of the described architectures, itmay easily be understood that the proposed implementation requires a single FPGA chip for most grammars. The speedupperformance makes the implementation particularly appealing for real time applications. The proposed implementationmay communicate with a larger system via the I/O interfaces of the FPGA. The case that the implementation should be splitto more than one chips has not be examined since we have not faced yet the problem of a grammar that does not t in onechip.

    The platform is proposed as rapid prototyping facility for application specic architectures, while the case of a runtime-recongurable system that takes grammars as input, produces a Verilog HDL description, compiles the code, load the FPGAconguration bitstream and run the application is possible, e.g. for a multi lingual natural language interface.

    4. Experimental results

    In order to evaluate the performance of the proposed architecture, using the automated platform, four attributeevaluating systems were created. The rst three were used for the comparison with previous hardware implementationsand the last one for the comparison with a pure software approach. All four examples were implemented and executed onthe same Xilinx Virtex-5 ML506 FPGA board. For each implementation, a set of measurements were taken using asparameter the clock cycles required for various input string lengths.

    This kind of measurements (clock cycles) was preferred in order to purely compare the architectures of the comparedapproaches, regardless of the technology used. Especially for the software approach, a second kind of measurements (actualexecution time) was taken in order to compare the proposed architecture against the software approaches, providing amore tangible indication of acceleration.

    4.1. Comparison against hardware approaches

    In 2005 Panagopoulos [26] presented an extended RISC microprocessor for attribute evaluation implemented on FPGA.It was the rst effort to design a specialized microprocessor for attribute grammar evaluation and was based on Floydsparser [27]. The performance evaluation in Ref. [26] was based on an AG describing the Logic Program/Knowledge Base ofthe Successor example, using both inherited and synthesized attributes. The same AG was given as input to the proposedautomatic system, generating the appropriate evaluation system. The performance of our approach surpasses that ofPanagopoulos by one to two orders of magnitude, as shown in Fig. 12(a).

    Also, in 2005, another approach was proposed [28] using Earleys parallel parsing algorithm [10] implemented in ahardware module [16] mapped on an FPGA, collaborating with an external RISC microprocessor responsible for theattribute evaluation. In that approach the performance evaluation was based on a well-known example taken from the AIenvironment, Wumpus World game [30], transformed to its equivalent S-Attributed AG. The speedup performance of ourapproach for the specic AG is also one to two orders of magnitude, as shown in Fig. 12(b).

    In 2006 [29] the same hardware implementation of the parsing algorithm [16] was used for the construction of anattribute evaluator, where the semantic analysis took place in two SoftCore microcontrollers (PicoBlaze) embedded on thesame FPGA board as the parsing module. This architecture enhanced the portability compared to [28], but inevitably theperformance was reduced due to the usage of the certain microcontroller. The AG, used for the performance evaluation, wasdescribing the Logic Program for nding a path in a directed acyclic graph, using both inherited and synthesized attributes.The performance of our approach for the specic AG is two to three orders of magnitude, as shown in Fig. 12(c).

    4.2. Comparison against software approaches

    Concerning software approaches, a combined variety [22] of standard tools that implement compiler constructionstrategies into a domain-specic programming environment is Eli [21]. For the attribute evaluation, Eli makes usage ofLIDO [23], a language for the specication of computations in trees, that supports both inherited and synthesizedattributes. The AG used for the performance evaluation was Gop, presented in Table 1. The rst set of measurements wastaken using as parameter the clock cycles required for various input string lengths. Our approach achieved a speed-upfactor of two to three orders of magnitude, compared to the software approach, as shown in the logarithmic graph ofFig. 13(a).

    Additionally, a second set of measurements (execution time) was taken in order to provide a more tangible indication ofacceleration. Due to the low clock frequency of the FPGA (100MHz), the time required could be comparative to the onerequired by the software implementation executed on a faster processor (Pentium 2.6GHz). Even though the clock

  • ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222218frequency is an order of magnitude greater than that of the FPGA, the execution time required by our architecture is, forreasonable input string lengths, an order of magnitude less, since it requires drastically fewer clock cycles. This set ofmeasurement is shown in the graph of Fig. 13(b).

    From both gures above, it can be predicted that the speed-up of our approach diminishes as the input string lengthgrows. Thus, it must be claried that the parser produced by Eli is a deterministic one, hence it is extremely fast but doesnot produce all the possible solutions in cases of ambiguous grammars in contrast to our approach. Because of thedeterministic nature of Eli parser, the increase of the input string length does not cause a proportional increase in the

    Fig. 12. Proposed implementation versus other hardware approaches. (a) Versus extended Risc approach. (b) Versus external Risc approach. (c) Versusapproach utilizing PicoBlaze.

    1

    10

    100

    1000

    10000

    100000

    1000000

    Clo

    ck C

    ycle

    s

    8 11Input String Length14

    Eli Approach Our Implementation

    20 23 260

    20406080

    100120140160180200

    Tim

    e (

    s)

    8 11Input String Length

    17

    Eli Approach Our Implementation

    20 2317 14 26

    Fig. 13. Proposed implementation versus Eli software approach. (a) Clock cycles. (b) Time.

  • execution time. Due to the close relation between attribute grammars and logic programs [3], our platform can be easilyextended to be used in intelligent embedded systems for constraint logic programming applications and can be extended tosupport fuzziness and uncertainty. That is the reason why a nondeterministic parser has been chosen. Moreover, usually inthe process of analyzing large input strings, the input string is divided into smaller strings of reasonable length. A typicalexample is ElectroCardioGram (ECG) recognition, where consecutive cardiac cycles have to be analyzed in real time.All measurements were taken for a maximum input string length of 26 and achieve a mean reduction of execution time ofone order of magnitude. For larger grammars the comparison favors even more our hardware approach. It is reallyimportant to clarify that although the time gain for a single input string may seem of little worth (from 200 to 20ms), in reallife applications where large data sets of inputs are to be recognized in real time, this time gain is accumulated to achieve asignicant decrease in total execution time.

    5. Natural language interface to a database

    An extended example is next given from the area of natural language (NL) processing. A system that allows users toaccess information stored in a database by requests expressed in NL is called natural language interface to a databaseNLIDB [31]. The idea of such a system is fascinating as it simplies the user-end of any transaction but the computationalcost can be heavy, especially in case where numerous user requests have to be served concurrently. Software approaches[32] are popular and commercially used in applications and therefore, in this section, the proposed system is used to createa hardware interface that is capable of translating English-like sentences concerning airline ights to SQL queries [33].The system can receive sentences (questions) belonging to a subset of NL through its network interface. When the syntacticrecognition of the input sentence is completed, using the created parse tree, the semantics are evaluated and the FPGAgenerates and sends SQL queries to a data-management machine that has access to a data-base in order to produce the nalresult (answer) or executes it directly in an embedded system (e.g. in mobile telephones). The hardware nature of theproposed system allows it to be used in applications where a lot of input sentences must be processed simultaneously andextract information.

    ARTICLE IN PRESS

    Table 4The attribute grammar GSQL.

    Rule number Syntactic rules Corresponding actions

    0 PR-NP VP pop x; pop y; push conc(y,x,;)

    1 NP-QS W VP pop x; pop y;

    push conc(select,y,where,x,AND)

    17 SB-OB

    18 QF-A SET pop x; pop y; push conc(y,from,x)

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 21919 QF-E SET pop x; pop y; push conc(*,from,x)

    20 OB-Object names e.g.: Athens push Athens

    21 SET-Class Names e.g.: Flights push FlightNo; push Flights

    City push CityName; push Cities

    22 REL-Relation Phrases e.g.: Departs from push DepartsFrom

    23 AT-Property Names e.g.: Has population push Population

    24 NRL-Numerical Relations e.g.: Larger than push 425 N-Numbers e.g. 55 push 55

    26 W-Relative pronouns e.g.: which

    27 A-Determiners e.g.: A

    28 E-Determiners e.g.: Each

    29 CNJ-Conjunction words or symbols e.g.: And push AND2 NP-QS pop x; push conc(select,x,where)

    3 QS-A SET pop x; pop y; push conc(y,from,x)

    4 QS-E SET pop x; pop y; push conc(*,from,x)

    5 VP-VP1 VP2 SP pop x; pop y; pop z; pop w;

    push con(z,y,x,w)

    6 VP-VP2 SP pop x; pop y; push conc(y,x);

    7 VP-VP1 SP pop x; pop y; pop z; push conc(y,x,z)

    8 VP-SP

    9 VP1-IP VP1 pop x; pop y; pop z; pop w;

    push conc(w,z); push conc(y,x)

    10 VP1-IP

    11 VP2-SP CNJ VP2 pop x; pop y; pop z; push conc(z,y,x)

    12 VP2-SP CNJ pop x; pop y; push conc(y,x)

    13 IP-REL QF W pop x; pop y; push );

    push conc(y,IN ( select ,x,where)

    14 SP-REL SB pop x; pop y; push conc(y,=,x)

    15 SP-AT NRL N pop x; pop y; pop z; push conc(z,y,x,))

    16 SB-QF

  • In the second column of Table 4 the syntactic notation of the underlying AG GNL [33] is shown, consisting of 29 rules,where a subset of English accepted by the system uses words belonging to classes like: class names, object names, propertynames, etc. In that grammar [33], the semantics were described using a synthesized attribute called output for eachnonterminal symbol of the underlying grammar. The sentences of the subset of English are questions concerning airlineights and in the attribute output intermediate code is stored, to be executed by an abstract data-management machine.The only operation needed between the attributes of the nonterminals is conc par1; . . . ;parn, which stands for theconcatenation of the contains of par1; . . . ;parn. An illustrative simple question is: A ight departs from Athens?

    This question can be syntactically analyzed into a noun phrase consisting of a determiner and a common noun, a verbphrase consisting of a verb, a preposition and a proper noun. The determiner corresponds to a quantier, the common nounto a class name, the verb and the preposition to a relation and the proper noun to an object. The nouns and the verb will beused as parameters by the intermediate code that will be generated. For a more thorough understanding of the GNL and themeaning of the intermediate code stored in the attribute output, Ref. [33] is suggested.

    Using our methodology, a new AG GSQL derives from GNL, where the syntactic rules are identical, still only onesynthesized attribute is evaluated but now the output is an SQL query which can be processed by any real-life SQL server.The semantic rules are corresponded to simple push and pop actions and a stack (output) is used. These actions arepresented on the right column of Table 4.

    The way the abovementioned question is transformed to an SQL query is shown in Fig. 14 and Table 5. In Fig. 14 thecorresponding parse tree is given, which is traversed and for each rule the corresponding push/pop action is executed(see Table 5). The tree is traversed from top to bottom and from left to right, as shown by the dashed line. The actionexecution as well as the stack contents are shown in Table 5. It must be noted that the numbered octagons of Fig. 14 refer tothe nodes of Table 5. At the end of the action execution, at the top of the stack the SQL query Select FlightNo From FlightsWhere DepartsFrom=ATH; is created. Given a database that consists of a Flights Relation: Flights(FlightNo id,

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222220Fig. 14. Corresponding parse tree for the input string a ight departs from Athens.

    Table 5Action output GSQL .

    Node Action Stack contents

    1 push select FlightNo select FlightNo

    2 push from Flights select FlightNo, from Flights

    3 pop x; pop y; push conc(y,x) select FlightNo from Flights

    4 push where DepartsFrom= select FlightNo from Flights, where DepartsFrom=

    5 push ATH select FlightNo from Flights, where DepartsFrom=, ATH

    6 pop x; pop y; push conc(y,x) select FlightNo from Flights, where DepartsFrom=ATH

    7 pop x; pop y; push conc(y,x,;) select FlightNo from Flights where DepartsFrom=ATH;

  • linked to each city which belongs to France, etc. The system performance was measured in clock cycles and is presented inFig. 15(a). Furthermore, in Fig. 15(b) the percentage of the overall computation time consumed by each of the

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222 221three submodules is illustrated for various input string lengths. As shown in Fig. 15(b), the least time consumingsubmodule is that of the parse table construction as expected, because of the combinatorial nature of the implementation.Nonetheless, the performance of the other two submodules due to the proposed methodology leads to efcient overallperformance.

    6. Conclusionfuture work

    This paper presents an innovative architectural design of an AG evaluator. We have designed an innovative automatedsynthesis tool, that exploits its characteristics for the hardware implementation of applications, that require attributeDepartsFrom char[3], ArrivesAt char[3], Airline char[5], y), the generated system produces the SQL query, that can beexecuted on an SQL server.

    The presented system was tested for various lengths of input strings for more complex questions such as TWA iesfrom Athens to New York, Each ight which is connected to a ight which belongs to Swissair departs from a city which is

    Fig. 15. (a) Cycles needed for GSQL for different lengths of input string. (b) Percentage of the overall computation time consumed by each of the threesubmodules.evaluation to enhance their performance by integrating syntactic and semantic knowledge via AGs. Obviously, in order toconstruct an efcient attribute evaluation system, care should be given to both syntactic and semantic tasks. For thesyntactic part, an efcient parsing algorithm has be chosen, capable of constructing the parse tree needed for the nextoperation of attribute evaluation. For the semantic part a stack-based methodology was proposed that allows to achieve theattribute evaluation mainly by the usage of simple push and pop commands on specic stacks. The proposed architecturehas been tested for various AGs and the outcome was more than encouraging; opposed to software approaches the speed-up is two to three orders of magnitude, while the execution time of our implementation is about one order of magnitudelesser, when software runs on contemporary fast processors that run at extremely higher frequency. For previous hardwareapproaches the speed-up is one to three orders of magnitude, depending on the implementation and the size of thegrammar and the input string length.

    This work is a part of a project4 for developing a platform (based on AGs) in order to automatically generate specialpurpose embedded systems. We are currently working on pipelining the three major submodules for the case of long inputstrings that must be subdivided and executed sequentially. This will lead to a further signicant reduction of the executiontime. In the future this work can be combined with other embedded systems that accelerate Java applications execution,such as JOP [34], in order to create a hardware implementation that takes as input the Java source, compiles it to JavaBytecode and executes the latter into the hardware Java virtual Machine (JOP). Furthermore, due to the close relationbetween attribute grammars and logic programs [3], the proposed platform can be easily extended to be used in intelligentembedded systems for constraint logic programming applications and may also be extended to support fuzziness anduncertainty.

    4 This work has been funded by the project PENED 2003. This project is part of the OPERATIONAL PROGRAMME COMPETITIVENESS and is co-funded

    by the European Social Fund (80%) and National Resources (20%).

  • References

    [1] Knuth DE. Semantics of context free languages. Math Syst Theory 1968;2:12745.[2] Aho AV, Lam MS, Sethi R, Ullman JD. Compilers: principles, techniques, and tools, 2nd ed.. Reading, MA: Addison-Wesley; 2006.[3] Papakonstantinou G, Kontos J. Knowledge representation with attribute grammars. Comput J 1986;29(3):2415. doi: 10.1093/comjnl/29.3.241.[4] Voliotis C, Sgouros NM, Papakonstantinou G. Attribute grammar based modeling of concurrent constraint logic programming. Int J Artif Intell Tools

    (IJAIT) 1995;4(3):383411. doi: 10.1142/S021821309500019X.[5] Tsai WH, Fu KS. Attributed grammara tool for combining syntactic and statistical approaches to pattern recognition. IEEE Trans Syst Man Cybern

    1980;10(12):87385.[6] Trahanias P, Skordalakis E. Syntactic pattern recognition of the ECG. IEEE Trans Pattern Anal Mach Intell 1990;12(7):64857 http://dx.doi.org/10.

    1109/34.56207.[7] Earley J. An efcient context-free parsing algorithm. Commun ACM 1970;13(2):94102 http://doi.acm.org/10.1145/362007.362035.[8] Younger DH. Context-free language processing in time n3. In: SWAT 66: proceedings of the 7th annual symposium on switching and automata

    theory (swat 1966). Washington, DC, USA: IEEE Computer Society; 1966. p. 720 http://dx.doi.org/10.1109/SWAT.1966.7.[9] Graham SL, Harrison MA, Ruzzo WL. An improved context-free recognizer. ACM Trans Program Lang Syst 1980;2(3):41562.[10] Chiang Y, Fu K. Parallel parsing algorithms and VLSI implementations for syntactic pattern recognition. Trans Pattern Anal Mach Intell

    1984;6(3):30213.

    ARTICLE IN PRESS

    A.C. Dimopoulos et al. / Computer Languages, Systems & Structures 36 (2010) 203222222[11] Cheng HD, Fu KS. Algorithm partition and parallel recognition of general context-free languages using xed-size vlsi architecture. Pattern Recognition1986;19(5):36172 http://dx.doi.org/10.1016/0031-3203(86)90003-8.

    [12] Ibarra OH, Pong T-C, Sohn SM. Parallel recognition and parsing on the hypercube. IEEE Trans Comput 1991;40(6):76470 http://dx.doi.org/10.1109/12.90253.

    [13] Ra D-Y, Kim J-H. A parallel parsing algorithm for arbitrary context-free grammars. Inf Process Lett 1996;58(2):8796 http://dx.doi.org/10.1016/0020-0190(96)00023-3.

    [14] Ciressan C, Sanchez E, Rajman M. An FPGA-based coprocessor for the parsing of context-free grammars. In: IEEE symposium on FCCM. ComputerSociety Press; 2000. p. 23645.

    [15] Bordim J, Ito Y, Nakano K. Accelerating the cky parsing using fpgas. IEICE Trans Inf Syst 2003;E-86D(5):80310.[16] Pavlatos C, Panagopoulos I, Papakonstantinou G. A programmable pipelined coprocessor for parsing applications. In: Workshop on application

    specic processors (WASP) CODES, Stockholm, 2004.[17] Pavlatos C, Dimopoulos AC, Koulouris A, Andronikos T, Panagopoulos I, Papakonstantinou G. Efcient recongurable embedded parsers. Comput Lang

    Syst Struct 2009;35(2):196215. doi: 10.1016/j.cl.2007.08.001.[18] Johnson SC. Yacc-yet another compiler. Computing Science Technical Report 32, AT&T Bell Laboratories, Murray Hill, NJ; 1975.[19] Paakki J. Attribute grammar paradigmsa high-level methodology in language implementation. ACM Comput Surv 1995;27(2):196255.[20] Bisongnu parser generator /http://www.gnu.org/software/bison/S.[21] Gray RW, Levi SP, Heuring VP, Sloane AM, Waite WM. Eli: a complete, exible compiler construction system. Commun ACM 1992;35(2):12130

    http://doi.acm.org/10.1145/129630.129637.[22] Eli: an integrated toolset for compiler construction /http://eli-project.sourceforge.netS.[23] Kastens U. LIDOa specication language for attribute grammars. Betriebsdatenerfassung, Fachbereich.[24] Utrecht university attribute grammar /http://www.cs.uu.nl/wiki/bin/view/HUT/AttributeGrammarSystemS.[25] Cheng HD, Cheng X. Shape recognition using a xed-size vlsi architecture. Int J Pattern Recognition Artif Intell 1995;9(1):121. doi: 10.1142/

    S021800149500002X.[26] Panagopoulos I, Pavlatos C, Papakonstantinou G. An embedded microprocessor for intelligent control. J Intell Robot Syst 2005;42(2):179211. doi:

    10.1007/s10846-004-4107-z.[27] Floyd RW. The syntax of programming languagesa survey. IEEE Trans Electron Comput 1964;13(4);13(4).[28] Pavlatos C, Dimopoulos A, Papakonstantinou G. An intelligent embedded system for control applications. In: Workshop on modeling and control of

    complex systems, Cyprus, 2005.[29] Dimopoulos A, Pavlatos C, Panagopoulos I, Papakonstantinou G. An efcient hardware implementation for AI applications. In: Lecture notes in

    computer science, vol. 3955. Berlin: Springer; 2006. p. 3545.[30] Russell SJ, Norvig P. Articial intelligence: a modern approach. Englewood Cliffs, NJ: Prentice-Hall; 1995.[31] Androutsopoulos I, Ritchie GD, Thanisch P. Natural language interfaces to databasesan introduction. CoRR cmp-lg/9503016, 1995.[32] Technology L. English wizarddictionary administrators guide. Littleton, MA, USA, 1997.[33] Pavlatos C, Dimopoulos A, Papakonstantinou G. Hardware natural language interface. In: 4th IFIP conference on articial intelligence applications

    and innovations (AIAI), Athens, Greece, 2007.[34] Schoeberl M. A java processor architecture for embedded real-time systems. J Syst Archit 2008;54(12):26586 http://doi:10.1016/j.sysarc.2007.06.

    001.

    A platform for the automatic generation of attribute evaluation hardware systemsIntroductionBackgroundAttribute grammarsThe combinatorial parser

    The proposed methodologyExtended Parallel Parser templateIllustrative example of Extended Parser

    Parse Tree Constructor TemplateIllustrative example of tree construction

    Semantic Evaluator templateIllustrative example of semantic evaluation

    Methodology for the automatic generation of the systemImplementation details

    Experimental resultsComparison against hardware approachesComparison against software approaches

    Natural language interface to a databaseConclusion--future workReferences