ANTLR in SSP Xingzhong Xu Hong Man Aug. 11. 2 Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work

ANTLR in SSP

Xingzhong Xu

Hong Man

Aug. 11

2

Outline

• ANTLR

• Abstract Syntax Tree

• Code Equivalence (Code Re-hosting)

• Future Work

3

What is ANTLR?

• ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical description containing actions in variety of target languages.

-- antlr.org

4

Why use ANTLR?

• SSP– Looking for a framework to understanding the

signal processing source code semantically.

• Classical analysis method in CS– Code Recognizer: Lexer & Parser – Interpreter: Cognitive Linguistic Modeling &

other syntax tree– Translator: code re-hosting, different target

5

How ANTLR Work? - I

• Lexer– Converting a sequence of characters into a sequence

of tokens.

• Parser– Converting a sequence of tokens which generated

from the Lexer to determine its grammatical structure.

• Abstract Syntax Tree– Tree representation of the abstract syntactic structure

of source code.– The syntax is ‘abstract’ which means it does not

represent every detail of the real syntax.

6

Example

7

How ANTLR Work? - II

• In order to generate the Lexer, Parser and AST. We need analyze the structure of the target code and write related ANTLR grammar.

• Example: Matrix Declaration in MatlabM1 = [1 2 3; 4 5 6];M2 = [1,2,3;4,5,6];M3 = [M1;M2];M4 = 1;

8

ANTLR Grammar - IM1 = [1 2 3; 4 5 6];

• Statement– [Variable] [Equal] [Expression] [Semicolon] (optional)

• Expression – [Left Square Bracket] [Matrix] [Right Square Bracket] or [one

digit]

• Matrix– [Line] [Semicolon] [Line] [Semicolon] ….

• Line– [digit] [comma] (optional) [digit] [comma] (optional) …

9

ANTLR Grammar - II

10

Abstract Syntax Tree

M1 = [1 2 3; 4 5 6];

11


11

M2 = [1,2,3;4,5,6];

12


M3 = [M1;M2];

13


M4 = 1;

14

AST Example from GNU-Radio

• Using ANTLR, some example from GNU-Radio code has been tested.

• http://sites.google.com/site/stevensxingzhong/

http://sites.google.com/site/stevensxingzhong/



15

Code Equivalence

• In order to re-hosting the code– The proper rule to abstract the code.– The functionality of the code segment.

• Methodology– Abstraction – Code Segmentation – Functionality Analysis – Replace the segment by equivalence code.

16

Current Method in CS

• Syntax Tree based Comparison– Generate AST or other related abstract tree, perform

tree-matching algorithm.– Use hash function to mapping the tree structure and

simplify the algorithm. • Radom Test Comparison

– Code Chopper, segment the code.– Randomly test the Input/Output behavior.– Schwartz-Zippel lemma, enough time of the test can

derive the functionality.

17

Simplest Filter Example

• Take the simplest filter as an example, following code segments have exactly same functionality.

for (i = 0; i < n; i++) acc0 += d_taps[i] * input[i];

for (i = 0; i < n ; ) acc0 += d_taps[i] * input[i++];

i = 0;while ( i < n )

acc0 += d_taps[i] * input[i++];

i = 0;for ( ; i < n ; )

acc0 += d_taps[i] * input[i++];

1

0

n

iiin xhy

18

Ordinary AST

for (i = 0; i < n; i++) acc0 += d_taps[i] * input[i];

19

Modified AST

• The ordinary AST is derived from the programming grammar level.

• Following the idea of the semantic signal processing. For example, in signal processing domain abstraction:– ‘For’, ‘While’, ‘do … while’ -> ‘LOOP’– ‘+=’, ‘VAR = VAR + whatever’ ->

‘ACCUMLATE’

20

Simplest Filter Examplefor (i = 0; i < n; i++) acc0 += d_taps[i] * input[i];

21


for (i = 0; i < n; ) acc0 += d_taps[i] * input[i++];

22


i = 0;while ( i < n ) acc0 += d_taps[i] * input[i++];

i = 0;for ( ; i < n ; ) acc0 += d_taps[i] * input[i++];

23

Code Equivalence

• Objection: From the syntax tree to determine the code segments are equivalence. – Abstraction– Tree matching.

• Perform code re-hosting.

24

Simplest Filter Examplefor (int i = 0; i < noutput_items; i++) { gr_complex sum(0.0, 0.0); for (k = 0; k < l; k++) sum += d_taps[l-k-1]*in[j+k]; out[i] = sum;}

From gr_adaptive_fir_ccf.cc

25

Abstraction

• The basic element for the simplest filter include:– LOOP– ACCUMLATION – MULTIPLY– ARRAY– MOVING INDEX

26

Similarity Tree Pattern

• No abstraction can guarantee the same functional code have precisely same abstraction form. Therefore, we need perform a similarity tree pattern recognition.

Similar enough to determine the equivalence

27

Future Work

• Using ANTLR generate other language Lexer and Parser for language recognition.

• Abstract the language into Cognitive Linguistic Modeling.

• Find proper method to perform a similarity tree pattern recognition.

28

Reference1. Terence Parr, The Definitive Antlr Reference: Building Domain-Specific

Language (Pragmatic Programmers), 20072. http://www.antlr.org3. http://www.stringtemplate.org4. Jiang L. and Su, Z. 2009. Automatic Mining of functionally equivalent code

fragments via random testing. In Proceedings of the Eighteenth international Symposium on Software Testing and Analysis

5. Gabel, M., Jiang, L., and Su, Z. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international Conference on Software Engineering

6. C.K. Roy, J.R. Cordy and R. Koschke B. 2009. Comparison and Evaluation of code Clone Detection Techniques and Tools: A Qualitative Approach. Science of Computer Programming

7. Bertran, M., Babot, F., and Climent, A. 2005. An Input/Output Semantics for Distributed Program Equivalence Reasoning. Electron. Notes Theor. Comput. Sci. 137, 1 (Jul. 2005)

http://www.antlr.org/

http://www.antlr.org/

http://www.stringtemplate.org/

http://www.stringtemplate.org/

Documents

ANTLR in SSP Xingzhong Xu Hong Man Aug. 11. 2 Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work