34
Session # 2221 YACC no more Sriram Srinivasan (“Ram”) Integrating parsers, interpreters and compilers into your application

Session # 2221 YACC no more Sriram Srinivasan (“Ram”) Integrating parsers, interpreters and compilers into your application

Embed Size (px)

Citation preview

Session # 2221

YACC no more

Sriram Srinivasan (“Ram”)

Integrating parsers, interpreters and compilers into your application

Session # 22212

This is he

• Sriram Srinivasan

• One of the core engineers of the WebLogic app server– Wrote the first commercially available EJB

implementation– Wrote the TP engine in the WLS

• Author: “Advanced Perl Programming” (O’reilly)

Beginning

Session # 22213

Why this talk?

• Quest for higher level programming patterns– More productive / faster / maintainable etc…

• Integrating compilers, parsers, interpreters into your application

Beginning

Session # 22214

Embeddable Parsers

• JDK parsers for configuration data– java.util.Properties, XML, regex library

• java.util.Properties– Limited to “property = value” format– Takes care of comments, multi-line values, quotes

Case Study: Configuration Data

#app server propertiesconnectionPoolName = testPoolnumThreads = 10

…p = new Properties().load(inputStream)

Middle

Session # 22215

XML parsers

• Good for structured, hierarchical data

• DOM (Document Object Model) parser– Converts an entire XML document into a

corresponding tree of Nodes.

• SAX (Simple API for XML) – Callback class extends DefaultHandler– Supplies methods for startDocument(…), startElement(…), endElement(…) etc.

Middle

Session # 22216

Adding code to data

• Problem: We want to add add macros and expressions to our properties.

numThreads = numProcessors# Ensure that connection pool is smaller than# thread pool. connectionPoolSize = min(numThreads – 2, 1)

• This requires an expression evaluator

Middle

Session # 22217

Embeddable interpreters

• Plethora of free, high quality interpreters available– BeanShell (Java-like syntax)– Rhino (JavaScript)– Jython (Python in Java)– Kawa (Scheme in Java)

• When embedded, flow of control easily goes from java to interpreter to back.

• Command-line shell always included

Middle

Session # 22218

BeanShell

• Expressions identical to java

• Types are inferred dynamically

Middle

add( a, b ) { return a + b; }

sum = add(1, 2); // 3 str = add("Web", "Logic"); // "WebLogic"

Session # 22219

Embedding BeanShell

Middle

import bsh.Interpreter;

Interpreter i = new Interpreter();i.set("foo", 5);i.eval("bar = foo*10"); System.out.println("bar = "+ i.get("bar"));

i.eval(new FileReader("config.properties"));Integer n = i.get("connectionPoolSize");

• Instead of writing code to parse the properties file, just eval it!– Comments should be “// … ”, not “# …– Each property definition line should end in “;”

Session # 222110

• Strict java expression syntax – no class declarations

• Loose convenience syntax

BeanShell features

Middle

b = new java.awt.Button();b.label = "Yo" // eqvt. to b.setLabel("Yo")h = new Hashtable();h{"spud"} = "potato";// Swing stuffb = new JButton("My Button");f = new JFrame("My Frame");f.getContentPane().add(b, "Center");f.pack();f.show();

Session # 222111

Rhino

• Free ECMAScript interpreter from Mozilla

• Slightly more cumbersome to embed than BeanShell

• Contains bytecode compiler that can be called from within java

• Closures

• Regex support built-in. Good for text manipulation

Middle

Session # 222112

Case study: Command pattern

Middle

function insertCommand(text) { this.pos = buf.pos buf.insert(text) this.len = text.length this.undo = function () { buf.moveTo(this.pos) buf.erase(this.len); }

undoStack.push(this);}

new insertCommand("foo")undoStack.pop().undo()

• Undo/Redo in an editor

Session # 222113

Python

Middle

• Python (Java implementation is "Jython") – powerful high-level language– Compiles to bytecode. – True scripting language– Can extend java classes– Static compilation and standalone execution

Session # 222114

More case studies

• Embedded expressions – Spreadsheet formulae

• Customizable GUIs– Macro facility, keyboard mapping

• Remote agents

• Monitoring

• Performance through partial evaluation

Middle

Session # 222115

Case Study: Remote Agents

• Example: Test Agents

• Can upload script to each agent to launch processes, control them locally.– Jython is well-suited for this kind of task

• Example: Scriptable IMAP mail server– "All messages that contain this regex, make

a copy in this folder"

Middle

Session # 222116

Case Study: Monitoring

• SNMP model: Obtain attributes from each node over the network, do calculation

• Alternatively, upload script to each node, and let it return the result– Conserves network bandwidth

• Can insert any kind of probe • Study application data structures• Application-specific profiling

Middle

Session # 222117

Case Study: Performance

• Partial evaluation can yield substantial performance benefits

• Object - RDBMS adaptors– Code generator studies class and db

schema– Omits unnecessary conversions, null checks

• Vector dot product

Middle

dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];

// But if 'a' is fixed {16,0,4} …dp = b[0] << 4 + b[2] << 2

Session # 222118

Generating java

• Moving from embedded interpreters to generating java source– Example: JSP.

• Convert template to java, compile and dynamically load

• BEA/WebLogic's weblogic.dtdc– Converts XML DTD to a high performance

SAX parser tuned to that DTD

Middle

Session # 222119

Generating code with Doclets

• javadoc is a general purpose parserjavadoc –doclet ListClass foo.java

• ListClass.start() called with a hierarchy of *Doc nodes

import com.sun.javadoc.*; public class ListClass { public static boolean start(RootDoc root) { ClassDoc[] classes = root.classes(); for (int i = 0; i < classes.length; ++i) { System.out.println(classes[i]); } return true; }

• Arbitrary tags can be introduced at any level

Middle

Session # 222120

Case study: iContract

• Pattern: doclet expressions converted to annotated java code

/** * Ensure that argument is always > 0* @pre f >= 0.0** Ensure that the function produces the sqrt * within a* @post Math.abs((return * return) - f) < 0.001 */ public float sqrt(float f) { ... }

Middle

Session # 222121

Case Study: EJBGen

/** * @ejbgen:entity * ejb-name = AccountEJB-OneToMany * data-source-name = demoPool * table-name = Accounts */abstract public class AccountBean implements EntityBean { /** * @ejbgen:cmp-field column = acct_id * @ejbgen:primkey-field * @ejbgen:remote-method transaction-attribute = Required */ abstract public String getAccountId();

Middle

Session # 222122

Generating bytecode

• Example: WebLogic RMI adaptors

• Sometimes, some facilities are available only in bytecode (goto's!)

• Example: fast string matching– Given a search string, encode the state

machine into bytecode– Worth it if the same pattern is going to be

used many times• Virus scanners• Searching genome sequences

Middle

Session # 222123

Example: String matching

• Problem: match "10100"– Convert to a state machine– Each state encodes a succesful prefix match

Middle

S5S0 S1 S3 S41 0 1 0 0

0 1

S2

1

0

1

Session # 222124

String matching (contd.)

• If only goto were allowed in java …

• But, goto's are allowed in bytecode!

Middle

try { //buf is the buffer to be searched int i = -1; s0: i++; if (buf[i] != '1') goto s0; s1: i++; if (buf[i] != '0') goto s1; s2: i++; if (buf[i] != '1') goto s0; s3: i++; if (buf[i] != '0') goto s1; s4: i++; if (buf[i] != '0') goto s3; s5: i++; return i-5;} catch (ArrayIndexOutOfBoundsException e) { return -1;}

Session # 222125

String matching (contd.)

• Using an assembler like jasmin

Middle

iconst_m1 istore_1S0: ;; i++; if a[i] != '1' goto S0; iinc 1 1 ; i++ aload_0 ; load a[i] iload_1 caload bipush 49 ; load '1' if_icmpne S0 ; if .. goto S0S1: ;; i++; if a[i] != '0' goto S1 iinc 1 1 aload_0 iload_1 caload bipush 48 if_icmpne S1

Session # 222126

Custom languages

• Craft a language that fits the context you are working in– Avoid XML ugliness: SRML (Simple Rule Markup)– Instead of "if s.purchaseAmount > 100 … "

Middle

<simpleCondition className="ShoppingCart" objectVariable="s"> <binaryExp operator="gt"> <field name="purchaseAmount"/> <constant type="float" value="100"/> </binaryExp> </simpleCondition>

Session # 222127

Antlr Introduction

• Antlr: A recursive descent parser with configurable lookahead (LL(k) parser)

• Much, much simpler than lex/yacc– Yacc error messages are cryptic, tough for non-CS

types to understand– Even generated code easy to understand

• Includes tree building and recognition– No such facility in yacc

• Lexer, parser and tree recognizer phase have similar syntax

Middle

Session # 222128

Antlr

• Example: hierarchical property list– A list consists of name value pairs– Names are identifiers, values are numbers or lists

Middle

( a 200 b (c 10 d 20))

Session # 222129

Antlr (contd.)

Middle

class LispLexer extends Lexer;

ID : ('a' .. 'z')+;

NUM: ('0' .. '9')+;

LP : '(';

RP : ')';

class LispParser extends Parser;

list : LP (nameValuePair)+ RP;

nameValuePair : ID value ;

value : NUM | list;

Session # 222130

Antlr (contd.)

Middle

nameValuePair returns [NVP ret=null]

{Object v;}

: t:ID v=value

{ret = new NVP(t.getText(),v);}

;

value returns [Object ret=null]

: t:NUM {ret=t.getText();}

| ret=list

;

• Adding code, arguments, return values

Session # 222131

Way out there …

Middle

• Configurable hardware– New circuits on the fly

• Intentional programming– Code not represented as a stream of characters

Session # 222132

Summary

• Run-time evaluation gives you a lot of power

• Other languages add features (e.g. closures) to java

• Lots of simple, free, quality parsers, interpreters

• Produce custom java source or byte code for performance

• Roll your own domain-specific language with ANTLR or javacc.

• Yacc No More.

End

Session # 222133

References

• Doclets– Doclet tools: www.doclet.com– EJBGen: www.beust.com, Cedric Beust – Icontract: www.reliable-systems.com, Reto Kramer

• Languages, interpreters– Beanshell: www.beanshell.org– Rhino: www.mozilla.org/rhino– Python: www.python.org, www.jython.org– ANTLR: www.antlr.org– More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html

• SRML: xml.coverpages.org/srml.html

End

Session # 222134

References (contd.)

• Bytecode manipulation:– Jasmin: mrl.nyu.edu/~meyer/jasmin/– Jikes Bytecode toolkit:

www.alphaworks.ibm.com/tech/jikesbt – BCEL: bcel.sourceforge.net

• "Rapid" - Reconfigurable hardware – www.cs.washington.edu/research

• "The death of computer languages, the birth of intentional programming", Charles Simonyi– research.microsoft.com/scripts/pubs/trpub.asp– Microsoft tech report MSR-TR-95-52

• Thinking in Patterns with Java, Bruce Eckel– www.mindview.net/Books/TIPatterns

End