Upload
dana-clark
View
229
Download
4
Tags:
Embed Size (px)
Citation preview
Session # 2221
YACC no more
Sriram Srinivasan (“Ram”)
Integrating parsers, interpreters and compilers into your application
Session # 22212
This is he
• Sriram Srinivasan
• One of the core engineers of the WebLogic app server– Wrote the first commercially available EJB
implementation– Wrote the TP engine in the WLS
• Author: “Advanced Perl Programming” (O’reilly)
Beginning
Session # 22213
Why this talk?
• Quest for higher level programming patterns– More productive / faster / maintainable etc…
• Integrating compilers, parsers, interpreters into your application
Beginning
Session # 22214
Embeddable Parsers
• JDK parsers for configuration data– java.util.Properties, XML, regex library
• java.util.Properties– Limited to “property = value” format– Takes care of comments, multi-line values, quotes
Case Study: Configuration Data
#app server propertiesconnectionPoolName = testPoolnumThreads = 10
…p = new Properties().load(inputStream)
Middle
Session # 22215
XML parsers
• Good for structured, hierarchical data
• DOM (Document Object Model) parser– Converts an entire XML document into a
corresponding tree of Nodes.
• SAX (Simple API for XML) – Callback class extends DefaultHandler– Supplies methods for startDocument(…), startElement(…), endElement(…) etc.
Middle
Session # 22216
Adding code to data
• Problem: We want to add add macros and expressions to our properties.
numThreads = numProcessors# Ensure that connection pool is smaller than# thread pool. connectionPoolSize = min(numThreads – 2, 1)
• This requires an expression evaluator
Middle
Session # 22217
Embeddable interpreters
• Plethora of free, high quality interpreters available– BeanShell (Java-like syntax)– Rhino (JavaScript)– Jython (Python in Java)– Kawa (Scheme in Java)
• When embedded, flow of control easily goes from java to interpreter to back.
• Command-line shell always included
Middle
Session # 22218
BeanShell
• Expressions identical to java
• Types are inferred dynamically
Middle
add( a, b ) { return a + b; }
sum = add(1, 2); // 3 str = add("Web", "Logic"); // "WebLogic"
Session # 22219
Embedding BeanShell
Middle
import bsh.Interpreter;
Interpreter i = new Interpreter();i.set("foo", 5);i.eval("bar = foo*10"); System.out.println("bar = "+ i.get("bar"));
i.eval(new FileReader("config.properties"));Integer n = i.get("connectionPoolSize");
• Instead of writing code to parse the properties file, just eval it!– Comments should be “// … ”, not “# …– Each property definition line should end in “;”
Session # 222110
• Strict java expression syntax – no class declarations
• Loose convenience syntax
BeanShell features
Middle
b = new java.awt.Button();b.label = "Yo" // eqvt. to b.setLabel("Yo")h = new Hashtable();h{"spud"} = "potato";// Swing stuffb = new JButton("My Button");f = new JFrame("My Frame");f.getContentPane().add(b, "Center");f.pack();f.show();
Session # 222111
Rhino
• Free ECMAScript interpreter from Mozilla
• Slightly more cumbersome to embed than BeanShell
• Contains bytecode compiler that can be called from within java
• Closures
• Regex support built-in. Good for text manipulation
Middle
Session # 222112
Case study: Command pattern
Middle
function insertCommand(text) { this.pos = buf.pos buf.insert(text) this.len = text.length this.undo = function () { buf.moveTo(this.pos) buf.erase(this.len); }
undoStack.push(this);}
new insertCommand("foo")undoStack.pop().undo()
• Undo/Redo in an editor
Session # 222113
Python
Middle
• Python (Java implementation is "Jython") – powerful high-level language– Compiles to bytecode. – True scripting language– Can extend java classes– Static compilation and standalone execution
Session # 222114
More case studies
• Embedded expressions – Spreadsheet formulae
• Customizable GUIs– Macro facility, keyboard mapping
• Remote agents
• Monitoring
• Performance through partial evaluation
Middle
Session # 222115
Case Study: Remote Agents
• Example: Test Agents
• Can upload script to each agent to launch processes, control them locally.– Jython is well-suited for this kind of task
• Example: Scriptable IMAP mail server– "All messages that contain this regex, make
a copy in this folder"
Middle
Session # 222116
Case Study: Monitoring
• SNMP model: Obtain attributes from each node over the network, do calculation
• Alternatively, upload script to each node, and let it return the result– Conserves network bandwidth
• Can insert any kind of probe • Study application data structures• Application-specific profiling
Middle
Session # 222117
Case Study: Performance
• Partial evaluation can yield substantial performance benefits
• Object - RDBMS adaptors– Code generator studies class and db
schema– Omits unnecessary conversions, null checks
• Vector dot product
Middle
dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];
// But if 'a' is fixed {16,0,4} …dp = b[0] << 4 + b[2] << 2
Session # 222118
Generating java
• Moving from embedded interpreters to generating java source– Example: JSP.
• Convert template to java, compile and dynamically load
• BEA/WebLogic's weblogic.dtdc– Converts XML DTD to a high performance
SAX parser tuned to that DTD
Middle
Session # 222119
Generating code with Doclets
• javadoc is a general purpose parserjavadoc –doclet ListClass foo.java
• ListClass.start() called with a hierarchy of *Doc nodes
import com.sun.javadoc.*; public class ListClass { public static boolean start(RootDoc root) { ClassDoc[] classes = root.classes(); for (int i = 0; i < classes.length; ++i) { System.out.println(classes[i]); } return true; }
• Arbitrary tags can be introduced at any level
Middle
Session # 222120
Case study: iContract
• Pattern: doclet expressions converted to annotated java code
/** * Ensure that argument is always > 0* @pre f >= 0.0** Ensure that the function produces the sqrt * within a* @post Math.abs((return * return) - f) < 0.001 */ public float sqrt(float f) { ... }
Middle
Session # 222121
Case Study: EJBGen
/** * @ejbgen:entity * ejb-name = AccountEJB-OneToMany * data-source-name = demoPool * table-name = Accounts */abstract public class AccountBean implements EntityBean { /** * @ejbgen:cmp-field column = acct_id * @ejbgen:primkey-field * @ejbgen:remote-method transaction-attribute = Required */ abstract public String getAccountId();
Middle
Session # 222122
Generating bytecode
• Example: WebLogic RMI adaptors
• Sometimes, some facilities are available only in bytecode (goto's!)
• Example: fast string matching– Given a search string, encode the state
machine into bytecode– Worth it if the same pattern is going to be
used many times• Virus scanners• Searching genome sequences
Middle
Session # 222123
Example: String matching
• Problem: match "10100"– Convert to a state machine– Each state encodes a succesful prefix match
Middle
S5S0 S1 S3 S41 0 1 0 0
0 1
S2
1
0
1
Session # 222124
String matching (contd.)
• If only goto were allowed in java …
• But, goto's are allowed in bytecode!
Middle
try { //buf is the buffer to be searched int i = -1; s0: i++; if (buf[i] != '1') goto s0; s1: i++; if (buf[i] != '0') goto s1; s2: i++; if (buf[i] != '1') goto s0; s3: i++; if (buf[i] != '0') goto s1; s4: i++; if (buf[i] != '0') goto s3; s5: i++; return i-5;} catch (ArrayIndexOutOfBoundsException e) { return -1;}
Session # 222125
String matching (contd.)
• Using an assembler like jasmin
Middle
iconst_m1 istore_1S0: ;; i++; if a[i] != '1' goto S0; iinc 1 1 ; i++ aload_0 ; load a[i] iload_1 caload bipush 49 ; load '1' if_icmpne S0 ; if .. goto S0S1: ;; i++; if a[i] != '0' goto S1 iinc 1 1 aload_0 iload_1 caload bipush 48 if_icmpne S1
Session # 222126
Custom languages
• Craft a language that fits the context you are working in– Avoid XML ugliness: SRML (Simple Rule Markup)– Instead of "if s.purchaseAmount > 100 … "
Middle
<simpleCondition className="ShoppingCart" objectVariable="s"> <binaryExp operator="gt"> <field name="purchaseAmount"/> <constant type="float" value="100"/> </binaryExp> </simpleCondition>
Session # 222127
Antlr Introduction
• Antlr: A recursive descent parser with configurable lookahead (LL(k) parser)
• Much, much simpler than lex/yacc– Yacc error messages are cryptic, tough for non-CS
types to understand– Even generated code easy to understand
• Includes tree building and recognition– No such facility in yacc
• Lexer, parser and tree recognizer phase have similar syntax
Middle
Session # 222128
Antlr
• Example: hierarchical property list– A list consists of name value pairs– Names are identifiers, values are numbers or lists
Middle
( a 200 b (c 10 d 20))
Session # 222129
Antlr (contd.)
Middle
class LispLexer extends Lexer;
ID : ('a' .. 'z')+;
NUM: ('0' .. '9')+;
LP : '(';
RP : ')';
class LispParser extends Parser;
list : LP (nameValuePair)+ RP;
nameValuePair : ID value ;
value : NUM | list;
Session # 222130
Antlr (contd.)
Middle
nameValuePair returns [NVP ret=null]
{Object v;}
: t:ID v=value
{ret = new NVP(t.getText(),v);}
;
value returns [Object ret=null]
: t:NUM {ret=t.getText();}
| ret=list
;
• Adding code, arguments, return values
Session # 222131
Way out there …
Middle
• Configurable hardware– New circuits on the fly
• Intentional programming– Code not represented as a stream of characters
Session # 222132
Summary
• Run-time evaluation gives you a lot of power
• Other languages add features (e.g. closures) to java
• Lots of simple, free, quality parsers, interpreters
• Produce custom java source or byte code for performance
• Roll your own domain-specific language with ANTLR or javacc.
• Yacc No More.
End
Session # 222133
References
• Doclets– Doclet tools: www.doclet.com– EJBGen: www.beust.com, Cedric Beust – Icontract: www.reliable-systems.com, Reto Kramer
• Languages, interpreters– Beanshell: www.beanshell.org– Rhino: www.mozilla.org/rhino– Python: www.python.org, www.jython.org– ANTLR: www.antlr.org– More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html
• SRML: xml.coverpages.org/srml.html
End
Session # 222134
References (contd.)
• Bytecode manipulation:– Jasmin: mrl.nyu.edu/~meyer/jasmin/– Jikes Bytecode toolkit:
www.alphaworks.ibm.com/tech/jikesbt – BCEL: bcel.sourceforge.net
• "Rapid" - Reconfigurable hardware – www.cs.washington.edu/research
• "The death of computer languages, the birth of intentional programming", Charles Simonyi– research.microsoft.com/scripts/pubs/trpub.asp– Microsoft tech report MSR-TR-95-52
• Thinking in Patterns with Java, Bruce Eckel– www.mindview.net/Books/TIPatterns
End