View
4.274
Download
0
Category
Tags:
Preview:
DESCRIPTION
Extensible domain-specific programming for the sciences The notion of scientists as programmers begs the question of what sort of programming language would be a good fit. The common answer seems to be both none of them and all of them. Many scientific applications are a combination of general-purpose and domain-specific languages: R for statistical elements, MATLAB for matrix-based computations, Perl-based regular expressions for string matching, C or FORTRAN for high performance parallel computations, and scripting languages such as Python to glue them all together. This clumsy situation demonstrates the need for different domain-specific language features. Our hypothesis is that programming could be made easier, less error-prone and result in higher-quality code if languages could be easily extended, by the programmer, with the domain-specific features that a programmer or scientists needs for their particular task at hand. This talk demonstrates the meta-language processing tools that support this composition of programmer-selected language features, with several extensions chosen from the previously mentioned list of features.
Citation preview
Extensible domain-specific programming
for the sciences
Eric Van Wyk
University of Minnesota
VBI December 5 2013
slides available at httpwwwcsumnedu~evw
1 45
Current trends topics in PL
Formal verification
I CompCert - httpcompcertinriafr
I Astree - httpwwwastreeensfr
I Hoare logic (1960rsquos)
P code Q
I Proof assistants Coq Abella Isabelle use required in some PL publishing venues
2 45
3 45
4 45
Current trends topics in PL
Parallel programming - multiple cores everywhere
I ldquono more free lunchrdquo
I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java
5 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Current trends topics in PL
Formal verification
I CompCert - httpcompcertinriafr
I Astree - httpwwwastreeensfr
I Hoare logic (1960rsquos)
P code Q
I Proof assistants Coq Abella Isabelle use required in some PL publishing venues
2 45
3 45
4 45
Current trends topics in PL
Parallel programming - multiple cores everywhere
I ldquono more free lunchrdquo
I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java
5 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
3 45
4 45
Current trends topics in PL
Parallel programming - multiple cores everywhere
I ldquono more free lunchrdquo
I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java
5 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
4 45
Current trends topics in PL
Parallel programming - multiple cores everywhere
I ldquono more free lunchrdquo
I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java
5 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Current trends topics in PL
Parallel programming - multiple cores everywhere
I ldquono more free lunchrdquo
I needI new abstractions eg Cilk MapReduce FPI new semantics eg deterministic parallel Java
5 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Current trends topics in PL
Expressive and safe static typing
I extending richer static types eg
append ( [a] [a] ) -gt [a]
I to dependent types
append ( [a|n] [a|m] ) -gt [a|n+m]
I turns array out-of-bounds and null-pointer bugs intostatic type errors
6 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Why would anyone want to do that
7 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Programming language featuresGeneral purpose featuresI assignment statements loops if-then-else statementsI functions (perhaps higher-order) and proceduresI IO facilitiesI modulesI data integer strings arrays records
Domain-specific featuresI matrix operations (MATLAB)I regular expression matching (Perl Python)I statistics functions (R)I computational geometry operations (LN)I parallel computing (SISAL X10 NESL etc)
Many similarities needless differencesWorking with multiple (domain-specific) languages is aheadache
8 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Extensible languages
Allow programmers select the features to be used in theirprogramming languages
I new syntax notations
I new semantic analyses error-checking
Pick a general purpose host language (eg ANSI C)extend with domain-specific features
myProgramxc =rArr myProgramc
9 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Regular expressions
include stdioh
include regexh
int main (int argc char argv [])
char text = readFileContents(Xdata)
eukaryotic messenger RNA sequences
regex foo = ^ATG[ATGC ]3 10A5 10$
if ( text =~ foo )
printf (Matches n)
else
printf (Doesnrsquot match n)
10 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Mining Climate Data - Ocean Eddies
I Spinning pools of water
I Transport heat salt andnutrients
I Learning about theirbehavior is difficult
11 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
A time slice for a point in the ocean
12 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
main (int argc char argv)
Matrix float lt3gt data
= readMatrix(sshdata)
Matrix float lt3gt scores
= matrixMap(scoreTS data [2])
writeMatrix(temporalScoresdata
scores)
13 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Matrix float lt1gt scoreTS (Matrix float lt1gtts)
int i = 0 beginning n = dimSize(ts 0)
Matrix float lt1gt scores
= init(Matrix float lt1gt dimSize(ts 0))
while(ts[i] lt ts[i+1]) i = i+1
Matrix float [0] trough
while(i lt n-1)
(trough beginning i)
= getTrough(ts i)
scores[beginning i]
= computeArea(trough)
return scores
14 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Matrix float lt1gt computeArea
(Matrix float lt1gt areaOfInterest)
float y1 = areaOfInterest [0]
float y2 = areaOfInterest[end]
int x1 = 0
int x2=dimSize(areaOfInterest 0) -1
float m = (y1-y2) ((float)(x1-x2))
float b = y1 - mx1
Matrix float lt1gt Line = (x1x2)m+b
float area
= with( x1 lt= i lt x2)
fold(+ 00 line - areaOfInterest)
return
with( 0 lt= i lt dimSize(Line 0) )
genarray ([ dimSize(Line 0)] area)
15 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
(Matrix float lt1gt int int) getTrough
(Matrix float lt1gt ts int i)
int beginning = i
int n = dimSize(ts 0)
while(i+1 lt n ampamp ts[i] gt= ts[i+1])
i = i+1
while(i+1 lt n ampamp ts[i] lt ts[i+1])
i = i+1
return (ts[beginning i] beginning i)
16 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Matrix extensionsI several features from MATLAB
I with fold and genarray from Single Assignment C
I all translated down to expected C code
I straightforward parallel implementations of matrixMapwith fold and genarray
17 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Dimension analysis
pound-seconds 6= newton-seconds18 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
19 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
include stdioh
int main (int argc char argv [])
int meter x = 34
int meter y = 56
int meter^2 area = x y
printf (dn x + y) OK
printf (dn x + z) Error
20 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
include stdioh
int main (int argc char argv [])
int x = 34
int y = 56
int area = x y
printf (dn x + y) OK
Extensions of this form find errors but otherwise are ldquoerasedrdquoduring translation
21 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Extension composition
I Programmers can select the extensions that they want
I May want to use multiple extensions in the same program
I Distinguish between1 extension user
I has no knowledge of language design or implementations
2 extension developerI must know about language design and implementation
I Tools build a custom xc =rArr c translator for them
I How can that be done
22 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Building translators from composable extensible
languages
Two primary challenges1 composable syntax mdash enables building a scanner parser
I context-aware scanning [GPCErsquo07]I modular determinism analysis [PLDIrsquo09]I Copper
2 composable semantics mdash analysis and translationsI attribute grammars with forwarding collections and
higher-order attributesI set union of specification components
I sets of productions non-terminals attributesI sets of attribute defining equations on a productionI sets of equations contributing values to a single attribute
I modular well-definedness analysis [SLErsquo12a]I modular termination analysis [SLErsquo12b Krishnan-PhD]I Silver
23 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Generating parsers and scanners from grammars
and regular expressions
nonterminals Stmt Exprterminals Id [a-zA-Z][a-zA-Z0-9]
Num [0-9]+Eq rsquo=rsquoSemi rsquorsquoPlus rsquo+rsquoMult rsquorsquo
Stmt = Stmt Semi StmtStmt = Id Eq Expr
Expr = Expr Plus ExprExpr = Expr Mult ExprExpr = Id
24 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(z)
Semi Stmt
Id(a) Eq Expr
Id(b)
Id(x) Eq Id(y) Plus Num(3) Mult Id(z) Semi Id(a) Eq Id(b)
ldquox = y + 3 z a = brdquo25 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Attribute Grammars
I add semantics mdash meaning mdash to context free grammars
I nodes (non-terminals) have attributesI that is semantic values
I Expr may be attributed withI type - the type of the expressionI errors - list of error messagesI env - mapping variable names to their types
I Stmt may be attributed with errors and env
26 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Stmt
Stmt
Id(x) Eq Expr
Expr
Id(y)
Plus Expr
Expr
Num(3)
Mult Expr
Id(y)
Semi Stmt
Id(x) Eq Expr
Id(z)
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]type = int errors = [ ]
type = int errors = [ ]
type = int errors = [ ]
errors = [ ]env = [x7rarrint y 7rarrint z 7rarrstring]
env = [x7rarrint y 7rarrint z 7rarrstring]t=string
errors=[ERROR]
errors=[ERROR]
27 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Attribute grammar specifications
Equations associated with productions define attribute values
a b s t r a c t p r o d u c t i o n a d d i t i o ne Expr = l Expr rsquo+ rsquo r Expr
e e r r o r s = l e r r o r ++ r e r r o r s ++ check t h a t l and r a r e i n t e g e r s
e t y p e = i n t
l env = e env r env = e env
28 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Modern attribute grammars
I higher-order attributes
I reference attributes
I collection attributes
I forwarding
I module systems
I separate compilation
I etc
29 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
for-loop as an extensiona b s t r a c t p r o d u c t i o n f o rs Stmt = i Name l o w e r Expr upper Expr
body Stmt
s e r r o r s = l o w e r e r r o r ++ upper e r r o r s ++body e r r o r s ++ check t h a t i i s an i n t e g e r
f o r w a r d s to i=l o w e r w h i l e ( i lt= upper ) body i=i +1seq ( a s s i g n m e n t ( v a r R e f ( i ) l o w e r )
w h i l e (l t e ( v a r R e f ( i ) upper ) b l o c k ( seq ( body
a s s i g n m e n t ( v a r R e f ( i ) add ( v a r R e f ( i ) i n t L i t ( rdquo1rdquo ) ) ) ) ) ) )
30 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Building an attribute grammar evaluator from composedspecifications
AGH cuplowast AGE1 AGEn
foralli isin [1 n]modComplete(AGH AGEi )
rArr rArr complete(AGH cup AGE1 AGE
n )
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [SLErsquo12a]
31 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Challenges in scanning
Keywords in embedded languages may be identifiers in hostlanguage
int SELECT
rs = using c query SELECT last name
FROM person WHERE
32 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Challenges in scanning
Different extensions use same keyword
connection c jdbcderbyderbydbtestdb
with table person [ person id INTEGER
first name VARCHAR ]
b = table ( c1 T F
c2 F )
33 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Challenges in scanning
Operators with different precedence specifications
x = 3 + y z
str = [a-z][a-z0-9]java
34 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Challenges in scanning
Terminals that are prefixes of others
ListltListltIntegergtgt dlist
x = y gtgt 4
35 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Need for context
I Traditionally parser and scanner are disjoint
Scanner rarr Parser rarr Semantic Analysis
I In context aware scanning they communicate
Scanner Parser rarr Semantic Analysis
36 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Context aware scanning
I Scanner recognizes only tokens valid for current ldquocontextrdquo
I keeps embedded sub-languages in a sense separate
I ConsiderI chan in out
for i in a a[i] = ii
I Two terminal symbols that match ldquoinrdquoI terminal IN rsquoinrsquo I terminal ID [a-zA-Z ][a-zA-Z 0-9]
submits to keyword
I terminal FOR rsquoforrsquo lexer class keyword
I example is part of AbleP [SPINrsquo11]
37 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Parsing C as an extension to Promelac_decl
typedef struct Coord
int x y Coord
c_state Coord pt Global goes in state vector
int z = 3 standard global decl
active proctype example()
c_code nowptx = nowpty = 0
do c_expr nowptx == nowpty
-gt c_code nowpty++
else -gt break
od
c_code printf(values d d ddn
Pexample-gt_pid nowz nowptx nowpty)
assert(false) trigger an error trail
38 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Context aware scanning
I This scanning algorithm subordinates thedisambiguation principle of maximal munchto the principle ofdisambiguation by context
I It will return a shorter valid match before a longer invalidmatch
I In ListltListltIntegergtgt before ldquogtrdquoldquogtrdquo in valid lookahead but ldquogtgtrdquo is not
I A context aware scanner is essentially an implicitly-modedscanner
I There is no explicit specification of valid look aheadI It is generated from standard grammars and terminal
regexs
39 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
I With a smarter scanner LALR(1) is not so brittle
I We can build syntactically composable languageextensions
I Context aware scanning makes composable syntax ldquomorelikelyrdquo
I But it does not give a guarantee of composability
40 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Building a parser from composed specifications
CFGH cuplowast CFGE1 CFGEn
foralli isin [1 n]isComposable(CFGH CFGEi )andconflictFree(CFGH cup CFGEi )
rArr rArr conflictFree(CFGH cup CFGE1 CFGEn)
I Monolithic analysis - not too hard but not too useful
I Modular analysis - harder but required [PLDIrsquo09]
I Non-commutative composition of restricted LALR(1)grammars
41 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
42 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Expressiveness versus safe composition
Compare to
I other parser generators
I libraries
The modular compositionality analysis does not requirecontext aware scanningBut context aware scanning makes it practical
43 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Future Work
I ableC - extensible C11 specificationI builds on lessons learned from extensible specifications of
Java [ECOOPrsquo07] Lustre [FASErsquo07] ModelicaPromela [SPINrsquo11]
I incorporate existing language extensions
I composition of language extensions are compile-time
I language specific analysis
I new applications of AGs
44 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Thanks for your attention
Questions
httpmeltcsumnedu
evwcsumnedu
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Eric Van Wyk and August SchwerdfegerContext-aware scanning for parsing extensible languagesIn Intl Conf on Generative Programming and ComponentEngineering (GPCE) pages 63ndash72 ACM 2007
Eric Van Wyk Derek Bodin Jimin Gao and LijeshKrishnanSilver an extensible attribute grammar systemScience of Computer Programming 75(1ndash2)39ndash54January 2010
August Schwerdfeger and Eric Van WykVerifiable composition of deterministic grammarsIn Proc of Conf on Programming Language Design andImplementation (PLDI) pages 199ndash210 ACM June 2009
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Ted Kaminski and Eric Van WykModular well-definedness analysis for attribute grammarsIn Proc of Intl Conf on Software Language Engineering(SLE) volume 7745 of LNCS pages 352ndash371Springer-Verlag September 2012
Lijesh Krishnan and Eric Van WykTermination analysis for higher-order attribute grammarsIn Proceedings of the 5th International Conference onSoftware Language Engineering (SLE 2012) volume 7745of LNCS pages 44ndash63 Springer-Verlag September 2012
Lijesh KrishnanComposable Semantics Using Higher-Order AttributeGrammarsPhD thesis University of Minnesota Department ofComputer Science and Engineering 2012httppurlumnedu144010
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Yogesh Mali and Eric Van WykBuilding extensible specifications and implementations ofPromela with AblePIn Proc of Intl SPIN Workshop on Model Checking ofSoftware volume 6823 of LNCS pages 108ndash125Springer-Verlag July 2011
Eric Van Wyk Lijesh Krishnan August Schwerdfeger andDerek BodinAttribute grammar-based language extensions for JavaIn Proc of European Conf on Object Oriented Prog(ECOOP) volume 4609 of LNCS pages 575ndash599Springer-Verlag 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Jimin Gao Mats Heimdahl and Eric Van WykFlexible and extensible notations for modeling languagesIn Fundamental Approaches to Software EngineeringFASE 2007 volume 4422 of LNCS pages 102ndash116Springer-Verlag March 2007
45 45
Recommended