© Copyright 2010 Hewlett-Packard Development Company, L.P. 1© Copyright 2010 Hewlett-Packard Development Company, L.P.
David LehaviHP Labs Israel
A NEW PARSING LANGUAGE FOR GUI AND VISUALLY
STRUCTURED DOCUMENTS
© Copyright 2010 Hewlett-Packard Development Company, L.P. 2
UNIVERSAL INTERFACE FOR ALL GRAPHICAL APPLICATIONS
STANDARD APPROACH: USE THE DOM
AND IF THERE IS NO DOM, OR A HYBRID ENVIRONMENT ?
WHY BOTHER ?
– New GUI for legacy apps (additional functionality, hiding sensitive data).
– Software testing (record and replay).
– Accessibility (speech activated apps).
• Mobile Devices
• Web 2.0 (Flash, fragmented toolkit environment)
• Hybrid environments
DOM inspector
•Object type
•Set/Get properties
© Copyright 2010 Hewlett-Packard Development Company, L.P. 3
What images do we need to understand ?
VISUAL LANGUAGES
– A two dimensional pixel word: bit map
– A two dimensional picture word (constructed from graphical tokens)
– Formal presentation: A•→(B•↓C)
– We only parse objects which are “cut by lines”.
– Less restrictive than it seems at first: we may generalize and parse objects which are “cut by curves” (overcome the X)
C
B
A
© Copyright 2010 Hewlett-Packard Development Company, L.P. 4
INTERMEZZO: USING LANGUAGE CONSTRUCTSFollowing Ken Thompsons work on regular expressions
UniversalmachineCompiler
Visuallexer
bytecode
Lang
uage
defin
itio
n
request
s
toke
ns
chara
cters
© Copyright 2010 Hewlett-Packard Development Company, L.P. 5
Finding useful language constructs
CHALLENGES IN GUI PARSING
• Expressability• Regular languages are too weak to describe recursive structures.
• Decidability & performance• Context free languages are too strong – they are undecidable.
• Ease of maintenance: Many GUI’s, and constantly changing.
• Robust to “lexing noise”: Input may originate from screenshot analysis.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 6
RADIO-BUTTON-SET EXAMPLE
A Naïve representation: (Radio•→Text)*↓
Problems: alignment, distances.
RTitled_E<Object X> = [ X C 0..50 Text ]
RTitled_M<Object X> = [ X C 0..50 Text
L L 0..50
X C 0..50 Text ]
RBS := V{RTitled_M<Radio>*RTitled_E<Radio>}
© Copyright 2010 Hewlett-Packard Development Company, L.P. 7
EBNF FOR REGULAR EXPRESSION
VPL = REGEX + FUNCTION CALLS AND DEFINITIONS
ADDING DISTANCES AND ALIGNMENTS
USING A VISIBLY PUSHDOWN META LANGUAGE
<RE>=<union>|<simple>
<union>=<RE>"|"<simple>
<simple>=<concat>|<basic>
<concat>=<simple><basic>
<basic>=<star>|<elementary>
<star>=<elementary>"*"
<elementary>= <group>|<token>
<group>="("<RE> ")“
<group>=[V>]"("<RE> ")"
<name>= standard
<call>=<name>"<"<values>">"
<values> = comma separated <value>
<value>=
<call>|<token>|<name>
<rule>=
<name>"<"<params>">=" (<group>|<col>|<call>)
<params>= comma separated <param>
<param>="Object" <name>
<elementary>=<group>|<call>|<col>|<token>
<range>= <int>".."<int>
<west>=
[TBC]<range>
(<call>|<token>|<name>)
<row>=
(<call>|<token>|<name>)
<west>?
<south>= [RLC][RLC]?<range><row>
<col>= "["<row><south>?"]"
© Copyright 2010 Hewlett-Packard Development Company, L.P. 8
LANGUAGE & COMPILATION - EXAMPLE
RTitled_M<Radio > = [ Radio C 0..50 Text
L L 0..50
Radio C 0..50 Text ]
RTitled_E<Radio > = [ Radio C 0..50 Text ]
RBS := V{RTitled_M<Radio>*RTitled_E<Radio>}
Each node is a function
Concat
Kleene-*
Col
Row
Radio Text
© Copyright 2010 Hewlett-Packard Development Company, L.P. 9
RUNNING THE VPL CODE - EXAMPLE
Line
Concat
Kleene-*
Col
Radio Text
UniversalVPL
machine RBS
© Copyright 2010 Hewlett-Packard Development Company, L.P. 10
GLOBAL ROBUSTNESS TO LOCAL AMBIGUITIES• Visual lexer returns atoms.
• Lexer assigns likelihood to any pair (atom, bounding box).
• We use conditional likelihood to avoid consistent errors.
• A “compound object” has heuristic “likelihood”
• VPL graph vertices are no longer functions, but co-routines (user space threads).• sit on (heuristic based) priority queue, and paused when their priority is low.
• can be forked when they get multiple return values.
50 % LO, 50 % scroller
© Copyright 2010 Hewlett-Packard Development Company, L.P. 11
QUESTIONS ?