Fusing Effectful Comprehensions - microsoft.com · Fusing Effectful Comprehensions Olli Saarikivi Aalto University olli.saarikivi@aalto. Margus Veanes Todd Mytkowicz Madan Musuvathi

Fusing Effectful Comprehensions ∗

Olli SaarikiviAalto University

[email protected]

Margus Veanes Todd MytkowiczMadan Musuvathi

Microsoft Research{margus,toddm,madanm}@microsoft.com

AbstractList comprehensions provide a powerful abstraction mechanism forexpressing computations over ordered collections of data declara-tively without having to use explicit iteration constructs. This paperputs forth effectful comprehensions as an elegant way to describelist comprehensions that incorporate loop carried state. This is mo-tivated by operations such as compression/decompression and se-rialization/deserialization that are common in log/data processingpipelines and require loop-carried state when processing an inputstream of data.

We build on the underlying theory of symbolic transducers to fusepipelines of effectful comprehensions into a single representation,from which efficient code can be generated. Using backgroundtheory reasoning with an SMT solver our fusion and subsequentreachability based branch elimination algorithms can significantlyreduce the complexity of the fused pipelines. Our implementationshows significant speedups over reasonable hand-written code (3×,on average) and a LINQ implementation of the pipeline (5×, onaverage) for a variety of examples, including scenarios for extractingfields with regular expressions, processing XML with XPath, andrunning queries over encoded data.

Finally, we formalize the semantics of symbolic transducersand their compositions as a transduction monad, which provides alink between the automata-theoretic view and a monadic view ofsymbolic transducers.

Categories and Subject Descriptors F.1.1 [Computation by Ab-stract Devices]: Models of Computation—Automata; D.3.4 [Pro-gramming Languages]: Processors—Code generation; F.3.1 [Log-ics and Meanings of Programs]: Specifying and Verifying and Rea-soning about Programs—Mechanical verification; D.3.2 [Program-ming Languages]: Language Classifications—Specialized applica-tion languages

General Terms Algorithms, Languages, Performance, Verification

Keywords transducers, comprehensions, fusion, deforestation,reachability analysis, applicative functors, monads

1. IntroductionList comprehensions provide a powerful mechanism for declara-tively specifying a pipeline of computations on collections of data.Programmers specify the various stages of the pipeline conciselyand modularly without using explicit iteration constructs, while theruntime ameliorates the cost of the abstraction by performing variousoptimization such as fusion/deforestation [27, 35].

This paper extends this idea to effectful comprehensions, anelegant way to describe list comprehensions that incorporate loop-carried state. As a motivation, consider the problem of analyzing

∗Microsoft Research Technical Report no. MSR-TR-2016-55

0010101101001010101100011101011111001011…

[Dec12, SPY, 50.13] [Dec13, SPY, 49.44] [Dec14, …

Deserialize (chars -> objects)

50.13, 49.44, 48.13, 51.32, 53.53, 49.12, 48.14, …

SelectPrice (objects -> floats)

0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 0, 0, 0, 0, 7, …

FindPriceDips (floats -> ints)

b1 a9 86 a8 70 7d a3 66 01 05 3a 0f d4 51 af 83...

Serialize (ints -> bytes)

Compress (bytes -> bits)

111000100100010000101111011111101101000...

75 49 95 7d 2e 98 80 e4 3e 76 0b 3b 2e 92 18 5e...

Decompress (bits -> bytes)

“12/12/12 SPY 50.13\n 12/13/12 SPY 49.44\n …”

UTF8Decode (bytes -> chars)

Data from disk/network

Data to disk/network

output =input.Decompress(. . .).Decode(. . .).Deserialize(. . .)

.SelectPrice(. . .).FindPriceDips(. . .)

.Serialize(. . .).Compress(. . .);

Figure 1. Motivating example of a log processing pipeline where aninput stream of bits goes through various stages to an output streamof bits. This paper allows programmers to declaratively specify thispipeline as a composition of symbolic transducers, as shown at thebottom.

logs as shown in Figure 1. The log on the disk (or coming acrossthe network from a file server) is compressed, and thus the userhas to first decompress the input stream of bits into bytes which isthen deserialized into objects in a higher-level language, such asJava. In this example, the application selects stock prices from eachobject and looks for price dips — decreases followed by increases.The output is then serialized and compressed before being writtenback to disk. Such processing from input stream of bits to outputstream of bits is not uncommon today. For instance, the processingin a single node, such as a mapper or a reducer, of data-processingsystems [2, 7, 13, 38], is similar to the one shown in Figure 1.

Note that the stages in the pipeline include both “functional”computations that operate on each input independently, such asSelectPrice, and “effectful” computations that iterate over theinput list while maintaining loop-carried state, such as Decompress,Deserialize, and FindPriceDips. The goal of this paper is toallow such pipelines to be declaratively and modularly specifiedas shown at the bottom of the figure, then fuse them to a singlerepresentation for which efficient code can be generated. We use avariation of symbolic transducers [33] as our program representation.

In order to provide some intuition we consider a concrete butsimplified example scenario of such a pipeline, consisting of twosymbolic transducers. The situation that we consider is a fairlytypical one when the raw input data is unstructured text, for examplewhen parsing CSV files. Raw text is most commonly assumed to

1 2016/10/3

be UTF8 encoded. Suppose that the task is to parse and extract anonnegative integer from the text, assuming a decimal encodingwith ASCII digits, i.e., matching the regex ^[0-9]+$. Suppose oursample pipeline is as follows: it first UTF8 decodes (Utf8Decode)and then parses an integer (ToInt). Utf8Decode takes as input asequence of bytes and produces a sequence of integers that are thedecoded Unicode character codes. For simplicity assume that onlyup to 2 byte encodings are allowed.1 Utf8Decode can be illustratedgraphically as follows:2

c∈[0xC2-0xDF]/[];r:=((c&0x3F)

hensions. Figure 4 presents a function implementing a pipeline ofthe Utf8Decode and ToInt comprehensions using C#’s LINQ [22]library4. Utf8Decode is represented as a SelectMany, which allowsproducing variable amounts of output. Since SelectMany does notencapsulate state usage, Utf8Decode uses ad-hoc state in the formof local variables, which complicates analyses by potentially allow-ing different stages in the pipeline to communicate through sharedstate. Because ToInt’s Update does not produce output it can berepresented with Aggregate, which does encapsulate state. However,writing effectful comprehensions that do partial state updates withAggregate is cumbersome, since returning the new state disallowsspecifying only the parts that change.

To address these concerns we present a C# interface (Section 5.1)for specifying effectful comprehensions that encapsulates state usage.The interface is similar to ones found in existing streaming libraries(Section 8). We translate programs that implement this interface intosymbolic transducers. Additionally, we provide specialized frontendsfor parsing scenarios based on regex and XPath matching.

We evaluate the efficacy of our approach on a variety of dataprocessing pipelines that decode, parse, compute, and then serializeback to disk. These pipelines exhibit common real-world scenariosof extracting data with regexes, querying XML files with XPath, andworking with (Base64) encoded data. On average, our fused code is3× faster than reasonable hand-written code and 5× faster than aLINQ implementation. We further demonstrate that our conservativereachability analysis and subsequent pruning based on backgroundtheory reasoning can significantly reduce the complexity of thesefused pipelines.

Finally, we formalize the semantics of transducers and their com-position using applicative functors and monads in a purely functionalstyle. A transduction monad extends state monads with compositionmechanisms allowing us to compose transducers. We found thisconnection to the functional programming world important becauseit explains the problem from a very different angle and lets us formal-ize composition unambiguously and succinctly. The functional viewalso provides a way to explain composition in a more declarativestyle, as opposed to automata based formulations that are mostlyoperational.

The contributions of this paper are:

• A variation of symbolic transducers with branching rules, whichsimplify analysis and code generation.• An algorithm for fusing symbolic transducers.• A branch elimination algorithm based on reachability analysis

which complements the satisfiability based branch eliminationbuilt into the fusion algorithm.• A frontend for specifying effectful comprehensions and a strat-

egy for translating these into symbolic transducers. Additionally,we provide frontends for regex and XPath based parsing scenar-ios.• A monadic formalization of transducers and their compositions.• A comprehensive evaluation demonstrating the efficacy of our

approach.

2. Symbolic TransducersThis section formally introduces symbolic transducers or STs, as ageneralization of symbolic finite transducers or SFTs. The definitionused here differs in the following key aspect from the originalintroduction of STs [33] — it is specialized for deterministic STs.This specialization is reflected in the way individual transitions are

4 The code for other list comprehension libraries, such as Java 8’s StreamsAPI, is largely similar.

q0 c∈[0-0x7F]

c∈[0xC2-0xDF][c];r:=0

false case

q1

true case

[];r:=((c&0x3F)

standard Cartesian product and function types, respectively.6 Thetype for Booleans is bool with truth values true and false. LetT (τ) denote a given predefined set of terms t that denote values[[t]] of type τ . In our implementation we use Z3 [12] expressions forT (τ) but the general definition is not restricted to any fixed represen-tation. Further, our implementation constructs no terms of the formT ((τ → σ)→ ρ), for which there is no direct representation or de-cision procedures in Z3. In general our theory and algorithms workwith any decidable background theory. A term in T (τ → bool) isa τ -predicate.

Let [τ ] denote the type of finite-length lists of elements of typeτ . A list of type [τ ] is denoted by [t1, . . . , tn] or [ti]ni=1 wheren ≥ 0 and each ti is a term or value of type τ . We assume thatif τ is a Cartesian product type τ1 × τ2 then there are projectionfunctions first : τ → τ1 and second : τ → τ2 and a pairingfunction 〈, 〉 : τ1 → τ2 → τ1 × τ2 with the intended semanticsthat [[〈t1, t2〉]] = ([[t1]], [[t2]]), and [[first〈t1, t2〉]] = [[t1]], and[[second〈t1, t2〉]] = [[t2]]. Each type τ denotes a nonempty set andhas a default element defaultτ .

The formal definition of a rule is as follows. Given types τ, o, ρand a finite set (or type) Q, letR(τ, o,Q, ρ) denote the smallest setX of rules satisfying the following conditions:

• Undef ∈ X;• for n ≥ 0 if {fi}ni=1 ⊆ T (τ → o), g ∈ T (τ → ρ), and q ∈ Q

then Base([fi]ni=1, q, g) ∈ X;• if ϕ ∈ T (τ → bool) and t, f ∈ X then Ite(ϕ, t, f) ∈ X .

A rule r ∈ R(τ, o,Q, ρ) denotes a partial function [[r]] of typeτ → [o]×Q× ρ : 7

[[Undef]] v = ⊥ for all values v;[[Base([fi]ni=1, q, g)]] v = ([[[fi]] v]

ni=1, q, [[g]] v);

[[Ite(ϕ, t, f)]] v ={

[[t]] v, if [[ϕ]] v = true;[[f ]] v, otherwise.

A symbolic transducer (ST) is a tuple (ι, o, ρ,Q, q0, r0, δ, $) wherethe components are:

• input type ι;• output type o;• register type ρ;• finite control state set Q;

• initial state (q0, r0) of state type σ def= Q× ρ, and s0 def= (q0, r0);• transition function δ : Q→R(ι× ρ, o,Q, ρ);• finalizer $ : Q→R(ρ, o,Q, ρ);

We indicate the component of an ST by using the ST as a subscript,unless the ST is clear from the context.

The finalizer is used to produce a final output list upon reachingthe end of the input list. It is a generalization of a final state.Intuitively one may think of the finalizer as being a special caseof the transition function that is triggered by a unique end-of-inputsymbol. However, unlike in the classical setting, formally such asymbol cannot in general be treated as an element of type ι. Instead

6 As usual,→ is right-associative. We assume that× is also right-associativeand has higher precedence than→.7 We can lift the rule type τ → [o] × Q × ρ to be the type of a totalfunction τ → ([o] × Q × ρ)? by using option types, but here we workdirectly with partial functions f of type τ1 → τ2 as relations of type τ1×τ2with the understanding that if (a, b), (a, b′) ∈ [[f ]] then b = b′. Moreover,[[f ]](a)

def= b if (a, b) ∈ [[f ]] and [[f ]](a) def= ⊥ if {b|(a, b) ∈ [[f ]]} = ∅.

of lifting every input type ι to a sum type of ι and an end-of-inputsymbol, end-of-input is handled separately by the finalizer.

We adopt the following variable naming conventions of termsoccurring in rules. In a term t occurring in a rule, variable x is oftype ι and refers to the input element and variable r is of type ρand refers to the register. To disambiguate between variables andfunctions that appear in formulas from those used in our definitions,proofs and algorithms, we use a mono-space font for the former.For example in (x = ϕ) the x is a literal part of the formula, whileϕ refers to another formula. Substitution of a variable y by a term uin t is denoted t{y 7→ u}.

In Utf8Decode, in Figure 5, the finalizer is depicted as q0 beingaccepting and q1 being non-accepting in the classical sense, meaningthat the finalizer is the function:

$Utf8Decode = {q0 7→ Base([], q0, 0), q1 7→ Undef}]The finalizer of ToInt, in Figure 6, that is shown as the dashed arrows,is the function:

$ToInt = {p0 7→ Undef, p1 7→ Base([r], p1, 0)},where the final value of the register r is output upon reaching theend of the input list in the control state p1, whereas the initial controlstate p0 is not valid as a final state and the input would be rejected ifthe input list terminates in this state.

An ST A denotes a transduction [[A]] that is a partial functionof type [ι] → [o]. First, we define the following partial semanticfunctions δ̂ : ι→ σ → [o]× σ and $̂ : σ → [o]× σ that enable usto provide a declarative definition of [[A]]:

δ̂ a (q, b)def= [[δ q]](a, b); $̂ (q, b)

def= [[$ q]] b.

Let ā = [ai]ki=1 be a given input list. Let π1(a, b)def= a. Then

[[A]] ādef= π1(((δ̂ a1)⊕ (δ̂ a2)⊕ · · · ⊕ (δ̂ ak)⊕ $̂)s0) (1)

where ⊕ is a left-associative operator of type(σ → [o]× σ)× (σ → [o]× σ)→ (σ → [o]× σ)

that composes single-input transduction steps into multi-input trans-duction steps, with the formal definition:

F1 ⊕ F2def= λs. let (u1, s1) = (F1 s) in

let (u2, s2) = (F2 s1) in(u1 + u2, s2)

where ‘+’ denotes list concatenation. For example, given ā =[0xC5, 0x93] and A = Utf8Decode, we have that, s0 = (q0, 0),

[[A]] ā = π1(((δ̂ 0xC5)⊕ (δ̂ 0x93)⊕ $̂)s0)= π1(((λs.([] + [‘œ’], s

0))⊕ $̂)s0)= π1((λs.([] + [‘œ’] + [], s

0))s0)= [‘œ’]

We refer to ⊕ as step composition and revisit it in Section 7. Themain intuition about⊕ is that it combines function composition withlist comprehension in the following sense. If the arguments F1 andF2 do not depend on the state s then⊕ corresponds to concatenation,as in a typical SelectMany list comprehension in LINQ. If, on theother hand, F1 and F2 produce no outputs and only transform thestate, then ⊕ corresponds to function composition.

3. Fusion of STsConsider two STs A and B such that oA = ιB . We want to fuseA and B into a single ST A ⊗ B such that [[A ⊗ B]] is equivalentto [[A]] ◦ [[B]], i.e., λx.[[B]]([[A]](x)). We first explain the main ideabehind the construction. We then explain the incremental algorithmthat makes the composition scale in practice. The control-state

4 2016/10/3

complexity of the algorithm is |Q|2. Typically |Q| is in the range of100-1000. The worst-case complexity with respect to the size of therules is also quadratical, even when the number of control states issmall. It is therefore instrumental to prune unreachable states earlyand to develop incremental algorithms.

3.1 Main ideaAt a high level, the fusion algorithm of A⊗B can be described asfollows. A ⊗ B has the following components: ι = ιA, o = oB ,ρ = ρA × ρB , Q ⊆ QA × QB , r0 = (r0A, r0B), q0 = (q0A, q0B).The goal of the fusion algorithm is to construct δA⊗B and $A⊗B .

For each pair (p, q) of control states in QA ×QB build a fusedrule that, given the rule δA p, symbolically runs δB q treating allof the output lists [vi]ni=1 that occur in the Base-subrules of δA pas symbolic values. The symbolic values are substituted into theregister update and output functions of (δ̂Bv1) ⊕ · · · ⊕ (δ̂Bvn),that is partially evaluated with respect to the control state q, andfinally normalized into a rule inR(ι× ρ, o,Q, ρ). The finalizer isconstructed similarly.

While such brute force approach will terminate in theory, becausethe output lists have a fixed length that is independent of the inputelement, it is highly impractical for several reasons. One problem iscontrol state space size, because |Q| = |QA||QB |. Another problemis output-branch explosion. Just consider self-composition of anencoder (say, with a single control state) that may output n elementsfor some input element. Then the composition may potentially outputn2 elements for some input element, but most of those cases maybe infeasible due to symbolic constraints imposed by the outputfunctions and their guards in A when considered as inputs of B. Forexample, an HTML encoder H may output a character with codehex(x÷ 32) in one of its branches, where

hex(y) = if (0 ≤ y ≤ 9) then (y + 48) else (y + 55)

if the guard γ(x) = 0x100 ≤ x ≤ 0xFFF holds for the inputcharacter x. However, in a double-HTML encoder H ⊗ H , thecorresponding composed guard γ(hex(x ÷ 32)) ∧ γ(x) for thatelement is unsatisfiable, which requires nontrivial integer linearconstraint reasoning in order to eliminate that branch. Such pruningrequires incremental symbolic techniques outside the scope of thebrute force approach.

3.2 Incremental fusionThere are several key optimizations used in the construction ofcomposed rules, powered by the use of the solver for deciding sat-isfiability and for model generation of predicates. One techniqueis to incrementally check for unsatisfiability and validity of guardsof newly formed Ite-rules and to remove branches that are inacces-sible and consequently also eliminate control states that becomeinaccessible. The distinction between control states and registers isinstrumental because finiteness of control states guarantees termina-tion and enables techniques not directly available over infinite statespaces.

We provide a top-down view of the fusion algorithm in Figure 7with further helper procedures in Figure 8. Fusion is implementedusing depth first search starting from (p, q) = (q0A, q

0B). Only

satisfiable parts of composite rules are ever explored. The procedureFUSE(γ,R, q) in Figure 7 uses an accumulating context conditionγ for a branch of an Ite-rule of A with R as the unexplored subrulein that context, and q is a control state of B. If the conditionSAT(γ ∧R′1 6= R′2) is false then for all (x, r) ∈ [[γ]], [[R′1]](x, r) =[[R′2]](x, r), so the branching condition is redundant. The conditionR′1 6= R′2 is itself, w.l.o.g., expressible as a ι×ρ-predicate. Thenewly discovered states in the depth first search are added to theFrontier in line 8 of the definition of PRODUCT in Figure 7. Elements

A⊗B1 let global Frontier = {(q0A, q0B)}2 let global Q = {(q0A, q0B)}3 let δ = $ = ∅4 while Frontier 6= ∅5 remove (p, q) from Frontier6 δ(p, q) 7→ FUSE(true, (δA p), q)7 $(p, q) 7→ FUSE$(true, ($A p), q)8 return (ιA, oB , ρA×ρB , Q, (q0A, q0B), 〈r0A, r0B〉, δ, $)

FUSE(γ,R, q) : (T (ι×ρ→bool)×R(ι×ρA, oA, QA, ρA)×QB)→R(ι×ρ, o,Q, ρ)

1 let θ = {r:ρA 7→ first(r:ρ)}2 match R3 case Undef: return Undef4 case Ite(ϕ,R1, R2):5 let R′1 = FUSE(γ ∧ (ϕθ), R1, q)6 let R′2 = FUSE(γ ∧ ¬(ϕθ), R2, q)7 if SAT(γ ∧R′1 6= R′2)8 return Ite(ϕθ,R′1, R′2)9 else return R′1

10 case Base(v̄, p, g):11 return PRODUCT(p, gθ,

RUN(γ, v̄θ, q, second(r:ρ)))

PRODUCT(p, g, R)1 match R2 case Undef: return Undef3 case Ite(ϕ,R1, R2):4 return Ite(ϕ, PRODUCT(p, g,R1),

PRODUCT(p, g,R2)5 case Base(v̄, q, h):6 if (p, q) /∈ Q7 add (p, q) to Q8 add (p, q) to Frontier9 return Base(v̄, (p, q), (g, h))

Figure 7. Fusion of STsA andB with oA = ιB . Definition of RUNis given in Figure 8.

of QA ×QB that are never added to Frontier are unreachable andthus irrelevant.

To construct a rule, the mutually recursive RUN(γ, v̄, q, s) andSTEP(γ, v, rest, R, s) procedures shown in Figure 8 symbolicallyexecute the step composition operator ⊕ for B over the symbolicvalue list v̄ starting from the state (q, s) of B. The satisfiabilitychecks in STEP on lines 6 and 10 maintain that the constructedrules only have branches that are feasible and non-redundant. Atrivial case of redundancy is when both R′1 and R′2 are Undef, butmore complicated conditional cases may arise when R′1 and R′2are syntactically different but semantically equivalent in the givencontext γ.

Observe how the procedure FUSE uses γ on lines 5–7: γ isincluded as a conjunct in every solver call to SAT and every recursivecall to FUSE. This pattern of use allows incremental SMT solving,where the solver is used in such a way that subsequent solvercalls can reuse clauses learned during previous calls. For example,on line 5 in FUSE this would be implemented by pushing (ϕθ)into the solver context before the recursive call and popping thecontext afterwards. In fact, both procedures FUSE and STEP usethe parameter γ in a way such that γ is included as a conjunct in(i) each call to SAT, and (ii) each γ argument formula in recursive

5 2016/10/3

RUN(γ, v̄, q, s) : (T (ι×ρ→bool)× [T (ι×ρ→ιB)]×QB×T (ι×ρ→ρB))→R(ι×ρ, o,QB , ρB)

1 match v̄2 case []: return Base([], q, s)3 case [v|rest]:4 return STEP(γ, v, rest, (δB q), s)

STEP(γ, v, rest, R, s)1 let θ = {r:ρB 7→ s, x:ιB 7→ v}2 match R3 case Undef: return Undef4 case Ite(ϕ,R1, R2):5 let R′1 =6 if SAT(γ ∧ (ϕθ))7 STEP(γ ∧ (ϕθ), v, rest, R1, s)8 else Undef9 let R′2 =

10 if SAT(γ ∧ ¬(ϕθ))11 STEP(γ ∧ ¬(ϕθ), v, rest, R2, s)12 else Undef13 if SAT(γ ∧R′1 6= R′2)14 return Ite(ϕθ,R′1, R′2)15 else return R′116 case Base(ū, q, g):17 return CONCAT(ūθ,RUN(γ, rest, q, gθ))

CONCAT(ū, R)1 match R2 case Undef: return Undef3 case Ite(ϕ,R1, R2):4 return Ite(ϕ,CONCAT(ū, R1),

CONCAT(ū, R2))5 case Base(v̄, q, h):6 return Base(ū+ v̄, q, h)

Figure 8. Step composition of B over a list of symbolic inputs v̄ inthe context γ.

calls.Furthermore when FUSE calls STEP on line 11 it passes its γas an argument. Therefore, each call to FUSE can use a single solvercontext incrementally for all satisfiability checks. The structureof pushing and popping the contexts follows the structure of theIte-rules. From our experience using the solver incrementally maydecrease the fusion time by an order of magnitude.

The fusion procedure for $A⊗B is omitted from the presentation,but is similar to the construction of δA⊗B . Elements of Q that onlylead to non-final control states (control states that only have theUndef finalizer) are also removed as “dead-ends” using the standarddead-end elimination algorithm for finite state automata [16].

Theorem 3.1. [[A⊗B]] = [[A]] ◦ [[B]]The proof is omitted for brevity. The main intuition for the proof

is that STEP implements a symbolic version of a single step of ⊕and RUN is a symbolic version of a run (of multiple steps) of ⊕.Once this connection is proved formally it can be used as a lemmafor proving that the transduction semantics given by Equation (1)(Section 2) is preserved by A⊗B.

3.3 Implementation remarksThe incremental satisfiability checks that are performed during STfusion are critical for the overall feasibility of the algorithm. Inalmost all of our case studies, the algorithm would not terminate oth-erwise. Several further optimizations are possible to locally improve

the succinctness of the generated ST. One such optimization is whatwe call symbolic constant propagation: applying the substitution θ toa sub-term t of ūθ in STEP(γ, v, rest, R,) may result in t becoming“constant valued”. This can be decided by checking unsatisfiabilityof the formula γ ∧ γ′ ∧ t 6= t′ where the variables in γ′ and t′ arefresh variants of the variables in γ and t. If the formula is unsatisfi-able, then t has a unique value in the context γ, independent of thevariables it contains, and can thus be replaced by that value. Such avalue can be queried from the solver by considering a model for theformula γ ∧ t = y where y is a fresh variable (a model exists sinceγ is satisfiable), and extracting the value of y from that model. Incode generation, we have witnessed that symbolic constant propa-gation may add significant performance improvements by avoidingunnecessary expression evaluation.

4. Reachability Based Branch EliminationFusing already removes many unsatisfiable branches. Still, theresulting STs may have a large number of control states and/orrules with redundant conditions. In particular some branches maybe unreachable due to state carried constraints, i.e., even thoughthe branch itself is satisfiable, the conjunction of reachable registervalues in the source states together with the branch is unsatisfiable.In this section we present a reachability based branch elimination(RBBE) algorithm, that proves the unreachability of and removessuch branches in the target ST. The algorithm is a combination ofsymbolic forward reachability and backward reachability algorithmsadapted to STs.

The reachability algorithm reasons about transition rules as aflattened set of Base-rules with their associated combined branchconstraints. Given a rule r ∈ R(τ, o,Q, ρ) let Paths(r) be definedas follows:

Paths : R(τ, o,Q, ρ)→ {(T (τ → bool)×T (τ → ρ)×Q)}

Paths(Undef) def= ∅Paths(Base([fi]ni=1, g, q))

def= {(true, g, q)}

Paths(Ite(ϕ, u, v)) def=⋃

(ψ,g,q)∈Paths(u){(ϕ ∧ ψ, g, q)} ∪⋃(ψ,g,q)∈Pathsv{(¬ϕ ∧ ψ, g, q)}

Since outputs do not affect reachability they are dropped from theflattened representation. Given an ST A let there be the following:

Movesδ(A) def=⋃p∈QA

⋃(ϕ,g,q)∈Paths(δA(p))

{(p, ϕ, g, q)}Moves$(A) def=

⋃p∈QA

⋃(ϕ,g,q)∈Paths($A(p))

{(p, ϕ)}

These give a flat representation of all transitions and finalizers(respectively) by source and target control state. We call elements ofthese sets moves and final moves respectively.

The ELIMINATE procedure in Figure 9 implements the top-level reachability algorithm. The variable w ∈ [ιA] is used torepresent a list of inputs. To check the reachability of a (final)move it calls ISREACHABLE with a ([ιA]× ρA)-predicate suchthat the (final) move is reachable if and only if the source controlstate can be reached such that the predicate holds (lines 5 and 9).If ISREACHABLE returns false then the branch is eliminated bysimplifying the corresponding Ite(ϕ, u, v), where u (or v) is theunreachable base rule, into v (or u). Note that if ISREACHABLEhits the bound k then it returns ⊥ and the branch can not be safelyremoved.

To minimize calls to ISREACHABLE, ELIMINATE uses a moreefficient COMPUTEUNDERAPPROXIMATION procedure. It performsa breadth-first forward-reachability analysis from the initial state andtags moves whose path conditions from the initial state are satisfiableas reachable. Breadth-first search increases coverage and ensuresthat there are potentially several states in a breadth-first frontier for

6 2016/10/3

ELIMINATE(A)1 let U = COMPUTEUNDERAPPROXIMATION(A)2 let M = Movesδ(A) ∪Moves$(A) \ U3 let k = |QA|4 foreach move (p, ϕ, g, q) in M5 let ϕ′ = (w 6= []) ∧ ϕ{x 7→ Head (w)}6 if ISREACHABLE(A, p, ϕ′, k) = false7 eliminate the corresponding branch in δA8 foreach final move (p, ϕ) in M9 let ϕ′ = (w = []) ∧ ϕ

10 if ISREACHABLE(A, p, ϕ′, k) = false11 eliminate the corresponding branch in $A12 remove control states with no path from q0A

Figure 9. Reachability based branch elimination (RBBE).

ISREACHABLE(A, qtgt, ϕtgt, k) : (ST ×QA×T ([ιA]×ρA → bool)× int)→ bool

1 let layer = {qtgt}2 let layer ′ = ∅3 let Ψ′ = empty = {q 7→ false | q ∈ QA}4 let Σ = Ψ = empty ] {qtgt 7→ ϕtgt}5 while layer 6= ∅6 while layer 6= ∅7 pop q from layer8 let ψ = Ψ[q]9 if q = q0A ∧ SAT(ψ{r 7→ r0A})

10 return true11 foreach (p, ϕ, g, q) in Movesδ(A)12 if (ϕ depends on r8) or (g depends on x8)13 let update = g{x 7→ Head (w)}14 let γ = (w 6= []) ∧ ϕ{x 7→ Head (w)}∧

ψ{w 7→ Tail (w), r 7→ update}15 else16 let γ = ψ{r 7→ g{x 7→ defaultιA}}17 if SAT(γ ∧ ¬Σ[p])18 let Σ[p] = Σ[p] ∨ γ19 let Ψ′[p] = Ψ′[p] ∨ γ20 add p to layer ′21 if k = 0 ∧ layer ′ 6= ∅22 return ⊥23 let k = k − 124 let layer = layer ′25 let layer ′ = ∅26 let Ψ = Ψ′27 let Ψ′ = empty28 return false

Figure 10. Checking the reachability of a state predicate.

the same control state, hopefully capturing different ways of enteringthe control state. While more sophisticated under approximationsare possible, this basic version was adequate for our experiments.

The ISREACHABLE procedure in Figure 10 performs a backwardbreadth-first traversal on A, exploring the states one layer at a time.Each layer is associated with the map Ψ from control states toreachability conditions yet to be explored. Initially the control stateqtgt is mapped to the predicate ϕtgt. Σ maps control states to the

8 These checks can be performed with an SMT solver call. For example∃i,r,r’ (ϕ 6= ϕ{r 7→ r’}) is satisfiable iff ϕ depends on the register.

predicates that summarize the arguments for which exploration hasalready been performed or is about to be performed.

Let ∆A denote the following partial function that extends thetransition function δ̂A to input lists and omits the output part:

∆A : [ιA]× σA → σA∆A([], s)

def= s

∆A([i|w], s)def= ∆A(w, π2(δ̂A i s))

A state s is k-reachable (in A) if there exists w ∈⋃n∈[0,k](ιA)

n

such that ∆A(w, s0A) = s. For example s0A is 0-reachable. A state

s is reachable if it is k-reachable for some k ≥ 0. Given q ∈ QAand an ρA-predicate ϕ, we say that (q, ϕ) is (k-)reachable if thereexists a (k-)reachable state (q, r) such that r ∈ [[ϕ]]Theorem 4.1. If ISREACHABLE(A, qtgt, ϕtgt, k) equals (a) truethen (qtgt, ϕtgt) is reachable; (b) false then (qtgt, ϕtgt) is notreachable; (c) ⊥ then (qtgt, ϕtgt) is not k-reachable.

Proof. First, we prove the theorem with one optimization turnedoff — the branch condition in line 12 always returns true. Let ψtgtbe the σA-predicate (q = qtgt) ∧ ϕtgt. The algorithm maintainsthe following invariant that for all entries (q 7→ ϕ) ∈ Σ such thatϕ 6= false:

(i) SAT(ϕ) and(ii) for all (w, r) ∈ [[ϕ]]: ∆A(w, (q, r)) ∈ [[ψtgt]]

Property (i) follows from the observation that, other than the initialvalue false, only satisfiable predicates are added to Σ[q] and thatsatisfiability remains true under disjunctions. Property (ii) followsby induction over |w| using the definition of ∆A and that theconstruction of γ in line 14 is the weakest precondition with respectto ψ and the given move from p.

Now (a) follows from the fact that if the procedure terminated inline 10 then ∃w ((w, r0A)∈ [[Σ[q0A]]]). So, by (ii), ∃w (∆A(w, s0A)∈[[ψtgt]]). The proof of (c) is by induction over k, showing that allpossible behaviors for input lists of up to length k that from somestate lead to ψtgt are captured in Σ. This implies that the initialregister must be captured in some layerk predicate Ψk[q

0] for thereto be a path from the initial state to the target state. The satisfiabilitytest in line 17 ensures that [[ϕ]] 6⊆ [[Σ[p]]]. In other words, if the testfails then [[ϕ]] ⊆ [[Σ[p]]], so no behavior is lost by excluding ϕ in thatcase. Statement (b) follows from (c), because if false is returnedfor k, then false is returned for any bound greater than k.

The condition in line 12 filters out input-noise: the else-case istaken in line 16 if the input element does not affect the registerupdate, which is when the guard does not depend on the registerand the register update does not depend on the input element. Inthis case, the summaries in Σ may accept shorter words than whatis required by ∆A, but the register part of the predicate is notaffected by omitting the input element because it does not influenceit. Here we need to assume that there is no (p, ϕ, g, q) ∈ Movesδ(A)for which ϕ is unsatisfiable. Otherwise the definition of γ inline 16 is unsound when ϕ is unsatisfiable. If ϕ is satisfiable then(∃i ϕ{r 7→ defaultρA}) ∧ ψ{r 7→ g{x 7→ defaultιA}} isequivalent to ψ{r 7→ g{x 7→ defaultιA}}.

The statement (ii) can no longer be used directly, but must bemodified to count for the omitted input elements, that become muchlike input-epsilon moves. Intuitively, the ST is implicitly convertedinto an εST (ST with input-epsilon moves) although the input-epsilon moves do still count against the bound k.

In this algorithm, Σ enables a crucial subsumption checking forpredicates (line 17) — if a reachability condition ϕ for a control statep is subsumed by Σ[p], then any search from ϕ is already covered,so adding ϕ to the next layer would be redundant. A subtlety is to

7 2016/10/3

abstract class Transducer {abstract IEnumerable Update(I datum);virtual IEnumerable Finish() { yield break; }

}

Figure 11. The C# abstract class users extend.

avoid the possible quantifier alternation that would arise if we treatΣ[p] as the predicate ∃w (Σ[p]) (i.e. characterize the reachable setof registers independent of inputs used to reach them). This couldpotentially introduce undecidability. However, the test in line 17works because it is sufficient in the the else case (when we omit ϕ).When the else case is taken, it means that

∀w,r (ϕ⇒ Σ[p])holds, which implies that

∀r (∃wϕ⇒ ∃wΣ[p]) (2)holds. Condition (2) is the necessary condition needed to preserveall register values.

5. Specifying Effectful ComprehensionsWe have explored several frontends for specifying effectful com-prehensions. In Section 5.1 we present a frontend that translatesimperative C# code to STs. This pattern matches interfaces presentin existing streaming frameworks, which we discuss in Section 8.

Some comprehensions can be more efficiently specified with aspecialized frontend. In Section 5.2 we translate regexes with namedcaptures into STs, while Section 5.3 presents a similar approach forXPath queries.

5.1 Effectful Comprehensions as C#We have implemented a translation from a subset of C# to STs. Usersextend the abstract class in Figure 11, where the Update and Finishmethods respectively define δ and $. Users may opt to not overrideFinish, in which case a trivial no-op finalizer is used.

Example 5.1. The following code implements the ToInt transducerfrom Figure 3:

partial class ToInt : Transducer {bool IsDigit(char c) {

return 0x30

integer in decimal notation and the substring in the fourth column isparsed as a Boolean:

(([^,]*,){2}(?\d+),(?\w+),[^\n]*\n)*

Here S1 is "([^,]*,){2}" (skip to the third column), S2 is ","(skip to the next column), and S3 is ",[^\n]*\n" (skip remainingcolumns until EOL). The capture int is mapped to the transducerToInt from Figure 3 and the capture bool is mapped to a transducerToBool, which maps the strings “true” and “false” respectively totrue and false. �

5.3 Effectful XPath ComprehensionsFor extracting information from XML formatted data we use trans-ducers constructed from XPath9 query expressions. Consider anexpression X of the form

st:trans(/tag1/tag2/tag3 · · · /tagn)The tag names tagn specify a path to match in an XML file. trans isa name that maps to a transducer A that maps the contents of anymatching elements to output of type o. Given X and the transducerA, a fused transducer that parses matches of X into values of o isconstructed. The matcher for the query uses counting with an integerregister to ignore arbitrarily deep nestings of non-matching elements.Otherwise the algorithm is similar to the one for regular expressionsin Section 5.2 (i.e. for steps 2 and 3).

Example 5.3. Consider the following XML:

PST893

88410

A transducer based on the following XPath expression will extractthe populations in the dataset:

st:int(/cities/city/population)

int again maps to the ToInt transducer from Figure 3. �

6. EvaluationWe have implemented the techniques described above in a tool thattranslates C# (and our other frontends) into STs, fuses them andfinally generates efficient C# code. For each control state a labeledcode block that implements the transition rule is generated. Givena rule, a tree of if else statements is generated, where each leafconsist of an appropriate sequence of outputs, state updates andfinally a goto to the code block of the target control state.

We evaluate the viability of our approach with a set of benchmarkpipelines. The experiments were run on an Intel Core i5-3570K CPU@ 3.4 GHz with 8 GB of RAM. All reported throughputs are meansof a sufficient number of samples to obtain a confidence intervalsmaller than ±0.5 MB/s at a 95% confidence level. All pipelineswere run through C#’s NGen tool, which produces native code forC# assemblies ahead-of-time.

Figure 12 presents throughputs for three variations of eachpipeline. For LINQ the pipelines communicate with IEnumerableand yield. The Hand-written pipelines are straightforward imple-mentations using arrays as buffers between phases. The fused andoptimized pipelines are labeled Fused. The individual pipeline stages

9 See https://en.wikipedia.org/wiki/XPath.

UTF8-lines

Base64-avg

CSV-max

Base64-delta

624.6

44.6

176.5

57.6

569.7

80.1

37.4

19.9

92.7

16.5

67.6

13

Throughput (MB/s)

LINQHand-writtenFused

Figure 12. Throughputs for different pipeline versions

in the LINQ and Fused pipelines use code generated from STs by ourimplementation, while the Hand-written pipelines use Hand-writtenC# and .NET system libraries where available. For the Hand-writtenpipelines we did not perform any manual fusion, since the aimof this paper is to allow pipeline stages to be specified modularlywith the fusion being handled by the compiler. Four pipelines werebenchmarked:

Base64-avg calculates a running average (window of 10) forBase6410 encoded ints and re-encodes the results in Base64.

CSV-max decodes an UTF-8 encoded CSV file to UTF-16, ex-tracts the third column with a regular expression and finds themaximum length of these strings. The output is a single UTF-8encoded decimal formatted integer.

Base64-delta reads Base64 encoded ints and outputs deltas ofsuccessive inputs as UTF-8 encoded decimal integers on separatelines.

UTF8-lines decodes an UTF-8 encoded file to UTF-16 and countsthe number of newline characters. The output is a single UTF-8encoded decimal formatted integer.

For Figure 12 we sampled the pipelines with 100 MB of data. Forthe UTF8-lines pipeline we used Herman Melville’s “Moby Dick”repeated a sufficient number of times, while for the others we usedrandomly generated data. For all pipelines except CSV-max theLINQ version has the lowest throughput. We believe this is due to theoverhead associated with passing values through IEnumerable.

Figure 13 presents a more detailed comparison of CSV parsingscenarios. Pipelines for three different datasets are compared:

CHSI is a dataset on health indicators from the U.S. Departmentof Health & Human Services. The three pipelines produce theaverage lung cancer deaths, minimum births and maximum totaldeaths for counties in the dataset.

SBO is a dataset on business owners from the U.S. Census Bureau.The three pipelines find the maximum employees, minimumgross receipts and average payroll for businesses in the dataset.

CC is a dataset of consumer complaints received by the U.S.Consumer Financial Protection Bureau. The pipeline producesthe maximum value for the ID column.

Each of the Fused pipelines in Figure 13 apply four effectfulcomprehensions: (i) decode UTF-8 to UTF-16, (ii) parse a columnas an int using a regular expression based parser, (iii) run a query(maximum, minimum or average), and (iv) output the result as asequence of bytes. The pipelines differ only in the regular expressionand query used.

10 See https://en.wikipedia.org/wiki/Base64.

9 2016/10/3

CC-id

SBO-payroll

SBO-receipts

SBO-employees

CHSI-deaths

CHSI-births

CHSI-cancer

103.2

572.7

619.2

594.1

282.4

275.1

290.6

44.6

220

211.6

234.1

81.1

80.8

94.4

34.6

86.1

85.9

86.8

83.7

84.2

84.7

Throughput (MB/s)

LINQHand-writtenFused

Figure 13. Throughputs for CSV parsing pipelines

TPC-DI-SQL

PIR-proteins

DBLP-oldest

MONDIAL-pop

310.3

326

459.7

305.6

67.5

82.6

84

83.8

31.4

25.7

28.2

58.7

41.8

41.6

61.6

Throughput (MB/s)

XmlDocumentXPathReaderLINQFused

Figure 14. Throughputs for XPath matching pipelines

Each version of the pipelines uses the same regular expressionfor parsing the CSV file. For example, the expression (([^,]*,){5}(?\d+),[^\n]*\n)* is used in the maximum employeespipeline for matching the sixth column on each line. In the Hand-written tests the .NET framework’s RegexOptions.Compiled optionwas used, which generates a .NET assembly for doing the matching.This extra work is not counted against the reported throughputs. An-other optimization we implemented for the Hand-written pipelinesis that the regular expression is matched for the whole dataset andthe values captured are then iterated. This proved to be significantlyfaster than splitting the dataset into lines and running the regularexpression on each line separately.

The original SBO dataset is 744 MB, which caused the .NETregular expression library to run out of memory. To work aroundthis we cut the dataset down to a 83 MB prefix. Our fused pipelinesare free of such limitations due to their incremental nature.

The fused pipelines are significantly faster for all benchmarks,with the average speedup being over 2.9× over the Hand-writtenpipelines.

Figure 14 presents throughputs for XML processing scenarios.Four pipelines are compared:

TPC-DI-SQL The dataset was generated by a tool from the TPC-DI benchmark [26]. The pipeline extracts ids of accounts fromcustomer records and for each outputs an SQL insert statement.

Pipeline Eliminated Left Pipeline Eliminated Left

Base64-delta 0 77 SBO-employees 7 78CSV-max 6 65 SBO-receipts 11 117Base64-avg 0 163 SBO-payroll 10 107UTF8-lines 1 10 TPC-DI-SQL 238 936CC-id 1301 5274 PIR-proteins 198 758CHSI-cancer 113 1134 DBLP-oldest 104 456CHSI-births 143 1434 MONDIAL-pop 162 662CHSI-deaths 144 1444

Figure 15. Branches eliminated by RBBE and branches left.

PIR-proteins The dataset is a protein dataset from the U.S. basedNational Biomedical Research Foundation. The pipeline extractsthe lengths of all proteins in the dataset and outputs the averagelength.

DBLP-oldest The dataset is bibliographic information from theDigital Bibliography Library Project. The pipeline extracts thepublication year of each article and outputs the earliest year.

MONDIAL-pop Mondial is a dataset extracted from various geo-graphical Web data sources. The pipeline extracts the populationof each city in the dataset and outputs the highest population.

All of the Fused pipelines in Figure 14 use an XPath based transducerfor extracting the relevant data. The XmlDocument pipelines usethe the XPath matching implemented in C#’s standard libraries.The throughput for the XmlDocument version of the PIR-proteinspipeline is not reported due the library running out of memory withthe 700 MB dataset. The XPathReader pipelines use Microsoft’sXPathReader library, which allows evaluating a subset of XPath in astreaming manner. Due to its streaming nature it is able to processthe PIR-proteins dataset.

The Fused versions have the highest throughput on all of theXPath benchmarks, with an average speedup of 11× over thestreaming XPathReader library. The fact that in the Fused pipelinesthe XPath matching code is specialized to the query is likely to giveit a significant advantage over the XmlDocument and XPathReaderversions, which do not perform any code generation. This also holdsfor the LINQ pipelines, which were second on all XML benchmarks.For queries over large XML datasets using our approach over ageneral purpose XPath library makes sense, as the speedup willmake up for the compilation time.

Figure 15 presents the number of branches in rules removedby RBBE (Section 4) for each pipeline. The numbers are sums ofremovals after all fusions that contribute to the complete pipeline.

We can see that for most pipelines applying RBBE resulted inbranches being removed. Thus RBBE is helpful for allowing biggerpipelines to be practically fused.

7. Symbolic Transducers and MonadsThis section provides redefinitions of the ⊕ and ⊗ operators interms of applicative functors and monads. In addition to beingconcise, these definitions provide a link between an automata-theoretic view and a functional view of symbolic transducers. This isto our knowledge the first time transductions have been successfullyrelated to monads, which has been unsuccessfully attempted before.For more discussion and the exact connection to LINQ’s list monadsee Section 8.

Given types σ and τ we define TMστ as the type σ → (τ×σ)?11,which as we will later show is a transduction monad. We use thehigher order applicative functor [21] operators pure and ?, defined

11 ? is the option type. We write a wrapped value x as �x and no value as ⊥.

10 2016/10/3

as follows:pure : τ → TMστ

pure f def= λs.�(f, s)

? : TMσ(τ → τ ′)→ TMστ → TMστ ′

F ? Xdef= λs.let �(f, s1) = (F s) in

let �(x, s2) = (X s1) in�((f x), s2)

The intuition is that ? captures side-effects of F in s1 and propagatesthem to X which produces side-effects s2 while the output is (f x).

Now the step composition operator ⊕ can also be defined as⊕ : TMστ → TMστ → TMστ

f ⊕ g def= (pure +) ? f ? gwhere + denotes list concatenation, although other operators, suchas addition, maximum, and minimum, could be used. One reasonwhy such operators are interesting is that they allow us to defineaggregation operations without explicitly using state for accumu-lating the intermediate result. Regardless of the operator used as+, the purpose of ⊕ is to compose together output results whilepropagating the effects of the computations “from left to right” asloop carried state.

The type of ⊕ in this definition (assuming + is concatenation) is

(σ → ([o]× σ)?)→ (σ → ([o]× σ)?)→ (σ → ([o]× σ)?)

Note that the original definition in Section 2 did not wrap thetransition functions inside the option type and instead defined themas partial functions. For the functional view we make the functionstotal by representing rejection with ⊥.

The bind operator for TMστ is

�= : TMστ1 → (τ1 → TMστ2)→ TMστ2F �= G def= λs.let �(a, s′) = (F s) in (Gas′)

We may now view TMστ as a transduction monad with the givenbind operator and whose unit operator is pure. It follows from thedefinitions that the monad laws hold. One can view this monad as acombination of the state monad and the option monad.

The fusion composition operator ⊗ (Section 3) can be definedusing the bind operator. First let there be:

fuse : ([ι]→ TMσA [τ ])→ ([τ ]→ TMσB [o])→([ι]→ TMσA×σB [o])

fuse A B def= λx̄. (A′ x̄)�= B′ whereA′ ā

def= λ(s1, s2).let �(b̄, s′1) = (A ā s1) in �(b̄, (s′1, s2))

B′ b̄def= λ(s′1, s2).let �(c̄, s′2) = (B b̄ s2) in �(c̄, (s′1, s′2))

Note how in fuse the ST A uses its own state that is disjoint fromthe state of B, and the function builds the disjoint sum of the states.Further, notice that the output b̄ of A may depend on the state s1,so the state s′2 may, through b̄, depend on s1, whereas s′1 does notdepend on s2. The latter property is integral to the fusion algorithmin Section 3. Now ⊗ can be defined as:

(A, s0A)⊗ (B, s0B)def= (fuse A B, (s0A, s

0B))

Note that here we represent an ST A as a pair of a function of type([ιA] → TMσA [oA]) and the initial state of A. To run transducersrepresented like this the following can be used:

runST : ([ι]→ TMσ[o])× σ → [ι]→ [o]runST (A, s) def= λx̄.let �(ȳ, s′) = (A x̄ s) in ȳ

Effectively, given an ST A, (runST (A, s0A)) is its denotation [[A]].In functional languages the state monad is typically implemented

using lazy evaluation and fuse could in principle be implemented

similarly. In contrast to these languages wherein unfeasible pathsare never explored by virtue of lazy evaluation, the fusion algorithmin Section 3 implements a statically optimized binding operator forthe transduction monad which statically prunes unfeasible paths. Webelieve similar static fusion techniques could also be applied to codewritten using the state monad.

8. Related WorkSymbolic transducers: were originally defined in flat form in [33].The main focus of the work in [33] is on symbolic finite transducersor SFTs, for analysis of string sanitizers. It is noted in [33] that STsare closed under composition, but, to the best of our knowledge,no algorithm for fusing STs has been studied prior to our work.Prior work on STs has focused on register exploration and inputgrouping that are orthogonal problems [11, 34]. Register explorationattempts to project the register type ρ into a Cartesian product typeρ1 × ρ2 where ρ1 is a finite type, the primary goal is to reduceregister dependency by migrating ρ1 into the set of control states.Input grouping tries to take advantage of grouping characters intolarger tokens in order to avoid intermediate register usage, thathas applications in decoder analysis [11] and parallelization [34].Efficient fusion of STs has, to the best of our knowledge, not beenstudied prior to our work.

Streaming: There is a large body of work on stream-process-ing [14, 20, 23, 24, 30]. There is also recent work on a domainspecific language DReX [8] for expressing regular string transfor-mations. Stream computations with internal state have been studiedbefore. The work in [10] defines a Stream data-type with internalstate that yields elements and allows operations such as map, fold,and zip. These operations are functional and operate on one ele-ment at a time with no operation-state carried across elements. Thestate in the Stream allows one to represent the current position,and bundling in the case of generalized stream fusion [19], in thestream. In contrast, our focus is on applying transformations thathave operation-state carried across elements (as opposed to streamshaving state). This allows us to represent effectful functions such asUTF decoding/encoding.

Some libraries for streams provide APIs for expressing state-ful operations. The Apache Flink [7] and Spark Streaming [5] dis-tributed streaming engines both provide support for using state instream operations and an associated framework for implementingfault tolerance in the presence of state. The Highland.js [3] and Con-duit [1] are traditional stream libraries, which both provide a way toexpress stateful operations. However, in these libraries the statefuloperations are treated as black boxes, as opposed to our approachthat fuses operations in compositions of STs. Implementing fron-tends similar to the C# one (Section 5.1) for these libraries wouldallow code written for them to use our backend.

StreamIt [31] is a programming language and compiler forsignal processing applications. StreamIt composes pipelines ofstateless filters with the aim of reducing communication overhead.In [6] composition is extended to filters with a linear state spacerepresentation, i.e., ones where the outputs and state updates arelinear operations. The composition retains the linear state spacerepresentation with a linear increase in size.

In contrast to StreamIt, we can compose any stateful filters wherethe state update is over a decidable theory, and instead of linearalgebra we use SMT solvers for our analysis. We view the work doneby the StreamIt group as complimentary to ours: the compositionand optimization techniques for symbolic transducers could be usedas an additional backend module in the StreamIt compiler for statefulfilters which are not amenable to a linear state space representation.

Monads: have had a huge impact on programming paradigmsand techniques in general after they were introduced into the func-

11 2016/10/3

tional programming world by Wadler [36]. One of the core contribu-tions of monads is that they provide a type discipline by which onecan enforce a separation of computational concerns in a clean func-tional style. A prime example is the state monad [37]. Another veryuseful monad is the maybe monad [36]. Our transduction monadtype TMστ is more-or-less the type for the maybe state monad pa-rameterized with the state type σ and the output type τ , and extendedwith extra composition operations for step and fusion composition.The fusion composition operator ⊗ is based on the monad bindingoperator �= but is itself not a binding operator because it usesdifferent monad state types. The “maybe” part in the transductionmonad reflects the fact that (deterministic) transducers are typicallypartial functions and their composition (that corresponds exactlyto fusion composition here) is often treated as a special case ofrelational composition.

LINQ [22] uses the list monad (or list comprehension [36])as its primary construct for query processing and (unlike SQL)also supports nested lists. The list comprehension construct is inLINQ expressed with the Select or, more generally, SelectManyextension method of the IEnumerable class. The exact relationto the transduction monad is that the list comprehension in LINQcorresponds to iterating the step composition operator ⊕ (Section 2)over the input list. Step composition handles loop carried state. TheLINQ query

"Man".SelectMany(A.Update)

corresponds to the following transduction or effectful comprehension,provided that we apply it to the initial state of A:

(δ̂A ‘M’)⊕ (δ̂A ‘a’)⊕ (δ̂A ‘n’)

The state of the computation (δ̂A ‘M’) is threaded through into thecomputation (δ̂A ‘a’), etc. For example, if we take A to be theBase64 encoder, and we start from the initial state (at the pointwhen no characters have been read so far) then the output would bethe string "TWFu". This is consistent with the existing semantics ofLINQ.

In Figure 4 in Section 1 the finalizer for ToInt can be imple-mented as a separate piece of code after the state has been ag-gregated. However, for transducers whose Update function pro-duces output the following pattern would be natural: SelectMany(i=> Update()).Concat(Finalize()), where Finalize returns anIEnumerable. This pattern is semantically correct, but relies onthe fact that Concat evaluates its parameter lazily. With eager eval-uation Finalize would access state before Update had been calledfor all inputs. We feel this reliance on subtle semantics makes LINQa poor match for writing effectful comprehensions. This is anotherconcern we address with our C# frontend.

Fusion: For fusion of symbolic transducers there is related workon filter fusion [27] and deforestation [35]. Fusion of symbolictransducers can be viewed as an extended form of filter fusion thatincorporates loop carried state and advanced constraint satisfactiontechniques into the classical framework.

The Steno library in [25] implements deforestation for LINQqueries and achieves speedups from removing the IEnumerableabstraction similar to what we report in Section 6. In contrastwith our work, Steno treats filters as black boxes, although thedeforestation can expose some optimization opportunities to thecompiler. Additionally, some of Steno’s optimizations assume thatfilters are stateless.

Filter fusion has also been extended to network fusion [15] thatuses the product of labeled transition systems, to merge a networkof interconnecting components. Synchronous product of automataand fusion of symbolic transducers have different semantics andcomputational complexities.

The work in [29] is related to our work regarding motivation. Thedifference is in the execution, we use an automata based definitionof transducers with an explicit control flow graph and use an SMTsolver as an oracle in our algorithms. This leads to a different set ofalgorithms and opens up a different set of optimization techniques.We build on some of the work in [33] by extending it with anincremental fusion algorithm and reachability analysis. The authorsof [29] were not able to relate their work to monads but use theSML type system in general. In our case the definition of the stepcomposition operator ⊕ uses applicative functors or idioms [18, 21]— it does not require full monad functionality.

Regex: Our construction of symbolic transducers from regexes isrelated to the work in [28]. On one hand our algorithm only handlesa special class of regexes, but on the other hand it supports fullUnicode by using the .NET regex parser and represents guards bypredicates over 16-bit bit-vectors (i.e., the char type). Regexes arevery handy for capturing custom patterns, for example for somespecific CSV file or some specific alphabet (such as the emoticonalphabet12.). This is reminiscent to handling hierarchical data, suchas XML, but with more relaxed rules, e.g., a line in a custom CSVfile may (or may not) end with a comma.

To handle XML data we use transducers generated from a subsetof the XPath query language. For a full automata theoretic treatmentof XPath see [9], where an approach for evaluating and reasoningabout XPath expressions (extended with regular expressions) basedon two-way weak alternating tree automata is presented.

List comprehensions have also been extended with ORDER BYand GROUP BY constructs [17] that are also supported in LINQ. Itis an ongoing research topic for us to investigate whether symbolictransducers can be extended similarly and, if so, to understand whatthe potential payoffs are.

9. ConclusionGood abstractions let a programmer easily express their intent as aprogram and at the same time let a runtime system compile thatprogram for efficient execution. This paper puts forth effectfulcomprehensions as an abstraction for expressing possibly-statefuldata-processing pipelines. We present fusion and branch eliminationalgorithms for these effectful comprehensions, which allow us tocompile large pipelines into efficient code.

We use symbolic transducers to represent individual and fusedstages in a data-processing pipeline, which we additionally formalizewith transduction monads. The monadic view provides very concisesemantics for transductions and their compositions. On the otherhand, our fusion and branch elimination algorithms use an automata-theoretic view, which allows them to exploit the separation of control-state from other state.

We have built a compiler that ingests pipelines written in C#and produces fused code that runs, on average, 3× faster than ahand-written baseline and 5× faster than LINQ on a variety of dataprocessing programs. In the future we will explore more extensiveoptimizations that rely on background theory reasoning to proveprogram properties. One such optimization we excluded from thispaper due to space constraints exploits minimization of symbolicfinite automata to simplify control flow.

In the future we intend to explore hierarchical compositions,i.e., parts of an effectful comprehension being specified in termsof another. In Sections 5.2 and 5.3 we use a specific pattern ofhierarchical composition for which fusion is straightforward. Weaim to expand this work to allow hierarchical compositions in ourgeneral C# frontend (Section 5.1).

12 See http://unicode.org/charts/PDF/U1F600.pdf

12 2016/10/3

References[1] Conduit (Haskell library).

https://github.com/snoyberg/conduit.

[2] Apache Hadoop.http://hadoop.apache.org/.

[3] Highland.js.http://highlandjs.org/.

[4] The .NET compiler platform “Roslyn”.https://github.com/dotnet/roslyn.

[5] Spark Streaming.http://spark.apache.org/streaming/.

[6] S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing streamprograms using linear state space analysis. In Proceedings of the 2005International Conference on Compilers, Architectures and Synthesisfor Embedded Systems (CASES’05), pages 126–136. ACM, 2005.doi:10.1145/1086297.1086315.

[7] A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske,A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters,A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, andD. Warneke. The Stratosphere platform for big data analytics. TheVLDB Journal, 23(6):939–964, Dec. 2014. doi:10.1007/s00778-014-0357-y.

[8] R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A declarativelanguage for efficiently evaluating regular string transformations. InProceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages (POPL’15), pages 125–137.ACM, 2015. doi:10.1145/2676726.2676981.

[9] D. Calvanese, G. Giacomo, M. Lenzerini, and M. Y. Vardi. Anautomata-theoretic approach to regular XPath. In Proceedings of the12th International Symposium on Database Programming Languages(DBPL’09), volume 5708 of LNCS, pages 18–35. Springer, 2009.doi:10.1007/978-3-642-03793-1 2.

[10] D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From liststo streams to nothing at all. In Proceedings of the 12th ACM SIGPLANInternational Conference on Functional Programming (ICFP’07),pages 315–326. ACM, 2007. doi:10.1145/1291151.1291199.

[11] L. D’antoni and M. Veanes. Extended symbolic finite automata andtransducers. Formal Methods in System Design, 47(1):93–119, Aug.2015. doi:10.1007/s10703-015-0233-4.

[12] L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceed-ings of the 14th International Conference on Tools and Algorithms forthe Construction and Analysis of Systems (TACAS’08), volume 4963 ofLNCS, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3 24.

[13] J. Dean and S. Ghemawat. MapReduce: Simplified data process-ing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.doi:10.1145/1327452.1327492.

[14] D. Debarbieux, O. Gauwin, J. Niehren, T. Sebastian, and M. Zergaoui.Early nested word automata for XPath query answering on XMLstreams. Theoretical Computer Science, 578:100–125, May 2015.doi:10.1016/j.tcs.2015.01.017.

[15] P. Fradet and S. H. T. Ha. Network fusion. In Proceedings ofProgramming Languages and Systems: Second Asian Symposium(APLAS’04), volume 3302 of LNCS, pages 21–40. Springer, 2004.doi:10.1007/978-3-540-30477-7 3.

[16] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Lan-guages, and Computation. Addison-Wesley, 1979. ISBN 0321455363.

[17] S. P. Jones and P. Wadler. Comprehensive comprehensions. InProceedings of the ACM SIGPLAN Workshop on Haskell (Haskell’07),pages 61–72. ACM, 2007. doi:10.1145/1291201.1291209.

[18] S. Lindley, P. Wadler, and J. Yallop. Idioms are oblivious, arrows aremeticulous, monads are promiscuous. In Proceedings of the SecondWorkshop on Mathematically Structured Functional Programming(MSFP’08), volume 229 of ENTCS, pages 97–117. Elsevier, 2011.doi:10.1016/j.entcs.2011.02.018.

[19] G. Mainland, R. Leshchinskiy, and S. Peyton Jones. Exploiting vectorinstructions with generalized stream fusion. In Proceedings of the 18thACM SIGPLAN International Conference on Functional Programming(ICFP’13), pages 37–48. ACM, 2013. doi:10.1145/2500365.2500601.

[20] A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power ofextended top-down tree transducers. SIAM J. Comput., 39(2):410–430,June 2009. doi:10.1137/070699160.

[21] C. Mcbride and R. Paterson. Applicative programming with ef-fects. Journal of Functional Programming, 18(1):1–13, Jan. 2008.doi:10.1017/S0956796807006326.

[22] E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object,relations and XML in the .NET framework. In Proceedings of the 2006ACM SIGMOD International Conference on Management of Data (SIG-MOD’06), pages 706–706. ACM, 2006. doi:10.1145/1142473.1142552.

[23] T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers.In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems (PODS’00), pages 11–22. ACM, 2000. doi:10.1145/335168.335171.

[24] B. Mozafari, K. Zeng, L. D’antoni, and C. Zaniolo. High-performancecomplex event processing over hierarchical data. ACM Trans. DatabaseSyst., 38(4):21:1–21:39, Dec. 2013. doi:10.1145/2536779.

[25] D. G. Murray, M. Isard, and Y. Yu. Steno: Automatic opti-mization of declarative queries. In Proceedings of the 32ndACM SIGPLAN Conference on Programming Language Designand Implementation (PLDI’11), pages 121–131. ACM, 2011.doi:10.1145/1993498.1993513.

[26] M. Poess, T. Rabl, H.-A. Jacobsen, and B. Caufield. TPC-DI: The first industry benchmark for data integration. Pro-ceedings of the VLDB Endowment, 7(13):1367–1378, Aug. 2014.doi:10.14778/2733004.2733009.

[27] T. A. Proebsting and S. A. Watterson. Filter fusion. In Proceedingsof the 23rd ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL’96), pages 119–130. ACM, 1996.doi:10.1145/237721.237760.

[28] Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regularexpression matching into transducers. Journal of Applied Logic, 10(1):32–51, Mar. 2012. doi:10.1016/j.jal.2011.11.003.

[29] O. Shivers and M. Might. Continuations and transducer composition. InProceedings of the 27th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI’06), pages 295–307.ACM, 2006. doi:10.1145/1133981.1134016.

[30] J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. StreamFlex: High-throughput stream programming in Java. In Proceedings of the 22ndAnnual ACM SIGPLAN Conference on Object-Oriented ProgrammingSystems and Applications (OOPSLA’07), pages 211–228. ACM, 2007.doi:10.1145/1297027.1297043.

[31] W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A languagefor streaming applications. In Proceedings of the 11th InternationalConference on Compiler Construction (CC’02), volume 2304 of LNCS,pages 179–196. Springer, 2002. doi:10.1007/3-540-45937-5 14.

[32] M. Veanes, P. d. Halleux, and N. Tillmann. Rex: Symbolic reg-ular expression explorer. In Proceedings of the 2010 Third In-ternational Conference on Software Testing, Verification and Vali-dation (ICST’10), pages 498–507. IEEE Computer Society, 2010.doi:10.1109/ICST.2010.15.

[33] M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner.Symbolic finite state transducers: Algorithms and applications. InProceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages (POPL’12), pages 137–150.ACM, 2012. doi:10.1145/2103656.2103674.

[34] M. Veanes, T. Mytkowicz, D. Molnar, and B. Livshits. Data-parallel string-manipulating programs. In Proceedings of the 42ndAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Pro-gramming Languages (POPL’15), pages 139–152. ACM, 2015.doi:10.1145/2676726.2677014.

13 2016/10/3

https://github.com/snoyberg/conduithttp://hadoop.apache.org/http://highlandjs.org/https://github.com/dotnet/roslynhttp://spark.apache.org/streaming/http://dx.doi.org/10.1145/1086297.1086315http://dx.doi.org/10.1007/s00778-014-0357-yhttp://dx.doi.org/10.1007/s00778-014-0357-yhttp://dx.doi.org/10.1145/2676726.2676981http://dx.doi.org/10.1007/978-3-642-03793-1_2http://dx.doi.org/10.1145/1291151.1291199http://dx.doi.org/10.1007/s10703-015-0233-4http://dx.doi.org/10.1007/978-3-540-78800-3_24http://dx.doi.org/10.1007/978-3-540-78800-3_24http://dx.doi.org/10.1145/1327452.1327492http://dx.doi.org/10.1016/j.tcs.2015.01.017http://dx.doi.org/10.1007/978-3-540-30477-7_3http://dx.doi.org/10.1145/1291201.1291209http://dx.doi.org/10.1016/j.entcs.2011.02.018http://dx.doi.org/10.1145/2500365.2500601http://dx.doi.org/10.1137/070699160http://dx.doi.org/10.1017/S0956796807006326http://dx.doi.org/10.1145/1142473.1142552http://dx.doi.org/10.1145/335168.335171http://dx.doi.org/10.1145/2536779http://dx.doi.org/10.1145/1993498.1993513http://dx.doi.org/10.14778/2733004.2733009http://dx.doi.org/10.1145/237721.237760http://dx.doi.org/10.1016/j.jal.2011.11.003http://dx.doi.org/10.1145/1133981.1134016http://dx.doi.org/10.1145/1297027.1297043http://dx.doi.org/10.1007/3-540-45937-5_14http://dx.doi.org/10.1109/ICST.2010.15http://dx.doi.org/10.1145/2103656.2103674http://dx.doi.org/10.1145/2676726.2677014

[35] P. Wadler. Deforestation: Transforming programs to eliminatetrees. Theoretical Computer Science, 73(2):231–248, Jan. 1988.doi:10.1016/0304-3975(90)90147-A.

[36] P. Wadler. Comprehending monads. In Proceedings of the 1990 ACMConference on LISP and Functional Programming (LFP’90), pages61–78. ACM, 1990. doi:10.1145/91556.91592.

[37] P. Wadler. Monads for functional programming. In Advanced Func-tional Programming: First International Spring School on AdvancedFunctional Programming Techniques, Tutorial Text, volume 925 ofLNCS, pages 24–52. Springer, 1995. doi:10.1007/3-540-59451-5 2.

[38] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica.Spark: Cluster computing with working sets. In Proceedings of the 2ndUSENIX Conference on Hot Topics in Cloud Computing (HotCloud’10),pages 10–10. USENIX Association, 2010.

14 2016/10/3

http://dx.doi.org/10.1016/0304-3975(90)90147-Ahttp://dx.doi.org/10.1145/91556.91592http://dx.doi.org/10.1007/3-540-59451-5_2

IntroductionSymbolic TransducersFusion of STsMain ideaIncremental fusionImplementation remarks

Reachability Based Branch EliminationSpecifying Effectful ComprehensionsEffectful Comprehensions as C#Effectful Regex ComprehensionsEffectful XPath Comprehensions

EvaluationSymbolic Transducers and MonadsRelated WorkConclusion

Documents

Fusing Effectful Comprehensions - microsoft.com · Fusing Effectful Comprehensions Olli Saarikivi Aalto University olli.saarikivi@aalto. Margus Veanes Todd Mytkowicz Madan Musuvathi