15
IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021 801 PAPER On CSS Unsatisfiability Problem in the Presense of DTDs ∗∗ Nobutaka SUZUKI a) , Member, Takuya OKADA †† , and Yeondae KWON †††∗ , Nonmembers SUMMARY Cascading Style Sheets (CSS) is a popular language for describing the styles of XML documents as well as HTML documents. To resolve conflicts among CSS rules, CSS has a mechanism called specificity. For a DTD D and a CSS code R, due to specificity R may contain “unsatis- fiable” rules under D, e.g., rules that are not applied to any element of any document valid for D. In this paper, we consider the problem of detect- ing unsatisfiable CSS rules under DTDs. We focus on CSS fragments in which descendant, child, adjacent sibling, and general sibling combinators are allowed. We show that the problem is coNP-hard in most cases, even if only one of the four combinators is allowed and under very restricted DTDs. We also show that the problem is in coNP or PSPACE depending on restrictions on DTDs and CSS. Finally, we present four conditions under which the problem can be solved in polynomial time. key words: CSS, DTD, satisfiability 1. Introduction Cascading Style Sheets (CSS) is a popular language for de- scribing the styles of (X)HTML documents. CSS is also widely used as a stylesheet language for XML documents, e.g., DocBook [21] and MathML [2]. A CSS code is de- scribed by listing CSS rules that assign property values to el- ements. For example, the following CSS rule assigns prop- erty value serif to the font-family property of li ele- ments that is a descendant of a ul element. ul li {font-family:serif} It is often the case that a CSS code contains conflicting CSS rules, i.e., CSS rules with the same property match the same HTML/XML element. To resolve such conflicts, the W3C CSS specification ∗∗∗ provides a mechanism called specificity. In short, a CSS rule with a more specific selector has higher priority; if multiple CSS rules match the same XML/HTML element, then only the rule with the largest specificity is applied to the element. In this paper, we consider CSS unsatisfiability prob- lem, which is to detect unsatisfiable CSS rules under DTDs. Manuscript received January 4, 2021. Manuscript publicized March 10, 2021. The author is with University of Tsukuba, Tsukuba-shi, 305– 8550 Japan. †† The author is with Nihon Unisys, Ltd, Tokyo, 135–8560 Japan. ††† The author is with the University of Tokyo, Tokyo, 105–0003 Japan. Presently, with Research Center for Agricultural Information Technology, National Agriculture and Food Research Organiza- tion. ∗∗ The previous version of this paper is [20]. a) E-mail: [email protected] DOI: 10.1587/transinf.2021EDP7002 We say that a CSS rule r is unsatisfiable under a DTD D if r is not applied to any element of any document valid for D, even if the selector of r is “valid” for D. For ex- ample, consider the DTD and the CSS rules in Fig. 1. The second CSS rule “c {font-family:sans-serif}” is un- satisfiable. To see this, consider the first and second CSS rules. The former matches a c element that has an a ele- ment as an ancestor, while the latter matches any c element. Since the first CSS rule has more labels (“a” and “c”) than the second rule (“c”), the first CSS rule has a higher speci- ficity. Moreover, under the DTD every c element has an a element as an ancestor. Hence the first CSS rule must be ap- plied to any c element, meaning that the second CSS rule is unsatisfiable. We have another unsatisfiable CSS rule; among the third, fourth, and fifth CSS rules, which have the same specificity and match an f element, the third CSS rule “a f {font-family:serif}” is unsatisfiable due to the following reasons. In the fourth CSS rule ‘>’ denotes child selector, mean- ing that the rule matches an f element having a b el- ement as its parent. Thus such an f element can be matched by the fourth CSS rule as well as the third CSS rule. But only the fourth CSS rule is applied to such an f element, since the latest rule takes the prior- ity according to the CSS specification. In the fifth CSS rule ‘+’ denotes adjacent sibling selec- tor, meaning that the rule matches an f element having a c element as its immediate left sibling. Because of a similar reason above, the fifth CSS rule, not the third CSS rule, is applied to such an f element. Under the DTD, every f element has a b element as its parent or a c element as its immediate left sibling. Fig. 1 Example of unsatisfiable CSS rules under DTD ∗∗∗ https://www.w3.org/TR/CSS22/ Copyright c 2021 The Institute of Electronics, Information and Communication Engineers

On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021801

PAPER

On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

Nobutaka SUZUKI†a), Member, Takuya OKADA††, and Yeondae KWON†††∗, Nonmembers

SUMMARY Cascading Style Sheets (CSS) is a popular language fordescribing the styles of XML documents as well as HTML documents. Toresolve conflicts among CSS rules, CSS has a mechanism called specificity.For a DTD D and a CSS code R, due to specificity R may contain “unsatis-fiable” rules under D, e.g., rules that are not applied to any element of anydocument valid for D. In this paper, we consider the problem of detect-ing unsatisfiable CSS rules under DTDs. We focus on CSS fragments inwhich descendant, child, adjacent sibling, and general sibling combinatorsare allowed. We show that the problem is coNP-hard in most cases, evenif only one of the four combinators is allowed and under very restrictedDTDs. We also show that the problem is in coNP or PSPACE dependingon restrictions on DTDs and CSS. Finally, we present four conditions underwhich the problem can be solved in polynomial time.key words: CSS, DTD, satisfiability

1. Introduction

Cascading Style Sheets (CSS) is a popular language for de-scribing the styles of (X)HTML documents. CSS is alsowidely used as a stylesheet language for XML documents,e.g., DocBook [21] and MathML [2]. A CSS code is de-scribed by listing CSS rules that assign property values to el-ements. For example, the following CSS rule assigns prop-erty value serif to the font-family property of li ele-ments that is a descendant of a ul element.

ul li {font-family:serif}

It is often the case that a CSS code contains conflictingCSS rules, i.e., CSS rules with the same property matchthe same HTML/XML element. To resolve such conflicts,the W3C CSS specification∗∗∗ provides a mechanism calledspecificity. In short, a CSS rule with a more specific selectorhas higher priority; if multiple CSS rules match the sameXML/HTML element, then only the rule with the largestspecificity is applied to the element.

In this paper, we consider CSS unsatisfiability prob-lem, which is to detect unsatisfiable CSS rules under DTDs.

Manuscript received January 4, 2021.Manuscript publicized March 10, 2021.†The author is with University of Tsukuba, Tsukuba-shi, 305–

8550 Japan.††The author is with Nihon Unisys, Ltd, Tokyo, 135–8560

Japan.†††The author is with the University of Tokyo, Tokyo, 105–0003

Japan.∗Presently, with Research Center for Agricultural Information

Technology, National Agriculture and Food Research Organiza-tion.

∗∗The previous version of this paper is [20].a) E-mail: [email protected]

DOI: 10.1587/transinf.2021EDP7002

We say that a CSS rule r is unsatisfiable under a DTD Dif r is not applied to any element of any document validfor D, even if the selector of r is “valid” for D. For ex-ample, consider the DTD and the CSS rules in Fig. 1. Thesecond CSS rule “c {font-family:sans-serif}” is un-satisfiable. To see this, consider the first and second CSSrules. The former matches a c element that has an a ele-ment as an ancestor, while the latter matches any c element.Since the first CSS rule has more labels (“a” and “c”) thanthe second rule (“c”), the first CSS rule has a higher speci-ficity. Moreover, under the DTD every c element has an aelement as an ancestor. Hence the first CSS rule must be ap-plied to any c element, meaning that the second CSS ruleis unsatisfiable. We have another unsatisfiable CSS rule;among the third, fourth, and fifth CSS rules, which havethe same specificity and match an f element, the third CSSrule “a f {font-family:serif}” is unsatisfiable due tothe following reasons.

• In the fourth CSS rule ‘>’ denotes child selector, mean-ing that the rule matches an f element having a b el-ement as its parent. Thus such an f element can bematched by the fourth CSS rule as well as the thirdCSS rule. But only the fourth CSS rule is applied tosuch an f element, since the latest rule takes the prior-ity according to the CSS specification.

• In the fifth CSS rule ‘+’ denotes adjacent sibling selec-tor, meaning that the rule matches an f element havinga c element as its immediate left sibling. Because of asimilar reason above, the fifth CSS rule, not the thirdCSS rule, is applied to such an f element.

• Under the DTD, every f element has a b element asits parent or a c element as its immediate left sibling.

Fig. 1 Example of unsatisfiable CSS rules under DTD

∗∗∗https://www.w3.org/TR/CSS22/

Copyright c© 2021 The Institute of Electronics, Information and Communication Engineers

Page 2: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

802IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

Table 1 Summary of our results

CSS DTDcombinator disjunction-free disjunction-allowed> ˜ + rest-

rictiondisjunction-free,closure-free, duplicate-free,and non-recursive DTD

disjunction-free and closure-free DTD

disjunction-free andnon-recursive DTD

closure-free, duplicate-free,and nonrecursive DTD

any DTD

+ - coNP-complete(Thm. 5, 9)

coNP-hard and in PSPACE(Thm. 5, 11)

coNP-complete(Thm. 5, 9)

coNP-complete(Thm. 1, 9)

coNP-hard and in PSPACE(Thm. 1, 11)

+ - coNP-complete(Thm. 5, 9)

coNP-complete(Thm. 5, 10)

coNP-complete(Thm. 5, 10)

coNP-complete(Thm. 2, 9)

coNP-complete(Thm. 2, 10)

+ - PTIME(Thm. 6)

PTIME(Thm. 6)

scoNP-hard and in PSPACE(Thm. 7, 11)

coNP-hard and in PSPACE(Thm. 3, 11)

coNP-hard and in PSPACE(Thm. 3, 11)

+ - PTIME(Thm. 6)

PTIME(Thm. 6)

coNP-hard and in PSPACE(Thm. 8, 11)

coNP-hard and in PSPACE(Thm. 4, 11)

coNP-hard and in PSPACE(Thm. 4, 11)

+ + - coNP-complete(Thm. 5, 9)

coNP-hard and in PSPACE(Thm. 5, 11)

coNP-complete(Thm. 5, 9)

coNP-complete(Thm. 5, 9)

coNP-hard and in PSPACE(Thm. 5, 11)

+ + + + - coNP-hard and in PSPACE(Thm. 5, 11)

coNP-hard and in PSPACE(Thm. 5, 11)

coNP-hard and in PSPACE(Thm. 5, 7, 8, 11)

coNP-hard and in PSPACE(Thm. 1–5, 11)

coNP-hard and in PSPACE(Thm. 1–4, 11)

+ R1 PTIME (Thm. 12)+ R2 PTIME (Thm. 13)

+ + + + R3 PTIME (Thm. 14)+ + + + R4 PTIME (Thm. 16)

As stated above, the fourth CSS rule is applied to theformer and the fifth CSS rule is applied to the latter,meaning that there is no f element to which the thirdCSS rule is applied.

Unsatisfiable CSS rules are clearly redundant, and thus re-duce the readability of CSS codes, make the maintenance ofCSS codes more difficult, and may slow the rendering speedof HTML/XML documents in browsers [14]. Therefore, un-satisfiable CSS rules should be detected and removed fromCSS codes.

Although this problem seems to be easily solvable dueto the simplicity of CSS rules, we show that the problemis intractable in most cases. We focus on simple CSS frag-ments in which selectors using whitespace ‘ ’ (descendant),‘>’ (child), ‘˜’ (general sibling), and ‘+’ (adjacent sibling)as combinators are allowed (attribute selectors are omittedsince attribute values are hard to predict from DTDs). Theobtained results are summarized in Table 1. First, the prob-lem is coNP-hard even if only one of the four combinatorsis allowed. Furthermore, the problem is still coNP-hard un-der very restricted DTDs, e.g., closure-free, duplicate-free,and non-recursive DTDs. Here, a DTD is duplicate-free [16]if each element name appears at most once in each contentmodel. As an exceptional case, the problem is in PTIME ifDTDs are restricted to be disjunction-free and closure-free,and ‘+’ and ‘˜’ are allowed as CSS combinators. Theseresults imply that restricting the combinators of CSS rulesor content models of DTDs does not effectively cope withthe intractability of the problem. In this paper, we exploretractable cases of the problem from a different perspectiveand present four tractable cases (R1 to R4 in Table 1). Inshort, conditions R1 and R2 means that, although the prob-lem is intractable even if only ‘>’ or ‘+’ combinator is al-lowed, it becomes tractable by disabling universal selectors.R3 is a condition that restricts the number of conflicting CSSrules and imposes a slight restriction on their simple selec-tors. Finally, R4 is a condition that restricts the length ofselectors, rather than restricting the number of conflictingrules.

Related Work

Several methods for static analysis of CSS rules have beenproposed. Geneves et al. proposed a logic-based system foranalyzing CSS codes [7]. This system checks CSS rulesby converting them into logic representations. Their en-coding from CSS rules to logical formulae is linear, andtheir solver runs in EXPTIME. However, any other resulton the complexity is not presented, and the unsatisfiabilityof CSS rules under DTDs is not discussed. Bosch et al. pro-posed a method for refactoring CSS rules [3], and Mazina-nian et al. also proposed methods for detecting duplicatedCSS rules [13], [14]. These methods check for redundantCSS rules without DTDs, and complexity is not consid-ered either. To the best of our knowledge, there have beenno studies on the complexity of detecting unsatisfiable CSSrules in the presence of DTDs.

In addition to the above static analysis methods, a num-ber of methods for analyzing CSS with HTML instanceshave been proposed. FireBug [5] and Chrome DeveloperTools [9] are popular debugging tools that can detect CSSproperties that are not applied to any elements of an HTMLdocument. Hague et al. proposed a method for detectingredundant CSS rules in HTML5 applications [10]. Mes-bah et al. proposed a method for detecting CSS rules thatare unused in given HTML documents [15]. Practically,instance-level checking is different from our problem in thefollowing sense. In websites and collections of XML doc-uments, a CSS code is often shared by multiple documents.Furthermore, HTML/XML documents may be updated byadding/deleting/relabeling elements. Therefore, even if aninstance-level checking detects a CSS rule r that is not ap-plied to any of the elements in a given HTML/XML doc-ument, the redundancy of r is still not obvious. In suchcases, if we find out that r is unsatisfiable, then we cansafely remove r since r can never be used in any other doc-uments sharing the CSS code or any updated versions of thedocument.

Another related problem is the XPath satisfiability

Page 3: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS803

problem. This problem is to decide, for an XPath expressionp and a DTD D, whether there exists an XML document dvalid for D such that the answer of p on d is not empty.Benedikt et al. investigated the complexity of the problem,and showed that the problem is intractable in a number ofcases [1]. To cope with this intractability, several restrictionson DTDs and XPath expressions have been proposed. Mon-tazerian et al. showed that satisfiability of XPath expressionswith child axes and qualifiers is tractable under duplicate-free DTDs [16]. Several other tractable XPath classes un-der duplicate-free DTDs were presented in our previouswork [19]. Ishihara et al. proposed MRW-DTD [12] thatis a class of DTDs broader than duplicate-free DTD, inwhich satisfiability of several practical XPath fragments istractable. The major difference between the XPath satisfi-ability problem and our problem is as follows. In the for-mer problem, complexity is determined by a single XPathexpression, say p, in that XPath expressions have no speci-ficity and thus whether p is satisfiable is not affected by anyother XPath expressions. On the other hand, in our problemcomplexity is not determined solely by a “target” CSS rule.Instead, it is determined by the interrelationship between atarget CSS rule r and the CSS rules conflicting with r.

Finally, a few studies on RDF query satisfiabilityhave been made. Hartig proposed a model for LinkedData query processing and considered query satisfiabilityon the model [11]. Zhang et al. considered satisfiabilityof SPARQL pattern query and presented decidable classesof the satisfiability problem [22]. Both studies consideredquery satisfiability without any schema.

The rest of this paper is organized as follows. Sec-tion 2 provides some definitions related to CSS and DTDs.We also formalize the unsatisfiability problem. Section 3presents the intractable cases of the problem. Section 4describes four conditions under which the problem can besolved in polynomial time. Section 5 summarizes our con-clusions.

2. Preliminaries

In this section, we firstly present some definitions related totree, CSS, and DTDs, and then formalize the target problem.

2.1 Tree

An XML document is modeled as a rooted labeled tree. For-mally, a rooted labeled tree (tree for short) is defined as fol-lows.

• A single node v is a tree, where v is the root.• Let v be a node and t1, t2, . . . , tn be trees with rootsv1, v2, . . . , vn, respectively. Then adding an edge fromv to vi for each 1 ≤ i ≤ n yields a tree rooted at v.

Let Σ be a set of labels. Each node in a tree has a label inΣ and two nodes might have a same label. For a node v ina tree t, l(v) ∈ Σ denotes the label of v. For example, Fig. 2depicts an XML document and its tree representation, where

Fig. 2 An XML document and its tree representation

l(v1) = book, l(v2) = title, and so on. As shown in the figure,each element of the XML instance corresponds to a node inthe tree.

2.2 CSS

A simple selector is either a label in Σ or a universal selectordenoted ‘∗.’ Let t be a tree. A simple selector s matches anode v in t if s = ‘∗’ or s is a label such that s = l(v).

A selector is a chain of one or more simple selectorsconnected by combinators, where a combinator is either awhitespace ‘ ’ (descendant), ‘>’ (child), ‘˜’ (general sib-ling), or ‘+’ (adjacent sibling). The length of a selector sel,denoted len(sel), is the number of simple selectors in sel.For example, if sel = a ∗ >c, then len(sel) = 3.

Let s, s′ be simple selectors and (v, v′) be a pair of nodesin t.

• A descendant selector s s′ matches (v, v′) if s matchesv, s′ matches v′, and v′ is a descendant of v.

• A child selector s > s′ matches (v, v′) if s matches v, s′matches v′, and v′ is a child of v.

• A general sibling selector s ˜ s′ matches (v, v′) if smatches v, s′ matches v′, and v′ is a (possibly non-immediate) right sibling of v.

• An adjacent sibling selector s + s′ matches (v, v′) if smatches v, s′ matches v′, and v′ is the immediate rightsibling of v.

Let sel = s1 c1 s2 c2 s3 · · · sn−1 cn−1 sn be a selector, where si

is a simple selector and ci is a combinator. Then sel matchesa pair (v, v′) of nodes in t if

• n = 1, v = v′, and simple selector s1 matches v, or• n > 1 and there exists a sequence v = v1, v2, . . . , vn = v′

of distinct nodes such that si−1 ci−1 si matches (vi−1, vi)for every 2 ≤ i ≤ n.

A CSS rule is denoted sel p : v, where sel is a selector,p is a property, and v is a value.† For a CSS rule r, sel(r)denotes the selector of r and prop(r) denotes the propertyof r. For a selector sel, first(sel) denotes the first simpleselector of sel and last(sel) denotes the last simple selectorof sel. For example, if r = a b+c p :v, then sel(r) = a b+c,prop(r) = p, first(sel(r)) = a, and last(sel(r)) = c. In thefollowing, we assume that for any CSS rule r, the last simple

†A real CSS rule has a set of “property:value” declarationsrather than a single declaration. However, such a CSS rule canbe treated as CSS rules with a single declaration, e.g., sel {p1 :v1, p2 :v2} can be treated as two CSS rules sel p1 :v1 and sel p2 :v2.

Page 4: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

804IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

selector last(sel(r)) is a label.By spec(sel), we mean the number of labels occurring

in selector sel, e.g., if sel = a ∗+c, then spec(sel) = 2. Sinceattributes are omitted in this paper, spec(sel) represents thespecificity of sel. A CSS code is defined as a list R of CSSrules. By indexR(r), we mean the index of a CSS rule r in R,e.g., if R = [r, r′, r′′], then indexR(r) = 1 and indexR(r′′) =3. We write r ∈ R if r occurs in R. For a tree t and a node vin t, a CSS rule r ∈ R is applied to v if

• for some node v′ in t, sel(r) matches (v′, v), and• for any CSS rule r′ ∈ R such that sel(r′) matches

(v′′, v) for some node v′′ and that prop(r) = prop(r′),(a) spec(sel(r)) > spec(sel(r′)) or (b) spec(sel(r)) =spec(sel(r′)) and indexR(r) > indexR(r′).

Let CSS be the set of CSS rules. We denote a fragmentof CSS by listing the combinators supported by the frag-ment. For example, CSS{ ,>} denotes the set of CSS rulesusing only ‘ ’ and ‘>’ as combinators.

2.3 Unsatisfiability of CSS Rule under DTD

A DTD is a pair D = (d, s), where d is a mapping from Σ tothe set of regular expressions over Σ and s ∈ Σ is the startlabel of D. For a label a ∈ Σ, d(a) is the content model of a.For example, consider the following DTD.

<!DOCTYPE book[

<!ELEMENT book (title, author+)>

<!ELEMENT author (name)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT name (#PCDATA)>

]>

Then the above DTD can be denoted by a pair (d, book),where d(book) = title author+, d(author) = name, andd(title) = d(name) = ε. A tree is valid for D if (i)the root of t is labeled by s and (ii) for each node v in tl(v1)l(v2) · · · l(vm) ∈ L(d(l(v))), where v1, v2, . . . , vm are thechildren of v and L(d(l(v))) is the language of d(l(v)).

Let R be a CSS code and r ∈ R. Then, r is unsatisfiableunder D w.r.t. R if for any tree t valid for D and for any nodev of t, r is not applied to v. Let C be a set of combinators. Fora fragment CSSC , the CSS unsatisfiability problem, denotedUNSAT(CSSC), is stated as follows.

Input: A DTD D = (d, a), a CSS code R of CSS rules inCSSC , and a CSS rule r ∈ R

Problem: Determine if r is unsatisfiable under D w.r.t. R

3. Intractability

In this section, we discuss intractable cases of the CSS un-satisfiability problem. We first consider the lower bound ofcomplexity for the problem. We then consider the upperbound. In the following discussion, we consider the fol-lowing restricted DTDs as well as general (non-restricted)DTDs.

• For a regular expression e, we say that e is duplicatefree if each label in Σ occurs at most once in e. A DTDD is duplicate free if each content model of D is dupli-cate free.

• For a regular expression e, we say that e is closure freeif e contains no Kleene closure (‘∗’ and ‘+’). Then, aDTD D is closure free if each content model of D isclosure free.

• For a DTD D = (d, s), b is reachable from a if (i) boccurs in d(a) or (ii) for some label c, c is reachablefrom a and b occurs in d(c). We say that D is non-recursive if for any label a, a is not reachable from a inD.

• A DTD D is disjunction-free if any content model of Dcontains no disjunction ‘|’.

3.1 Lower Bound

We first consider the lower bound of complexity for theproblem. As shown below, the problem is intractable evenfor very restricted DTDs and CSS rules. We first considerCSS rules using child/descendant combinators, then con-sider CSS rules using sibling combinators.

Theorem 1: UNSAT(CSS{ }) is coNP-hard under closure-free, duplicate-free, and non-recursive DTDs.

Proof. We reduce the 3DNF-tautology problem, which iscoNP-complete [6], to the CSS unsatisfiability problem.The 3DNF-tautology problem is defined as follows.

Input: A 3DNF-formula φProblem: Determine if φ is a tautology, i.e., φ is true for

every truth assignment to φ

Let φ = (l11∧ l12∧ l13)∨ (l21∧ l22∧ l23)∨· · ·∨ (lm1∧ lm2∧ lm3)be a 3DNF formula, where li j is a literal. Let {x1, x2, . . . , xn}be the set of variables of φ. Without loss of generality, weassume that literals in a clause are sorted by the indexes ofvariables, e.g., (x1 ∧ ¬x3 ∧ x4) satisfies the assumption but(x1 ∧ x4 ∧ ¬x3) does not.

From the 3DNF-formula φ, we define a DTD D, a CSScode R in which each rule uses only ‘ ’ as a combinator, anda CSS rule r ∈ R. First, the DTD D = (d, s) is defined asfollows:

d(s) = X1T |X1F ,

d(X1T ) = d(X1F) = X2T |X2F ,

d(X2T ) = d(X2F) = X3T |X3F ,

...

d(Xn−1T ) = d(Xn−1F) = XnT |XnF ,

d(XnT ) = d(XnF) = b,

d(b) = ε.

Here, XiT is a label that represents “xi is true” and XiF is alabel that represents “xi is false”. Therefore, a tree valid forD represents a truth assignment to φ.

Page 5: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS805

Next, we define the CSS code R. Let R =

[r1, r2, . . . , rm, rB], where

ri = Li1 Li2 Li3 b p :vi, (1 ≤ i ≤ m)

rB = b p :v,

and Li j is a label defined as follows (1 ≤ i ≤ m, 1 ≤ j ≤ 3):

Li j=

{XkT if li j = xk,XkF if li j = ¬xk.

(1)

We show that φ is a tautology if and only if rB is unsat-isfiable under D w.r.t. R.

(⇒) Suppose that φ is a tautology. Then, for any truthassignment to φ, there exists a term (li1 ∧ li2 ∧ li3) that be-comes true. Thus, for any tree t valid for D, there exists aCSS rule ri = Li1 Li2 Li3 b p : vi that is applied to the leafnode b in t. Furthermore, spec(ri) > spec(rB). This impliesthat rB is never applied to the leaf node b, meaning that rB isunsatisfiable.

(⇐) Suppose that for any tree t valid for D, rB is notapplied to the leaf node b in t. This implies that for any tvalid for D, at least one of r1, r2, . . . , rm is applied to the leafnode in t. This implies that for any truth assignment to φ, φbecomes true. �

As shown below, the problem is also coNP-hard forCSS using only ‘>’ as a combinator.

Theorem 2: UNSAT(CSS{>}) is coNP-hard under closure-free, duplicate-free, and non-recursive DTDs.

Proof. Similar to Theorem 1, this theorem can be proventhrough a reduction from the 3DNF-tautology problem. Letφ = (l11 ∧ l12 ∧ l13)∨ (l21 ∧ l22 ∧ l23)∨ · · · ∨ (lm1 ∧ lm2 ∧ lm3)be a 3DNF formula. The DTD D = (d, s) is the same as theDTD in Theorem 1. Then, let R = [r1, r2, . . . , rm, rB] be aCSS code. Here, rB = b p :v and ri (1 ≤ i ≤ m) is defined sothat ‘ ’ is simulated by consecutive ‘∗’ selectors connectedby ‘>’ combinators. That is,

ri = Li1 >

dist(Li1, Li2) ∗’s con-nected by ‘>’︷��������︸︸��������︷∗ > · · · > ∗ >Li2 >

dist(Li2, Li3) ∗’s con-nected by ‘>’︷��������︸︸��������︷∗ > · · · > ∗ >Li3

> ∗ > · · · > ∗︸��������︷︷��������︸dist(Li3, b) ∗’s con-nected by ‘>’

>b p :vi,

where Li j is the same as Eq. (1). Moreover, for Li j ∈{XkT , XkF} and Li j+1 ∈ {Xk′T , Xk′F , b}, we define

dist(Li j, Li j+1)

=

{k′ − k − 1if Li j+1 ∈{Xk′T , Xk′F},n − k if Li j+1 = ‘b’,

(2)

where n is the number of variables in φ. For example, sup-pose that the ith term of φ is (x2 ∧ ¬x5 ∧ x7) and that n = 9.Then we have

ri=X2T >

(a)︷︸︸︷∗ > ∗ >X5F >

(b)︷︸︸︷∗ >X7T >

(c)︷︸︸︷∗ > ∗ > b p :vi,

where (a) dist(X2T , X5F) = 5−2−1 = 2, (b) dist(X5F , X7T ) =7 − 5 − 1 = 1, and (c) dist(X7T , b) = 9 − 7 = 2.

Similar to Theorem 1, we can show that φ is a tautologyif and only if rB is unsatisfiable under D w.r.t. R. �

We next consider CSS rules using sibling combinators.Firstly, the problem is coNP-hard for CSS rules using only‘˜’ as a combinator.

Theorem 3: UNSAT(CSS{˜}) is coNP-hard under closure-

free, duplicate-free, and non-recursive DTDs.

Proof. Similar to Theorem 1, this theorem can be proventhrough a reduction from the 3DNF-tautology problem. Letφ = (l11∧ l12∧ l13)∨(l21∧ l22∧ l23)∨· · ·∨(lm1∧ lm2∧ lm3) be a3DNF formula. From φ, we define a DTD D, a CSS code Rin which each rule uses only ‘˜’ as a combinator, and a CSSrule r ∈ R. First, the DTD D = (d, s) is defined as follows:

d(s) = (X1T |X1F)(X2T |X2F)

· · · (XnT |XnF)b,

d(XiT ) = d(XiF) = ε, (1 ≤ i ≤ n)

d(b) = ε.

Thus, for any tree t valid for D, the children of s represent atruth assignment to φ.

Second, let R = [r1, r2, . . . , rm, rB] be a CSS code,where

ri = Li1 ˜ Li2 ˜ Li3 ˜ b p :vi, (1 ≤ i ≤ m)

rB = b p :v,

and Li j is the same as Eq. (1).Similar to Theorem 1, we can show that φ is a tautology

if and only if rB is unsatisfiable under D w.r.t. R. �

The problem is also coNP-hard for CSS rules usingonly ‘+’ as a combinator.

Theorem 4: UNSAT(CSS{+}) is coNP-hard under closure-free, duplicate-free, and non-recursive DTDs.

Proof. This theorem can be proven in a manner similar toTheorem 3. Let φ = (l11 ∧ l12 ∧ l13)∨ (l21 ∧ l22 ∧ l23)∨ · · · ∨(lm1∧ lm2∧ lm3) be a 3DNF formula. The DTD D is the sameas that in Theorem 3. Let R = [r1, r2, . . . , rm, rB] be a CSScode, where rB = b p :v, and ri is a CSS rule of the followingform:

ri = Li1 +

dist(Li1, Li2) ∗’s con-nected by ‘+’︷��������︸︸��������︷∗ + · · · + ∗ + Li2 +

dist(Li2, Li3) ∗’s con-nected by ‘+’︷��������︸︸��������︷∗ + · · · + ∗ + Li3

+ ∗ + · · · + ∗︸��������︷︷��������︸dist(Li3, b) ∗’s con-nected by ‘+’

+ b p :vi,

and Li j is the same as Eq. (1) and dist() is defined as Eq. (2).Now, it is easy to show that φ is a tautology if and only

if rB is unsatisfiable under D w.r.t. R. �

Page 6: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

806IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

Fig. 3 Valid tree of D′ with n = 3

3.1.1 Lower Bound under Disjunction-Free DTDs

For the XPath satisfiability problem, some tractable casescan be obtained by restricting DTDs to be disjunction-free [1]. We consider the lower bound of complexity for theCSS unsatisfiability problem under disjunction-free DTDs.

We first consider CSS using ‘ ’ or ‘>’ as a combinator.We have the following theorem.

Theorem 5: UNSAT(CSS{ }) and UNSAT(CSS{>}) arecoNP-hard under disjunction-free, closure-free, duplicate-free, and non-recursive DTDs.

Proof. First consider CSS using ‘ ’. Let φ = (l11 ∧ l12 ∧l13) ∨ (l21 ∧ l22 ∧ l23) ∨ · · · ∨ (lm1 ∧ lm2 ∧ lm3) be a 3DNFformula. From φ, CSS code R = [r1, r2, . . . , rB] is definedsame as the proof of Theorem 1. As for DTD, let D = (d, s)be the DTD defined in the proof of Theorem 1. We useDTD D′ = (d′, s), which is obtained from D by replacingeach disjunction operator in d with a concatenation operator.Thus d′ is defined as follows.

d′(s) = X1T X1F ,

d′(X1T ) = d′(X1F) = X2T X2F ,

d′(X2T ) = d′(X2F) = X3T X3F ,

...

d′(Xn−1T ) = d′(Xn−1F) = XnT XnF ,

d′(XnT ) = d′(XnF) = b,

d′(b) = ε.

This reduction can be done in polynomial time. Since D′ hasno disjunction, any valid tree t of D′ has the same structure;the root is labeled by s, which has two children labeled byX1T and X1F , respectively, and so on. Figure 3 depicts avalid tree of D′ with n = 3. Therefore, each path in t fromthe root s to a leaf b corresponds to a valid tree of D, andvice versa. This implies that φ is a tautology if and only ifrB is unsatisfiable. The case for CSS using ‘>’ can be shownsimilar to above, by using D′ and the CSS code of the proofof Theorem 2. �

Second, we consider CSS using sibling combinators.Under disjunction-free and closure-free DTDs, each contentmodel represents a single string rather than a set of strings.This implies the following theorem:

Theorem 6: UNSAT(CSS{˜,+}) is in PTIME underdisjunction-free and closure-free DTDs.

Sketch of Proof. Let D = (d, s) be a disjunction-free andclosure-free DTD. Then for any a ∈ Σ, d(a) is a string. Thenfor any CSS rule r ∈ R, we can identify the substring(s)of d(a) matched by sel(r). Thus for a given CSS rule r,by comparing (a) the substring(s) matched by sel(r) and (b)those matched by the selectors of the rules conflicting withr, we can determine whether r is satisfiable. �

However, the above tractability does not hold withoutclosure-freeness.

Theorem 7: UNSAT(CSS{˜}) is coNP-hard under

disjunction-free and non-recursive DTDs.

Proof. Let φ = (l11 ∧ l12 ∧ l13)∨ (l21 ∧ l22 ∧ l23)∨ · · · ∨ (lm1 ∧lm2 ∧ lm3) be a 3DNF formula and let x1, x2, . . . , xn be thevariables in φ. We define the DTD D = (d, s) as follows:

d(s) = (X1F∗1Y1)(X2F∗2Y2)· · ·(XnF∗nYn)b,

d(Xi) = d(Fi) = d(Yi) = ε. (1 ≤ i ≤ n)

Here, XiF∗i Yi indicates the value of xi. If the children of sare matched by XiYi without Fi, then we consider xi to betrue. If the children of s contain one or more Fi, then weconsider xi to be false. Let R = [r1, r2, . . . , rm, rB] be a CSScode, where

ri = Li1 ˜ Li2 ˜ Li3 ˜ b p :vi, (1 ≤ i ≤ m)

rB = b p :v.

Here, Li j is defined as follows (1 ≤ i ≤ m, 1 ≤ j ≤ 3):

Li j =

{XkYk if li j = xk,XkFk ˜ Yk if li j = ¬xk.

Now it is easy to show that for any truth assignment, φ istrue if and only if rB is unsatisfiable under D w.r.t. R. �

Theorem 8: UNSAT(CSS{+}) is coNP-hard underdisjunction-free and non-recursive DTDs.

Proof. Let φ = (l11 ∧ l12 ∧ l13)∨ (l21 ∧ l22 ∧ l23)∨ · · · ∨ (lm1 ∧lm2∧ lm3) be a 3DNF formula. We define the DTD D = (d, s)as follows:

d(s) = (U1T ∗1 F∗1X1)(U2T ∗2 F∗2X2) · · · (UnT ∗n F∗nXn)b

d(Ui) = d(Ti) = d(Fi) = d(Xi) = ε. (1 ≤ i ≤ n)

Here, UiT ∗i F∗i Xi in d(s) indicates the value of xi. If the chil-dren of s contain one Ti but no Fi, then we consider xi tobe true. If the children of s contain one Fi but no Ti, thenwe consider xi to be false (all other cases are irrelevant). LetR = [r1, r2, . . . , rm, rB] be a CSS code, where

Page 7: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS807

Fig. 4 An example of ext-path

ri = Li1 + Li2 + · · · + Lin + b p :vi, (1 ≤ i ≤ m)

rB = L′1 + L′2 + · · · + L′n + b p :v.

Let Ci denote the ith clause of φ. Then Li j and L′j are definedas follows (1 ≤ j ≤ n):

Li j =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

U j + T j + Xj if x j appears in Ci

without negation,U j + F j + Xj if x j appears in Ci

with negation,U j + ∗ + Xj otherwise,

L′j = U j + ∗ + Xj.

Now it is easy to show that φ is true for any truth as-signment if and only if rB is unsatisfiable under D, i.e., forany valid instance of D such that sel(rB) matches the in-stance, there exists a rule ri ∈ R such that sel(ri) matches theinstance. �

3.2 Upper Bound

We next consider the upper bound of complexity for theproblem. We first consider the case of restricted DTDs, thenconsider the case of general DTDs.

Let t be a tree. We use an “extended” path that al-lows horizontal traversal in addition to parent-child traver-sal. Formally, an extended path (ext-path for short) of t isa sequence v1, v2, . . . , vn of nodes such that v1 is the root oft and that for every 1 ≤ i ≤ n − 1, vi+1 is either a childor the immediate right sibling of vi. In particular, we saythat an ext-path path = v1, v2, . . . , vn is straight if vi+1 is achild of vi for every 1 ≤ i ≤ n − 1. Then, a selector selmatches an ext-path v1, v2, . . . , vn if for some vi such thatv j+1 is a child of v j for every 1 ≤ j ≤ i − 1, sel matches(vi, vn). For example, consider the tree t depicted in Fig. 4.Then p = va, vc, vd, ve, v f , vg is an ext-path of t. For a selectorsel = d ˜ f > g, sel matches p since vc is a child of va, vd isa child of vc, and sel matches (vd, vg).

3.2.1 The Case of Restricted DTDs

Theorem 9: UNSAT(CSS{ ,>}) is in coNP under non-recursive DTDs.

Sketch of Proof. We show that the following “satisfiability”

problem is in NP, which implies that UNSAT(CSS{ ,>}) is incoNP.

Input: a non-recursive DTD D = (d, s), a CSS code R, anda CSS rule r ∈ R

Problem: determine if there exists a straight ext-pathpath = v1, v2, . . . , vn satisfying the following condi-tions:

(i) path is “valid” for D, that is, there exists a tree tvalid for D containing path.

(ii) sel(r) matches path, but for any r′ ∈ Rsel(r′) does not match path whenever prop(r′) =prop(r) and (a) spec(sel(r′)) > spec(sel(r)) or(b) spec(sel(r′)) = spec(sel(r)) and indexR(r′) >indexR(r).

Here, suppose that an “answer” path satisfying the aboveconditions (i) and (ii) is guessed. Since D is non-recursive,the length of path is bounded by the size of D. Therefore,the conditions (i) and (ii) can be verified in polynomial timeby using D and R, meaning the above problem is in NP. �

3.2.2 The Case of General DTDs

In the following, we consider DTDs without any restric-tions. First, consider CSS rules using only uses only ‘>’as combinators. We have the following theorem.

Theorem 10: UNSAT(CSS{>}) is in coNP under generalDTDs.

Proof. Let D = (d, s) be a DTD over Σ, R be a CSS code,and r ∈ R. Basically, this theorem can be shown similarto Theorem 9. However, under general DTDs the lengthof an “answer” ext-path is not bounded. But this can be ad-dressed as follows. Let spl(a,D) be the length of the shorteststraight valid ext-path from the root to a node labeled by a.Then spl(a,D) is no more than |Σ| since the shortest validpath is “non-recursive.” Let maxspl(D) = maxa∈Σ spl(a,D).Moreover, let maxsel(R) = maxr∈R(len(sel(r))). Then wehave that, for a CSS rule r using only ‘>’ as combinators,there is a straight valid ext-path to which r is applied ifand only if there is a straight valid ext-path path to whichr is applied such that the length of path is no more thanmaxspl(D) + maxsel(R). To see this, let path = v1, v2, . . . , vnbe a straight valid ext-path. To check if r is applied to vn onpath, we need not to check all the nodes of path; a suffix ofpath of at most length maxsel(R) suffices. Let v j, v j+1, . . . , vnbe such a suffix of path, and consider the prefix of pathfrom v1 to v j. The length of the prefix is not bounded,but the prefix can be replaced by a “shortest” straight validext-path v′1, v

′2, . . . , v

′k such that l(v′1) = s, l(v′k) = l(v j), and

k ≤ maxspl(D). The resulting ext-path path’ is a valid ext-path of length no more than maxspl(D)+maxsel(R) such thatr is applied to vn on path if and only if r is applied to vn onpath’. �

We next consider the upper bound for

Page 8: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

808IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

Fig. 5 Example of DTD automaton

UNSAT(CSS{ ,>,˜,+}). To show the upper bound, we de-fine DTD automaton that represents the structure of DTD.For example, Fig. 5 (a) illustrates the DTD automaton ofD = (d, s), where d(s) = ab∗, d(a) = ε, d(b) = sc, andd(c) = ε. As shown in the figure, the horizontal transi-tions represent sibling relationships between elements, andthe vertical transitions (with subscript ‘v’) represent parent-child relationships between elements.

To define the DTD automaton, we use Glushkov au-tomaton (or position automaton) [8], [17]. Let r be a reg-ular expression. Each label occurring in r is superscriptedwith a number to distinguish different occurrences of thesame label in r. By r# we mean the superscripted regularexpression of r obtained by superscripting each label oc-curring in r. By sym(r#) we mean the set of superscriptedlabels occurring in r#. For example, if r = (a|b)(ab)∗b, thenr# = (a1|b1)(a2b2)∗b3 and sym(r#) = {a1, b1, a2, b2, b3}. Letai be a superscripted label of a. By (ai)� we mean the labelresulting from ai by dropping the superscript of ai, namely(ai)� = a. For a word w of superscripted labels, w� denotesthe word obtained by dropping the superscript of each la-bel in w. In the following, we use u, x, y, z to denote super-scripted labels.

The Glushkov automaton of r is a five-tuple G =

(Q,Σ, δ, q0, F), where Q = sym(r#) ∪ {q0} is a set of states,δ : Q × Σ → Q is a transition function, q0 � sym(r#) is theinitial state of G, and F is a set of final states. To define δand F, we need First(r#), Last(r#), and Follow(r#, x), whichare defined as follows:

First(r#)

= {x ∈ sym(r#) | xw ∈ L(r#) for some word w},Last(r#)

= {x ∈ sym(r#) | wx ∈ L(r#) for some word w},Follow(r#, x)

= {y ∈ sym(r#) | wxyw′ ∈ L(r#)

for some words w, w′}.Then, δ and F are defined as follows.

δ(x, a)=

{ {y | y∈First(r#), y�= a} if x = q0,{y | y∈Follow(r#, x), y�= a} otherwise,

F=

{Last(r#) ∪ {q0} if ε ∈ L(r),Last(r#) otherwise.

For any regular expression r, it holds that L(r) = {w� | w ∈L(G)}. The Glushkov automaton G can be constructed inpolynomial time [4].

We define the DTD automaton formally. Let D = (d, s)be a DTD over Σ. Let Σv = {av | a ∈ Σ} be the set ofvertical labels. Let Ga = (Qa,Σ, δa, q0

a, Fa) be the Glushkovautomaton of d(a). Without loss of generality, we assumethat Qa ∩ Qb = ∅ whenever a � b, where Qb is the set ofstates of the Glushkov automaton of d(b). Then the DTDautomaton of D w.r.t. CSS rule r is defined as an NFA M =(Q,Σv∪Σ, δ, r0, F), where Q, δ, and F are defined as follows:

• First, Q is defined as follows:

Q =⋃a∈Σ

Qa ∪ {r0, s0},

where Qa is the set of states of Ga, r0 is the initial stateand s0 is the state representing the start label s.

• δ is obtained by taking the union of (a) δa (the transi-tion function of Ga) of each a and (b) δv, where δa rep-resents “horizontal transitions” and δv represents “ver-tical transitions.” Here, δv is defined as follows:

δv(x, cv)

=

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

{s0} if x = r0 and cv = sv,{y | y ∈ Qs, y

� = c} if x = s0,{y | y ∈ Qb, y

� = c} if x = bi ∈ Qa for somei and some a, b ∈ Σ,

∅ otherwise.

Now, δ : Q × (Σv ∪ Σ) → Q is defined by using δa andδv as follows:

δ(x, c∗)

=

{δa(x, c∗) if c∗ ∈ Σ and x ∈ Qa for some a,δv(x, c∗) if c∗ ∈ Σv.

• F is determined by the last simple selector of r as fol-lows:

F = {x | x ∈ Q, x� = last(sel(r))}.For example, let us consider Fig. 5. If r = a c p : v,then F = {c1}.One can see that the DTD automaton exactly covers

every valid ext-path.

Lemma 1: Let D = (d, s) be a DTD and M = (Q,Σv ∪Σ, δ, r0, F) be the DTD automaton of D. Then, there exists atree t valid for D containing an ext-path p = v1, v2, . . . , vn ifand only if there is a sequence s = x0, x1, . . . , xn of states ofM corresponding to p. In other words, p and s satisfy

• x0 = r0 and x1 = s0, and• for 2 ≤ i ≤ n,

– xi ∈ δ(xi−1, l(vi)) if vi is the immediate right sibling

Page 9: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS809

of vi−1,– xi ∈ δ(xi−1, l(vi)v) if vi is a child of vi−1.

We now have the following theorem:

Theorem 11: UNSAT(CSS{ ,>,+,˜}) is in PSPACE undergeneral DTDs.

Sketch of Proof. Let sel = s1c1s2 · · · cn−1sn be a selector,where si is a simple selector and ci is a combinator. We firstdefine the regular expression representation of sel, denotedre(sel), as follows: re(sel) = c′0s′1c′1s′2 · · · c′n−1s′n, where

c′i =

⎧⎪⎪⎪⎨⎪⎪⎪⎩(Σv)∗ if (a) i = 0 or (b) i ≥ 1 and ci = ‘ ’,(Σ)∗ if ci = ‘˜’,ε if ci ∈ {>, +},

for 0 ≤ i ≤ n − 1 and

s′i =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

(si)v if si ∈ Σ, and i = 1 or ci−1 ∈ { , >},|av∈Σvav if si = ‘∗’ and ci−1 ∈ { , >},si if si ∈ Σ and ci−1 ∈ {+,˜},|a∈Σa if si = ‘∗’ and ci−1 ∈ {+,˜},

for 1 ≤ i ≤ n. For example, if sel = a ˜ b > c, then re(sel) =(Σv)∗av(Σ)∗b cv.

By using the DTD automaton and above regular ex-pression representation of a selector, we can check if r ∈ Ris unsatisfiable under a DTD D w.r.t. R as follows:

1. Construct the DTD automaton M of D.2. Convert sel(r) into its regular expression representation

re(sel(r)).3. Let r1, r2, . . . , rk ∈ R be the CSS rules that may con-

flict with r. That is, for every 1 ≤ i ≤ k, ri

satisfies that last(sel(ri)) = last(sel(r)), prop(ri) =prop(r), and (a) spec(sel(ri)) > spec(sel(r)) or (b)spec(sel(ri)) = spec(sel(r)) and indexR(ri) > indexR(r).Convert sel(ri) into its regular expression representa-tion re(sel(ri)) for every 1 ≤ i ≤ k.

4. Determine if

L(are(sel(r)))∩ L(M) ⊆⋃

1≤i≤k

L(re(sel(ri)))∩ L(M).

By this and Lemma 1, one can see that step 4 is true ifand only if r is unsatisfiable under D w.r.t. R.

Since the containment problem for regular expression isPSPACE-complete [18], step 4 can be done in PSPACE. �

4. Tractability

In this section, we present four different conditions underwhich the CSS unsatisfiability problem can be solved inpolynomial time. As shown in the previous section, theproblem is intractable under very restricted DTDs and withlimited combinators. Therefore, we explore tractable condi-tions from a different perspective.

4.1 Conditions R1 and R2: Disabling Universal Selectors

In Theorem 2, we showed that UNSAT(CSS{>}) is coNP-hard under very restricted DTDs. Recall that the proof ofthe coNP-hardness essentially depends on universal selector‘∗’. In the following, we show that the problem becomestractable even under general DTDs, if universal selector isdisabled.

Let R be a CSS code and r ∈ R. By CR(r) we mean theset of CSS rules in R that may conflict with r, that is,

CR(r) = {r′ ∈ R | last(sel(r)) = last(sel(r′)),prop(r) = prop(r′),spec(sel(r′)) > spec(sel(r)) or

(spec(sel(r′)) = spec(sel(r)) and

indexR(r′) > indexR(r))}.If a selector sel uses only child combinators and labels, thensel is called a child-label selector. For example, a > b > c isa child-label selector but a > ∗ c is not. We show that if thefollowing condition R1 holds, then whether a rule r ∈ R issatisfiable under D w.r.t. R is determined in PTIME.

R1: sel(r) is a child-label selector and for all r′ ∈ CR(r),sel(r′) is a child-label selector.

Let D = (d, s) be a DTD and M = (Q,Σv ∪ Σ, δ, r0, F)be the DTD automaton of D. For a child-label selector sel =s1 > s2 > · · · > sn, by QI(sel) we mean the set of “start” states

q0 of a vertical path q0(s1)v→ q1

(s2)v→ · · · (sn)v→ qn in M, that is,

QI(sel) = {q0 | qi ∈ δ(qi−1, (si)v) for 1 ≤ i ≤ n,

qi ∈ Q for 0 ≤ i ≤ n}.We now have the following theorem.

Theorem 12: If condition R1 holds for all r ∈ R, thenUNSAT(CSS{>}) is in PTIME under general DTDs.

Sketch of Proof. Let sel, sel′ be child-label selectors, wheresel = s1 > s2 > · · · > sn and that sel′ = s′1 > s′2 > · · · > s′m. Wesay that sel is a suffix of sel′ if n < m and si = s′i+m−n forevery 1 ≤ i ≤ n. By Condition R1, it suffices to consideronly rules r′ ∈ CR(r) such that sel(r) is a suffix of sel(r′).Let

Cs f xR (r) = {r′ ∈ CR(r) | sel(r) is a suffix of sel(r′)}.

Then it is easy to show that r is satisfiable under D w.r.t. Rif and only if there is a state q ∈ Q such that

1. q is reachable from r0,2. for some q′ ∈ QI(sel(r)), q′ is reachable from q over

M, and that3. for any r′ ∈ Cs f x

R (r) and any q′ ∈ QI(sel(r′)), q′ is notreachable from q over M.

The second condition means that for some valid tree t, r canbe applied to a node on t. The third condition means that no

Page 10: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

810IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

rule in Cs f xR (r) can be applied to the node. Both conditions

can be checked in polynomial time. �

A similar argument can also be applied to CSS{+}. ByTheorem 4, UNSAT(CSS{+}) is coNP-hard under very re-stricted DTDs. However, similar to the above theorem, wecan show that the problem can be solved in PTIME if uni-versal selector is disabled.

We say that sel is called a is-label selector if sel usesonly immediate sibling combinators and labels. We presentthe “horizontal version” of condition R1, as follows.

R2: sel(r) is an is-label selector and for all r′ ∈ CR(r),sel(r′) is an is-label selector.

Similar to Theorem 12, we can show the following theorem.The only difference to Theorem 12 is that this theorem useshorizontal transitions instead of vertical transitions.

Theorem 13: If condition R2 holds for all r ∈ R, thenUNSAT(CSS{+}) is in PTIME under general DTDs. �

4.2 Condition R3: Restricting the Number of ConflictingRules

The third condition is to restrict the number of conflictingrules and the form of their selectors.

To define the condition, we present a restricted formof selector. Consider dividing a selector sel by its descen-dant/general sibling combinators. Then sel can be denoted

sel = sel1sel2 · · · selm, (3)

where seli is a selector such that the first combinator is eithera descendant or general sibling combinator and the othersare child or immediate sibling combinators. We say that seliis specific if each simple selector in seli is a label and s � s′for any distinct simple selectors s, s′ in seli. For example, ifseli = ˜a + b > c and sel j = a + ∗ > a, then seli is specificbut sel j is not. We say that sel is specific if seli is specificfor every 1 ≤ i ≤ m.

Let c > 0 be a constant number. We show that if thefollowing condition holds, then the unsatisfiability of r canbe determined in polynomial time.

R3: (1) |CR(r)| < c and (2) if |CR(r)| ≥ 1, then sel(r′) isspecific for any r′ ∈ CR(r).

Theorem 14: If condition R3 holds for all r ∈ R, thenUNSAT(CSS{ ,>,+,˜}) is in PTIME under general DTDs.

Sketch of Proof. Let r ∈ R be a CSS rule. If |CR(r)| = 0,then whether r is unsatisfiable can be determined in PTIMEby taking the intersection of the DTD automaton of D andthe NFA of re(sel(r)). Consider the case where |CR(r)| ≥ 1.Assume that |CR(r)| < c and that sel(r′) is specific for anyr′ ∈ CR(r). Then sel(r′) can be represented by a regularexpression re(sel(r′)), which is represented by an NFA. Butin this case sel(r′) is very restricted, i.e., specific. This im-plies that we can construct a DFA M(sel(r′)) equivalent to

re(sel(r′)) without exponential state explosion (a more de-tailed construction is presented in the appendix).

By using M(sel(r′)), whether r is unsatisfiable w.r.t. Runder D can be determined as follows:

1. Construct M(sel(r)) from sel(r).2. Let CR(r) = {r′1, r′2, . . . , r′m} (m < c). Construct a DFA

M(sel(r′i )) for 1 ≤ i ≤ m.3. Construct the following NFA:

M′I = M(sel(r)) ∩ ¬M(sel(r′1)) ∩ ¬M(sel(r′2))

∩ · · · ∩ ¬M(sel(r′m)).

4. Construct intersection MI = M′I∩MD, where MD is theDTD automaton of D.

5. Determine if L(MI) is empty.

Then it is easy to see that MI is empty if and only if r isunsatisfiable w.r.t. R under D.

Since M(sel(r′i )) is deterministic, ¬M(sel(r′i )) can beobtained in linear time. Furthermore, M′I is the intersectionof m DFAs and one NFA, which can be obtained in polyno-mial time since m < c. Therefore, M′I can be obtained inpolynomial time. �

4.3 Condition R4: Restricting Length of CSS Rules

We now present the last condition under which the prob-lem can be solved in PTIME. In short, the problem becomestractable by restricting the length of selectors to no greaterthan two, even if there is no restriction on the number ofconflicting rules and their selectors.

For a CSS rule r and a DTD D, to check if r is unsatisfi-able, we use the DTD automaton M of D and find states thatare matched by r but not matched by the rules conflictingwith r. In general, such states cannot be found efficientlyaccording to the results presented in the previous section.However, we show that such a path can be found efficientlyunder the following condition:

R4: If |CR(r)| ≥ 1, then (1) len(sel(r)) ≤ 2 andlen(sel(r′)) = 2 for every r′ ∈ CR(r), and (2) the simpleselectors of r and any rule in CR(r) are labels.

To present an algorithm for solving the problem undercondition R4, we give some definitions. Assuming condi-tion R4 with |CR(r)| ≥ 1, by Cc

R(r) we mean the set of CSSrules in CR(r) whose combinator is c, that is,

CcR(r) = {r′ ∈ CR(r) | cmb(sel(r′)) = c},

where c ∈ { , >,˜, +} and cmb(sel(r′)) denotes the combina-tor of sel(r′) (since len(sel(r′)) = 2, sel(r′) has exactly onecombinator). We also use the following abbreviation:

first(CcR(r)) = {first(sel(r′)) | r′ ∈ Cc

R(r)}.For example, let R = [r1, r2, r3, r4], where

r1 = a b p :v1,

r2 = b p :v2,

Page 11: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS811

Fig. 6 Straight ext-path p from v1 to vn and siblings of vn

r3 = c b p :v3,

r4 = d b p :v4.

Then CR(r1) = {r3, r4} and first(CR (r1)) = {c, d}.Under condition R4, we can focus on only ext-paths

with a simple shape. Consider the case where r is a CSSrule with cmb(sel(r)) = ‘ ’. Then any ext-path we needto check is a straight ext-path p together with siblings ofthe “bottom” node vn = v′k (Fig. 6). To see this, supposethat sel(r) matches p. To check if r is applied to vn, weneed to verify that none of the rules in CR(r) is applied to vn,where CR(r) = CR (r) ∪ C>R(r) ∪ C˜R (r) ∪ C+R(r). For rules inCR (r)∪C>R(r), this can be checked by traversing only p. Forrules in C˜R (r) ∪ C+R(r), this can be done by checking onlythe labels of (left) siblings of vn = v′k. Similar argumentscan also be applied to the other cases. Thus, to determine ifr is satisfiable under a DTD D, our algorithm explores theDTD automaton of D and checks if D allows such a straightext-path (with the siblings of the bottom node) to which rcan be applied.

Let M = (Q,Σv ∪ Σ, δ, r0, F) be the DTD automatonof D. Our algorithm firstly explores M and finds states thatare matched by r but not matched by the rules in CR (r). Toobtain such states, for a set S ⊆ Q and a set S kip of labels,we define the set of pairs of parent-child states (x, y) suchthat y is vertically reachable in “i-hops” from a state in Swithout using S kip, denoted RSi

v(M, S , S kip), as follows:

RSiv(M, S , S kip)

=

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩{(x, y) | x ∈ S , x� � S kip,y ∈ δ(x, cv), cv ∈ Σv} if i = 1,

{(x, y) | x ∈ RSi−1v (M, S , S kip),

x� � S kip, y ∈ δ(x, cv), cv ∈ Σv} if i ≥ 2.

Then the “any-hops” version of RSiv(M, S , S kip), denoted

RS+v (M, S , S kip), is defined as follows:

RS+v (M, S , S kip) =⋃i≥1

RSiv(M, S , S kip).

By RS+v (M, S , S kip, a) we mean the set of pairs of parent-child states (x, y) in RS+v (M, S , S kip) such that y� = a, thatis,

RS+v (M, S , S kip, a)

= {(x, y) | (x, y) ∈ RS+v (M, S , S kip), y� = a}.

Algorithm 1 CheckUnsatisfiabilityInput: DTD D = (d, s), CSS code R and CSS rule r ∈ R satisfying

condition R4Output: “satisfiable” or “unsatisfiable”1: Construct the DTD automaton M = (Q,Σ ∪ Σv, δ, r0, F) of D.2: N1 ← RS+v (M, {r0}, first(CR (r)), first(sel(r))).3: if cmb(sel(r)) ∈ { , >} then4: if cmb(sel(r)) = ‘ ’ then5: N2 ← RS+v (M, π2(N1), first(CR (r)), last(sel(r))).6: else7: N2 ← RS1

v (M, π2(N1), first(CR (r)), last(sel(r))).

8: if for some (y′, z) ∈ N2, y′� � first(C>R(r)) and there is a se-quence q0, q1, . . . , qm of states of Gy′� satisfying the following:

a) q0 is the initial state, qk = z for some 1 ≤ k ≤ m, qm isa final state, and qi ∈ δ(qi−1, (qi)�) for every 1 ≤ i ≤ m,

b) (qi)� � first(CR (r)) for every 1 ≤ i ≤ k − 1, andc) (qk−1)� � first(C+R(r)).

then9: return “satisfiable”.

10: return “unsatisfiable”.11: else if cmb(sel(r)) ∈ {˜, +} then12: if for some (x, y) ∈ N1, x� � first(C>R(r)) and there is a se-

quence q0, q1, . . . , qm of states of Gx� satisfying the following:

a) q0 is the initial state, qj = y for some 1 ≤ j ≤ m − 1,qm is a final state, and qi ∈ δ(qi−1, (qi)�) for every 1 ≤i ≤ m,

b) if cmb(sel(r)) = ˜, then (qk)� ∈ last(sel(r)) for some

k > j, otherwise (qk)� ∈ last(sel(r)) with k = j + 1,c) (qi)� � first(CR (r)) for every 1 ≤ i ≤ k − 1, andd) (qk−1)� � first(C+R(r)).

then13: return “satisfiable”.14: return “unsatisfiable”.

We now present our algorithm (Algorithm 1). Forsimplicity, in the following we present an algorithm forthe case of len(sel(r)) = 2 (the algorithm for the case oflen(sel(r)) = 1 can be obtained similarly). Under conditionR4, Algorithm 1 checks if a given CSS rule r ∈ R is un-satisfiable under D. The algorithm has two parts: the firstpart (lines 3 to 10) checks the satisfiability of r in the caseof cmb(sel(r)) ∈ { , >}, and the second part (lines 11 to 14)checks the satisfiability of r in the case of cmb(sel(r)) ∈{˜, +}. The outline of the first part is shown in Fig. 7. Thispart finds a state z that is reachable from r0 via sel(r) and isnot “blocked” by any rule in CR (r) (lines 2 to 7), and thencheck if z is not “blocked” by any rule in C>R(r), C˜R (r), orC+R(r) (line 8). In line 2, N1 is the set of pairs of states (x, y)such that y� = first(sel(r)) and that y is vertically reachablefrom r0 without using any label in first(CR (r)). In lines 5and 7, by π2(N1) we mean the second column of N1, i.e.,π2(N1) = {y | (x, y) ∈ N1}. Thus N2 is the set of parent-childpairs (y′, z) such that z� = last(sel(r)) and that z is verticallyreachable in any-hops (line 5) or one-hop (line 7) from somestate y ∈ π2(N1) without using any label in first(CR (r)). Fi-nally, line 8 checks if z is not blocked by any rule in C>R(r),C˜R (r), and C+R(r). In line 8, Gy′� denotes the Glushkov au-

Page 12: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

812IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

Fig. 7 Outline of the first step of the algorithm

Fig. 8 Outline of the second step of the algorithm

tomaton of y′� of M. The second part proceeds similarly andits outline is shown in Fig. 8.

In the following, we show the correctness of the algo-rithm. We first show the following lemma.

Lemma 2: Let D = (d, s) be a DTD and M = (Q,Σv ∪Σ, δ, r0, F) be the DTD automaton of D. Then, there ex-ists a tree t valid for D containing a straight ext-path p =v1, v2, . . . , vn with the siblings v′1, v

′2, . . . , v

′m of vn (i.e., vn = v′j

for some j) if and only if there is a sequence x0, x1, . . . , xn

of states of M satisfying the following:

• x0 = r0,• xi ∈ δ(xi−1, l(vi)v) for every 1 ≤ i ≤ n, and• there is a sequence q0, q1, . . . , qm of states correspond-

ing to the siblings of vn, i.e.,

– q0 is the initial state of G(xn−1)� , qk = xn for somek, and qm is an accepting state of G(xn−1)� , and

– qi ∈ δ(qi−1, l(v′i)) for every 1 ≤ i ≤ m,

where G(xn−1)� is the Glushkov automaton of (xn−1)�.

We have the following theorem:

Theorem 15: Let D = (d, s) be a DTD, R be a CSS code,

and r ∈ R. If condition R4 holds, then r is unsatisfiableunder D w.r.t. R if and only if CheckUnsatisfiability returns“unsatisfiable”.

Sketch of Proof. Assume that condition R4 holds. We showthe case where cmb(sel(r)) = ‘ ’ (the other cases canbe shown similarly). By condition R4, we have CR(r) =CR (r) ∪ C>R(r) ∪ C˜R (r) ∪ C+R(r). Let t be a tree valid for D.Consider a straight ext-path p = v1, v2, . . . , vn in t with thesiblings v′1, v

′2, · · · , v′m of vn = v′k for some k (Fig. 6). Here,

suppose that sel(r) matches p but r is not applied to vn. Bycondition R4, the reason must be one of the following.

1. Some r′ ∈ CR (r) is applied to vn. In this case, it musthold that l(vi) = first(sel(r′)) for some 1 ≤ i ≤ n−1 andthat l(vn) = last(sel(r′)).

2. Some r′ ∈ C>R(r) is applied to vn. In this case, itmust hold that l(vn−1) = first(sel(r′)) and that l(vn) =last(sel(r′)).

3. Some r′ ∈ C˜R (r) is applied to vn. In this case, l(vn) =last(sel(r′)) and there must be a left sibling v′i (1 ≤ i ≤k − 1) of vn = v′k such that l(v′i) = first(sel(r′)).

4. Some r′ ∈ C+R(r) is applied to vn. In this case, l(vn) =last(sel(r′)) and the immediate left sibling v′k−1 of vn =v′k must satisfy that l(v′k−1) = first(sel(r′)).

Therefore, r is satisfiable under D w.r.t. R if and only if thereis a tree t valid for D containing a straight ext-path p =v1, v2, . . . , vn with the siblings v′1, v

′2, . . . , v

′m of vn = v′k such

that sel(r) matches p and that none of the above conditions1 to 4 holds.

In the following, we show that r is unsatisfiable underD w.r.t. R if and only if CheckUnsatisfiability returns “un-satisfiable”.

⇐) Suppose that CheckUnsatisfiability returns “un-satisfiable” on line 10. Then (1) N2 = ∅ or (2) for any(y′, z) ∈ N2, y′� ∈ first(C>R(r)) or there is no sequenceq0, q1, . . . , qm of states satisfying the conditions (a) to (c)on line 8. In the case of (1), any state matched by sel(r) isalso matched by sel(r′) for some r′ ∈ CR (r). In the case of(2), qk = z is matched by sel(r) but also matched by sel(r′)for some r′ in C>R(r), C˜R (r), or C+R(r). In either case, byLemma 2 there is no tree t valid for D containing a straightext-path, say p = v1, v2, . . . , vn, such that sel(r) matches pbut no rule in CR(r) is applied to vn. Therefore, r is unsatis-fiable.

⇒) Suppose that CheckUnsatisfiability returns “satis-fiable”. Then there is a sequence x0, x1, . . . , xn of states ofM satisfying the following condition:

1. From the construction of N1 and N2, we have x0 = r0,xi ∈ δ(xi−1, (xi)�) for every 1 ≤ i ≤ n, (xi)� =first(sel(r)) for some 1 ≤ i < n, (xn)� = last(sel(r)),and (xi)� � first(CR (r)) for any 1 ≤ i < n.

2. From line 8, there is a sequence of states q0, q1, . . . , qm

such that qk = xn for some k and that the conditions (a)to (c) in line 8 holds.

By Lemma 2, there exists a tree t valid for D containing a

Page 13: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS813

straight ext-path p = v1, v2, . . . , vn such that l(vi) = (xi)� forevery 1 ≤ i ≤ n and that vn = v′k has siblings v′1, v

′2, . . . , v

′m

such that l(v′i) = (qi)� for every 1 ≤ i ≤ m. By the con-ditions (a) to (c) in line 8, no rule in in C>R(r), C˜R (r), orC+R(r) is applied to vn. Hence r is applied to vn and thus r issatisfiable. �

Consider the running time of the algorithm.RS+v (M, S , S kip, a) can be obtained in polynomial time.The conditions in lines 8 and 12 can be checked by solv-ing reachability problems with a few additional conditions.Hence it is easy to show that CheckUnsatisfiability runs inpolynomial time. Therefore, we have the following:

Theorem 16: If condition R4 holds for all r ∈ R, thenUNSAT(CSS{ ,>,˜,+}) is in PTIME under general DTDs.

5. Conclusion

In this paper, we considered the CSS unsatisfiability prob-lem. First, we showed that the problem is coNP-hard underclosure-free, duplicate-free, and non-recursive DTDs, evenif only one of the four combinators of CSS is allowed. Next,we showed that the problem is coNP-hard even if DTDsare restricted to be disjunction-free and either child or de-scendant combinator is allowed. We also showed that theproblem is in coNP or PSPACE depending on restrictionson DTDs and CSS. Finally, we presented conditions R1 toR4 under which the problem can be solved in polynomialtime.

However, we still have much work to do. First, asshown in Table 1, the upper and lower bounds of complexityare not strict in several cases. Therefore, we need to identifythe strict upper and lower bounds for the cases. In particular,we will investigate whether or not PSPACE-completeness ofthe problem holds when no restrictions are placed on DTDsor CSS. At present, we consider that the tight upper boundwould be Σp

i or Πpi and we would like to try to prove it.

Second, we presented several conditions under whichthe CSS unsatisfiability problem is tractable, but it is notclear what extent these conditions are supported by real-world CSS codes. Therefore, we need to investigate real-world CSS codes to clarify this point. We expect that real-world CSS rules tend to have “short” selectors, meaningcondition R4 could be supported by many of real-world CSSrules.

Third, we need to consider CSS selectors that were notconsidered in this paper, e.g., attributes (e.g., id and class),selectors using “first-child” and “last-child” pseudo classes,and so on.

Finally, we showed that the problem becomes tractableby restricting the length of selectors to no greater than two(condition R4). On the other hand, by Theorems 1 and 3, theproblem becomes intractable in the case where the length ofselectors is four. However, the (in)tractability of the casewhere the length of selectors is three is not identified, whichis left as a future work.

Acknowledgments

The authors are thankful to anonymous reviewers for theirinsightful comments and suggestions.

References

[1] M. Benedikt, W. Fan, and F. Geerts, “XPath satisfiability in the pres-ence of DTDs,” J. ACM, vol.55, no.2, pp.8:1–8:79, May 2008.

[2] B. Bos, D. Carlisle, P.D.F. Ion, B.R. Miller, and eds., “A MathMLfor CSS profile,” https://www.w3.org/TR/mathml-for-css/.

[3] M. Bosch, P. Geneves, and N. Layaıida, “Automated refactoring forsize reduction of CSS style sheets,” Proc. 2014 ACM Symposiumon Document Engineering, DocEng ’14, pp.13–16, 2014.

[4] A. Bruggenmann-Klein, “Regular expressions into finite automata,”Theoretical Computer Science, vol.120, no.2, pp.197–213, 1993.

[5] Firebug Working Group, “FireBug,” https://www.getfirebug.com/.[6] M.R. Garey and D.S. Johnson, Computers and Intractability - A

Guide to the Theory of NP-Completeness, W.H. Freeman, 1979.[7] P. Geneves, N. Layaida, and V. Quint, “On the analysis of cascading

style sheets,” Proc. 21st International Conference on World WideWeb, WWW ’12, pp.809–818, 2012.

[8] V.M. Glushkov, “The abstract theory of automata,” Russian Math.Surveys, vol.16, pp.1–53, 1961.

[9] Google Inc., “Chrome developer tools,” https://developers.google.com/web/tools/chrome-devtools/.

[10] M. Hague, A.W. Lin, and C.-H.L. Ong, “Detecting redundantCSS rules in HTML5 applications: A tree rewriting approach,”SIGPLAN Not., vol.50, no.10, pp.1–19, Oct. 2015.

[11] O. Hartig, Querying a Web of Linked Data, IOS Press, 2016.[12] Y. Ishihara, N. Suzuki, K. Hashimoto, S. Shimizu, and T. Fujiwara,

“XPath satisfiability with parent axes or qualifiers is tractable undermany of real-world DTDs,” Proc. 14th International Symposium onDatabase Programming Languages (DBPL 2013), Aug. 30, 2013,Riva del Garda, Trento, Italy, 2013.

[13] D. Mazinanian and N. Tsantalis, “Migrating cascading style sheetsto preprocessors by introducing mixins,” Proc. 31st IEEE/ACM In-ternational Conference on Automated Software Engineering, ASE2016, pp.672–683, 2016.

[14] D. Mazinanian, N. Tsantalis, and A. Mesbah, “Discovering refac-toring opportunities in cascading style sheets,” Proc. 22nd ACMSIGSOFT International Symposium on Foundations of Software En-gineering, FSE 2014, pp.496–506, 2014.

[15] A. Mesbah and S. Mirshokraie, “Automated analysis of css rules tosupport style maintenance,” Proc. 34th International Conference onSoftware Engineering, ICSE ’12, pp.408–418, 2012.

[16] M. Montazerian, P.T. Wood, and S.R. Mousavi, “XPath query satisfi-ability is in ptime for real-world DTDs,” Proc. 5th International Con-ference on Database and XML Technologies, XSym’07, pp.17–30,2007.

[17] H. Yamada and R. McNaughton, “Regular expressions and stategraphs for automata,” IRA Trans. Electron. Comput., vol.EC-9,no.1, pp.39–47, 1960.

[18] L.J. Stockmeyer and A.R. Meyer, “Word problems requiring expo-nential time (preliminary report),” Proc. Fifth Annual ACM Sympo-sium on Theory of Computing, STOC ’73, pp.1–9, 1973.

[19] N. Suzuki, Y. Fukushima, and K. Ikeda, “Satisfiability of simpleXPath fragments under duplicate-free DTDs,” IEICE Trans. Inf. &Syst., vol.E96-D, no.5, pp.1029–1042, 2013.

[20] N. Suzuki, T. Okada, and Y. Kwon, “Detecting unsatisfiable css rulesin the presence of dtds,” Proc. 17th ACM SIGPLAN InternationalSymposium on Database Programming Languages, pp.18–29, 2019.

[21] N. Walsh, “DocBookCssStylesheets,” https://github.com/docbook/wiki/wiki/DocBookCssStylesheets/.

[22] X. Zhang and J. Van den Bussche, “On the satisfiability problem

Page 14: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

814IEICE TRANS. INF. & SYST., VOL.E104–D, NO.6 JUNE 2021

for SPARQL patterns,” Journal of Artificial Intelligence Research,vol.56, pp.403–428, 2016.

Appendix: DFA Construction for Specific Selectors

Let r = sel p : v be a CSS rule such that sel is specific.We show that sel can be converted into a DFA equivalent tore(sel) in polynomial time. We firstly represent sel using anequivalent regular expression. As shown in (3), sel can bedivided as follows:

sel = sel1sel2 · · · selm.

Here, seli can be denoted

seli = c1 s1 c2 s2 · · · cni sni ,

where

• c1 = ε if i = 1 and c1 ∈ { ,˜} otherwise, and c j ∈ {+, >}for 2 ≤ j ≤ ni, and

• s j ∈ Σ for 1 ≤ j ≤ n and s j � sk whenever j � k.

Then re(seli) can be denoted as follows.

re(seli) = S ia1a2 · · · ani ,

where

Si =

{Σ∗v if c1 = ‘ ’,Σ∗ if c1 = ‘˜’,

and

a j =

{(s j)v if c j ∈ { , >},s j if c j ∈ {˜, +}.

(1 ≤ j ≤ ni)

Fig. A· 1 DFAs M(seli) constructed from re(seli). (a) in the case where re(seli) contains no pivot and(b) in the case where re(seli) contains a pivot.

To obtain a DFA equivalent to re(sel), we construct DFAsrepresenting re(seli) for each i and merge them into a singleDFA. For re(seli) = S ia1a2 · · · an, if a j−1 ∈ Σ and a j ∈ Σv,then we say that a j is a pivot. Similarly, if a j−1 ∈ Σv anda j ∈ Σ, then we say that a j is a pivot. We have the followingtwo cases:

• The case where re(seli) contains no pivot: In thiscase, re(seli) can be represented by the DFA shown inFig. A· 1 (a), where Σ′ = Σ if S i = Σ

∗, and Σ′ = Σv ifS i = Σ

∗v . The bottom-right edge from qni to q′l is “op-

tional.” More precisely, this edge is required only ifS i � S i+1 or i = n. It is easy to verify that the DFAs inthe figure are equivalent to re(seli).

• The case where re(seli) contains a pivot: Let a j be thefirst pivot. In this case, re(seli) can be representedby the DFA shown in Fig. A· 1 (b). The only differ-ence from the above case is that for any state qk withk ≥ j, the transition from qk to q1 and the transitionfrom qk to q′l are dropped. The reason is as follows.The pivot a j does not contained in S i. Therefore, froma j, a j+1, . . . , ani we need not to return to the Kleene clo-sure represented by S i.

One can see that the above automaton is deterministic sincesel is specific. Let M(seli) be the DFA obtained fromre(seli) as shown above. The first state q0 of M(seli) iscalled the start state and the last state qni is called the endstate. From M(sel1),M(sel2), . . . ,M(selni ), DFA M(sel) isobtained as follows.

• Merge the end state of M(seli−1) and the first state ofM(seli) into one state for 2 ≤ i ≤ ni.

• The initial state of M(sel) is the start state of M(sel1)

Page 15: On CSS Unsatisfiability Problem in the Presense of DTDs∗∗

SUZUKI et al.: ON CSS UNSATISFIABILITY PROBLEM IN THE PRESENSE OF DTDS815

Fig. A· 2 Example of DFA M(sel)

and the accepting state of M(sel) is the end state ofM(selni ).

For example, consider sel = a ˜ b + c > d. Then sel1 = aand sel2 = ˜b + c > d, and thus re(sel1) = Σ∗vav andre(sel2) = Σ∗b c dv. As shown in Fig. A· 2, M(sel) is ob-tained by merging M(sel1) and M(sel2). In general, it iseasy to see that M(sel) is equivalent to re(sel).

Nobutaka Suzuki received his B.E. de-gree in information and computer sciences fromOsaka University in 1993, and his M.E. andPh.D. degrees in information science from NaraInstitute of Science and Technology in 1995 and1998, respectively. He was with Okayama Pre-fectural University as a Research Associate in1998–2004. In 2004, he joined University ofTsukuba as an Assistant Professor. Since 2020,he has been a Professor of Faculty of Library,Information and Media Science, University of

Tsukuba. His current research interests include database theory and struc-tured documents.

Takuya Okada received his bachelor’sdegree in library and information science fromUniversity of Tsukuba in 2017, and his M.E. de-gree in information science from University ofTsukuba in 2019. Currently, he has been work-ing in Nihon Unisys, Ltd. His research interestsare CSS and Web data management.

Yeondae Kwon received her M.S. degreein biochemistry from Pusan National University,Busan, Korea, and Ph.D. degree in informationscience from Nara Institute of Science and Tech-nology, Nara, Japan, in 2000. She was with theUniversity of Tokyo as a Project Associate Pro-fessor in 2017–2019. Since 2019, she has beena researcher of Research Center for AgriculturalInformation Technology, National Agricultureand Food Research Organization. Her currentresearch interests include text mining and senti-

ment analysis with machine learning. She is a member of JSBi.