Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Decidability and Enumeration in Automatic
Sequences
Jeffrey ShallitSchool of Computer Science
University of WaterlooWaterloo, Ontario N2L 3G1
http://www.cs.uwaterloo.ca/~shallit
Joint work with Jean-Paul Allouche, Emilie Charlier, NaradRampersad, Dane Henshall, Luke Schaeffer, Eric Rowland, DanielGoc, and Hamoon Mousavi.
1 / 68
In Memory of My Grandparents
ZipporaLevintan
(1877–1964)MoixeXalit
(1871–1949)
2 / 68
What is an automatic sequence?
◮ An infinite sequence
a = a0a1a2 · · ·
over a finite alphabet of letters, generated by a finite-statemachine (automaton)
◮ The automaton, given n as input, computes an as follows:
◮ n is represented in some fixed integer base k ≥ 2◮ The automaton moves from state to state according to this
input◮ Each state has an output letter associated with it◮ The output on input n is the output associated with the last
state reached
3 / 68
The canonical example: the Thue-Morse automaton
0 01
1
0 1
This automaton generates the Thue-Morse sequence
t = (tn)n≥0 = 0110100110010110 · · · .
4 / 68
Why automatic sequences?
◮ A nontrivial class of self-similar sequences
◮ Many “naturally-occurring” sequences are automatic
◮ Halfway between periodic and chaotic
◮ Provide canonical examples for various kinds of avoidanceproblems
5 / 68
Historically interesting properties of t
1. t is not ultimately periodic.2. t contains no factor that is an overlap, that is, a word of the
form axaxa, where a is a single letter and x is an arbitraryfinite word. (Example in Russian: mamam.)
3. t contains infinitely many distinct square factors xx , but foreach such factor we have |x | = 2n or 3 · 2n, for n ≥ 0.Examples of squares in Russian include d�d� (“uncle”) andkuskus (“couscous”).
4. t has infinitely many distinct palindromic factors (Apalindrome is a word equal to its reverse, like dohod.)
5. The number p(n) of distinct palindromic factors of length n int is given by
p(n) =
0, if n odd and n ≥ 5;
1, if n = 0;
2, if 1 ≤ n ≤ 4, or n even and 3 · 4k + 2 ≤ n ≤ 4k+1;
4, if n even and 4k + 2 ≤ n ≤ 3 · 4k .6 / 68
Historically interesting properties of t
6. t is mirror-invariant: if x is a finite factor of t, then so is itsreverse xR .
7. t is recurrent, that is, every factor that occurs, occursinfinitely often.
8. t is uniformly recurrent, that is, for all factors x occurring int, there is a constant c(x) such that two consecutiveoccurrences of x are separated by at most c(x) symbols.
9. t is linearly recurrent, that is, it is uniformly recurrent andfurthermore there is a constant C such that c(x) ≤ C |x | forall factors x . In fact, the optimal bound is given by c(1) = 3,c(2) = 8, and c(n) = 9 · 2e for n ≥ 3, wheree = ⌊log2(n − 2)⌋.
7 / 68
Historically interesting properties of t
10. The lexicographically least sequence in the shift orbit closureof t is
t1 t2 t3 · · · = 0010110 · · · ,
which is also 2-automatic.
11. The subword complexity ρ(n) of t, which is the functioncounting the number of distinct factors of t, is given by
ρ(n) =
2n, if 0 ≤ n ≤ 2;
2n + 2t+2 − 2, if 3 · 2t ≤ n ≤ 2t+2 + 1;
4n − 2t − 4, if 2t + 1 ≤ n ≤ 3 · 2t−1;
.
12. t has an unbordered factor of length n if n 6≡ 1 (mod 6) (Hereby an unbordered word y we mean one with no expression inthe form y = uvu for words u, v with u nonempty.)
8 / 68
The claim
Claim. All of these properties can be verified (and in some case,even obtained) purely mechanically, by a machine computation.
To see how, we need to digress into...
9 / 68
Logic
◮ Let Th(N,+, 0, 1, <) denote the set of all true first-ordersentences in the logical theory of the natural numbers withaddition.
◮ Example: in this theory we can express the so-called “ChickenMcNuggets” theorem that 43 is the largest integer thatcannot be represented as a non-negative integer linearcombination of 6, 9, and 20, as follows:
(∀n > 43 ∃x , y , z ≥ 0 such that n = 6x + 9y + 20z) ∧
¬(∃x , y , z ≥ 0 such that 43 = 6x + 9y + 20z). (1)
Here, of course, “6x” is shorthand for the expression“x + x + x + x + x + x”, and similarly for 9y and 20z .
◮ Presburger (1929) proved that Th(N,+, 0, 1, <) is decidable:that is, there exists an algorithm that, given a sentence in thetheory, will decide its truth.
10 / 68
Decidability of Presburger arithmetic: proof sketch
◮ represent integers in a integer base k ≥ 2 using the alphabetΣk = {0, 1, . . . , k − 1}.
◮ represent n-tuples of integers as words over the alphabet Σnk ,
padding with leading zeroes, if necessary.
◮ For example, the pair (21, 7) can be represented in base 2 bythe word
[1, 0][0, 0][1, 1][0, 1][1, 1].
11 / 68
Decidability of Presburger arithmetic
◮ Then the relation x + y = z can be checked by a simple2-state automaton depicted below, where transitions notdepicted lead to a nonaccepting “dead state”.
{[a, b, c] : a+ b = c}
{[a, b, c] : a + b + 1 = c}
{[a, b, c] : a + b = c + k}
{[a, b, c] : a + b + 1 = c + k}
Figure: Checking addition in base k
12 / 68
Decidability of Presburger arithmetic: proof sketch
◮ Relations like x = y and x < y can be checked similarly.
◮ Given a formula with free variables x1, x2, . . . , xn, we constructan automaton accepting the base-k expansion of thosen-tuples (x1, . . . , xn) for which the proposition holds.
◮ If a formula is of the form ∃x1, x2, . . . xn p(x1, . . . , xn), thenwe use nondeterminism to “guess” the xi and check them.
◮ If the formula is of the form ∀p, we use the equivalence∀p ≡ ¬∃¬p; this may require using the subset construction toconvert an NFA to a DFA and then flipping the “finality” ofstates.
◮ Finally, the truth of a formula can be checked by using theusual depth-first search techniques to see if any final state isreachable from the start state.
13 / 68
Even more
◮ If we add the function Vk : N → N to our logical theory,where Vk(x) = kn, and kn is the largest power of k dividingx , it is still decidable by a similar automaton-based technique(cf. Bruyere, Villemaire, Hodgson, ...).
◮ By doing so, we gain the capability of deciding manyquestions about automatic sequences.
TheoremThere is an algorithm that, given a predicate phrased using onlythe universal and existential quantifiers, indexing into a givenautomatic sequence a, addition, subtraction, logical operations,and comparisons, will decide the truth of that proposition.
We call such a predicate an automatic predicate.
14 / 68
The bad news
◮ The worst-case running time of our algorithm is boundedabove by
22...2p(N)
,
where the number of 2’s in the exponent is equal to thenumber of quantifiers, p is a polynomial, and N is the numberof states needed to describe the underlying automaticsequence.
15 / 68
The good news
◮ That’s just the worst-case upper bound!
◮ An implementation often succeeds in verifying statements inthe theory
◮ It has been implemented by Dane Henshall and Daniel Goc,independently
16 / 68
Deciding periodicity
◮ An infinite word a is periodic if it is of the form xω = xxx · · ·for a finite nonempty word x (e.g., nananana · · · ).
◮ It is ultimately periodic if it is of the form yxω for a (possiblyempty) finite word y (e.g., banananana · · · ).
◮ Honkala (1986) proved that ultimate periodicity is decidablefor automatic sequences.
◮ Using our approach: it suffices to express ultimatelyperiodicity as an automatic predicate:
∃p ≥ 1,N ≥ 0 ∀i ≥ N a[i ] = a[i + p].
◮ When we run this on the Thue-Morse sequence, we discover(as expected) that t is not ultimately periodic.
◮ Can be implemented in polynomial time.
17 / 68
Repetitions
◮ Thue (1912) proved that t contains no overlaps; that is, t isoverlap-free.
◮ Using our technique, we can express the property of having anoverlap axaxa beginning at position N with |ax | = p, asfollows: a[N..N + p] = a[N + p..N + 2p].
◮ So the corresponding automatic predicate for t is
∃p ≥ 1,N ≥ 0 t[N..N + p] = t[N + p..N + 2p],
or, in other words,
∃p ≥ 1,N ≥ 0 ∀i , 0 ≤ i ≤ p t[N + i ] = t[N + p + i ].
◮ We programmed up our decision procedure and verified thatindeed t is overlap-free.
18 / 68
Critical exponent
◮ We can define more general repetitions as follows: a word x isan α-power for α ≥ 1 if we can write x = y ey ′ where e = ⌊α⌋and y ′ is a prefix of y and |x | = α|y |.
◮ For example, abracadabra is an 117 -power.
◮ The techniques above suffice to check if a k-automaticsequence has α-powers, using the following predicate:
∃N ≥ 0, p, q ≥ 1 a[N..N+p−q−1] = a[N+q..N+p−1] and p = αq.
◮ However, this observation alone does not suffice to computethe so-called critical exponent of a, which is the supremumover all rational α such that a has α-power factors.
◮ It turns out that the critical exponent is also computable forautomatic sequences....
19 / 68
Representing rational numbers
◮ Represent rational number α = p/q by pair of integers (p, q),represented in base k ; pad shorter with leading zeroes
◮ So representations of rationals are over the alphabet Σk × Σk
◮ For example, if w = [3, 0][5, 0][2, 4][6, 1] then[w ]10 = (3526, 41).
◮ Define quok(x) = [π1(x)]k/[π2(x)]k , where πi is theprojection onto the i ’th coordinate
◮ So quo10(w) = 3526/41 = 86.
◮ Canonical representations lack leading [0, 0]’s
◮ Every rational has infinitely many canonical representations,e.g., as (1, 2), (2, 4), (3, 6), . . ., etc.
20 / 68
Automatic sets over Q≥0
◮ quok(L) =⋃
x∈L{quok(x)}
◮ A ⊆ Q≥0 is a k-automatic set of rationals if A = quok(L) forsome regular language L ⊆ (Σk × Σk)
∗.
21 / 68
Examples
Example 1. Let k = 2, B = {[0, 0], [0, 1], [1, 0], [1, 1]}, andconsider
L1 := B∗{[0, 1], [1, 1]}B∗.
Then L1 consists of all pairs of integers where the secondcomponent has at least one nonzero digit — the point being toavoid division by 0. Then quok(L) = Q≥0, the set of allnon-negative rational numbers.
Example 2. Consider
L2 = {w ∈ (Σ2k)
∗ : π1(w) ∈ 0∗Ck and π2(w) ∈ 0∗1}.
Then quok(L2) = N.
22 / 68
Examples
Example 3. Let k = 3, and consider the language
L3 := [0, 1]{[0, 0], [2, 0]}∗.
Then quok(L3) is the 3-adic Cantor set, the set of all rationalnumbers in the “middle-thirds” Cantor set with denominators apower of 3.
Example 4. Let k = 2, and consider
L4 := [0, 1]{[0, 0], [0, 1]}∗{[1, 0], [1, 1]}.
Then the numerator encodes the integer 1, while the denominatorencodes all positive integers that start with 1. Hence
quok(L4) = {1
n: n ≥ 1}.
23 / 68
Examples
Example 5. Let k = 4, and consider
S := {0, 1, 3, 4, 5, 11, 12, 13, . . .}
of all non-negative integers that can be represented using only thedigits 0, 1,−1 in base 4. Consider the language
L5 = {(p, q)4 : p, q ∈ S}.
It is not hard to see that L5 is (Q, 4)-automatic.The main result in Loxton & van der Poorten [1987] can berephrased as follows: quo4(L5) contains every odd integer.In fact, an integer t is in quo4(L5) if and only if the exponent ofthe largest power of 2 dividing t is even.
24 / 68
Examples
Example 6. Consider
L6 = {w ∈ (Σ2k)
∗ : π2(w) ∈ 0∗1+0∗}.
An easy exercise using the Fermat-Euler theorem shows that thatquok(L6) = Q≥0.
25 / 68
Examples
Example 7. For a word x and letter a let |x |a denote the numberof occurrences of a in x . Consider the regular language
L7 = {w ∈ (Σ22) : |π1(w)|1 is even and |π2(w)|1 is odd}.
Then it follows from a result of Schmid [1984] that
quo2(L7) = Q≥0 − {2n : n ∈ Z}.
26 / 68
Basic decidability properties
Given a DFA M accepting a language L representing a set ofrationals S , can decide
◮ if S = ∅
◮ given α ∈ Q≥0, whether there exists x ∈ S with x = α (resp.,x < α, x ≤ α, x > α, x ≥ α, x 6= α, etc.)
◮ if |S | = ∞
◮ given a finite set F ⊆ Q≥0, if F ⊆ S or if S ⊆ F
◮ given α ∈ Q≥0, if α is an accumulation point of S
27 / 68
supA is rational or infinite
Given a DFA M accepting L ⊆ (Σk × Σk)∗ representing a set of
rationals A ⊆ Q≥0, what can we say about supA?
Theorem. supA is rational or infinite.
Proof ideas: quok(uviw) forms a monotonic sequence. Defining
γ(u, v) :=[π1(uv)]k − [π1(u)]k[π2(uv)]k − [π2(u)]k
one of the following three cases must hold:
(i) quok(uw) < quok(uvw) < quok(uv2w) < · · · < U ;
(ii) quok(uw) = quok(uvw) = quok(uv2w) = · · · = U ;
(iii) quok(uw) > quok(uvw) > quok(uv2w) > · · · > U .
Furthermore, limi→∞ quok(uviw) = U.
28 / 68
supA is rational or infinite
It follows that if supA is finite, and the DFA M has n states, thensupA = maxT , where
T = T1 ∪ T2
and
T1 = {quok(x) : |x | < n and x ∈ L};
T2 = {γ(u, v) : |uv | ≤ n, |v | ≥ 1, δ(q0, u) = δ(q0, uv),
and there exists w such that uvw ∈ L}.
29 / 68
supA is computable
We know that supA lies in the finite computable set T .
For each of t ∈ T , we can check to see if t ≥ supA by checking ifA ∩ (t,∞) is empty.
Then supA is the least such t.
30 / 68
Computing the critical exponent
- Previously known to be computable for fixed points of uniformmorphisms (Krieger)
Theorem. If w is a k-automatic sequence, then its criticalexponent is rational or infinite. Furthermore, it is computable fromthe DFAO M generating w .
Proof sketch. Given M, we can transform it into anotherautomaton M ′ accepting
{(m, n) : there exists i ≥ 0 such that w[i ..i+m−1] has period n}.
We then apply our algorithm for computing sup(quok(L)) toL(M ′).
31 / 68
The Leech word
Leech [1957] showed that the fixed point l of the morphism
0 → 0121021201210
1 → 1202102012021
2 → 2010210120102
is squarefree.
We used our method to compute the critical exponent of thisword. It is 15/8.
Furthermore, if x is a 15/8-power occurring in l, then |x | = 15 · 13i
for some i ≥ 0.
32 / 68
Applications 2: Diophantine exponent
The Diophantine exponent of an infinite word w is defined to bethe supremum of the real numbers β for which there existarbitrarily long prefixes of w that can be expressed as uv e for finitewords u, v and rationals e such that |uv e |/|uv | ≥ β.
(concept due to Adamczewski, Bugeaud)
Theorem. The Diophantine exponent of a k-automatic sequenceis either rational or infinite. Furthermore, it is computable.
33 / 68
Mirror invariance
We can express the property that a is mirror-invariant as follows:
∀N ≥ 0, ℓ ≥ 1 ∃N ′ ≥ 0 a[N..N + ℓ− 1] = a[N ′..N ′ + ℓ− 1]R ,
which is the same as
∀N ≥ 0, ℓ ≥ 1 ∃N ′ ≥ 0 ∀i , 0 ≤ i < ℓ a[N + i ] = a[N ′ + ℓ− i − 1],
which can be easily checked by our method.
34 / 68
Recurrence
◮ We can express the property that a is recurrent by saying thatfor each factor, and each integer M there exists a copy of thatfactor occurring at a position after M in a.
◮ This corresponds to the following predicate:
∀N,M ≥ 0, ℓ ≥ 1 ∃M ′ ≥ M a[N..N+ℓ−1] = a[M ′..M ′+ℓ−1].
◮ An easy argument shows that an infinite word a is recurrent ifand only if each finite factor occurs at least twice. This meansthat the following simpler predicate suffices:
∀N ≥ 0, ℓ ≥ 1 ∃M 6= N a[N..N + ℓ− 1] = a[M..M + ℓ− 1].
35 / 68
Uniform recurrence
◮ For uniform recurrence, we need to express the fact that twoconsecutive occurrences of each factor are separated by nomore than C positions.
◮ Since there are only finitely many factors of each length, wecan take C to be the maximum of the constantscorresponding to each factor of that length.
◮ Thus uniform recurrence corresponds to the followingpredicate:
∀ℓ ≥ 1 ∃C ≥ 1 ∀N ≥ 0 ∃M with N < M ≤ N + C
a[N..N + ℓ− 1] = a[M..M + ℓ− 1].
36 / 68
Sequences with grouped factors
Cassaigne (1997) said a sequence a(ai )i≥0 has grouped factors iffor all n ≥ 1, there exists some position m = m(n) such that
a[m..m + ρ(n) + n − 2]
contains all of the ρ(n) length-n factors of a, each factor occurringexactly once.
For example, the Fibonacci word has grouped factors.
37 / 68
Sequences with grouped factors
We can write a predicate for the property of having groupedfactors, as follows:
∀n ≥ 1 ∃m, s ≥ 0 ∀i ≥ 0
∃j , m ≤ j ≤ m + s a[i ..i + n − 1] = a[j ..j + n − 1] and
∀j ′, m ≤ j ′ ≤ m+s, j 6= j ′ we have a[i ..i+n−1] 6= a[j ′..j ′+n−1].
The first part of the predicate says that any length-n factorappears somewhere in the desired window, and the second saysthat it appears exactly once.
“...in what sense would a problem that required at least threealternating quantifiers to describe be natural?” – Homer andSelman, Computability and Complexity Theory, 2011.
38 / 68
Sequences with grouped factors
Open problem: is there an aperiodic automatic sequence withgrouped factors?
Weaker property: having grouped factors for infinitely many n.
We can use our method to check this. The results are:
◮ The Thue-Morse sequence has grouped factors exactly forn = 1 and n = 2j + 1, j ≥ 0
◮ The period-doubling sequence has grouped factors exactly forn = 2j , j ≥ 0, and when (n)2 starts with 11.
39 / 68
Orbit closure
◮ The shift orbit of a sequence a = a0a1a2 · · · is the set of allsequences under the shift, that is, the set
S = {aiai+1ai+2 · · · : i ≥ 0}.
◮ The orbit closure is the topological closure S under the usualtopology.
◮ In other words, a sequence b = b0b1b2 · · · is in S if and onlyif, for each j ≥ 0, the prefix b0 · · · bj is a factor of a.
◮ Most sequences in the orbit closure of a k-automatic sequenceare not automatic themselves.
◮ However, we can use our method to show that twodistinguished sequences, the lexicographically least andlexicographically greatest sequences in the orbit closure, areindeed k-automatic.
◮ We were able to verify, for example, a recent result of Curriethat the lexicographically least sequence in the orbit closure ofthe Rudin-Shapiro sequence r is 0r.
40 / 68
Unbordered factors
◮ A word is bordered if it can be expressed as uvu for words u, vwith u nonempty, and otherwise it is unbordered.
◮ Currie and Saari proved that t has an unbordered factor oflength n if n 6≡ 1 (mod 6).
◮ However, these are not the only lengths with an unborderedfactor; for example,
0011010010110100110010110100101
is an unbordered factor of length 31.◮ We can express the property that t has an unbordered factor
of length ℓ as follows:
∃N ≥ 0 ∀j , 1 ≤ j ≤ ℓ/2 t[N..N+ j−1] 6= t[N+ℓ− j ..N+ℓ−1].
◮ Using our technique, we were able to prove
TheoremThere is an unbordered factor of length ℓ in t if and only iff(ℓ)2 6∈ 1(01∗0)∗10∗1.
41 / 68
Enumeration
◮ What if we want to know the number of unbordered factorsof length n, not just whether they exist or not?
◮ In many cases we can count the number T (n) of length-nfactors of an automatic sequence having a particular propertyP .
◮ Here by “count” we mean, give an algorithm A to computeT (n) efficiently, that is, in time bounded by a polynomial inlog n.
◮ Although finding the algorithm A may not be particularlyefficient, once we have it, we can compute T (n) quickly.
42 / 68
Enumeration
The basic idea is to create an automaton A accepting words of theform (n, i)k , where each i corresponds to one of the distinctlength-n factors we are trying to count.
The easiest way to do this is to let i be the position of the firstoccurrence of this factor.
Now the number of different paths in A whose first componentspells out (n)k gives us the number of different i , and hence thenumber of distinct factors.
From A we can determine vectors v and x and matricesM0,M1, . . . ,Mk−1 such that if [ci−1ci−2 · · · c1c0]k = n, then
T (n) = v · · ·Mci−1Mci−2 · · ·Mc1Mc0xT .
43 / 68
Enumeration example: Subword complexity
◮ Subword complexity counts the number of distinct length-nfactors of a sequence.
◮ To count these factors in an automatic sequence, we create aDFA M accepting the language
{ (ℓ, i)k : a[i ..i + ℓ− 1] is the first occurrence
of the given factor}
= { (ℓ, i)k : ∀i ′ < i a[i ..i + ℓ− 1] 6= a[i ′..i ′ + ℓ− 1]}.
◮ the number of i corresponding to a given ℓ is just the numberof subwords of length ℓ.
44 / 68
Enumeration: subword complexity
From this, for example, we can recover the well-known result onthe subword complexity of the Thue-Morse sequence:
ρ(n) =
2n, if 0 ≤ n ≤ 2;
2n + 2t+2 − 2, if 3 · 2t ≤ n ≤ 2t+2 + 1;
4n − 2t − 4, if 2t + 1 ≤ n ≤ 3 · 2t−1;
.
45 / 68
Enumeration: unbordered factors
A little numerical experimention suggests the following:
Conjecture
Let f (n) denote the number of unbordered factors of length n in t,the Thue-Morse sequence. Then f is given by f (0) = 1, f (1) = 2,f (2) = 2, and the system of recurrences
f (4n) = 2f (2n), (n ≥ 2)
f (4n + 1) = f (2n + 1), (n ≥ 0)
f (8n + 2) = f (2n + 1) + f (4n + 3), (n ≥ 1)
f (8n + 3) = −f (2n + 1) + f (4n + 2) (n ≥ 2)
f (8n + 6) = −f (2n + 1) + f (4n + 2) + f (4n + 3) (n ≥ 2)
f (8n + 7) = 2f (2n + 1) + f (4n + 3) (n ≥ 3)
for n ≥ 0.
46 / 68
Proving the conjecture on number of unbordered factors
◮ Express “first occurrence of unbordered factor t[i ..i + n− 1] isat position i” as a predicate P(n, i)
◮ Translate the predicate to a DFA accepting the language{(n, i)2 : P(n, i)}
◮ Translate the DFA to vectors v , x and matrices M0,M1 suchthat if n = [cj−1 · · · c1c0]2 then
f (n) = vMcj−1 · · ·Mc1Mc0xT
◮ Assertions about f correspond to certain assertions about M,a matrix that could occur as a product of the M0 and M1
◮ For example, the identity f (8n + 2) = f (2n + 1) + f (4n + 3)
vMM0M1M0xT = vMM1x
T + vMM1M1xT , (2)
where M is the matrix product corresponding to the base-2expansion of n.
47 / 68
Proving the conjecture on number of unbordered factors
◮ We have now reduced the problem to a finite verification.
◮ To reduce the amount of work even further, we can work withvM instead of M.
◮ With our identities in hand, we can now easily prove resultssuch as
◮ f (n) ≤ n for n ≥ 4 and◮ f (n) = n infinitely often.
48 / 68
Enumeration
In a similar way, we can handle
◮ palindrome complexity (the number of distinct length-npalindromic factors)
◮ the number of words whose reversals are also factors;
◮ the number of squares of a given length;
◮ the number of unbordered factors
and so forth.
49 / 68
Are there some properties that are not expressible?
◮ A difficult candidate: abelian properties
◮ We say that a nonempty word x is an abelian square if it ofthe form ww ′ with |w | = |w ′| and w ′ a permutation of w .(An example in English is the word reappear.)
◮ Luke Schaeffer has recently shown that the predicate forabelian squarefreeness of the paperfolding sequence is indeedinexpressible in Th(N,+, 0, 1, <,Vk)
◮ Nevertheless, some abelian properties are decidable by othermeans (Currie & Rampersad)
50 / 68
Synchronized sequences
(Carpi) A sequence f : N → N is k-synchronized if its graph
{ (n, f (n))k : n ≥ 0}
is a regular language.
51 / 68
Synchronized sequences
Example. The function f (n) = n + 1 is k-synchronized. Forexample, for k = 2, the following automaton suffices:
[
10
]
[
01
]
[
00
]
,
[
11
]
[
01
]
[
11
]
q0
q2
q1
52 / 68
Why synchronized sequences?
◮ If f (n) is k-synchronized, thenwe can compute f (n) in O(log n) time
◮ If f (n) is k-synchronized, then f (n) = O(n)
53 / 68
Efficient computation of synchronized sequences
To compute f (n) in O(log n) time:
◮ On input n, construct the O(log n)-state machine M ′
accepting those words with first component of the form0∗(n)k and second component anything
◮ Intersect, using the familiar direct product construction, withthe DFA M accepting { (n, f (n))k : n ≥ 0}
◮ Resulting automaton accepts exactly one word◮ Find accepting path using depth-first search◮ Label of accepting path gives f (n) in base k
54 / 68
Growth bounds
Theorem. If f (n) is k-synchronized, then f (n) = O(n).
Proof.
◮ Suppose f is k-synchronized and accepted by a DFA with tstates.
◮ If f (n) 6= O(n), then there exists n such that f (n) > k tn.
◮ So the base-k representation of (n, f (n)) starts with at least tzeros in the first component, and a nonzero symbol in thesecond component.
◮ Apply the pumping lemma to z = (n, f (n))k◮ We see that “pumping” gives a new word in the language
with n unchanged, but f (n) increased.
◮ But f is a function, a contradiction.
55 / 68
Closure properties of synchronized sequences
The class of k-synchronized sequences is closed under
◮ sum
◮ N-linear combination
◮ f (n) → ⌊αf (n)⌋ for α rational
◮ term-wise maximum and minimum
◮ running maximum: g(n) = max0≤i<n f (i)
◮ discrete inverse: g(n) = min{i : f (i) ≥ n}
◮ composition
56 / 68
Many aspects of k-automatic sequences are k-synchronized
Example: the appearance function.
Ax(n) = length of shortest prefix of x containing all length-nfactors of x
= the smallest integer t such that every length-n factor of xoccurs at least once in x[0..t − 1].
= t such that every length-n factor of x occurs in x[0..t − 1]but the length-n factor ending at position t − 1 occurs exactlyonce in x[0..t − 1]
57 / 68
Appearance function predicate
L = {(n, t)k : ∀ i ≥ 0 ∃ j ≤ t − n
such that x[i ..i + n − 1] = x[j ..j + n − 1]
and ∀ j ′ < t − n
x[j ′..j ′ + n − 1] 6= x[t − n..t − 1]}.
Here t = Ax(n).
58 / 68
Other synchronized functions
◮ separator function: length of the shortest factor of xbeginning at position n that never appeared previously in x
(Carpi & Maggi, 2001)
◮ repetitivity index: the minimal distance between twoconsecutive occurrences of the same length-n factor in x
(Carpi & D’Alonzo, 2009)
◮ recurrence function: size of the smallest “window” alwaysguaranteed to contain all length-n factors in x (Charlier &Rampersad & S, 2011)
59 / 68
Subword complexity
ρx(n) = number of distinct length-n factors of x
◮ known to be k-regular
◮ known to be O(n) for k-automatic sequences
◮ suggests it could be k-synchronized
60 / 68
Novel occurrences
Call a length-n factor novel at position i if it occurs there but inno earlier location.
Here is a predicate for novel factors:
{(n, i)k : ∀j , 0 ≤ j < i x[i ..i + n − 1] 6= x[j ..j + n − 1]}
Theorem. In any sequence of linear complexity, the startingpositions of novel occurrences of factors are “clumped together” ina bounded number of contiguous blocks.
61 / 68
Novel factors for Thue-Morse
Consider the Thue-Morse sequence
t = t0t1t2 · · · = 0110100110010110 · · · ,
The gray squares in the rows below depict the evolution of novellength-n factors in the Thue-Morse sequence for 1 ≤ n ≤ 9.
1
242322212019181716
n = 1
011010 0
2
3
t[i ]
3
9
8
7
6
5
4
00110100
654210i 7
110010110
15141312111098
62 / 68
Bound on number of contiguous blocks
TheoremLet x be an infinite word. For n ≥ 1, the number of contiguousblocks of starting occurrences of novel factors in row n is at mostρx(n)− ρx(n − 1) + 1.
Proof.By induction on n. Base case easy.Assume true for n − 1. We prove for n.Every position marking the start of a novel occurrence is still novel.Further, in every block except the first, we get novel occurrencesat one position to the left of the beginning of the block.So if row n − 1 has t contiguous blocks, then we get t − 1 noveloccurrences at the beginning of each block, except the first.The remaining ρx(n)− ρx(n− 1)− (t − 1) novel occurrences couldbe, in the worst case, in their own individual contiguous blocks.Thus row n has at most t + ρx(n)− ρx(n − 1)− (t − 1)= ρx(n)− ρx(n − 1) + 1 contiguous blocks.
63 / 68
Bound for Thue-Morse
For Thue-Morse example, it is well-known thatρt(n)− ρt(n − 1) ≤ 4.
So the number of contiguous blocks of novel factors is at most 5.
This is achieved, for example, for n = 6.
64 / 68
Linear complexity
Corollary
If the sequence x has linear complexity (that is, ρx(n) = O(n)),then there is a constant C such that every row in the evolution ofnovel occurrences consists of at most C contiguous blocks.
Proof.By a deep result of Cassaigne (1996) we know that there exists aconstant C such that ρx(n)− ρx(n − 1) ≤ C − 1. Hence from ourresult, there are at most C contiguous blocks in any row.
65 / 68
Subword complexity of automatic sequences is
k-synchronized
TheoremLet x be a k-automatic sequence. Then its subword complexityfunction ρx(n) is k-synchronized.
Proof.Construct a DFA to accept {(n,m)k : n ≥ 0 and m = ρx(n)}.
There is a finite constant C ≥ 1 such that the number ofcontiguous blocks of novel factors is bounded by C .
Nondeterministically “guess” the endpoints of every block andthen verify that each factor of length n starting at the positionsinside blocks is a novel occurrence, while all other factors are not.
Finally, verify that m is the sum of the sizes of the blocks.
66 / 68
Unsynchronized sequences
Are other aspects of k-automatic sequences always k-synchronized?
No.
We say a word w is bordered if it has a nonempty prefix, otherthan w itself, that is also a suffix. Alternatively, w is bordered if itcan be written in the form w = tvt, where t is nonempty.Otherwise a word is unbordered.
TheoremThe characteristic sequence c = 0110100010 · · · is 2-automatic,but the function uc(n) counting the number of unbordered factorsis not 2-synchronized.
67 / 68
Open Problems
◮ Extend these ideas to morphic sequences (fixed points ofpossibly non-uniform morphisms, followed by a coding)
◮ Is sup{x/y : (x , y)k ∈ L} computable for context-freelanguages L?
◮ Given a regular language L ⊆ (Σk × Σk)∗ representing a a set
S ⊆ N× N of pairs of natural numbers, is it decidable if Scontains a pair (p, q) with p | q?
68 / 68