Decidability and Enumeration in Automatic Sequences

Decidability and Enumeration in Automatic

Sequences

Jeffrey ShallitSchool of Computer Science

University of WaterlooWaterloo, Ontario N2L 3G1

[email protected]

http://www.cs.uwaterloo.ca/~shallit

Joint work with Jean-Paul Allouche, Emilie Charlier, NaradRampersad, Dane Henshall, Luke Schaeffer, Eric Rowland, DanielGoc, and Hamoon Mousavi.

1 / 68

In Memory of My Grandparents

ZipporaLevintan

(1877–1964)MoixeXalit

(1871–1949)

2 / 68

What is an automatic sequence?

◮ An infinite sequence

a = a0a1a2 · · ·

over a finite alphabet of letters, generated by a finite-statemachine (automaton)

◮ The automaton, given n as input, computes an as follows:

◮ n is represented in some fixed integer base k ≥ 2◮ The automaton moves from state to state according to this

input◮ Each state has an output letter associated with it◮ The output on input n is the output associated with the last

state reached

3 / 68

The canonical example: the Thue-Morse automaton

0 01

1

0 1

This automaton generates the Thue-Morse sequence

t = (tn)n≥0 = 0110100110010110 · · · .

4 / 68

Why automatic sequences?

◮ A nontrivial class of self-similar sequences

◮ Many “naturally-occurring” sequences are automatic

◮ Halfway between periodic and chaotic

◮ Provide canonical examples for various kinds of avoidanceproblems

5 / 68

Historically interesting properties of t

1. t is not ultimately periodic.2. t contains no factor that is an overlap, that is, a word of the

form axaxa, where a is a single letter and x is an arbitraryfinite word. (Example in Russian: mamam.)

3. t contains infinitely many distinct square factors xx , but foreach such factor we have |x | = 2n or 3 · 2n, for n ≥ 0.Examples of squares in Russian include d�d� (“uncle”) andkuskus (“couscous”).

4. t has infinitely many distinct palindromic factors (Apalindrome is a word equal to its reverse, like dohod.)

5. The number p(n) of distinct palindromic factors of length n int is given by

p(n) =

0, if n odd and n ≥ 5;

1, if n = 0;

2, if 1 ≤ n ≤ 4, or n even and 3 · 4k + 2 ≤ n ≤ 4k+1;

4, if n even and 4k + 2 ≤ n ≤ 3 · 4k .6 / 68


6. t is mirror-invariant: if x is a finite factor of t, then so is itsreverse xR .

7. t is recurrent, that is, every factor that occurs, occursinfinitely often.

8. t is uniformly recurrent, that is, for all factors x occurring int, there is a constant c(x) such that two consecutiveoccurrences of x are separated by at most c(x) symbols.

9. t is linearly recurrent, that is, it is uniformly recurrent andfurthermore there is a constant C such that c(x) ≤ C |x | forall factors x . In fact, the optimal bound is given by c(1) = 3,c(2) = 8, and c(n) = 9 · 2e for n ≥ 3, wheree = ⌊log2(n − 2)⌋.

7 / 68


10. The lexicographically least sequence in the shift orbit closureof t is

t1 t2 t3 · · · = 0010110 · · · ,

which is also 2-automatic.

11. The subword complexity ρ(n) of t, which is the functioncounting the number of distinct factors of t, is given by

ρ(n) =

2n, if 0 ≤ n ≤ 2;

2n + 2t+2 − 2, if 3 · 2t ≤ n ≤ 2t+2 + 1;

4n − 2t − 4, if 2t + 1 ≤ n ≤ 3 · 2t−1;

.

12. t has an unbordered factor of length n if n 6≡ 1 (mod 6) (Hereby an unbordered word y we mean one with no expression inthe form y = uvu for words u, v with u nonempty.)

8 / 68

The claim

Claim. All of these properties can be verified (and in some case,even obtained) purely mechanically, by a machine computation.

To see how, we need to digress into...

9 / 68

Logic

◮ Let Th(N,+, 0, 1, <) denote the set of all true first-ordersentences in the logical theory of the natural numbers withaddition.

◮ Example: in this theory we can express the so-called “ChickenMcNuggets” theorem that 43 is the largest integer thatcannot be represented as a non-negative integer linearcombination of 6, 9, and 20, as follows:

(∀n > 43 ∃x , y , z ≥ 0 such that n = 6x + 9y + 20z) ∧

¬(∃x , y , z ≥ 0 such that 43 = 6x + 9y + 20z). (1)

Here, of course, “6x” is shorthand for the expression“x + x + x + x + x + x”, and similarly for 9y and 20z .

◮ Presburger (1929) proved that Th(N,+, 0, 1, <) is decidable:that is, there exists an algorithm that, given a sentence in thetheory, will decide its truth.

10 / 68

Decidability of Presburger arithmetic: proof sketch

◮ represent integers in a integer base k ≥ 2 using the alphabetΣk = {0, 1, . . . , k − 1}.

◮ represent n-tuples of integers as words over the alphabet Σnk ,

padding with leading zeroes, if necessary.

◮ For example, the pair (21, 7) can be represented in base 2 bythe word

[1, 0][0, 0][1, 1][0, 1][1, 1].

11 / 68

Decidability of Presburger arithmetic

◮ Then the relation x + y = z can be checked by a simple2-state automaton depicted below, where transitions notdepicted lead to a nonaccepting “dead state”.

{[a, b, c] : a+ b = c}

{[a, b, c] : a + b + 1 = c}

{[a, b, c] : a + b = c + k}

{[a, b, c] : a + b + 1 = c + k}

Figure: Checking addition in base k

12 / 68

Decidability of Presburger arithmetic: proof sketch

◮ Relations like x = y and x < y can be checked similarly.

◮ Given a formula with free variables x1, x2, . . . , xn, we constructan automaton accepting the base-k expansion of thosen-tuples (x1, . . . , xn) for which the proposition holds.

◮ If a formula is of the form ∃x1, x2, . . . xn p(x1, . . . , xn), thenwe use nondeterminism to “guess” the xi and check them.

◮ If the formula is of the form ∀p, we use the equivalence∀p ≡ ¬∃¬p; this may require using the subset construction toconvert an NFA to a DFA and then flipping the “finality” ofstates.

◮ Finally, the truth of a formula can be checked by using theusual depth-first search techniques to see if any final state isreachable from the start state.

13 / 68

Even more

◮ If we add the function Vk : N → N to our logical theory,where Vk(x) = kn, and kn is the largest power of k dividingx , it is still decidable by a similar automaton-based technique(cf. Bruyere, Villemaire, Hodgson, ...).

◮ By doing so, we gain the capability of deciding manyquestions about automatic sequences.

TheoremThere is an algorithm that, given a predicate phrased using onlythe universal and existential quantifiers, indexing into a givenautomatic sequence a, addition, subtraction, logical operations,and comparisons, will decide the truth of that proposition.

We call such a predicate an automatic predicate.

14 / 68

The bad news

◮ The worst-case running time of our algorithm is boundedabove by

22...2p(N)

,

where the number of 2’s in the exponent is equal to thenumber of quantifiers, p is a polynomial, and N is the numberof states needed to describe the underlying automaticsequence.

15 / 68

The good news

◮ That’s just the worst-case upper bound!

◮ An implementation often succeeds in verifying statements inthe theory

◮ It has been implemented by Dane Henshall and Daniel Goc,independently

16 / 68

Deciding periodicity

◮ An infinite word a is periodic if it is of the form xω = xxx · · ·for a finite nonempty word x (e.g., nananana · · · ).

◮ It is ultimately periodic if it is of the form yxω for a (possiblyempty) finite word y (e.g., banananana · · · ).

◮ Honkala (1986) proved that ultimate periodicity is decidablefor automatic sequences.

◮ Using our approach: it suffices to express ultimatelyperiodicity as an automatic predicate:

∃p ≥ 1,N ≥ 0 ∀i ≥ N a[i ] = a[i + p].

◮ When we run this on the Thue-Morse sequence, we discover(as expected) that t is not ultimately periodic.

◮ Can be implemented in polynomial time.

17 / 68

Repetitions

◮ Thue (1912) proved that t contains no overlaps; that is, t isoverlap-free.

◮ Using our technique, we can express the property of having anoverlap axaxa beginning at position N with |ax | = p, asfollows: a[N..N + p] = a[N + p..N + 2p].

◮ So the corresponding automatic predicate for t is

∃p ≥ 1,N ≥ 0 t[N..N + p] = t[N + p..N + 2p],

or, in other words,

∃p ≥ 1,N ≥ 0 ∀i , 0 ≤ i ≤ p t[N + i ] = t[N + p + i ].

◮ We programmed up our decision procedure and verified thatindeed t is overlap-free.

18 / 68

Critical exponent

◮ We can define more general repetitions as follows: a word x isan α-power for α ≥ 1 if we can write x = y ey ′ where e = ⌊α⌋and y ′ is a prefix of y and |x | = α|y |.

◮ For example, abracadabra is an 117 -power.

◮ The techniques above suffice to check if a k-automaticsequence has α-powers, using the following predicate:

∃N ≥ 0, p, q ≥ 1 a[N..N+p−q−1] = a[N+q..N+p−1] and p = αq.

◮ However, this observation alone does not suffice to computethe so-called critical exponent of a, which is the supremumover all rational α such that a has α-power factors.

◮ It turns out that the critical exponent is also computable forautomatic sequences....

19 / 68

Representing rational numbers

◮ Represent rational number α = p/q by pair of integers (p, q),represented in base k ; pad shorter with leading zeroes

◮ So representations of rationals are over the alphabet Σk × Σk

◮ For example, if w = [3, 0][5, 0][2, 4][6, 1] then[w ]10 = (3526, 41).

◮ Define quok(x) = [π1(x)]k/[π2(x)]k , where πi is theprojection onto the i ’th coordinate

◮ So quo10(w) = 3526/41 = 86.

◮ Canonical representations lack leading [0, 0]’s

◮ Every rational has infinitely many canonical representations,e.g., as (1, 2), (2, 4), (3, 6), . . ., etc.

20 / 68

Automatic sets over Q≥0

◮ quok(L) =⋃

x∈L{quok(x)}

◮ A ⊆ Q≥0 is a k-automatic set of rationals if A = quok(L) forsome regular language L ⊆ (Σk × Σk)

∗.

21 / 68

Examples

Example 1. Let k = 2, B = {[0, 0], [0, 1], [1, 0], [1, 1]}, andconsider

L1 := B∗{[0, 1], [1, 1]}B∗.

Then L1 consists of all pairs of integers where the secondcomponent has at least one nonzero digit — the point being toavoid division by 0. Then quok(L) = Q≥0, the set of allnon-negative rational numbers.

Example 2. Consider

L2 = {w ∈ (Σ2k)

∗ : π1(w) ∈ 0∗Ck and π2(w) ∈ 0∗1}.

Then quok(L2) = N.

22 / 68

Examples

Example 3. Let k = 3, and consider the language

L3 := [0, 1]{[0, 0], [2, 0]}∗.

Then quok(L3) is the 3-adic Cantor set, the set of all rationalnumbers in the “middle-thirds” Cantor set with denominators apower of 3.

Example 4. Let k = 2, and consider

L4 := [0, 1]{[0, 0], [0, 1]}∗{[1, 0], [1, 1]}.

Then the numerator encodes the integer 1, while the denominatorencodes all positive integers that start with 1. Hence

quok(L4) = {1

n: n ≥ 1}.

23 / 68

Examples

Example 5. Let k = 4, and consider

S := {0, 1, 3, 4, 5, 11, 12, 13, . . .}

of all non-negative integers that can be represented using only thedigits 0, 1,−1 in base 4. Consider the language

L5 = {(p, q)4 : p, q ∈ S}.

It is not hard to see that L5 is (Q, 4)-automatic.The main result in Loxton & van der Poorten [1987] can berephrased as follows: quo4(L5) contains every odd integer.In fact, an integer t is in quo4(L5) if and only if the exponent ofthe largest power of 2 dividing t is even.

24 / 68

Examples

Example 6. Consider

L6 = {w ∈ (Σ2k)

∗ : π2(w) ∈ 0∗1+0∗}.

An easy exercise using the Fermat-Euler theorem shows that thatquok(L6) = Q≥0.

25 / 68

Examples

Example 7. For a word x and letter a let |x |a denote the numberof occurrences of a in x . Consider the regular language

L7 = {w ∈ (Σ22) : |π1(w)|1 is even and |π2(w)|1 is odd}.

Then it follows from a result of Schmid [1984] that

quo2(L7) = Q≥0 − {2n : n ∈ Z}.

26 / 68

Basic decidability properties

Given a DFA M accepting a language L representing a set ofrationals S , can decide

◮ if S = ∅

◮ given α ∈ Q≥0, whether there exists x ∈ S with x = α (resp.,x < α, x ≤ α, x > α, x ≥ α, x 6= α, etc.)

◮ if |S | = ∞

◮ given a finite set F ⊆ Q≥0, if F ⊆ S or if S ⊆ F

◮ given α ∈ Q≥0, if α is an accumulation point of S

27 / 68

supA is rational or infinite

Given a DFA M accepting L ⊆ (Σk × Σk)∗ representing a set of

rationals A ⊆ Q≥0, what can we say about supA?

Theorem. supA is rational or infinite.

Proof ideas: quok(uviw) forms a monotonic sequence. Defining

γ(u, v) :=[π1(uv)]k − [π1(u)]k[π2(uv)]k − [π2(u)]k

one of the following three cases must hold:

(i) quok(uw) < quok(uvw) < quok(uv2w) < · · · < U ;

(ii) quok(uw) = quok(uvw) = quok(uv2w) = · · · = U ;

(iii) quok(uw) > quok(uvw) > quok(uv2w) > · · · > U .

Furthermore, limi→∞ quok(uviw) = U.

28 / 68

supA is rational or infinite

It follows that if supA is finite, and the DFA M has n states, thensupA = maxT , where

T = T1 ∪ T2

and

T1 = {quok(x) : |x | < n and x ∈ L};

T2 = {γ(u, v) : |uv | ≤ n, |v | ≥ 1, δ(q0, u) = δ(q0, uv),

and there exists w such that uvw ∈ L}.

29 / 68

supA is computable

We know that supA lies in the finite computable set T .

For each of t ∈ T , we can check to see if t ≥ supA by checking ifA ∩ (t,∞) is empty.

Then supA is the least such t.

30 / 68

Computing the critical exponent

- Previously known to be computable for fixed points of uniformmorphisms (Krieger)

Theorem. If w is a k-automatic sequence, then its criticalexponent is rational or infinite. Furthermore, it is computable fromthe DFAO M generating w .

Proof sketch. Given M, we can transform it into anotherautomaton M ′ accepting

{(m, n) : there exists i ≥ 0 such that w[i ..i+m−1] has period n}.

We then apply our algorithm for computing sup(quok(L)) toL(M ′).

31 / 68

The Leech word

Leech [1957] showed that the fixed point l of the morphism

0 → 0121021201210

1 → 1202102012021

2 → 2010210120102

is squarefree.

We used our method to compute the critical exponent of thisword. It is 15/8.

Furthermore, if x is a 15/8-power occurring in l, then |x | = 15 · 13i

for some i ≥ 0.

32 / 68

Applications 2: Diophantine exponent

The Diophantine exponent of an infinite word w is defined to bethe supremum of the real numbers β for which there existarbitrarily long prefixes of w that can be expressed as uv e for finitewords u, v and rationals e such that |uv e |/|uv | ≥ β.

(concept due to Adamczewski, Bugeaud)

Theorem. The Diophantine exponent of a k-automatic sequenceis either rational or infinite. Furthermore, it is computable.

33 / 68

Mirror invariance

We can express the property that a is mirror-invariant as follows:

∀N ≥ 0, ℓ ≥ 1 ∃N ′ ≥ 0 a[N..N + ℓ− 1] = a[N ′..N ′ + ℓ− 1]R ,

which is the same as

∀N ≥ 0, ℓ ≥ 1 ∃N ′ ≥ 0 ∀i , 0 ≤ i < ℓ a[N + i ] = a[N ′ + ℓ− i − 1],

which can be easily checked by our method.

34 / 68

Recurrence

◮ We can express the property that a is recurrent by saying thatfor each factor, and each integer M there exists a copy of thatfactor occurring at a position after M in a.

◮ This corresponds to the following predicate:

∀N,M ≥ 0, ℓ ≥ 1 ∃M ′ ≥ M a[N..N+ℓ−1] = a[M ′..M ′+ℓ−1].

◮ An easy argument shows that an infinite word a is recurrent ifand only if each finite factor occurs at least twice. This meansthat the following simpler predicate suffices:

∀N ≥ 0, ℓ ≥ 1 ∃M 6= N a[N..N + ℓ− 1] = a[M..M + ℓ− 1].

35 / 68

Uniform recurrence

◮ For uniform recurrence, we need to express the fact that twoconsecutive occurrences of each factor are separated by nomore than C positions.

◮ Since there are only finitely many factors of each length, wecan take C to be the maximum of the constantscorresponding to each factor of that length.

◮ Thus uniform recurrence corresponds to the followingpredicate:

∀ℓ ≥ 1 ∃C ≥ 1 ∀N ≥ 0 ∃M with N < M ≤ N + C

a[N..N + ℓ− 1] = a[M..M + ℓ− 1].

36 / 68

Sequences with grouped factors

Cassaigne (1997) said a sequence a(ai )i≥0 has grouped factors iffor all n ≥ 1, there exists some position m = m(n) such that

a[m..m + ρ(n) + n − 2]

contains all of the ρ(n) length-n factors of a, each factor occurringexactly once.

For example, the Fibonacci word has grouped factors.

37 / 68


We can write a predicate for the property of having groupedfactors, as follows:

∀n ≥ 1 ∃m, s ≥ 0 ∀i ≥ 0

∃j , m ≤ j ≤ m + s a[i ..i + n − 1] = a[j ..j + n − 1] and

∀j ′, m ≤ j ′ ≤ m+s, j 6= j ′ we have a[i ..i+n−1] 6= a[j ′..j ′+n−1].

The first part of the predicate says that any length-n factorappears somewhere in the desired window, and the second saysthat it appears exactly once.

“...in what sense would a problem that required at least threealternating quantifiers to describe be natural?” – Homer andSelman, Computability and Complexity Theory, 2011.

38 / 68


Open problem: is there an aperiodic automatic sequence withgrouped factors?

Weaker property: having grouped factors for infinitely many n.

We can use our method to check this. The results are:

◮ The Thue-Morse sequence has grouped factors exactly forn = 1 and n = 2j + 1, j ≥ 0

◮ The period-doubling sequence has grouped factors exactly forn = 2j , j ≥ 0, and when (n)2 starts with 11.

39 / 68

Orbit closure

◮ The shift orbit of a sequence a = a0a1a2 · · · is the set of allsequences under the shift, that is, the set

S = {aiai+1ai+2 · · · : i ≥ 0}.

◮ The orbit closure is the topological closure S under the usualtopology.

◮ In other words, a sequence b = b0b1b2 · · · is in S if and onlyif, for each j ≥ 0, the prefix b0 · · · bj is a factor of a.

◮ Most sequences in the orbit closure of a k-automatic sequenceare not automatic themselves.

◮ However, we can use our method to show that twodistinguished sequences, the lexicographically least andlexicographically greatest sequences in the orbit closure, areindeed k-automatic.

◮ We were able to verify, for example, a recent result of Curriethat the lexicographically least sequence in the orbit closure ofthe Rudin-Shapiro sequence r is 0r.

40 / 68

Unbordered factors

◮ A word is bordered if it can be expressed as uvu for words u, vwith u nonempty, and otherwise it is unbordered.

◮ Currie and Saari proved that t has an unbordered factor oflength n if n 6≡ 1 (mod 6).

◮ However, these are not the only lengths with an unborderedfactor; for example,

0011010010110100110010110100101

is an unbordered factor of length 31.◮ We can express the property that t has an unbordered factor

of length ℓ as follows:

∃N ≥ 0 ∀j , 1 ≤ j ≤ ℓ/2 t[N..N+ j−1] 6= t[N+ℓ− j ..N+ℓ−1].

◮ Using our technique, we were able to prove

TheoremThere is an unbordered factor of length ℓ in t if and only iff(ℓ)2 6∈ 1(01∗0)∗10∗1.

41 / 68

Enumeration

◮ What if we want to know the number of unbordered factorsof length n, not just whether they exist or not?

◮ In many cases we can count the number T (n) of length-nfactors of an automatic sequence having a particular propertyP .

◮ Here by “count” we mean, give an algorithm A to computeT (n) efficiently, that is, in time bounded by a polynomial inlog n.

◮ Although finding the algorithm A may not be particularlyefficient, once we have it, we can compute T (n) quickly.

42 / 68

Enumeration

The basic idea is to create an automaton A accepting words of theform (n, i)k , where each i corresponds to one of the distinctlength-n factors we are trying to count.

The easiest way to do this is to let i be the position of the firstoccurrence of this factor.

Now the number of different paths in A whose first componentspells out (n)k gives us the number of different i , and hence thenumber of distinct factors.

From A we can determine vectors v and x and matricesM0,M1, . . . ,Mk−1 such that if [ci−1ci−2 · · · c1c0]k = n, then

T (n) = v · · ·Mci−1Mci−2 · · ·Mc1Mc0xT .

43 / 68

Enumeration example: Subword complexity

◮ Subword complexity counts the number of distinct length-nfactors of a sequence.

◮ To count these factors in an automatic sequence, we create aDFA M accepting the language

{ (ℓ, i)k : a[i ..i + ℓ− 1] is the first occurrence

of the given factor}

= { (ℓ, i)k : ∀i ′ < i a[i ..i + ℓ− 1] 6= a[i ′..i ′ + ℓ− 1]}.

◮ the number of i corresponding to a given ℓ is just the numberof subwords of length ℓ.

44 / 68

Enumeration: subword complexity

From this, for example, we can recover the well-known result onthe subword complexity of the Thue-Morse sequence:

ρ(n) =

2n, if 0 ≤ n ≤ 2;

2n + 2t+2 − 2, if 3 · 2t ≤ n ≤ 2t+2 + 1;

4n − 2t − 4, if 2t + 1 ≤ n ≤ 3 · 2t−1;

.

45 / 68

Enumeration: unbordered factors

A little numerical experimention suggests the following:

Conjecture

Let f (n) denote the number of unbordered factors of length n in t,the Thue-Morse sequence. Then f is given by f (0) = 1, f (1) = 2,f (2) = 2, and the system of recurrences

f (4n) = 2f (2n), (n ≥ 2)

f (4n + 1) = f (2n + 1), (n ≥ 0)

f (8n + 2) = f (2n + 1) + f (4n + 3), (n ≥ 1)

f (8n + 3) = −f (2n + 1) + f (4n + 2) (n ≥ 2)

f (8n + 6) = −f (2n + 1) + f (4n + 2) + f (4n + 3) (n ≥ 2)

f (8n + 7) = 2f (2n + 1) + f (4n + 3) (n ≥ 3)

for n ≥ 0.

46 / 68

Proving the conjecture on number of unbordered factors

◮ Express “first occurrence of unbordered factor t[i ..i + n− 1] isat position i” as a predicate P(n, i)

◮ Translate the predicate to a DFA accepting the language{(n, i)2 : P(n, i)}

◮ Translate the DFA to vectors v , x and matrices M0,M1 suchthat if n = [cj−1 · · · c1c0]2 then

f (n) = vMcj−1 · · ·Mc1Mc0xT

◮ Assertions about f correspond to certain assertions about M,a matrix that could occur as a product of the M0 and M1

◮ For example, the identity f (8n + 2) = f (2n + 1) + f (4n + 3)

vMM0M1M0xT = vMM1x

T + vMM1M1xT , (2)

where M is the matrix product corresponding to the base-2expansion of n.

47 / 68

Proving the conjecture on number of unbordered factors

◮ We have now reduced the problem to a finite verification.

◮ To reduce the amount of work even further, we can work withvM instead of M.

◮ With our identities in hand, we can now easily prove resultssuch as

◮ f (n) ≤ n for n ≥ 4 and◮ f (n) = n infinitely often.

48 / 68

Enumeration

In a similar way, we can handle

◮ palindrome complexity (the number of distinct length-npalindromic factors)

◮ the number of words whose reversals are also factors;

◮ the number of squares of a given length;

◮ the number of unbordered factors

and so forth.

49 / 68

Are there some properties that are not expressible?

◮ A difficult candidate: abelian properties

◮ We say that a nonempty word x is an abelian square if it ofthe form ww ′ with |w | = |w ′| and w ′ a permutation of w .(An example in English is the word reappear.)

◮ Luke Schaeffer has recently shown that the predicate forabelian squarefreeness of the paperfolding sequence is indeedinexpressible in Th(N,+, 0, 1, <,Vk)

◮ Nevertheless, some abelian properties are decidable by othermeans (Currie & Rampersad)

50 / 68

Synchronized sequences

(Carpi) A sequence f : N → N is k-synchronized if its graph

{ (n, f (n))k : n ≥ 0}

is a regular language.

51 / 68

Synchronized sequences

Example. The function f (n) = n + 1 is k-synchronized. Forexample, for k = 2, the following automaton suffices:

[

10

]

[

01

]

[

00

]

,

[

11

]

[

01

]

[

11

]

q0

q2

q1

52 / 68

Why synchronized sequences?

◮ If f (n) is k-synchronized, thenwe can compute f (n) in O(log n) time

◮ If f (n) is k-synchronized, then f (n) = O(n)

53 / 68

Efficient computation of synchronized sequences

To compute f (n) in O(log n) time:

◮ On input n, construct the O(log n)-state machine M ′

accepting those words with first component of the form0∗(n)k and second component anything

◮ Intersect, using the familiar direct product construction, withthe DFA M accepting { (n, f (n))k : n ≥ 0}

◮ Resulting automaton accepts exactly one word◮ Find accepting path using depth-first search◮ Label of accepting path gives f (n) in base k

54 / 68

Growth bounds

Theorem. If f (n) is k-synchronized, then f (n) = O(n).

Proof.

◮ Suppose f is k-synchronized and accepted by a DFA with tstates.

◮ If f (n) 6= O(n), then there exists n such that f (n) > k tn.

◮ So the base-k representation of (n, f (n)) starts with at least tzeros in the first component, and a nonzero symbol in thesecond component.

◮ Apply the pumping lemma to z = (n, f (n))k◮ We see that “pumping” gives a new word in the language

with n unchanged, but f (n) increased.

◮ But f is a function, a contradiction.

55 / 68

Closure properties of synchronized sequences

The class of k-synchronized sequences is closed under

◮ sum

◮ N-linear combination

◮ f (n) → ⌊αf (n)⌋ for α rational

◮ term-wise maximum and minimum

◮ running maximum: g(n) = max0≤i<n f (i)

◮ discrete inverse: g(n) = min{i : f (i) ≥ n}

◮ composition

56 / 68

Many aspects of k-automatic sequences are k-synchronized

Example: the appearance function.

Ax(n) = length of shortest prefix of x containing all length-nfactors of x

= the smallest integer t such that every length-n factor of xoccurs at least once in x[0..t − 1].

= t such that every length-n factor of x occurs in x[0..t − 1]but the length-n factor ending at position t − 1 occurs exactlyonce in x[0..t − 1]

57 / 68

Appearance function predicate

L = {(n, t)k : ∀ i ≥ 0 ∃ j ≤ t − n

such that x[i ..i + n − 1] = x[j ..j + n − 1]

and ∀ j ′ < t − n

x[j ′..j ′ + n − 1] 6= x[t − n..t − 1]}.

Here t = Ax(n).

58 / 68

Other synchronized functions

◮ separator function: length of the shortest factor of xbeginning at position n that never appeared previously in x

(Carpi & Maggi, 2001)

◮ repetitivity index: the minimal distance between twoconsecutive occurrences of the same length-n factor in x

(Carpi & D’Alonzo, 2009)

◮ recurrence function: size of the smallest “window” alwaysguaranteed to contain all length-n factors in x (Charlier &Rampersad & S, 2011)

59 / 68

Subword complexity

ρx(n) = number of distinct length-n factors of x

◮ known to be k-regular

◮ known to be O(n) for k-automatic sequences

◮ suggests it could be k-synchronized

60 / 68

Novel occurrences

Call a length-n factor novel at position i if it occurs there but inno earlier location.

Here is a predicate for novel factors:

{(n, i)k : ∀j , 0 ≤ j < i x[i ..i + n − 1] 6= x[j ..j + n − 1]}

Theorem. In any sequence of linear complexity, the startingpositions of novel occurrences of factors are “clumped together” ina bounded number of contiguous blocks.

61 / 68

Novel factors for Thue-Morse

Consider the Thue-Morse sequence

t = t0t1t2 · · · = 0110100110010110 · · · ,

The gray squares in the rows below depict the evolution of novellength-n factors in the Thue-Morse sequence for 1 ≤ n ≤ 9.

1

242322212019181716

n = 1

011010 0

2

3

t[i ]

3

9

8

7

6

5

4

00110100

654210i 7

110010110

15141312111098

62 / 68

Bound on number of contiguous blocks

TheoremLet x be an infinite word. For n ≥ 1, the number of contiguousblocks of starting occurrences of novel factors in row n is at mostρx(n)− ρx(n − 1) + 1.

Proof.By induction on n. Base case easy.Assume true for n − 1. We prove for n.Every position marking the start of a novel occurrence is still novel.Further, in every block except the first, we get novel occurrencesat one position to the left of the beginning of the block.So if row n − 1 has t contiguous blocks, then we get t − 1 noveloccurrences at the beginning of each block, except the first.The remaining ρx(n)− ρx(n− 1)− (t − 1) novel occurrences couldbe, in the worst case, in their own individual contiguous blocks.Thus row n has at most t + ρx(n)− ρx(n − 1)− (t − 1)= ρx(n)− ρx(n − 1) + 1 contiguous blocks.

63 / 68

Bound for Thue-Morse

For Thue-Morse example, it is well-known thatρt(n)− ρt(n − 1) ≤ 4.

So the number of contiguous blocks of novel factors is at most 5.

This is achieved, for example, for n = 6.

64 / 68

Linear complexity

Corollary

If the sequence x has linear complexity (that is, ρx(n) = O(n)),then there is a constant C such that every row in the evolution ofnovel occurrences consists of at most C contiguous blocks.

Proof.By a deep result of Cassaigne (1996) we know that there exists aconstant C such that ρx(n)− ρx(n − 1) ≤ C − 1. Hence from ourresult, there are at most C contiguous blocks in any row.

65 / 68

Subword complexity of automatic sequences is

k-synchronized

TheoremLet x be a k-automatic sequence. Then its subword complexityfunction ρx(n) is k-synchronized.

Proof.Construct a DFA to accept {(n,m)k : n ≥ 0 and m = ρx(n)}.

There is a finite constant C ≥ 1 such that the number ofcontiguous blocks of novel factors is bounded by C .

Nondeterministically “guess” the endpoints of every block andthen verify that each factor of length n starting at the positionsinside blocks is a novel occurrence, while all other factors are not.

Finally, verify that m is the sum of the sizes of the blocks.

66 / 68

Unsynchronized sequences

Are other aspects of k-automatic sequences always k-synchronized?

No.

We say a word w is bordered if it has a nonempty prefix, otherthan w itself, that is also a suffix. Alternatively, w is bordered if itcan be written in the form w = tvt, where t is nonempty.Otherwise a word is unbordered.

TheoremThe characteristic sequence c = 0110100010 · · · is 2-automatic,but the function uc(n) counting the number of unbordered factorsis not 2-synchronized.

67 / 68

Open Problems

◮ Extend these ideas to morphic sequences (fixed points ofpossibly non-uniform morphisms, followed by a coding)

◮ Is sup{x/y : (x , y)k ∈ L} computable for context-freelanguages L?

◮ Given a regular language L ⊆ (Σk × Σk)∗ representing a a set

S ⊆ N× N of pairs of natural numbers, is it decidable if Scontains a pair (p, q) with p | q?

68 / 68

Documents

Decidability and Enumeration in Automatic Sequences