Frédéric Wang Subscribe   About   Mathematics   Computer Science   Miscellaneous   Archive

S[α] for strings of ordinals

Update 2020/03/10: Added complexity analysis for S.substr(α, N)

In a previous blog post, I defined for each ordinal α\alpha a string SαS_\alpha (made of the characters for the empty set, comma, opening brace and closing brace) that enumerates the element of α\alpha. I gave a simple formulas to calculate the length LαL_\alpha of this string.

My colleague Ioanna was a bit disappointed that I didn’t provide a script for calculating the infinite SαS_\alpha strings. Obviously, the complexity would be “Ω(ω)\Omega(\omega)” but it is still possible to evaluate the string at a given position: Given β<Lα\beta < L_\alpha, what is the character of SαS_\alpha at position β\beta?

Since the initial segments of the strings are compatible, another way to express this is by introducing the class S=αOrdSαS = \bigcup_{\alpha \in {\mathrm{Ord}}} S_\alpha corresponding to a giant string enumerating the class Ord\mathrm{Ord} of all ordinals. Given an ordinal α\alpha, what is S[α]S[\alpha]?

A small generalization of this S.charAt(α) operation is S.substr(α, N) calculating the substring of length NN starting at α\alpha.

Example

  • S2S_{2} is "∅,{∅}", so S[0]S[0] is the character ∅, S[1]S[1] is a comma, S[2]S[2] is an opening brace and S[4]S[4] is a closing brace.
  • Sω+1S_{\omega+1} is made of of an ω\omega-concatenation of finite strings (the character ∅, a comma, S1S_1 surrounded by braces, a comma, S2S_2 surrounded by braces, a comma, S4S_4 surrounded by braces, a comma, etc), followed by a comma, an opening brace, the same ω\omega-concatenation of finite strings and finally a closing brace. So S[ω]S[\omega] is a comma, S[ω+1]S[\omega+1] is an opening brace, S[ω+2]S[\omega+2] is the character "∅" and S[ω2]S[\omega2] is a closing brace.

In the previous example, we have basically analyzed the string Sα+1S_{\alpha+1} at a given successor ordinal, splitting it into two copies of SαS_\alpha, comma and braces. This suggests some easy values of SS:

Lemma

For any n<ωn < \omega and β1\beta \geq 1, S[α]S[\alpha] is:

  • A comma if α\alpha can be written 2n32^n - 3 or ωβ2n+n\omega^\beta 2^n + n
  • An opening brace if α\alpha can be written 2n22^n - 2 or ωβ2n+n+1\omega^\beta 2^n + n + 1
  • A closing brace if α\alpha can be written 2n+142^{n+1} - 4 or ωβ2n+1+n\omega^\beta 2^{n+1} + n

Proof: For any ordinal α\alpha, by viewing the string Sα+1S_{\alpha+1} as a concatenation of SαS_\alpha, a comma, an opening fence, SαS_\alpha and a closing fence, we deduce that:

  • S[Lα]{S\left[L_{\alpha}\right]} is a comma.
  • S[Lα+1]{S\left[L_{\alpha} + 1 \right]} is an opening brace.
  • Lα+1L_{\alpha+1} is a successor ordinal and S[Lα+11]{S\left[L_{\alpha+1} - 1 \right]} is a closing brace.

The lemma follows immediately from the calculation of string lengths performed in the previous blog post. □

Warning: The rest of the blog post gives the solution to this puzzle, so you might want to have fun solving it yourself first and then go back checking my proposed solution later 😉...

More generally, the proof of the lemma can be extended by saying that if we find nn such that 2n1α2n+15{2^n - 1} \leq \alpha \leq {2^{n+1} - 5} then δ=α(2n1)2n4\delta = {\alpha - {(2^n - 1)}} \leq {2^n - 4} is the index of S[α]{S[\alpha]} in the second substring SαS_\alpha of Sα+1S_{\alpha+1} and so S[α]=S[δ]{S[\alpha]} = {S[\delta]}.

Details will be provided in the theorem below but one can already write a simple JavaScript recursive program to evalute SS at finite ordinals:

Script for α<ω\alpha < \omega

The character of SS at position α=\alpha =

is

The following intermediary step will be helpful to evaluate SS at infinite ordinal α\alpha:

Proposition

Let α\alpha is infinite and β=logω(α)1\beta = \log_{\omega}(\alpha) \geq 1. Let 1q<ω1 \leq q < \omega and 0ρ<ωβ0 \leq \rho < \omega^\beta be the quotient and remainder of the euclidean division of α\alpha by ωβ\omega^\beta. Let’s define:

n={log2(q) if q is not a power of 2 or this value is ρlog2(q)1 otherwise.n = \begin{cases} \left\lfloor {\log_2(q)} \right\rfloor & \text{ if } q \text{ is not a power of 2 or this value is } \leq \rho \\ \left\lfloor {\log_2(q)} \right\rfloor - 1 & \text{ otherwise.} \end{cases}

Then S[α]S[\alpha] is:

  • A comma if α=ωβ2n+n\alpha = {\omega^{\beta} 2^n + n}
  • An opening brace if α=ωβ2n+n+1\alpha = {\omega^{\beta} 2^n + n + 1}
  • An closing brace if α=ωβ2n+1+n\alpha = {\omega^{\beta} 2^{n+1} + n}
  • S[δ]S[\delta] otherwise where δ\delta is the unique ordinal such that α=ωβ2n+n+2+δ\alpha = {\omega^\beta 2^n + n + 2 + \delta} and δ<ωβ2n+n\delta < {\omega^{\beta} 2^n + n}.

Proof: First by construction we have α=ωβq+ρ\alpha = { {\omega^\beta q } + \rho}.

If the first case of the definition of nn, 2nq<2n+12^n \leq q < 2^{n+1} so we always have α<ωβ(q+1)ωβ2n+1Lωβ+n+1\alpha < {\omega^\beta {(q+1)}} \leq \omega^\beta 2^{n+1} \leq L_{\omega{\beta} + n + 1}. If additionnaly qq is not a power then 2n<q2^n < q and so αωβqωβ(2n+1)Lωβ+n\alpha \geq {\omega^{\beta}q} \geq {\omega^{\beta}{(2^n+1)}} \geq L_{\omega{\beta} + n}. Otherwise q=2nq = 2^n and nρn \leq \rho and so again αLωβ+n\alpha \geq L_{\omega{\beta} + n}.

In the second case of the definition of nn, log2(q)ρ+1\left\lfloor {\log_2(q)} \right\rfloor \geq \rho + 1 so nρ0n \geq \rho \geq 0. qq is a power of 2 and more precisely q=2n+1>2nq = 2^{n+1} > 2^n so we deduce the same way as in the previous case that αLωβ+n\alpha \geq L_{\omega{\beta} + n}. Moreover, ρ<n+1\rho < n + 1 so again α=ωβ2n+1+ρ<Lωβ+n+1\alpha = {\omega^\beta 2^{n+1} + \rho} < L_{\omega{\beta} + n + 1}.

We can thus view Sωβ+n+1S_{\omega \beta + n + 1} as a concatenation of Sωβ+nS_{\omega \beta+n}, a comma, an opening brace, Sωβ+nS_{\omega \beta+n} and a closing brace. We assume that ωβ2n+n+2α<ωβ2n+1+n { {\omega^{\beta} {2^n}} + n + 2} \leq \alpha < {\omega^{\beta} 2^{n+1} + n} as the three other cases are handled by the lemma. Then δ\delta is well-defined and is actually the index of S[α]S[\alpha] in the second copy of Sωβ+nS_{\omega \beta + n} so S[α]=S[δ]{S[\alpha]} = S{[\delta]}. □

We are now ready to give a nice way to evaluate SS at any ordinal α\alpha:

Theorem

S[α]S[\alpha] can be calculated inductively as follows:

  • If α=0\alpha = 0, S[α]S[\alpha] is the character "∅".
  • If 0<αω0 < \alpha \leq \omega then 2log2(α+3)3α2log2(α+3)+14 \right\rfloor}} - 3} \leq \alpha \leq \right\rfloor + 1}} - 4} and S[α]S[\alpha] is equal to:
    • A comma if α=2log2(α+3)3\alpha = {2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor} - 3}
    • An opening brace if α=2log2(α+3)2\alpha = {2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor} - 2}
    • A closing brace if α=2log2(α+3)+14\alpha = {2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor + 1}} - 4
    • S[δ]S[\delta] otherwise where δ=α(2log2(α+3)1)2log2(α+3)4α3\delta = {\alpha - \left( 2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor} - 1 \right)} \leq {2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor} - 4} \leq \alpha - 3.
      Moreover δ\delta compares against α\alpha as follows: δα12+12log2(α+3)+134\frac{\delta}{\alpha} \leq \frac{1}{2} + \frac{1}{2^{\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor + 1}} \leq \frac{3}{4} and log2(δ+3)<log2(α+3) {\left\lfloor {\log_2{(\delta+3)}} \right\rfloor} < {\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor}
  • If α\alpha is infinite, let 0<c1<ω0 < c_1 < \omega be the coefficient in its Cantor normal form corresponding to the smallest infinite term and c0<ωc_0 < \omega be finite term if there is one, or zero otherwise. Let kk be the exponent corresponding to the smallest nonzero term in the binary decomposition of c1c_1. Then S[α]S[\alpha] is equal to:
    • A closing brace if c0k1c_0 \leq k - 1.
    • A comma if c0=kc_0 = k.
    • An opening brace if c0=k+1c_0 = k + 1.
    • S[c0(k+2)]S[{c_0 - {(k+2)}}] if c0k+2c_0 \geq k + 2.

Proof:

The case α=0\alpha = 0 is clear. Suppose 0<α<ω0 < \alpha < \omega and let l=log2(α+3)l = {\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor}. By definition 2lα+3<2l+12^l \leq \alpha + 3 < 2^{l+1} so 2l3α2l+142^l - 3 \leq \alpha \leq 2^{l+1} - 4. Let’s consider the case 2l1α2l+152^l - 1 \leq \alpha \leq 2^{l+1} - 5 as the three other cases are already known from the Lemma. Sl+1S_{l+1} is made of the concatenation of SlS_l, a comma and SlS_l surrounded by braces. δ\delta is actually the index of S[α]S[\alpha] in the second copy of SlS_l so S[α]=S[δ]{S[\alpha]} = S{[\delta]}. The first equality is straighforward:

δ=α(2l1)2l+15(2l1)=2l4=2l13α3\delta = {\alpha - {(2^l -1)}} \leq {2^{l+1} - 5 - {(2^l -1)}} = {2^l - 4} = {2^l - 1 - 3}\leq \alpha - 3

Morever we have:

δα12l12l+1+512l12l+112+12l+1 \frac{\delta}{\alpha} \leq {1 - \frac{2^l - 1}{2^{l+1}+5}} \leq {1 - \frac{2^l - 1}{2^{l+1}}} \leq {\frac{1}{2} + \frac{1}{2^{l+1}}}

and so the second inequality follows from the fact that l1l \geq 1. Finally, the third inequality comes from:

2log2(δ+3)δ+32l4+3<2l 2^{\left\lfloor {\log_2{(\delta+3)}} \right\rfloor} \leq {\delta + 3} \leq {2^l - 4 + 3} < 2^l

Let’s consider the case of an infinite α\alpha. For some N1N \geq 1, βN>βN1>>β11\beta_N > \beta_{N-1} > \dots > \beta_1 \geq 1 and c0<ωc_0 < \omega and 0<c1,c2,cN<ω0 < c_1, c_2, c_N < \omega, we can write Cantor’s Normal form as:

α=ωβNcN+ωβN1cN1++ωβ1c1+c0\alpha = { \omega^{\beta_N} c_N} + { \omega^{\beta_{N-1}} c_{N-1} } + \dots + { \omega^{\beta_{1}} c_{1} } + c_0

With the notation of the proposition, we have β=βN\beta = \beta_N, q=cNq = c_N and

ρ=ωβN1cN1++ωβ1c1+c0 \rho = { \omega^{\beta_{N-1}} c_{N-1} } + \dots + { \omega^{\beta_{1}} c_{1} } + c_0

If N2N \geq 2, then ρ\rho is infinite and so n=log2(q)n = \left\lfloor {\log_2(q)} \right\rfloor and we are in the fouth bullet of the proposition. Moreover q2nq \geq 2^n and we can write:

α=ωβ2n+ωβ(q2n)+ρ=ωβ2n+n+2+δ\alpha = { \omega^{\beta} 2^n} + {\omega^{\beta} \left(q-2^n\right)} + \rho = { \omega^{\beta} 2^n} + n + 2 + \delta

where δ=ωβ(q2n)+ρ\delta = {\omega^{\beta} \left(q-2^n\right)} + \rho is infinite and so cancels out the n+2n + 2 term. It follows that S[α]=S[ωβ(q2n)+ρ]S{[\alpha]} = S\left[ {\omega^{\beta} \left(q-2^n\right)} + \rho \right]. Essentially, we have just removed from cNc_{N} its term of highest exponent in its binary decomposition!

By repeated application of the theorem, we can remove each binary digit of the cic_i for ii going from NN to 22. When then arrive at i=1i = 1:

S[α]=S[ωβ1c1+c0]{S[\alpha]} = S\left[ \omega^{\beta_1} c_1 + c_0\right]

With the notation of the proposition, we now have β=β1\beta = \beta_1, q=c1q = c_1 and ρ=c0\rho = c_0. If the binary decomposition of c1c_1 has more than one nonzero digit then so qq is not a power of 2. So although ρ\rho is now finite, we are still in the first case of the proposition and δ\delta remains infinite. So we can remove all but the last digit of c1c_1 by repeated application of the proposition:

S[α]=S[ωβ12k+c0]{S[\alpha]} = S\left[ \omega^{\beta_1} 2^k + c_0\right]

where kk is the exponent corresponding to the smallest nonzero term in the binary decomposition of c1c_1.

Using the lemma, S[α]S[\alpha] is a comma if k=c0k = c_0, an opening brace if k=c01k = c_0 - 1 and a closing brace if k=c0+1k = c_0 + 1.

If kc02k \leq c_0 - 2 then k=log2(2k)c0k = \left\lfloor {\log_2(2^k)} \right\rfloor \leq c_0 and writing

ωβ12k+c0=ωβ12k+k+2+(c0(k+2)){\omega^{\beta_1} 2^k + c_0} = \omega^{\beta_1} 2^k + k + 2 + \left(c_0 - {(k + 2)}\right)

we deduce from the proposition that S[α]=S[c0(k+2)]{S[\alpha]} = {S[{c_0 - {(k+2)}}]}.

Finally, if kc0+2k \geq c_0 + 2 then k=log2(2k)>c0k = \left\lfloor {\log_2(2^k)} \right\rfloor > c_0 and writing

ωβ12k+c0=ωβ12k1+k1+2+ωβ12k1+c0{\omega^{\beta_1} 2^k + c_0} = {\omega^{\beta_1} 2^{k-1}} + k - 1 + 2 + {\omega^{\beta_1} 2^{k-1}} + c_0

we deduce from the proposition that

S[α]=S[ωβ12k1+c0]{S[\alpha]} = S\left[ \omega^{\beta_1} 2^{k-1} + c_0\right]

We have essentially decremented kk and we can repeat this until we reach the case k=c0+1k = c_0 + 1 for which we already said that the character is a closing brace. □

As an application of this theorem, here is a few simple exercises:

Exercise 1

  1. S[272+2]S\left[2^72 + 2\right] is an empty set.
  2. S[ω72]S\left[\omega^72\right] is a comma.
  3. S[ω7272+72]S\left[\omega^72 72 + 72\right] is a closing brace.
  4. S[ω7272+ω4242+12]S\left[\omega^72 72 + \omega^42 42 + 12\right] is an opening brace.

Exercise 2

The evaluation of SS at

Corollary 1: Time complexity

Let α\alpha be an ordinal. Let c1<ωc_1 < \omega be the coefficient in its Cantor normal form corresponding to the smallest infinite term if there is one, or zero otherwise. Let c0<ωc_0 < \omega be its finite term if there is one, or zero otherwise. Then S[α]S[\alpha] can be evaluated in:

  • O(log2(c0+2)2+log2(c1+2))O\left( \log_2{(c_0+2)}^2 + \log_2{(c_1+2)} \right) elementary arithmetic operations and comparisons on integers.
  • O(log2(c0+2))O\left( \log_2{(c_0+2)}\right) elementary operations if integers are represented in binary and leading/trailing zero counting and bit shifts are elementary operations.

Proof: First note that the “+ 2” is just to workaround for the edge cases c0=0c_0 = 0 or c1=0c_1 = 0.

For the infinite case c1>0c_1 > 0 we need to calculate kk that is performing the find first set. A naive implementation can be done in O(log2(c1+2))O(\log_2{(c_1+2)}) steps by browsing the digits of c1c_1 to find the first nonzero for example by calculating the remainder modulo increasing power of 2. Each iteration requires only O(1)O(1) elementary integer operations %, * and comparisons. Then returning the result of moving to the finite case requires O(1)O(1) integer operations +, − and comparisons.

For the finite case c1=0c_1 = 0 and α=c0\alpha = c_0, first notice that we only require O(log2(c0+2))O\left( \log_2{(c_0+2)}\right) recursive calls given the inequality:

log2(δ+3)<log2(α+3) {\left\lfloor {\log_2{(\delta+3)}} \right\rfloor} < {\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor}

The case α=0\alpha = 0 only requires one comparison. For the case α>0\alpha > 0, we need to calculate l=log2(α+3)l = {\left\lfloor {\log_2{(\alpha+3)}} \right\rfloor} which is integer rounding of the binary logarithm or even just 2l2^l. As above, we can provide a naive implementation by calculating the quotient modulo increasing power of 2 in O(l)O(l) comparisons and elementary integer operations /, *. Then returning the result of moving to a δ<α\delta < \alpha requires only O(1)O(1) integer operations and comparisons. In total, complexity is O(log2(c0+2)2)O\left( \log_2{(c_0+2)}^2\right).

Finally, this can be simplified if one calculates kk and ll by a simple leading/trailing zero counting (or similar) and 2l2^l by a bit shift. □

Script based on Cantor normal form

If the coefficients of Cantor normal form of α\alpha corresponding the smallest infinite term and finite term are
c1c_1 =
c0c_0 =
Then the character of SS at position α\alpha is

Once we have an algorithm for S.charAt(α), it is easy to get an algorithm of NN times that complexity for S.substr(α, N) calculating the substring of length NN starting at position α\alpha by repeated calls to S.charAt(α). Let’s analyze a bit more carefully how we can make this recursive and more efficient:

Corollary 2: Algorithm for S.substr(α, N)

Let α\alpha be an ordinal and N<ωN < \omega. Then S.substr(α, N) can be calculated as follows:

  • If N=0N = 0 then it is the empty string.
  • If α=0\alpha = 0 and N=1N = 1 then it is the character "∅".
  • If 0<α<ω0 < \alpha < \omega, let l=log2(α+N+2)l = \left\lfloor {\log_2{(\alpha+N+2)}} \right\rfloor. We have 2l3α+N12l+142^l-3 \leq {\alpha + N - 1} \leq 2^{l+1} - 4 and the result is obtained by concatenating the following strings:
    1. If α2l4\alpha \leq 2^l - 4, the substring at offset α\alpha and length 2l3α2^l - 3 - \alpha.
    2. If α2l3\alpha \leq {2^l - 3}, a comma.
    3. If α2l2α+N1\alpha \leq {2^l - 2} \leq {\alpha + N - 1}, an opening brace.
    4. If α+N12l1{\alpha + N - 1} \geq {2^l - 1}, the substring at offset δ=max(0,α(2l1))\delta = \mathrm{max}{(0, \alpha - {(2^l - 1)})} and length 1+min(α+N1,2l+15)max(α,(2l1))1 + \mathrm{min}{({\alpha + N - 1}, {2^{l+1} - 5})} - \mathrm{max}{(\alpha, {(2^l - 1)})}.
    5. If α+N1=2l+14{\alpha + N - 1} = {2^{l+1} - 4}, a closing brace.
  • If α\alpha is infinite, let 0<c1<ω0 < c_1 < \omega be the coefficient in its Cantor normal form corresponding to the smallest infinite term and c0<ωc_0 < \omega be finite term if there is one, or zero otherwise. Let kk be the exponent corresponding to the smallest nonzero term in the binary decomposition of c1c_1. Then the result is obtained by a concatenating the following strings:
    1. If c0k1c_0 \leq k - 1, min(N,kc0){\mathrm{min}{(N, k - c_0)}} closing braces.
    2. If c0kc0+N1c_0 \leq k \leq c_0 + N - 1, a comma.
    3. If c0k+1c0+N1c_0 \leq k + 1 \leq c_0 + N - 1, an opening brace.
    4. If c0+N1k+2c_0 + N - 1 \geq k + 2, the substring at offset δ=max(0,c0(k+2))\delta = \mathrm{max}{({0, {c_0 - {(k + 2)}}})} and length c0+Nmax(c0,k+2)c_0 + N - {\mathrm{max}{(c_0, k + 2)}}.

Moreover, this only adds O(N)O(N) compared to the complexity of evaluating to a single offset.

Proof: The algorithm is just direct application of the Theorem. For the case where αω\alpha \geq \omega, the only change is that we add at most NN characters before moving to the finite case. The case α<ω\alpha < \omega is essentially a divide-and-conquer algorithm and we have a relation of the form:

T(L)=2T(L2)+f(L){T(L)} = 2{T\left(\frac{L}{2}\right)} + {f(L)}

where L=2l=Θ(α+N)L = 2^l = {\Theta{(\alpha + N)}} and f(L)f{(L)} is O(1)O(1) or O(log2L)O(\log_2{L}) depending on available operations, but in any case O(L)O{(\sqrt{L})}. So from the master theorem, T(L)=O(L)=O(α+N){T{(L)}} = {O(L)} = {O(\alpha+N)}. In general, this bound is not as good as repeating NN calls to S.charAt!

However, we note that if we assume that α>N4\alpha > N - 4 then α+2+N<2(α+3){\alpha + 2 + N} < {2{(\alpha+3)}} and so

log2(α+2+N)<1+log2(α+3)\log_2\left(\alpha + 2 + N\right) < {1 + \log_2{(\alpha+3)}}

We can easily discard the edge case where these start/end offsets point to a brace surrounding the right substring of the iterative step and so we get:

log2(α+N+2)=log2(α+3)\left\lfloor {\log_2{(\alpha+N+2)}} \right\rfloor = \left\lfloor {\log_2{(\alpha+3)}} \right\rfloor

which means that the left substring is just empty and so the complexity is not changed compared to S.charAt. Finally, when αN4\alpha \leq N - 4 the previous bound tells us that the steps are done in O(N)O(N). □

Script for S.substr(α, N)

If the coefficients of Cantor normal form of α\alpha corresponding the smallest infinite term and finite term are
c1c_1 =
c0c_0 =
Then the character of SS substring of length NN =
and offset α\alpha is