S[α] for strings of ordinals
Update 2020/03/10: Added complexity analysis for S.substr(α, N)
In a previous blog post, I defined for each ordinal a string (made of the characters for the empty set, comma, opening brace and closing brace) that enumerates the element of . I gave a simple formulas to calculate the length of this string.
My colleague Ioanna was a bit disappointed that I didn’t provide a script for calculating the infinite strings. Obviously, the complexity would be “” but it is still possible to evaluate the string at a given position: Given , what is the character of at position ?
Since the initial segments of the strings are compatible, another way to express this is by introducing the class corresponding to a giant string enumerating the class of all ordinals. Given an ordinal , what is ?
A small generalization of this S.charAt(α) operation is S.substr(α, N) calculating the substring of length starting at .
Example
- is "∅,{∅}", so is the character ∅, is a comma, is an opening brace and is a closing brace.
- is made of of an -concatenation of finite strings (the character ∅, a comma, surrounded by braces, a comma, surrounded by braces, a comma, surrounded by braces, a comma, etc), followed by a comma, an opening brace, the same -concatenation of finite strings and finally a closing brace. So is a comma, is an opening brace, is the character "∅" and is a closing brace.
In the previous example, we have basically analyzed the string at a given successor ordinal, splitting it into two copies of , comma and braces. This suggests some easy values of :
Lemma
For any and , is:
- A comma if can be written or
- An opening brace if can be written or
- A closing brace if can be written or
Proof: For any ordinal , by viewing the string as a concatenation of , a comma, an opening fence, and a closing fence, we deduce that:
- is a comma.
- is an opening brace.
- is a successor ordinal and is a closing brace.
The lemma follows immediately from the calculation of string lengths performed in the previous blog post. □
More generally, the proof of the lemma can be extended by saying that if we find such that then is the index of in the second substring of and so .
Details will be provided in the theorem below but one can already write a simple JavaScript recursive program to evalute at finite ordinals:
Script for
is
The following intermediary step will be helpful to evaluate at infinite ordinal :
Proposition
Let is infinite and . Let and be the quotient and remainder of the euclidean division of by . Let’s define:
Then is:
- A comma if
- An opening brace if
- An closing brace if
- otherwise where is the unique ordinal such that and .
Proof: First by construction we have .
If the first case of the definition of , so we always have . If additionnaly is not a power then and so . Otherwise and and so again .
In the second case of the definition of , so . is a power of 2 and more precisely so we deduce the same way as in the previous case that . Moreover, so again .
We can thus view as a concatenation of , a comma, an opening brace, and a closing brace. We assume that as the three other cases are handled by the lemma. Then is well-defined and is actually the index of in the second copy of so . □
We are now ready to give a nice way to evaluate at any ordinal :
Theorem
can be calculated inductively as follows:
- If , is the character "∅".
- If then and is equal to:
- A comma if
- An opening brace if
- A closing brace if
- otherwise where
.
Moreover compares against as follows: and
- If is infinite, let be the coefficient
in its Cantor normal form corresponding to the smallest infinite term
and be finite term if there is one, or zero otherwise.
Let be the exponent corresponding to the smallest nonzero
term in the binary decomposition of .
Then is equal to:
- A closing brace if .
- A comma if .
- An opening brace if .
- if .
Proof:
The case is clear. Suppose and let . By definition so . Let’s consider the case as the three other cases are already known from the Lemma. is made of the concatenation of , a comma and surrounded by braces. is actually the index of in the second copy of so . The first equality is straighforward:
Morever we have:
and so the second inequality follows from the fact that . Finally, the third inequality comes from:
Let’s consider the case of an infinite . For some , and and , we can write Cantor’s Normal form as:
With the notation of the proposition, we have , and
If , then is infinite and so and we are in the fouth bullet of the proposition. Moreover and we can write:
where is infinite and so cancels out the term. It follows that . Essentially, we have just removed from its term of highest exponent in its binary decomposition!
By repeated application of the theorem, we can remove each binary digit of the for going from to . When then arrive at :
With the notation of the proposition, we now have , and . If the binary decomposition of has more than one nonzero digit then so is not a power of 2. So although is now finite, we are still in the first case of the proposition and remains infinite. So we can remove all but the last digit of by repeated application of the proposition:
where is the exponent corresponding to the smallest nonzero term in the binary decomposition of .
Using the lemma, is a comma if , an opening brace if and a closing brace if .
If then and writing
we deduce from the proposition that .
Finally, if then and writing
we deduce from the proposition that
We have essentially decremented and we can repeat this until we reach the case for which we already said that the character is a closing brace. □
As an application of this theorem, here is a few simple exercises:
Exercise 1
- is an empty set.
- is a comma.
- is a closing brace.
- is an opening brace.
Exercise 2
The evaluation of at
- Limit ordinals is either a comma or a closing brace.
- Epsilon numbers is a comma.
- Indecomposable ordinals is a comma.
Corollary 1: Time complexity
Let be an ordinal. Let be the coefficient in its Cantor normal form corresponding to the smallest infinite term if there is one, or zero otherwise. Let be its finite term if there is one, or zero otherwise. Then can be evaluated in:
- elementary arithmetic operations and comparisons on integers.
- elementary operations if integers are represented in binary and leading/trailing zero counting and bit shifts are elementary operations.
Proof: First note that the “+ 2” is just to workaround for the edge cases or .
For the infinite case we need to calculate that is performing the find first set. A naive implementation can be done in steps by browsing the digits of to find the first nonzero for example by calculating the remainder modulo increasing power of 2. Each iteration requires only elementary integer operations %, * and comparisons. Then returning the result of moving to the finite case requires integer operations +, − and comparisons.
For the finite case and , first notice that we only require recursive calls given the inequality:
The case only requires one comparison. For the case , we need to calculate which is integer rounding of the binary logarithm or even just . As above, we can provide a naive implementation by calculating the quotient modulo increasing power of 2 in comparisons and elementary integer operations /, *. Then returning the result of moving to a requires only integer operations and comparisons. In total, complexity is .
Finally, this can be simplified if one calculates and by a simple leading/trailing zero counting (or similar) and by a bit shift. □
Script based on Cantor normal form
=
=
Then the character of at position is
Once we have an algorithm for S.charAt(α), it is easy to get an algorithm of times that complexity for S.substr(α, N) calculating the substring of length starting at position by repeated calls to S.charAt(α). Let’s analyze a bit more carefully how we can make this recursive and more efficient:
Corollary 2: Algorithm for S.substr(α, N)
Let be an ordinal and . Then S.substr(α, N) can be calculated as follows:
- If then it is the empty string.
- If and then it is the character "∅".
- If , let . We have and the result is obtained by concatenating the following strings:
- If , the substring at offset and length .
- If , a comma.
- If , an opening brace.
- If , the substring at offset and length .
- If , a closing brace.
- If is infinite, let be the coefficient
in its Cantor normal form corresponding to the smallest infinite term
and be finite term if there is one, or zero otherwise.
Let be the exponent corresponding to the smallest nonzero
term in the binary decomposition of . Then the result is obtained by a concatenating the following strings:
- If , closing braces.
- If , a comma.
- If , an opening brace.
- If , the substring at offset and length .
Moreover, this only adds compared to the complexity of evaluating to a single offset.
Proof: The algorithm is just direct application of the Theorem. For the case where , the only change is that we add at most characters before moving to the finite case. The case is essentially a divide-and-conquer algorithm and we have a relation of the form:
where and is or depending on available operations, but in any case . So from the master theorem, . In general, this bound is not as good as repeating calls to S.charAt!
However, we note that if we assume that then and so
We can easily discard the edge case where these start/end offsets point to a brace surrounding the right substring of the iterative step and so we get:
which means that the left substring is just empty and so the complexity is not changed compared to S.charAt. Finally, when the previous bound tells us that the steps are done in . □
Script for S.substr(α, N)
=
=
Then the character of substring of length =
and offset is