Frédéric Wang    About    Mathematics    Computer Science    Miscellaneous    Archive

MathML Improvements in WebKit

In a previous blog post, I explained the work made by Igalia’s web platform team to refactor WebKit’s MathML layout classes. I stated that although some rendering improvements were a nice side effect, the main goal of the first phase was really to clean the code up so that it is easier for developers to work on MathML in the future. Indeed this really made things easier to review: Quite unexpectedly to me, the second phase only took 4 days to be upstreamed… Kudos to Brent Fulgham for having reviewed so many patches in such a short period of time!

In this blog post, I am going to give an overview of the improvements made during these two first phases taking changeset r203109 as a reference. The changes will be available in WebKitGTK+ 2.14 in September and are likely to be included this month in the next Safari Technology Preview. It definitely remains more work to do such as the third phase or other rendering improvements, but I believe we have already made a big step forward!

Mathematical Fonts

Two years ago, basic support for operator stretching via the OpenType MATH table was added to WebKit. During the refactoring, we improved that support and also made use of more parameters to improve the math layout (see section about OpenType MATH parameters below). While Latin Modern Math will be used in most screenshots, the following one shows that you can use any math fonts. By default WebKit will try and use one of these fonts but if none are available or if you force a non-math font-family then the rendering quality may not be good.

The following screenshot gives the rendering for various fonts. For the last one we used the value sans-serif to illustrate issues with non-math fonts (displaystyle integral too small, mathvariant italic glyphs taken from another font, missing italic correction etc).

Screenshots of a formula with various fonts Integral obtained by contour integration
WebKitGTK+ r203109 with various font-family values.
From top to bottom: TeX Gyre Schola, Latin Modern, DejaVu Math, STIX, and sans-serif.

The href Attribute

This new feature is obvious: You can now create a hyperlink for any part of a mathematical formula! Here is a screenshot of the MathML Torture Test 21 with additional links, as displayed in WebKit r203109. Click the image to load the test case in your browser and test it.

Screenshot of MathML Torture test 21 with hyperlinks

The mathvariant Attribute

Unicode contains Mathematical Alphanumeric Symbols to convey special meaning such as double-struck or specific Arabic styles. Characters for these symbols are generally provided by math fonts. In MathML, mathematical variables are automatically rendered using the italic characters from this Unicode block. One can also access these characters via the mathvariant attribute and that attribute is actually used by many LaTeX-to-MathML converters.

In the following screenshot, you can see that the letters f, x and y are now drawn with this special mathematical italic glyphs and that WebKit uses the conventional fraktur style for the Lie algebra g. Note that the prime is still too small because WebKit does not make use of the ssty feature yet.

Screenshot of Wikipedia article on Lie Algebra Homomorphism of Lie algebra
Top: Safari 9.1.1. Bottom: Safari r203109.

Operators and Spacing

As said in my previous blog post, the rendering of large and stretchy operators have been rewritten a lot and as a consequence the rendering has improved. Also, I mentioned that the width of operators may depend on their height. This may cause accumulated approximations during the computation of preferred widths. The old flexbox-based implementation incorrectly forced layout during preferred computation to avoid that but a quick workaround for that security concern caused the approximate preferred widths to be used for the logical widths. With our new implementation, the logical width is now correctly calculated. Finally, we added partial support for the mpadded element which is often used to tweak spacing in mathematical formulas.

The screenshot below illustrates the fix for a serious regression with large operator (summation symbol) as well as improvements in the rendering of stretchy operators (horizontal braces). Note that the formula has a hack with a zero-width mpadded element which used to cause improper spacing (large gap between the group of a’s and the group of b’s).

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring Tests 21 and 22 from the MathML torture test
Left: Safari 9.1.1. Right: Safari r203109.

The following screenshot shows how incorrect width computations used to cause excessive spacing after the stretchy radical and slash symbols:

Screenshot of Wikipedia article on Trigonometric Functions Sine of 1 degree
Top: Safari 9.1.1. Bottom: Safari r203109.

The displaystyle Property

Mathematical formulas can be integrated inside a paragraph of text (inline math in TeX terminology) or displayed in its own horizontally centered paragraph (display math in TeX terminology). In the latter case, the formula is in displaystyle and does not have any restrictions on vertical spacing. In the former case, the layout of the mathematical formula is modified a bit to optimize this vertical spacing and to better integrate within the surrounding text. The displaystyle property can also be set using the corresponding attribute or can change automatically in subformulas (e.g. in fractions).

In the following screenshot the fix for the large operator regression is obvious but you can also notice that the summation is now slightly different for the definition of a Bézier curve (top) and for the one of a rational Bézier curve (bottom). For example, to save some vertical space in the fractions, the sigma symbol is drawn smaller and the scripts attached to it are moved on its right. However, the script size could still be improved when we implement the scriptlevel property.

Screenshot of Wikipedia article on Bézier Curves Definitions of Bézier Curve
Left: Safari 9.1.1. Right: Safari r203109.

OpenType MATH Parameters

Many new OpenType MATH features have been implemented following the MathML in HTML5 Implementation Note:

  • Use of the AxisHeight parameter to set vertical position of fractions, tables and symmetric operators.
  • Use of layout constants for radicals (some of them were already used), scripts and fractions. This improves math spacing and positioning and allows to adjust them according to value of the displaystyle property discussed in the previous section.
  • Use of the italic correction of large operator glyph to set the position of subscripts.

The screenshots below illustrate some of these improvements. For the first one, the use of AxisHeight allows to better align the fraction bar with the plus sign. For the second one, the use of layout constants for scripts as well as the italic correction of the integral improves the placement of limits.

Screenshot of Wikipedia article on the Continued Fractions Generalized Continued Fraction
Top: Safari 9.1.1. Bottom: Safari r203109.
Screenshot of Wikipedia article on the Gamma Function Definition of the Gamma Function
Top: Safari 9.1.1. Bottom: Safari r203109.

Right-to-left radicals

One of the advantage of the old flexbox-based implementation is that right-to-left layout was available for free. This support has of course been preserved in the new implementation. We also added a simple workaround to mirror radicals using a scale transform as shown in the screenshot below. However, general glyph-level mirroring is still missing.

Screenshot of Wikipedia article on Lie Algebra Test 13 from the MathML torture test (Machrek Style)
Top: Safari 9.1.1. Bottom: Safari r203109.

Conclusion

Igalia’s web platform team has been able to follow the MathML in HTML5 Implementation Note in order to significantly improve the rendering of mathematical expressions in WebKit. More work remains to do but we will definitely appreciate any feedback that can help improving native MathML support in web engines. We are also excited to continue work and collaboration at the next Web Engines Hackfest!

MathML Refactoring in WebKit

Introduction

If you follow WebKit developments, you are certainly aware that Igalia has been working on WebKit’s MathML implementation for some time. More recently, effort has been made to write a clean implementation addressing issues reported by WebKit reviewers in the past. After joining Igalia in March, I have been in charge of getting this work reviewed and merged into WebKit’s development branch. In the past four months, we have been successful in upstreaming the first phase of the refactoring and the work accomplished is described in this blog post.

Screenshot from Mozilla's MathML torture test using Safari 9.1.1 or the the current refactoring
MathML torture tests with the Latin Modern Math font
Left: Safari 9.1.1. Right: Current MathML branch.

Note that the focus was on code refactoring so the improvement may not be obvious for non-developers. Nevertheless many issues have already been fixed as a consequence of that work: math italic, position of scripts, stretchy and large operators, rendering update and more. More importantly, this preliminary step opens the way for beautiful math rendering based on TeX rules and the OpenType MATH table. Rendering improvements and implementation of new features have already started in the next phase of the refactoring, so stay tuned…

Design Issues

As explained in a previous report, the main design issues in the flexbox-based implementation released in 2013 can essentially be summarized in three points:

  1. WebKit’s code to stretch operator was not efficient at all and was limited to some basic fences buildable via Unicode characters.
  2. WebKit’s MathML code violated many layout invariants, making the code unreliable.
  3. WebKit’s MathML code relied heavily on the C++ renderer classes for flexboxes and had to manage too many anonymous renderers.

For the first point, the performance issue had been fixed by Igalia developers right after the initial feedback from WebKit developers and we improved that again during our refactoring. Partial support for the OpenType MATH table was added during my crowdfunding project and allowed to stretch more operators with the help of math fonts. For the second point, the main issue was also fixed right after the initial feedback. However one could still have some doubts regarding the layout steps, given the complexity implied by the third point. That last issue was still problematic so far and addressing it was the main achievement of our refactoring.

Technically, the dependence on flexbox is unnecessary and the implementation actually only used a limited set of flexbox features. Thus executing the whole flexbox code was overkill. It can also be a burden for people working on other places of the layout code. For example Javi Fernández has worked on improving the box alignments in the past and he had hard time fixing the MathML code impacted by his changes. This is probably the cause of the bad position of the summation symbol that can be seen in the screenshot above.

From the layout perspective, most of the rendering logic was implemented in the flexbox classes and the MathML “renderer” classes were really just managing the creation and update of anonymous nodes and style. Although this sounds good code reuse it actually made impossible to understand how and when layout steps happen or to add more advanced features. The new implementation replaced this manipulation of the render tree with simple arithmetic calculations on box metrics which is safer and more reliable. Also, complex renderers such as RenderMathMLScripts or RenderMathMLRoot actually achieve better rendering quality with less code!

As an example of the complexity, RenderMathMLUnderOver can behave as a RenderMathMLScripts in some situation so we really want the former class to reuse the latter class. As we will see below the old implementation of the two renderers were quite different: RenderMathMLUnderOver only relied on setting column direction in the user agent stylesheet while RenderMathMLScripts created a complex render tree structures with anonymous style. Hence it seemed difficult to share the two cases or to handle DOM changes causing to move from one case to the other one. With our new implementation, this is simply reduced to simple C++ inheritance.

When I started to work on WebKit some years ago, I made the mistake of continuing with the existing approach. The implementation of multiscripts or automatic italic mathvariant added more anonymous objects and made the situation even worse. After the end of my crowdfunding project, Alex Castro did more cleanup and tried to implement important features such as displaystyle but he also soon realized that it was too hard to work with the current code base…

Layout Refactoring

In order to solve the issues discussed in the previous section, Javi and Alex worked on a new MathML branch where the first step was to remove the inheritance on the flexbox layout classes. During the Web Engines Hackfest 2015, I collaborated with the Igalia’s web platform team team to continue the work on this branch. In a second step, we rewrote many MathML renderer functions so that they stop creating anonymous nodes or style. We obtained very encouraging results: The implementation looked much simpler and much more understandable!

Alex announced the initial plan on the webkit-dev mailing list. He started opening bugs and attaching patches to merge the first step. However, that step still required many of the flexbox logic and so made code hard to understand for reviewers. Hence when I joined Igalia four months ago Alex asked me to try and see how to reorganize patches so that the two initial steps can be submitted in one go. This corresponds to the first phase mentioned in the introduction. As indicated on the wiki page, the layout refactoring consisted in rewriting the following member functions of each renderer class:

  • computePreferredLogicalWidths: calculate preferred widths, based on the preferred widths of child renderers.
  • layoutBlock: set final position and size of child renderers.
  • firstLineBaseLine: calculate the ascent of the renderer.
  • paint (optional): perform special painting such as fraction bars.

Refactored renderers no longer rely on any flexbox code nor anonymous renderers and the functions mentioned above essentially perform arithmetic computations. By reading the code, one can be sure that we follow standard layout rules and that we do not perform unnecessary reflow. Moreover, the rules specific to math rendering are only located in the MathML renderers and can be better understood. Details for each class are provided in the next subsections. After all the layout functions were rewritten and the code managing the render tree structure removed, we were able to make the RenderMathMLBlock class inherit from RenderBlock instead of RenderFlexibleBox. Many of the bugs could then be immediately closed or otherwise fixed with small follow-up patches.

Spacing

RenderMathMLSpace is a simple class inserting blank boxes for adjusting spacing of math formulas. Obviously, we do not need any of the complexity of flexbox so it was straightforward to write the layout functions.

3x Large space between 3 and x.

Grouping

RenderMathMLRow performs rendering of a row of math items. Since WebKit does not support linebreaking in MathML at the moment, this is just putting child boxes on a same baseline. One specificity is that some operators can be stretched vertically and so their width may depend on their height.

{ 2x x 3 Row containing a stretched brace, a fraction and a scripted element.

Again, flexbox features are useless here. With the old code, it was not clear whether we were violating the CSS invariant with preferred and logical widths and which kind of relayout or render tree changes would happen when doing the stretch call. By properly implementing the layout functions previously mentioned all of this became much more trustable.

Fractions

RenderMathMLFraction draws a fraction with numerator and denominator.

x+1y+2 Simple fraction.

This used to be implemented using a column direction for the fraction element. Numerator and denominator were wrapped into anonymous nodes with additional style to leave space for the fraction bar and to adjust the horizontal alignments.

RenderMathMLFraction (flexbox with column direction)
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (numerator)
      ...
  RenderMathMLBlock (anonymous flexbox)
    RenderMathMLRow (denominator)
      ...

It was relatively easy to implement this without any anonymous nodes and again the use of flexbox did not sound justified. For example, to calculate the preferred width we just take the maximum of the preferred widths of the numerator and denominator. For the layout, the calculation of the logical width is similar and we calculate the horizontal coordinates of numerator and denominator so that their centers are aligned. Vertical metrics are similarly calculated from the vertical metrics of the numerator and denominator. During that step, we also fixed some bugs with the linethickness attribute and added support for some OpenType MATH table constants.

Scripts above and below

RenderMathMLUnderOver is used to attach some scripts above and below a base. Each child can itself be a horizontal stretchy operator.

base Base with stretchy arrow over it.

This was implemented in the user agent stylesheet by using flexboxes with column direction for the corresponding MathML elements and the C++ class had additional rules to fire the stretching. So the problems and solutions for this class were essentially a mixed of the cases of RenderMathMLFraction and RenderMathMLRow we just discussed.

Subscripts and Superscripts

RenderMathMLScripts is used for a base with some arbitrary number of scripts. All the scripts can have different positions (pre, post, sub, super) and metrics (width, ascent and descent). We must avoid collisions and take care of horizontal and vertical alignements.

base a b c d e f Base with pre and post scripts.

The old code used a complex render tree with additional style to achieve the best possible result. However, the quality was still bad as you can see for the script attached to the integral in the screenshot above. Managing the render tree was a nightmare: Just to give the idea, additional anonymous node and style were used to allow horizontal and vertical adjustments (similar to RenderMathMLFraction above) and prescripts had negative order property so that they were positioned before the base.

RenderMathMLScripts
    Base Wrapper (anonymous flexbox)
      RenderMathMLRow (base)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction)
      RenderMathMLRow (post-subscript)
        ...
      RenderMathMLRow (post-superscript)
        ...
    ... (more postscripts)
    RenderMathMLBlock (prescripts separator)
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-subperscript)
        ...
    SubSupPair Wrapper (anonymous flexbox with column direction and order -1)
      RenderMathMLRow (pre-subscript)
        ...
      RenderMathMLRow (pre-superscript)
        ...
    ... (more prescripts)

Rules from TeX and the OpenType MATH table are relatively complex and we decided to implement them directly in the new refactoring as otherwise it was impossible to get decent quality. The code is still complex but we now have clear rules, we only perform simple calculations and the render tree structure matches the DOM tree.

“Enclosing” Notations

RenderMathMLMenclose is a row of math items with some additional notations. Gurpreet Kaur implemented this element two years ago but she followed the same approch, combining anonymous nodes and style for some simple notations and special painting for others.

x+1 circle and strike notations

During the refactoring, the code has been completely rewritten so that RenderMathMLMenclose is now essentially a derived class of RenderMathMLRow with the measuring and painting functions adjusted to take into account the additional notations. During that refactoring, we also removed support for unused radical notation, which was implemented using an anonymous RenderMathMLSquareRoot (see Radicals section below).

Helper Classes for Operators

The RenderMathMLOperator class is used for math operators. It was quite complex class and we decided to extract from it two features that are unrelated to layout:

The remaining code was indeed the real layout part but the mess with anonymous node and style was only removed later (see Text Classes below). Although it seems we just needed to move the code out of RenderMathMLOperator into those new classes, the case of MathOperator was particularly difficult. We had to split the effort into several small steps to make review possible and also fixed many issues due to the entanglement and confusion of these three different features of the RenderMathMLOperator class… The work done for MathOperator actually improved the rendering of stretchy operators as you can see for the horizontal braces in the screenshot above.

Radicals

RenderMathMLRoot is used for square root or arbitrary N-th root. Many of the TeX and OpenType MATH table rules were already used by the old implementation with anonymous nodes and style. However, there were bugs difficult to fix related to zooming, child removal or style change due to the management of the anonymous RenderMathMLOperator to draw the radical sign.

x+1 + x+1 3 square and cube roots

The old implementation actually had two classes for the square and general cases (RenderMathMLSquareRoot and RenderMathMLRoot). The usual technique with various anonymous wrappers and style was used. After the refactoring, we were able to merge everything in a single RenderMathMLRoot class. Because the square root behaves as an mrow, we also made that class derive from RenderMathMLRow to reuse as much code as possible. Here is are how the render trees used to look like:

RenderMathMLSquareRoot
  RenderMathMLBlock (anonymous used for metric adjustements)
    RenderMathMLRadicalOperator (anonymous used for the radical symbol)
      ...
  RenderMathMLRootWrapper (anonymous used for the children)
    RenderMathMLRow (child 1)
      ...
    RenderMathMLRow (child 2)
      ...
    ...
    RenderMathMLRow (child N)
      ...

RenderMathMLRoot
  RenderMathMLRootWrapper (anonymous for the index)
    ...
  RenderMathMLBlock (anonymous used for metric adjustements)
     RenderMathMLRadicalOperator (anonymous used for the radical symbol)
       ...
  RenderMathMLRootWrapper (anonymous for the base)
    ...

Again, we rewrote the implementation using only simple box positioning. The difficult part was to get rid of the anonymous RenderMathMLRadicalOperator to draw the radical symbol. This class was derived from RenderMathMLOperator and extended it with some fallback drawing when math fonts were not available. After having extracted stretchy operator shaping from RenderMathMLOperator it became possible to use the MathOperator helper class to draw the radical symbol. We implemented the fallback for missing math fonts the same as Gecko: Use a scale transform to stretch the base glyph for U+221A SQUARE ROOT. As a bonus, we used such transform to implement glyph mirroring, as required to draw right-to-left radicals in some Arabic mathematical notations.

Text Classes

These classes are containers for math text such as variables or operators. There is a generic RenderMathMLToken class and a derived class RenderMathMLOperator adding features specific to operators such as spacing, dictionary property, stretching… Anonymous wrappers and style were used to implement automatic italic mathvariant or operator spacing. The RenderText child of RenderMathMLOperator was (re)built as an anonymous text node so that is was possible to convert U+002D HYPHEN-MINUS into U+2212 MINUS SIGN or to provide some text for anonymous operators created by RenderMathMLFenced (see Unchanged Classes section).

RenderMathMLToken (e.g. mi element)
  RenderMathMLBlock (anonymous flexbox used to apply CSS italic)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText
        text run "x"

RenderMathMLOperator (mo element)
   RenderMathMLBlock (anonymous flexbox used for spacing)
    RenderBlock (anonymous created elsewhere to honor CSS rules)
      RenderText (anonymous destroyed and built again)
         text run "−"

We did a big refactoring to remove all the anonymous nodes created by the MathML renderer classes. Just like for MathOperator, we had to be careful and submit various small pieces as the text rendering was quite sensible to code change.

The simplified operator spacing that was supported by WebKit was easy to implement with the new approach. To do automatic italic mathvariant, we modified the paint function to use Mathematical Alphanumeric Symbols instead of CSS italic as you can notice for the variables displayed in the screenshot above. Hence we could remove the RenderMathMLBlock anonymous wrapper.

The use of an anonymous node for the text prevented it to appear in the dumped render tree of layout tests and also required some hacks in the accessibility code to expose that text. In order to address the cases of the minus sign and of mfenced operators, we decided to use our new MathOperator class again. Indeed MathOperator is actually also able to draw unstretched operators made of a single character and this works for the minus sign and for mfenced operators used in practice.

Unchanged Classes

Two classes have not been modified but such modifications were not needed to remove the dependency on RenderFlexibleBox:

  • RenderMathMLFenced is used for an mrow-like element that is defined in the MathML specification as strictly equivalent to constructions with rows and operators. It is implemented as a derived class of RenderMathMLRow and creates anonymous RenderMathMLOperators. This is the only remaining class that modifies the render tree structure. Note that prominent MathML websites and generators do not use the mfenced element, so it is not a big concern.

  • RenderMathMLTable is used for table layout. It is just derived from RenderTable, not RenderFlexibleBox. We did not change anything for now but we considered creating our own implementation in order to make our code independent from HTML table, to support MathML-specific table features and to make it better integrated with the rest of the MathML code.

Accessibility

Even if our main focus was on rendering, the changes we made also had impact on the MathML accessibility code. Indeed, the accessibility tree is generated from the MathML renderer classes: Since we changed the latter during the refactoring, we also had to adjust the accessibility code. Fortunately, we are lucky to have Joanmarie Diggs in our team and she was able to provide some help here.

First, the accessibility code exposes the linethickness of fractions to implement Apple’s AXMathLineThickness attribute. In practice, this is really necessary to know whether the linethickness is null or not (e.g. binomial coefficient VS the Legendre symbol). Apple’s unit test seemed to expose the ratio between the actual thickness and the default thickness but the accessibility code really just reads the actual thickness calculated by RenderMathMLFraction. Our fix and improvement for linethickness made the Apple’s unit test fail so we had to adjust RenderMathMLFraction to expose the value expected by that test.

In general, the accessibility code does not care about anonymous nodes created for layout purpose and there was some code to avoid exposing them in the accessibility tree. So removing all the anonymous during the layout refactoring was actually a good and safe thing to do. There were some helper functions to implement Apple’s AXMathRootRadicand and AXMathRootIndex attributes that had to be adjusted, though. These functions used to do some work to skip the anonymous wrappers and we were actually able to simplify them.

There was also some specific code for the RenderMathMLOperators and their anonymous RenderText that were necessary to expose the text content. Actually, there was an old bug in the accessibility code and the anonymous RenderMathMLOperators created by mfenced were not correctly exposed. The unit test we had for mfenced operators was only checking the text content but it was still passing and so the regression had never been detected before. After the layout refactoring we removed the anonymous RenderText of mfenced operators and so broke that test… We thus spent some time to fix the RenderMathMLOperator code. Essentially, we removed all the old hacks and only left a specific handling for mfenced operators. We also used this opportunity to improve and extend our MathML accessibility tests.

Finally, the MathML accessibility code was directly implemented into a generic AccessibilityRenderObject class. There was some functions to access math nodes and properties but also specific cases scattered all over the code (anonymous boxes, mfenced operators, math roles etc). In order to facilitate future work and maintenance we decided to move all the MathML code into a new AccessibilityMathMLElement class. Hence the work implied by the layout refactoring actually encouraged us to improve the organization and testing of our accessibility code!

Conclusion

In the past four months, Igalia’s web platform team has successfully upstreamed the refactoring of WebKit’s MathML renderer classes and we are now very confident about the quality of the layout code. In addition to the people mentioned above I would personally like to thank everybody who helped with this work. More specifically, I am very grateful to other people from Igalia (Martin Robinson, Sergio Villar and Manuel Rego) or Apple (Brent Fulgham and Darin Adler) who have spent some time to review patches. As a nice side effect of this work, mathematical formulas look better and the accessibility code has been improved. More is happenning in the next two phases. We are looking forward to continuing implementation of Web standards and collaboration with browser vendors at the next Web Engines Hackfest!

OpenType MATH in HarfBuzz

TL;DR:

  • Work is in progress to add OpenType MATH support in HarfBuzz and will be instrumental for many math rendering engines relying on that library, including browsers.

  • For stretchy operators, an efficient way to determine the required number of glyphs and their overlaps has been implemented and is described here.

In the context of Igalia browser team effort to implement MathML support using TeX rules and OpenType features, I have started implementation of OpenType MATH support in HarfBuzz. This table from the OpenType standard is made of three subtables:

  • The MathConstants table, which contains layout constants. For example, the thickness of the fraction bar of ab\frac{a}{b}.

  • The MathGlyphInfo table, which contains glyph properties. For instance, the italic correction indicating how slanted an integral is e.g. to properly place the subscript in D\displaystyle\displaystyle\int_{D}.

  • The MathVariants table, which provides larger size variants for a base glyph or data to build a glyph assembly. For example, either a larger parenthesis or a assembly of U+239B, U+239C, U+239D to write something like:

    (abcdefgh\left(\frac{\frac{\frac{a}{b}}{\frac{c}{d}}}{\frac{\frac{e}{f}}{\frac{g}{h}}}\right.

Code to parse this table was added to Gecko and WebKit two years ago. The existing code to build glyph assembly in these Web engines was adapted to use the MathVariants data instead of only private tables. However, as we will see below the MathVariants data to build glyph assembly is more general, with arbitrary number of glyphs or with additional constraints on glyph overlaps. Also there are various fallback mechanisms for old fonts and other bugs that I think we could get rid of when we move to OpenType MATH fonts only.

In order to add MathML support in Blink, it is very easy to import the OpenType MATH parsing code from WebKit. However, after discussions with some Google developers, it seems that the best option is to directly add support for this table in HarfBuzz. Since this library is used by Gecko, by WebKit (at least the GTK port) and by many other applications such as Servo, XeTeX or LibreOffice it make senses to share the implementation to improve math rendering everywhere.

The idea for HarfBuzz is to add an API to

  1. 1.

    Expose data from the MathConstants and MathGlyphInfo.

  2. 2.

    Shape stretchy operators to some target size with the help of the MathVariants.

It is then up to a higher-level math rendering engine (e.g. TeX or MathML rendering engines) to beautifully display mathematical formulas using this API. The design choice for exposing MathConstants and MathGlyphInfo is almost obvious from the reading of the MATH table specification. The choice for the shaping API is a bit more complex and discussions is still in progress. For example because we want to accept stretching after glyph-level mirroring (e.g. to draw RTL clockwise integrals) we should accept any glyph and not just an input Unicode strings as it is the case for other HarfBuzz shaping functions. This shaping also depends on a stretching direction (horizontal/vertical) or on a target size (and Gecko even currently has various ways to approximate that target size). Finally, we should also have a way to expose italic correction for a glyph assembly or to approximate preferred width for Web rendering engines.

As I mentioned at the beginning, the data and algorithm to build glyph assembly is the most complex part of the OpenType MATH and deserves a special interest. The idea is that you have a list of n1n\geq 1 glyphs available to build the assembly. For each 0in-10\leq i\leq n-1, the glyph gig_{i} has advance aia_{i} in the stretch direction. Each gig_{i} has straight connector part at its start (of length sis_{i}) and at its end (of length eie_{i}) so that we can align the glyphs on the stretch axis and glue them together. Also, some of the glyphs are “extenders” which means that they can be repeated 0, 1 or more times to make the assembly as large as possible. Finally, the end/start connectors of consecutive glyphs must overlap by at least a fixed value omino_{\mathrm{min}} to avoid gaps at some resolutions but of course without exceeding the length of the corresponding connectors. This gives some flexibility to adjust the size of the assembly and get closer to the target size tt.

gig_{i}

sis_{i}

eie_{i}

aia_{i}

gi+1g_{i+1}

si+1s_{i+1}

ei+1e_{i+1}

ai+1a_{i+1}

oi,i+1o_{i,i+1}

Figure 1: Two adjacent glyphs in an assembly

To ensure that the width/height is distributed equally and the symmetry of the shape is preserved, the MATH table specification suggests the following iterative algorithm to determine the number of extenders and the connector overlaps to reach a minimal target size tt:

  1. 1.

    Assemble all parts by overlapping connectors by maximum amount, and removing all extenders. This gives the smallest possible result.

  2. 2.

    Determine how much extra width/height can be distributed into all connections between neighboring parts. If that is enough to achieve the size goal, extend each connection equally by changing overlaps of connectors to finish the job.

  3. 3.

    If all connections have been extended to minimum overlap and further growth is needed, add one of each extender, and repeat the process from the first step.

We note that at each step, each extender is repeated the same number of times r0r\geq 0. So if IExtI_{\mathrm{Ext}} (respectively INonExtI_{\mathrm{NonExt}}) is the set of indices 0in-10\leq i\leq n-1 such that gig_{i} is an extender (respectively is not an extender) we have ri=rr_{i}=r (respectively ri=1r_{i}=1). The size we can reach at step rr is at most the one obtained with the minimal connector overlap omino_{\mathrm{min}} that is

i=0N-1(j=1riai-omin)+omin=(iINonExtai-omin)+(iIExtr(ai-omin))+omin\sum_{i=0}^{N-1}\left(\sum_{j=1}^{r_{i}}{a_{i}-o_{\mathrm{min}}}\right)+o_{ \mathrm{min}}=\left(\sum_{i\in I_{\mathrm{NonExt}}}{a_{i}-o_{\mathrm{min}}} \right)+\left(\sum_{i\in I_{\mathrm{Ext}}}r{(a_{i}-o_{\mathrm{min}})}\right)+o% _{\mathrm{min}}

We let NExt=|IExt|N_{\mathrm{Ext}}={|I_{\mathrm{Ext}}|} and NNonExt=|INonExt|N_{\mathrm{NonExt}}={|I_{\mathrm{NonExt}}|} be the number of extenders and non-extenders. We also let SExt=iIExtaiS_{\mathrm{Ext}}=\sum_{i\in I_{\mathrm{Ext}}}a_{i} and SNonExt=iINonExtaiS_{\mathrm{NonExt}}=\sum_{i\in I_{\mathrm{NonExt}}}a_{i} be the sum of advances for extenders and non-extenders. If we want the advance of the glyph assembly to reach the minimal size tt then

SNonExt-omin(NNonExt-1)+r(SExt-ominNExt)t{S_{\mathrm{NonExt}}-o_{\mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}+{r% \left(S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}\right)}\geq t

We can assume SExt-ominNExt>0S_{\mathrm{Ext}}-o_{\mathrm{min}}N_{\mathrm{Ext}}>0 or otherwise we would have the extreme case where the overlap takes at least the full advance of each extender. Then we obtain

rrmin=max(0,t-SNonExt+omin(NNonExt-1)SExt-ominNExt)r\geq r_{\mathrm{min}}=\max\left(0,\left\lceil\frac{t-{S_{\mathrm{NonExt}}+o_{ \mathrm{min}}\left(N_{\mathrm{NonExt}}-1\right)}}{S_{\mathrm{Ext}}-o_{\mathrm{ min}}N_{\mathrm{Ext}}}\right\rceil\right)

This provides a first simplification of the algorithm sketched in the MATH table specification: Directly start iteration at step rminr_{\mathrm{min}}. Note that at each step we start at possibly different maximum overlaps and decrease all of them by a same value. It is not clear what to do when one of the overlap reaches omino_{\mathrm{min}} while others can still be decreased. However, the sketched algorithm says all the connectors should reach minimum overlap before the next increment of rr, which means the target size will indeed be reached at step rminr_{\mathrm{min}}.

One possible interpretation is to stop overlap decreasing for the adjacent connectors that reached minimum overlap and to continue uniform decreasing for the others until all the connectors reach minimum overlap. In that case we may lose equal distribution or symmetry. In practice, this should probably not matter much. So we propose instead the dual option which should behave more or less the same in most cases: Start with all overlaps set to omino_{\mathrm{min}} and increase them evenly to reach a same value oo. By the same reasoning as above we want the inequality

SNonExt-o(NNonExt-1)+rmin(SExt-oNExt)t{S_{\mathrm{NonExt}}-o\left(N_{\mathrm{NonExt}}-1\right)}+{r_{\mathrm{min}} \left(S_{\mathrm{Ext}}-oN_{\mathrm{Ext}}\right)}\geq t

which can be rewritten

SNonExt+rminSExt-o(NNonExt+rminNExt-1)tS_{\mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-{o\left(N_{\mathrm{NonExt% }}+{r_{\mathrm{min}}N_{\mathrm{Ext}}}-1\right)}\geq t

We note that N=NNonExt+rminNExtN=N_{\mathrm{NonExt}}+{r_{\mathrm{min}}N_{\mathrm{Ext}}} is just the exact number of glyphs used in the assembly. If there is only a single glyph, then the overlap value is irrelevant so we can assume NNonExt+rNExt-1=N-11N_{\mathrm{NonExt}}+{rN_{\mathrm{Ext}}}-1=N-1\geq 1. This provides the greatest theorical value for the overlap oo:

ominoomaxtheorical=SNonExt+rminSExt-tNNonExt+rminNExt-1o_{\mathrm{min}}\leq o\leq o_{\mathrm{max}}^{\mathrm{theorical}}=\frac{S_{ \mathrm{NonExt}}+r_{\mathrm{min}}S_{\mathrm{Ext}}-t}{N_{\mathrm{NonExt}}+{r_{ \mathrm{min}}N_{\mathrm{Ext}}}-1}

Of course, we also have to take into account the limit imposed by the start and end connector lengths. So omaxo_{\mathrm{max}} must also be at most min(ei,si+1)\min{(e_{i},s_{i+1})} for 0in-20\leq i\leq n-2. But if rmin2r_{\mathrm{min}}\geq 2 then extender copies are connected and so omaxo_{\mathrm{max}} must also be at most min(ei,si)\min{(e_{i},s_{i})} for iIExti\in I_{\mathrm{Ext}}. To summarize, omaxo_{\mathrm{max}} is the minimum of omaxtheoricalo_{\mathrm{max}}^{\mathrm{theorical}}, of eie_{i} for 0in-20\leq i\leq n-2, of sis_{i} 1in-11\leq i\leq n-1 and possibly of e0e_{0} (if 0IExt0\in I_{\mathrm{Ext}}) and of of sn-1s_{n-1} (if n-1IExt{n-1}\in I_{\mathrm{Ext}}).

With the algorithm described above NExtN_{\mathrm{Ext}}, NNonExtN_{\mathrm{NonExt}}, SExtS_{\mathrm{Ext}}, SNonExtS_{\mathrm{NonExt}} and rminr_{\mathrm{min}} and omaxo_{\mathrm{max}} can all be obtained using simple loops on the glyphs gig_{i} and so the complexity is O(n)O(n). In practice nn is small: For existing fonts, assemblies are made of at most three non-extenders and two extenders that is n5n\leq 5 (incidentally, Gecko and WebKit do not currently support larger values of nn). This means that all the operations described above can be considered to have constant complexity. This is much better than a naive implementation of the iterative algorithm sketched in the OpenType MATH table specification which seems to require at worst

r=0rmin-1NNonExt+rNExt=NNonExtrmin+rmin(rmin-1)2NExt=O(n×rmin2)\sum_{r=0}^{r_{\mathrm{min}}-1}{N_{\mathrm{NonExt}}+rN_{\mathrm{Ext}}}=N_{ \mathrm{NonExt}}r_{\mathrm{min}}+\frac{r_{\mathrm{min}}\left(r_{\mathrm{min}}-% 1\right)}{2}N_{\mathrm{Ext}}={O(n\times r_{\mathrm{min}}^{2})}

and at least Ω(rmin)\Omega(r_{\mathrm{min}}).

One of issue is that the number of extender repetitions rminr_{\mathrm{min}} and the number of glyphs in the assembly NN can become arbitrary large since the target size tt can take large values e.g. if one writes \underbrace{\hspace{65535em}} in LaTeX. The improvement proposed here does not solve that issue since setting the coordinates of each glyph in the assembly and painting them require Θ(N)\Theta(N) operations as well as (in the case of HarfBuzz) a glyph buffer of size NN. However, such large stretchy operators do not happen in real-life mathematical formulas. Hence to avoid possible hangs in Web engines a solution is to impose a maximum limit NmaxN_{\mathrm{max}} for the number of glyph in the assembly so that the complexity is limited by the size of the DOM tree. Currently, the proposal for HarfBuzz is Nmax=128N_{\mathrm{max}}=128. This means that if each assembly glyph is 1em large you won’t be able to draw stretchy operators of size more than 128em, which sounds a quite reasonable bound. With the above proposal, rminr_{\mathrm{min}} and so NN can be determined very quickly and the cases NNmaxN\geq N_{\mathrm{max}} rejected, so that we avoid losing time with such edge cases…

Finally, because in our proposal we use the same overlap oo everywhere an alternative for HarfBuzz would be to set the output buffer size to nn (i.e. ignore r-1r-1 copies of each extender and only keep the first one). This will leave gaps that the client can fix by repeating extenders as long as oo is also provided. Then HarfBuzz math shaping can be done with a complexity in time and space of just O(n)O(n) and it will be up to the client to optimize or limit the painting of extenders for large values of NN

MathML at the Web Engines Hackfest 2015

Hackfest

Two weeks ago, I travelled to Spain to participate to the second Web Engines Hackfest which was sponsored by Igalia and Collabora. Such an event has been organized by Igalia since 2009 and used to be focused on WebkitGTK+. It is great to see that it has now been extended to any Web engines & platforms and that a large percentage of non-igalian developers has been involved this year. If you did not get the opportunity to attend this event or if you are curious about what happened there, take a look at the wiki page or flickr album.

Last day of the hackfest
Photo from @webengineshackfest licensed under Creative Commons Attribution-ShareAlike

I really like this kind of hacking-oriented and participant-driven event where developers can meet face to face, organize themselves in small collaboration groups to efficiently make progress on a task or give passionate talk about what they have recently been working on. The only small bemol I have is that it is still mainly focused on WebKit/Blink developments. Probably, the lack of Mozilla/Microsoft participants is probably due to Mozilla Coincidental Work Weeks happening at the same period and to the proprietary nature of EdgeHTML (although this is changing?). However, I am confident that Igalia will try and address this issue and I am looking forward to coming back next year!

MathML developments

This year, Igalia developer Alejandro G. Castro wanted to work with me on WebKit’s MathML layout code and more specifically on his MathML refactoring branch. Indeed, as many people (including Google developers) who have tried to work on WebKit’s code in the past, he arrived to the conclusion that the WebKit’s MathML layout code has many design issues that make it a burden for the rest of the layout team and too complex to allow future improvements. I was quite excited about the work he has done with Javier Fernández to try and move to a simple box model closer to what exists in Gecko and thus I actually extended my stay to work one whole week with them. We already submitted our proposal to the webkit-dev mailing list and received positive feedback, so we will now start merging what is ready. At the end of the week, we were quite satisfied about the new approach and confident it will facilitate future maintenance and developements :-)

Main room
Photo from @webengineshackfest licensed under Creative Commons Attribution-ShareAlike

While reading a recent thread on the Math WG mailing list, I realized that many MathML people have only vague understanding of why Google (or to be more accurate, the 3 or 4 engineers who really spent some time reviewing and testing the WebKit code) considered the implementation to be unsafe and not ready for release. Even worse, Michael Kholhase pointed out that for someone totally ignorant of the technical implementation details, the communication made some years ago around the “flexbox-based approach” gave the impression that it was “the right way” (indeed, it somewhat did improve the initial implementation) and the rationale to change that approach was not obvious. So let’s just try and give a quick overview of the main problems, even if I doubt someone can have good understanding of the situation without diving into the C++ code:

  1. WebKit’s code to stretch operator was not efficient at all and was limited to some basic fences buildable via Unicode characters.
  2. WebKit’s MathML code violated many layout invariants, making the code unreliable.
  3. WebKit’s MathML code relied heavily on the C++ renderer classes for flexboxes and has to manage too many anonymous renderers.

The main security concerns were addressed a long time ago by Martin Robinson and me. Glyph assembly for stretchy operators are now drawn using low-level font painting primitive instead of creating one renderer object for each piece and the preferred width for them no longer depends on vertical metrics (although we still need some work to obtain Gecko’s good operator spacing). Also, during my crowdfunding project, I implemented partial support for the OpenType MATH table in WebKit and more specifically the MathVariant subtable, which allows to directly use construction of stretchy operators specified by the font designer and not only the few Unicode constructions.

However, the MathML layout code still modifies the renderer tree to force the presence of anonymous renderers and still applies specific CSS rules to them. It is also spending too much time trying to adapt the parent flexbox renderer class which has at the same time too much features for what is needed for MathML (essentially automatic box alignment) and not enough to get exact placement and measuring needed for high-quality rendering (the TeXBook rules are more complex, taking into account various parameters for box shifts, drops, gaps etc).

During the hackfest, we started to rewrite a clean implementation of some MathML renderer classes similar to Gecko’s one and based on the MathML in HTML5 implementation note. The code now becomes very simple and understandable. It can be summarized into four main functions. For instance, to draw a fraction we need:

  • computePreferredLogicalWidths which sets the preferred width of the fraction during the first layout pass, by considering the widest between numerator and denominator.
  • layoutBlock and firstLineBaseline which calculate the final width/height/ascent of the fraction element and position the numerator and denominator.
  • paint which draws the fraction bar.

Perhaps, the best example to illustrate how the complexity has been reduced is the case of the renderer of mmultiscripts/msub/msup/msubsup elements (attaching an arbitrary number of subscripts and superscripts before or after a base). In the current WebKit implementation, we have to create three anonymous wrappers (a first one for the base, a second one for prescripts and a third one for postscripts) and an anonymous wrapper for each subscript/superscript pair, add alignment styles for these wrappers and spend a lot of time maintaining the integrity of the renderer tree when dynamic changes happen. With the new code, we just need to do arithmetic calculations to position the base and script boxes. This is somewhat more complex than the fraction example above but still, it remains arithmetic calculations and we can not reduce any further if we wish quality comparable to TeXBook / MATH rules. We actually take into account many parameters from the OpenType MATH table to get much better placement of scripts. We were able to fix bug 130325 in about twenty minutes instead of fighting with a CSS “negative margin” hack on anonymous renderers.

MathML dicussions

The first day of the hackfest we also had an interesting “breakout session” to define the tasks to work on during the hackfest. Alejandro briefly presented the status of his refactoring branch and his plan for the hackfest. As said in the previous section, we have been quite successful in following this plan: Although it is not fully complete yet, we expect to merge the current changes soon. Dominik Röttsches who works on Blink’s font and layout modules was present at the MathML session and it was a good opportunity to discuss the future of MathML in Chrome. I gave him some references regarding the OpenType MATH table, math fonts and the MathML in HTML5 implementation note. Dominik said he will forward the references to his colleagues working on layout so that we can have deeper technical dicussion about MathML in Blink in the long term. Also, I mentioned noto-fonts issue 330, which will be important for math rendering in Android and actually does not depend on the MathML issue, so that’s certainly something we could focus on in the short term.

Álex and Fred during the MathML breakout session
Photo from @webengineshackfest licensed under Creative Commons Attribution-ShareAlike

Alejandro also proposed to me to prepare a talk about MathML in Web Engines and exciting stuff happening with the MathML Association. I thus gave a brief overview of MathML and presented some demos of native support in Gecko. I also explained how we are trying to start a constructive approach to solve miscommunication between users, spec authors and implementers ; and gather technical and financial resources to obtain a proper solution. In a slightly more technical part, I presented Microsoft’s OpenType MATH table and how it can be used for math rendering (and MathML in particular). Finally, I proposed my personal roadmap for MathML in Web engines. Although I do not believe I am a really great speaker, I received positive feedback from attendees. One of the thing I realized is that I do not know anything about the status and plan in EdgeHTML and so overlooked to mention it in my presentation. Its proprietary nature makes hard for external people to contribute to a MathML implementation and the fact that Microsoft is moving away from ActiveX de facto excludes third-party plugin for good and fast math rendering in the future. After I went back to Paris, I thus wrote to Microsoft employee Christian Heilmann (previously working for Mozilla), mentioning at least the MathML in HTML5 Implementation Note and its test suite as a starting point. MathML is currently on the first page of the most voted feature requested for Microsoft Edge and given the new direction taken with Microsoft Edge, I hope we could start a discussion on MathML in EdgeHTML…

Conclusion

This was a great hackfest and I’d like to thank again all the participants and sponsors for making it possible! As Alejandro wrote to me, “I think we have started a very interesting work regarding MathML and web engines in the future.”. The short term plan is now to land the WebKit MathML refactoring started during the hackfest and to finish the work. I hope people now understand the importance of fonts with an OpenType MATH table for good mathematical rendering and we will continue to encourage browser vendors and font designers to make such fonts wide spread.

The new approach for WebKit MathML support gives good reason to be optmimistic in the long term and we hope we will be able to get high-quality rendering. The fact that the new approach addresses all the issues formulated by Google and that we started a dialogue on math rendering, gives several options for MathML in Blink. It remains to get Microsoft involved in implementing its own OpenType table in EdgeHTML. Overall, I believe that we can now have a very constructive discussion with the individuals/companies who really want native math support, with the Math WG members willing to have their specification implemented in browsers and with the browser vendors who want a math implementation which is good, clean and compatible with the rest of their code base. Hopefully, the MathML Association will be instrumental in achieving that. If everybody get involved, 2016 will definitely be an exciting year for native MathML in Web engines!

The canonical well-ordering of α×α (part 2)

In a my previous blog post, I discussed the canonical well-ordering on α×α\alpha\times\alpha and stated theorem 0.5 below to calculate its order-type γ(α)\gamma(\alpha). Subsequent corollaries provided a bound for γ(α)\gamma(\alpha), its fixed points and a proof that infinite cardinals were among these fixed points (and so that cardinal addition and multiplication is trivial). In this second part, I’m going to provide the proof of this theorem.

First, we note that for all n<ωn<\omega, there is only one order-type for n×nn\times n, since the notion of cardinal and ordinal is the same for finite sets. So indeed

n<ω,γ(n)=|n×n|=n2\forall n<\omega,\gamma(n)={|n\times n|}=n^{2}

and taking the limit, we get our first infinite fixed point:

γ(ω)=supn<ωγ(n)=ω\gamma(\omega)=\sup_{n<\omega}{\gamma(n)}=\omega

For all α\alpha the ordering on (α+1)×(α+1){(\alpha+1)}\times{(\alpha+1)} is as follows: first the elements of α×α\alpha\times\alpha ordered as γ(α)\gamma(\alpha), followed by the elements (ξ,α)(\xi,\alpha) for 0ξ<α0\leq\xi<\alpha, followed by the elements (α,ξ)(\alpha,\xi) for 0ξα0\leq\xi\leq\alpha. Hence

α,γ(α+1)=γ(α)+α2+1\forall\alpha,{\gamma(\alpha+1)}={\gamma(\alpha)}+\alpha\cdot 2+1

From this, we can try to calculate the next values of γ\gamma after ω\omega:

γ(ω+1)=ω+ω2+1=ω3+1{\gamma(\omega+1)}=\omega+{\omega\cdot 2}+1={\omega\cdot 3}+1
γ(ω+2)=ω3+1+ω+1+ω+1+1=ω5+2{\gamma(\omega+2)}={\omega\cdot 3+1}+\omega+1+\omega+1+1={\omega\cdot 5}+2

The important point is 1+ω=ω1+\omega=\omega ; in general we will use several times the property α+ωβ=ωβ\alpha+\omega^{\beta}=\omega^{\beta} if α<ωβ\alpha<\omega^{\beta}. By a simple recurrence (see proposition 0.1), we can generalize the expression to arbitrary nn:

γ(ω+n)=ω(2n+1)+n{\gamma(\omega+n)}=\omega\left(2n+1\right)+n

and so taking the limit

γ(ω2)=ω2{\gamma(\omega\cdot 2)}=\omega^{2}

which is a limit ordinal not fixed by γ\gamma. We note another point that will be used later: taking the limit “eliminates” the smallest terms. More generally, we can perform the same calculation, starting from any arbitrary αω\alpha\geq\omega:

Proposition 0.1.

For any limit ordinal α\alpha and 1n<ω1\leq n<\omega we have

γ(α+n)=γ(α)+α2n+n\gamma(\alpha+n)=\gamma(\alpha)+{\alpha\cdot{2n}}+n
Proof.

For n=1n=1 this is the relation for γ(α+1)\gamma(\alpha+1) explained above. If the relation is true for nn then

γ(α+n+1)=γ(α+n)+(α+n)2+1\gamma(\alpha+n+1)=\gamma(\alpha+n)+{\left(\alpha+n\right)\cdot 2}+1
γ(α+n+1)=γ(α)+α2n+n+α+n+α+n+1\gamma(\alpha+n+1)=\gamma(\alpha)+{\alpha\cdot{2n}}+n+\alpha+n+\alpha+n+1

and since αω\alpha\geq\omega we have n+α=αn+\alpha=\alpha and so

γ(α+n+1)=γ(α)+α(2n+1)+n+1\gamma(\alpha+n+1)=\gamma(\alpha)+{\alpha\cdot{(2n+1)}}+n+1

Which shows the result at step n+1n+1. ∎

As above, we can take the limit and say γ(α+ω)=supn<ωγ(α+n)=γ(α)+αω{\gamma(\alpha+\omega)}=\sup_{n<\omega}{\gamma(\alpha+n)}=\gamma(\alpha)+% \alpha\cdot\omega which is consistent with γ(ω2)=ω+ω2=ω2{\gamma(\omega\cdot 2)}=\omega+\omega^{2}=\omega^{2}. However, if we consider the Cantor Normal Form α=ωβ1n1++ωβknk\alpha=\omega^{\beta_{1}}n_{1}+...+\omega^{\beta_{k}}n_{k}, then for all n<ωn<\omega we have can use the fact that “ωβ1\omega^{\beta_{1}} will eliminate smaller terms on its left” that is αn=ωβ1(n1n)+ωβ2n2++ωβknk\alpha\cdot n=\omega^{\beta_{1}}{(n_{1}n)}+\omega^{\beta_{2}}n_{2}+...+\omega^% {\beta_{k}}n_{k}. Then using the fact that “taking the limit eliminates the smallest terms” we get αω=supn<ωαn=ωβ1+1\alpha\cdot\omega=\sup_{n<\omega}\alpha\cdot n=\omega^{\beta_{1}+1}. So actually, we have a nicer formula where αω\alpha\cdot\omega is put in Cantor Normal Form:

αω,γ(α+ω)=γ(α)+ωlogω(α)+1\forall\alpha\geq\omega,{\gamma(\alpha+\omega)}=\gamma(\alpha)+\omega^{\log_{% \omega}(\alpha)+1}

This can be generalized by the following proposition:

Proposition 0.2.

For any αω\alpha\geq\omega and β1\beta\geq 1 such that logω(α)+1β\log_{\omega}(\alpha)+1\geq\beta we have

γ(α+ωβ)=γ(α)+ωlogω(α)+β{\gamma(\alpha+\omega^{\beta})}=\gamma(\alpha)+\omega^{\log_{\omega}(\alpha)+\beta}
Proof.

We prove by induction on β\beta that for all such α\alpha the expression is true. We just verified β=1\beta=1 and the limit case is obvious by continuity of γ\gamma and of the sum/exponentiation in the second variable. For the successor step, if logω(α)+1β+1\log_{\omega}(\alpha)+1\geq\beta+1 then a fortiori 1n<ω,logω(α+ωβn)+1β+1β\forall 1\leq n<\omega,\log_{\omega}(\alpha+\omega^{\beta}\cdot n)+1\geq\beta+% 1\geq\beta. We can then use the induction hypothesis to prove by induction on 1n<ω1\leq n<\omega that γ(α+ωβn)=γ(α)+ωlogω(α)+βn{\gamma(\alpha+{\omega^{\beta}\cdot n})}=\gamma(\alpha)+\omega^{\log_{\omega}(% \alpha)+\beta}\cdot n. For n=1n=1, this is just the induction hypothesis of β\beta (for the same α\alpha!). For the successor step, we need to use the induction hypothesis of β\beta (for α+ωβn\alpha+\omega^{\beta}\cdot n) which is γ(α+ωβ(n+1))=γ(α+ωβn)+ωlogω(α)+β{\gamma(\alpha+{\omega^{\beta}\cdot{(n+1)}})}={\gamma(\alpha+{\omega^{\beta}% \cdot n})}+\omega^{\log_{\omega}(\alpha)+\beta}. Finally, γ(α+ωβ+1)=sup1n<ωγ(α+ωβn)=γ(α)+ωlogω(α)+β+1{\gamma(\alpha+{\omega^{\beta+1}})}={\sup_{1\leq n<\omega}{\gamma(\alpha+{% \omega^{\beta}\cdot n})}}=\gamma(\alpha)+\omega^{\log_{\omega}(\alpha)+\beta+1}, as wanted. ∎

For all α1\alpha\geq 1, logω(ωα)+1=α+1\log_{\omega}(\omega^{\alpha})+1=\alpha+1 so the previous paragraph also gives γ(ωα+1)=γ(ωα+ωα+1)=γ(ωα)+ωα2+1{\gamma(\omega^{\alpha+1})}=\gamma\left(\omega^{\alpha}+\omega^{\alpha+1}% \right)={\gamma(\omega^{\alpha})}+\omega^{\alpha\cdot 2+1}. Then, we find

γ(ω2)=ω+ω3=ω3\gamma(\omega^{2})=\omega+\omega^{3}=\omega^{3}
γ(ω3)=ω3+ω5=ω5\gamma(\omega^{3})=\omega^{3}+\omega^{5}=\omega^{5}

And more generally by induction on n<ωn<\omega, we can show that

γ(ωn+1)=ω2n+1\gamma(\omega^{n+1})=\omega^{2n+1}

Then we deduce another fixed point

γ(ωω)=sup1n<ωγ(ωn)=ωω\gamma(\omega^{\omega})={\sup_{1\leq n<\omega}\gamma\left({\omega^{n}}\right)}% =\omega^{\omega}

The following proposition tries to generalize the expression of γ(ωα+1){\gamma(\omega^{\alpha+1})}.

Proposition 0.3.

For any α,β1\alpha,\beta\geq 1 such that logω(α)>logω(β)\log_{\omega}(\alpha)>\log_{\omega}(\beta) we have

γ(ωα+β)=γ(ωα)+ωα2+β{\gamma(\omega^{\alpha+\beta})}={\gamma(\omega^{\alpha})}+\omega^{\alpha\cdot 2% +\beta}
Proof.

This is done by induction on β<α\beta<\alpha for a fixed α\alpha. We already verified the case β=1\beta=1 in the previous paragraph and the limit case is obvious by continuity of γ\gamma and of the sum/exponentiation in the second variable. For the successor step, we have γ(ωα+β+1)=γ(ωα+β)+ω(α+β)2+1{\gamma(\omega^{\alpha+\beta+1})}={\gamma(\omega^{\alpha+\beta})}+\omega^{{(% \alpha+\beta)}\cdot 2+1} and by induction hypothesis, γ(ωα+β)=γ(ωα)+ωα2+β{\gamma(\omega^{\alpha+\beta})}={\gamma(\omega^{\alpha})}+\omega^{\alpha\cdot 2% +\beta}. Since logω(α)>logω(β+1)=logω(β)\log_{\omega}(\alpha)>\log_{\omega}(\beta+1)=\log_{\omega}(\beta) we have (α+β)2+1=α+β+α+β+1=α2+β+1>α2+β{(\alpha+\beta)}\cdot 2+1=\alpha+\beta+\alpha+\beta+1=\alpha\cdot 2+\beta+1>% \alpha\cdot 2+\beta and so ωα2+β+ωα2+β+1=ωα2+β+1\omega^{\alpha\cdot 2+\beta}+\omega^{\alpha\cdot 2+\beta+1}=\omega^{\alpha% \cdot 2+\beta+1}. Finally, γ(ωα+β+1)=γ(ωα+β)+ωα2+β+1{\gamma(\omega^{\alpha+\beta+1})}={\gamma(\omega^{\alpha+\beta})}+\omega^{% \alpha\cdot 2+\beta+1} as wanted. ∎

For any 1n<ω1\leq n<\omega and α1\alpha\geq 1, if 1β<ωα1\leq\beta<\omega^{\alpha} then logω(β)<α=logω(ωαn)\log_{\omega}(\beta)<\alpha=\log_{\omega}(\omega^{\alpha}\cdot n). Hence proposition 0.3 gives γ(ωωαn+β)=γ(ωωαn)+ωωα2n+β{\gamma(\omega^{{\omega^{\alpha}\cdot n}+\beta})}={\gamma(\omega^{{\omega^{% \alpha}\cdot n}})}+\omega^{{\omega^{\alpha}\cdot{2n}}+\beta} Then by continuity of γ\gamma and of the sum/exponentiation in the second variable, we can consider the limit βωα\beta\rightarrow\omega^{\alpha} to get γ(ωωα(n+1))=γ(ωωαn)+ωωα(2n+1){\gamma(\omega^{{\omega^{\alpha}\cdot(n+1)}})}={\gamma(\omega^{{\omega^{\alpha% }\cdot n}})}+\omega^{{\omega^{\alpha}\cdot{(2n+1)}}}. So continuing our calculation we have

γ(ωω2)=ωω+ωω3=ωω3{\gamma(\omega^{\omega\cdot 2})}=\omega^{\omega}+\omega^{\omega\cdot 3}=\omega% ^{\omega\cdot 3}
γ(ωω3)=ωω3+ωω5=ωω5{\gamma(\omega^{\omega\cdot 3})}=\omega^{\omega\cdot 3}+\omega^{\omega\cdot 5}% =\omega^{\omega\cdot 5}

and taking the limit we find another fixed point

γ(ωω2)=ωω2{\gamma(\omega^{\omega^{2}})}=\omega^{\omega^{2}}

then again

γ(ωω22)=ωω2+ωω23=ωω23{\gamma(\omega^{\omega^{2}\cdot 2})}=\omega^{\omega^{2}}+\omega^{\omega^{2}% \cdot 3}=\omega^{\omega^{2}\cdot 3}
γ(ωω23)=ωω23+ωω25=ωω25{\gamma(\omega^{\omega^{2}\cdot 3})}=\omega^{\omega^{2}\cdot 3}+\omega^{\omega% ^{2}\cdot 5}=\omega^{\omega^{2}\cdot 5}

and taking the limit we find another fixed point

γ(ωω3)=ωω3{\gamma(\omega^{\omega^{3}})}=\omega^{\omega^{3}}

More generally, we have the following proposition:

Proposition 0.4.

For any ordinal α\alpha and 1n<ω1\leq n<\omega we have

γ(ωωαn)=ωωα(2n-1){\gamma(\omega^{{\omega^{\alpha}\cdot n}})}=\omega^{{\omega^{\alpha}\cdot{(2n-% 1)}}}
Proof.

From the relation γ(ωωα(n+1))=γ(ωωαn)+ωωα(2n+1){\gamma(\omega^{{\omega^{\alpha}\cdot(n+1)}})}={\gamma(\omega^{{\omega^{\alpha% }\cdot n}})}+\omega^{{\omega^{\alpha}\cdot{(2n+1)}}}, we deduce by induction on nn that

n2,γ(ωωαn)=γ(ωωα)+ωωα(2n-1)\forall n\geq 2,{\gamma(\omega^{{\omega^{\alpha}\cdot n}})}={\gamma(\omega^{{% \omega^{\alpha}}})}+\omega^{{\omega^{\alpha}\cdot{(2n-1)}}}

Taking the limit nωn\rightarrow\omega,we obtain

γ(ωωα+1)=γ(ωωα)+ωωα+1{\gamma(\omega^{{\omega^{\alpha+1}}})}={\gamma(\omega^{{\omega^{\alpha}}})}+% \omega^{{\omega^{\alpha+1}}}

We can then show by induction that all the ωωα\omega^{\omega^{\alpha}} are actually fixed points, using the previous relation at successor step, the continuity of γ\gamma at limit step and the fact that γ(ω)=ω\gamma(\omega)=\omega. This means

γ(ωωα)=ωωα=ωωα(2×1-1){\gamma(\omega^{{\omega^{\alpha}}})}=\omega^{{\omega^{\alpha}}}=\omega^{{% \omega^{\alpha}\cdot{(2\times 1-1)}}}

Then for n2n\geq 2, we get

γ(ωωαn)=γ(ωωα)+ωωα(2n-1)=ωωα+ωωα(2n-1)=ωωα(2n-1){\gamma(\omega^{{\omega^{\alpha}\cdot n}})}={\gamma(\omega^{{\omega^{\alpha}}}% )}+\omega^{{\omega^{\alpha}\cdot{(2n-1)}}}={\omega^{{\omega^{\alpha}}}}+\omega% ^{{\omega^{\alpha}\cdot{(2n-1)}}}=\omega^{{\omega^{\alpha}\cdot{(2n-1)}}}

Equipped with these four propositions, we have a way to recursively calculate γ\gamma. We are ready to prove the main theorem:

Theorem 0.5.

For all ordinal α\alpha, we denote γ(α)\gamma(\alpha) the order-type of the canonical ordering of α×α\alpha\times\alpha. Then γ\gamma can be calculated as follows:

  1. 1.

    Finite Ordinals: For any n<ωn<\omega we have

    γ(n)=n2\gamma(n)=n^{2}
  2. 2.

    Limit Ordinals: For any limit ordinal α\alpha,

    1. (a)

      If ωlogω(logω(α))\omega^{\log_{\omega}\left(\log_{\omega}\left(\alpha\right)\right)} does not divide logω(α)\log_{\omega}(\alpha) then

      γ(α)=ωlogω(α)α\gamma(\alpha)=\omega^{\log_{\omega}(\alpha)}\cdot\alpha
    2. (b)

      Otherwise, we write α=ωlogω(α)n+ρ\alpha={\omega^{\log_{\omega}(\alpha)}n}+\rho for some n1n\geq 1. If n2n\geq 2 then

      γ(α)=ωlogω(α)(ωlogω(α)(n-1)+ρ)\gamma(\alpha)=\omega^{\log_{\omega}(\alpha)}\cdot\left({\omega^{\log_{\omega}% (\alpha)}\cdot{(n-1)}}+\rho\right)

      (like the first case but we “decrement nn in the second factor”)

    3. (c)

      Otherwise, α=ωlogω(α)+ρ\alpha={\omega^{\log_{\omega}(\alpha)}}+\rho and we write logω(α)=ωlogω(logω(α))m\log_{\omega}(\alpha)=\omega^{\log_{\omega}\left(\log_{\omega}\left(\alpha% \right)\right)}m for some m1m\geq 1. We have

      γ(α)=ωlogω(α)(ωωlogω(logω(α))(m-1)+ρ)\gamma(\alpha)=\omega^{\log_{\omega}(\alpha)}\cdot\left(\omega^{\omega^{\log_{% \omega}\left(\log_{\omega}\left(\alpha\right)\right)}\cdot\left(m-1\right)}+% \rho\right)

      (like the first case but we “decrement mm in the second factor”)

  3. 3.

    Infinite Successor Ordinals: For any limit ordinal α\alpha and 1n<ω1\leq n<\omega we have

    γ(α+n)=γ(α)+α2n+n\gamma(\alpha+n)=\gamma(\alpha)+{\alpha\cdot{2n}}+n

    where γ(α)\gamma(\alpha) is determined as in the previous point.

Proof.

The “Finite Ordinals” has been discussed at the beginning and the “Infinite Successor Ordinals” is proposition 0.1. Now let’s consider the Cantor Normal Form ωβ1n++ωβknk\omega^{\beta_{1}}n+...+\omega^{\beta_{k}}n_{k} of a limit ordinal αω\alpha\geq\omega (so βk1\beta_{k}\geq 1 and β1=logω(α)\beta_{1}=\log_{\omega}(\alpha)). First, from proposition 0.2 we can make successively extract the nkn_{k} terms ωβk\omega^{\beta_{k}} (by left-multiplying them by ωβ1\omega^{\beta_{1}}), then the nk-1n_{k-1} terms ωβk-1\omega^{\beta_{k-1}}, … then the n2n_{2} terms ωβ2\omega^{\beta_{2}} and finally n-1n-1 terms ωβ1\omega^{\beta_{1}}. We obtain:

γ(α)=γ(ωβ1)+ωβ1(ωβ1(n-1)+ωβ2n2++ωβknk){\gamma(\alpha)}={\gamma(\omega^{\beta_{1}})}+{\omega^{\beta_{1}}\left(\omega^% {\beta_{1}}{(n-1)}+\omega^{\beta_{2}}n_{2}+\dots+\omega^{\beta_{k}}n_{k}\right)}

We now write β1=ωδm+σ\beta_{1}=\omega^{\delta}m+\sigma where δ=logωβ1\delta=\log_{\omega}{\beta_{1}} and m,σm,\sigma are the quotient and remainder of the Euclidean division of β1\beta_{1} by ωδ\omega^{\delta}. We can then use proposition 0.3 to extract σ\sigma:

σ=0γ(ωβ1)=γ(ωωδm)\sigma=0\implies{\gamma(\omega^{\beta_{1}})}={\gamma(\omega^{\omega^{\delta}m})}
σ0γ(ωβ1)=γ(ωωδm)+ωωδ(2m)+σ\sigma\neq 0\implies{\gamma(\omega^{\beta_{1}})}={\gamma(\omega^{\omega^{% \delta}m})}+\omega^{\omega^{\delta}\cdot{(2m)}+\sigma}

Finally, using proposition 0.4 we obtain

γ(ωωδm)=ωωδ(2m-1){\gamma(\omega^{\omega^{\delta}m})}=\omega^{{\omega^{\delta}\cdot{(2m-1)}}}

σ0\sigma\neq 0 means that ωδ=ωlogω(logω(α))\omega^{\delta}=\omega^{\log_{\omega}\left(\log_{\omega}\left(\alpha\right)% \right)} does not divide β1=logω(α)\beta_{1}={\log_{\omega}(\alpha)}. In that case, ωωδ(2m)+σ>ωωδ(2m-1)\omega^{\omega^{\delta}\cdot{(2m)}+\sigma}>\omega^{{\omega^{\delta}\cdot{(2m-1% )}}} and so γ(ωβ1)=ωωδ(2m)+σ{\gamma(\omega^{\beta_{1}})}=\omega^{\omega^{\delta}\cdot{(2m)}+\sigma}. We note that β12=ωδm+σ+ωδm+σ=ωδ(2m)+σ\beta_{1}\cdot 2=\omega^{\delta}m+\sigma+\omega^{\delta}m+\sigma=\omega^{% \delta}{(2m)}+\sigma since the remainder σ\sigma is less than ωδ\omega^{\delta}. So actually γ(ωβ1)=ωβ1ωβ1{\gamma(\omega^{\beta_{1}})}=\omega^{\beta_{1}}\omega^{\beta_{1}}. Coming back to the expression of γ(α)\gamma(\alpha), this term can be grouped with ωβ1(n-1)\omega^{\beta_{1}}{(n-1)} to recover the Cantor Normal Form of α\alpha and we finally get γ(α)=ωβ1α\gamma(\alpha)=\omega^{\beta_{1}}\alpha.

Otherwise, σ=0\sigma=0, β1=ωδm\beta_{1}=\omega^{\delta}m and

γ(ωβ1)=γ(ωωδm)=ωωδ(2m-1)=ωβ1ωωδ(m-1){\gamma(\omega^{\beta_{1}})}={\gamma(\omega^{\omega^{\delta}m})}=\omega^{{% \omega^{\delta}\cdot{(2m-1)}}}={\omega^{\beta_{1}}{\omega^{\omega^{\delta}% \cdot{(m-1)}}}}

But then β1=ωδm>ωδ(m-1)\beta_{1}=\omega^{\delta}m>{\omega^{\delta}{(m-1)}} and so ωβ1ωβ1>ωβ1ωωδ(m-1)\omega^{\beta_{1}}\omega^{\beta_{1}}>\omega^{\beta_{1}}\omega^{{\omega^{\delta% }\cdot{(m-1)}}}. Hence if n2n\geq 2, the term γ(ωβ1){\gamma(\omega^{\beta_{1}})} is eliminated in the expression of γ(α)\gamma(\alpha) and it remains

γ(α)=ωβ1(ωβ1(n-1)+ρ){\gamma(\alpha)}={\omega^{\beta_{1}}\left(\omega^{\beta_{1}}{(n-1)}+\rho\right)}

where ρ=ωβ2n2++ωβknk\rho=\omega^{\beta_{2}}n_{2}+\dots+\omega^{\beta_{k}}n_{k}. If instead n=1n=1, then the term ωβ1(n-1)\omega^{\beta_{1}}{(n-1)} is zero and it remains

γ(α)=ωβ1(ωωδ(m-1)+ωβ2n2++ωβknk){\gamma(\alpha)}={\omega^{\beta_{1}}\left(\omega^{{\omega^{\delta}\cdot{(m-1)}% }}+\omega^{\beta_{2}}n_{2}+\dots+\omega^{\beta_{k}}n_{k}\right)}