Unicode Combining Characters and Font Fallback

Posted by: Alberto.Denzler on 19 November 2018, 12:14 am EST

  • Posted 19 November 2018, 12:14 am EST

    Hello there,

    We are evaluating Documents for PDF and I found some minor issues when using Combining Characters and/or Clusters:
    - Combining Diacritical Marks are not rendered over the previous character but as a separate character.
    - Thai Combining Characters are overwritten

    For example the standard font Helvetica uses for the decomposed ö character a Yu Gothic Fallback printing a big O and making a double line height.

    By adding a Fallback to "Arial" and "Leelawadee UI" the output is better, still Helvetica uses a big O and Courier New print the combining characters separately.

    The Courier New issue can be fixed by manually converting the decomposed characters into their equivalent combined form (when possible). I understand that it's an understandable API design to not do that automatically withing Documents for PDF.

    As a comparation the same output with Microsoft Word 2016 and the source code.

    Actually we are using a library where we have to handle the font fallback manually, it requires much work (character clusters are Font dependent), especially for a cross platform application. A library that can handle this well would be welcome.

    Thank you


    // Create a new PDF document:
    var doc = new GrapeCity.Documents.Pdf.GcPdfDocument();
    // Add a page, get its graphics:
    var g = doc.NewPage().Graphics;
    // Get some fonts
    var fc = new GrapeCity.Documents.Text.FontCollection();
    fc.AppendFallbackFonts(fc.FindFamilyName("Leelawadee UI"));
    g.FontCollection = fc;
    GrapeCity.Documents.Text.TextFormat[] textFormats =
    new GrapeCity.Documents.Text.TextFormat() { Font = GrapeCity.Documents.Pdf.StandardFonts.Times },
    new GrapeCity.Documents.Text.TextFormat() { Font = GrapeCity.Documents.Pdf.StandardFonts.Courier },
    new GrapeCity.Documents.Text.TextFormat() { Font = GrapeCity.Documents.Pdf.StandardFonts.Helvetica },
    new GrapeCity.Documents.Text.TextFormat() { Font = fc.FindFamilyName("Arial") },
    new GrapeCity.Documents.Text.TextFormat() { Font = fc.FindFamilyName("Courier New") },
    new GrapeCity.Documents.Text.TextFormat() { Font = fc.FindFamilyName("Segoe UI Symbol") },
    new GrapeCity.Documents.Text.TextFormat() { Font = fc.FindFamilyName("Leelawadee UI") }
    // Build Text Layout
    var tl = g.CreateTextLayout();
    tl.DefaultFormat.FontSize = 12;

    // Print some text using various fonts
    var text = ": Composed:ö\u1E8D, Decomposed:o\u0308x\u0308, Cluster:p\u0308 \u0E17\u0E37\u0E49\u0E48\n";
    foreach (var tf in textFormats)
    tl.Append(tf.Font.FontFamilyName + text, tf);
    doc.Pages.Last.Graphics.DrawTextLayout(tl, new PointF(72,72));

    // Use GcPdfDocument object to set the FontEmbedMode
    doc.FontEmbedMode = GrapeCity.Documents.Pdf.FontEmbedMode.EmbedFullFont;

    // Save the PDF:
    var filename = "DecomposedText.pdf";
  • Marked as Answer

    Replied 19 November 2018, 10:24 pm EST

    Hi Alberto,

    > - Combining Diacritical Marks are not rendered over the previous character but as a separate character.

    That depends on the specific font. Some fonts have complicated GPOS table to help coping with these issues. In other cases we just output characters as they are.

    The most difficult question is how to match the best fallback fonts. The default set of fallback fonts and their application is not ideal, as you have mentioned. To disable completely the default fallbacks add the following line at the beginning of your app:

    SystemFontCollection.NoSystemFonts = true;

    then add fonts manually with AppendFallbackFonts, as in your sample.
    IMHO, in this specific situation "Calibri" or "Segoe UI" fonts work better than "Arial". The result seems to be acceptable with:

    fc.AppendFallbackFonts(fc.FindFamilyName("Leelawadee UI"));

    As for the issue with Courier New it looks strange that MS Word replaced characters for "o" and "x" but did not replace for "p". The font has not enough info in its GPOS and GSUB tables.

    > - Thai Combining Characters are overwritten

    Thai characters might require special adjustments. This item is currently in our to-do list.
    Could you please point out the specific issue with Thai Combining Characters? What was the expected result?

  • Replied 3 December 2018, 2:17 am EST

    Hi Andrey,

    We would like to avoid to handle manually the list of Fallback fonts, at least with a typical Windows installation. The mentioned öäü characters are widely used in various countries in Europe and the listed fonts are included in each Windows installation (https://en.wikipedia.org/wiki/List_of_typefaces_included_with_Microsoft_Windows).

    Could you provide a Font fallback solution that would work out of the box with a default Windows 10 Installation their fonts and the most common characters? For example with Latin-1 something like:

    About the Thai Character, it is shown in the code above. It's a single Cluster Character made of 4 Code Points. The 3 signs should be placed one above the other, as seen correctly when using Fallback Arial and Leelawadee (or in Word 2016). Without the Fallback Fonts all the default Grapecity Fonts (Nimbus) and even Arial show various signs overwritten.

    Thank You,
  • Replied 3 December 2018, 2:35 am EST

    P.S.: Word 2016 and Courier New: There exists a composed ö character but not a composed p¨ character, that's why it could not replace it. Still it could have printed the ¨ above the p as defined by the composed character using a fallback or perhaps a specification of ¨inside Courier New.
  • Replied 3 December 2018, 9:01 pm EST

    Hi Alberto,

    > We would like to avoid to handle manually the list of Fallback fonts, at least with a typical Windows installation.

    Yes, that's our aim as well. The problem is how to achieve it.

    > The mentioned öäü characters are widely used in various countries in Europe.

    Yes, but I see no problem with these characters when they are used as composed ones (with single code point). Also, there should be no problem with replacing several code points with a single glyph or re-positioning mark glyphs relative to the base glyph if the specific font provides the corresponding features.

    The issue may occur however if the font is missing features required for displaying several code points as a single cluster. Font fallbacks couldn't help here because the font formally supports all necessary code points. How can we become aware that we should switch to a fallback font? I don't currently see any easy solution.

    > For example with Latin-1 something like:
    > fc.AppendFallbackFonts(EnumFallbackFonts.Windows10.Latin1)

    We try to match the font style and don't take the target script into account. It is enough for the chosen fallback font to support the missing code points. For example, it is possible to take some Thai characters from an Arabic font if it includes necessary code points and is closer to the original font (with missing code points) by its style.

    We should probably take the script into account. That's a good idea. I'll try to elaborate it.

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels