Type Good Pictures


A Case of 畫

How does the Cantonese Font work? Here’s how the character 畫 gets drawn and become context-sensitive.

A Case of 畫

While most of the Early Access users are somewhere else, I did show the Cantonese Font to a half-dozen people in Real Life™. These showing passes through the same stages:

  1. Politely, “oh, that’s cute”
  2. (Pause. Thinks.)
  3. “But there are many ways to read a character”
    • “What do you mean The Font Knows?”
  4. (Types/speaks the most convoluted sentence they have in mind. 一名出名嘅名字學家; 行人 行入 行山銀行 行房)
  5. (Long pause)
  6. “How does it work???” 🤯 🤯 🤯

The documentations offers an algorithmic explanation, but an example would speak to more people. Let’s use 畫 (“draw”) as a relatively straight-forward example. (I need a few stiff drinks before I can talk about some other characters. 行 and 彈, I’m looking at you.) We will talk about the process as if one step happen before another, and as if we only work on one character at a time. In reality this is iterative with frequent back-tracking (often spaced weeks apart), and tens of characters are being worked on in parallel.


Birds’ Eye View

From a distant, I do three things for a character:

  1. draw pictures for all possible readings,
  2. identify the default reading,
  3. prepare adaptations for all alternate “words”

Let’s look at this for 畫.

1. Establish readings

畫 roughly means “to draw (verb)” as well as “picture (noun)” and “picture-related (adj)”. When the character is used as a verb, it is pronounced as `waak6`. In the other cases it is either waa2 or waa6.

We can verify that this is exhaustive by querying

  1. K T Shek’s 粵音資料集叢. (Pseudonymous Shek is one of my heroes. He single-handedly digitized tens of historical Cantonese dictionaries and made them easily searchable.), and
  2. Prof Qin Lu at Polytechnic University HK’s Cantonese character pronunciations

Our selection of waak6, waa2, and waa6 are indeed exhaustive. Establishing this definitively is extremely important, as it confines what could be expressed.

At this point, I (prepare a data file that lets us (use Nathan Hammond’s Opentype.js code to)) draw some pictures. I then (run some Elixir code to)1 do some transforms, so the final SVG looks like this:

The dotted lines are the bounding boxes for the SVG. “Jon, they are off! The bottoms are cut off!” Yes they are. The fonts only work if the boxes are cut off by this exact amount.2

2. Decide on a Default Reading

At this point I need to make a critical decision on what ought to be the default reading. That is, if the user type 畫 as a standalone character, what should we show?

There is no hard and fast rules here; it’s case by case. Characters have multiple readings for different reasons (I counted nine categories of reasons). There are certain rule-of-thumbs; if a character can be used as a noun or a verb, it is generally a good idea to pick the reading of the verb. This is because the noun is often found3 in specific ways (you’ll see this for 畫 a moment later) but as a verb it’s often messy and hard to use neighbours as context.

畫 being both a verb and a noun, I picked waak6 as the default reading.

3. Adapt for Alternative Readings

Now comes the tricky part. When does 畫 becomes waa6 or waa2?

畫, “officially” (書面語), is waa6; colloquially (白話) it is waa2. What this means, again, needs to be decided on a case-by-case basis. We can think of three types of examples:

  1. 一幅畫: most likely colloquial
  2. 圖畫, 繪畫: could be either4
  3. 畫舫: most likely classical

In general, with an eye on Cantonese learning and preservation, the “either”s will get assigned the colloquial reading. But these are for the fine-tuning later, because the decade-long hard work by the 粵典 Words.hk group is going to resolve a large number of these.

3.1 Words.hk sets

粵典 (words.hk; “Cantonese Dictionary”) is a monumental undertaking by SF and Chaak, first opened to the public in 2014 on Buddha’s Birthday. They seek to “explain Cantonese with Cantonese” and places their work in the public domain. A decade later5 this is the most comprehensive source of Cantonese usage.

For each character, I compile a list of usage from Words.hk, together with how the character sounded in that context. For 畫, this means 作畫, 壁畫, 動畫, 繪畫*, 一筆畫, 漫畫, 四格漫畫*, 圖畫, 版畫, 落畫, 水墨畫, 勾畫, 插畫, 書畫, 上畫, 年畫, 繪畫, 解畫, 如詩如畫, 山水畫, 抽象畫, 字畫, 掛畫, 國畫, 換畫, 刻畫, 油畫, 畫畫, 畫公仔唔駛畫出腸, 畫餅充飢 / 畫餅充饑, 畫風, 畫地為牢*, 畫壞鍾馗, 畫功, 畫師, 畫龍點睛, 畫鬼腳, 畫龜, 畫作, 畫廊, 畫面, 鬼畫符, 畫眉*, 畫花, 畫家帽, 畫家, 依樣畫葫蘆, 畫像, 畫筆, 詩情畫意, 畫眉*, 畫板, 畫質, 畫蛇添足, 嶺南畫派, 畫押, and 畫屏. (Phew.)

Of these, some like 畫龜 (“draw turtle”; slang for “signing (signature)”) are pronounced as default readings, and will need no adjustments. These are discarded, and we are left with the rest.

Then there are ones that are ambiguous: 畫眉, for example, can be waak6 mei4 (painting eyebrows) or waa6 mei2 (a kind of bird). I decide case-by-case whether it ought to be one or the other, taking into account relative usage, and how it would behave in the font (e.g., how it would likely interact with neighboring characters, including (non-) word separators such as | and \).

3.2 …and the rest

Parallel with the Words.hk, I draw up my own list of “how Jon thinks the character can be used”. This, for 畫, looked like: x幅_, x嘅_, x張_, _紙, 畫工, 寫實畫,抽像畫,漫畫,插畫,靜物畫,人體畫,肖像畫,風景畫,x色畫,西洋畫,版畫,動畫,壁畫,繪畫,x筆畫,作畫,畫畫,圖畫,油畫,x彩畫, x墨畫, x國畫, 畫廊, 畫室, 畫舫, 畫師, 畫具, 畫家, 畫聖. This list is de-duped against the Words.hk list. (What’s _? I scribble lots and lots and lots of these on paper. _ saves me some time.)

You’d notice that my list have all the “x”. These are cases where the bi-gram could be adapted for a number of other cases. The easiest to see is “x幅_” (x classifier picture): does it matter how many pictures x there are? Some might be surprising; for example, most native speakers would have gone 水彩畫 (“watercolor paint picture”) — but 亞加力彩畫 (“acrylic paint picture”) is also a possibility.

This process relies on language familiarity, lots of searches, and asking people. It is (drum roll 🥁) case by case.6

3.3 Write some pairs of rules

Once I know what adaptations are needed, I can start working on the how. (This is going to be a little technical.)

OpenType font specifications include what’s known as GSUB glyphs substitution. The classic use-case for this is when “f” and “l” are neighbors of one another; the combination f+l is then replaced by a new glyph fl (look closely: this is one glyph that looks like the two are joint together). How does the font know this? The font “knows” this because the font-maker specifies the rules in the following form:

sub f l by fl;

This tells the font “when you see f followed by l, substitute it with the glyph named fl”.

Naively you’d think I can write rules like:

sub 畫 畫 by 畫畫;

…but there isn’t a glyph called “畫畫”. In Canto Font 2, each glyph is its own picture.7

Essential Detour

What I haven’t told you is that each of the SVG (pictures) drawn must be named in a specific way: the unicode if it is the default reading, the unicode.jyutping if it is not. That is, for 畫, the three in the above picture were uni756B.svg, uni756B.waa2.svg, and uni756B.waa6.svg respectively. Inside the font files, the glyphs are named in the same way, and it is these names that we use in font rules.

With the detour, you might think the rule looks like:

sub uni756B uni756B by uni756B uni756B.waa2;

But no, OpenType does one-to-one substitutions, many-to-one (ligatures like above), and one-to-many, but not many-to-many 😢

What needs to happen is to prepare a pair of rules: first many-to-one, then one-to-many:

sub uni756B uni756B by some_phantom_glyph;
sub some_phantom_glyph by uni756B uni756B.waa2;

Concurrent with that, some_phantom_glyph (and its placeholder) needs to be created in the font. Once we do this, 畫畫 will be correctly displayed as waak6 waa2.

Why Canto Font doesn’t work in Adobe Illustrator

Phantom glyph is where Mac/Adobe Illustrator gets tripped up. For some reason, Illustrator performs the first rule and stops; it never processes the second rule (and so you see a 一 because that is the place-holder content in the phantome_glyphs.) Someone in Adobe knows that it should not be the case, since Mac/InDesign doesn’t have this bug and neither does Win/Illustrator. Those font renderers, however, fails in their other special ways. Adobe, the type company who brings OT-SVG to the table, fails in Myriad ways in their font rendering.

(Can you tell I am unhappy with Adobe, the company of Dark Patterns?)

畫畫 is done; now I repeat for all the other contexts of 畫.

4. Testing

When that is done, we can do “integration tests”. That is browsing a bunch of web pages with the Canto Font. For example, on Yue Wikipedia we have the following:

Which suggests that “(x)種畫” needs to be added. I would, however, not do this. Unlike 幅畫, I judge there to be an equal likelihood that we encounter 呢種畫 as 呢種畫法; it’s not worth trading one error for another.

When 畫 is deemed OK, I do all of the above again, and again, for 29,103 other characters.

Outro

  1. “…you did all that for one character? And you do it over and over?”
  2. (pause)
  3. what is your problem?”

I get questioned occasionally (ok, all the time) about my obsessiveness. Part of it can’t be helped; it’s just who I am.8 Part of it is… escapism. For several years now, life was all problems. Difficult work gave me refuge from the uncertainties, injustices, and difficulties of everyday life.

The most significant factor, perhaps, is when I realized how this just won’t ever get done again. Whatever state the Font is in would be the foundation that other people build on for years and years, and I owe them (you) to get this as right as I can.


  1. The font is 90% a variety of bespoke code, and 10% grunt work. The time spent is the opposite: 90% grunt work, and 10% on code. The need for grunt work comes from the (1) exceptions in the language (for which there are many), and (2) errors in the sources. The volume we’re talking about here is immense: sources with 99.9% accuracy gives me tens/hundreds of errors to fix (sometimes the errors are silent). One thing that is great working with fonts is how fastidious they is: fonts just refuses to compile unless they are perfect. ↩︎
  2. Curious font-makers crack open the fonts and tell me there are errors. There are many errors: this color + CJK + 150,000 rules font-family would compile and works cross-platform because of this delicate combination of errors. ↩︎
  3. as a bi-gram or tri-gram ↩︎
  4. , in fact, could be all three. ↩︎
  5. It is no co-incidence that Canto Fonts 2 is available 2024-05-15: that is the Buddha’s Birthday one decade later! ↩︎
  6. I get asked sometimes if I can make “Mandarin Fonts” or even “Thai Fonts” (?) and the answer is no. I just don’t know Mandarin well enough. In fact, the Cantonese Fonts is really only for Traditional Chinese. I think 画 is the equivalent of 畫, but there is not necessarily 1:1 correspondence and I do not make this conjecture. ↩︎
  7. To be precise, each {character, jyutping} is a pair of pictures. One is colored; one is black-and-white for the case that an application can display monochrome but not color ↩︎
  8. Gattaca is one of my favorite movies. In one scene, the protagonist Victor made “no errors in a million keystroke”. I totally identify with the dude. ↩︎

Jon Chui Avatar

About author

Comments

Leave a Reply

Blog at WordPress.com.

Discover more from Visual Fonts

Subscribe now to keep reading and get access to the full archive.

Continue reading