After several days break I still plan to learn more about digital typography by writing a typesetting system inspired by TeX. Previously I wrote why I believe that this system should not be compatible with TeX and how using a general purpose programming language for typesetting will help. This post is about a different issue – how fonts would be represented in this system. For simplicity, I will not write about problems specific for math typesetting; text has enough problems for a post.
As stated by Vulis the plain TeX model of fonts is inadequate for e.g. academic publications. There each font is a set of 256 characters which for TeX are just boxes of specific size, combined with different characters by ligatures and kerning. Although a specific font may be scaled, this model does not provide any support for using different styles and sizes of fonts. Therefore macros used for books written by Donald Knuth (e.g. in Appendix E of The TeXbook), GNU Texinfo (texinfo.tex) and LaTeX 2.09 use static tables of different font definitions in several styles for some sizes. This approach makes using different font families clearly difficult.
Therefore LaTeX2e uses a different model, called the New Font Selection Scheme. There a font has the following attributes (from LaTeX2e font selection, the file fntguide.pdf in a TeX distribution):
- encoding
- the mapping of character commands to 8-bit character numbers in TeX fonts; font encodings define also ligatures used
- family
- this is commonly known as a typeface or font
- series
- e.g. medium or bold
- shape
- e.g. italic, roman, slanted, caps and small caps
- size
- the size of one em
This is clearly appropriate for the original Computer Modern fonts (the default fonts in LaTeX, the only ones known to be available in every TeX distribution since 1980s), but now it has at least the following problems:
- font encodings are a useless waste of time and hindrance for multilingual typesetting; I believe that Unicode will be enough for everybody (imagine that a list of all its characters would not fit in a typical book)
- slanted (or italic) small capitals cannot be easily represented in this scheme; the package
slantscallows their use as a different shape, exactly what the scheme was designed to avoid - usually font size is artificially limited to avoid scaling them (this was a problem before scalable fonts or automatic generation of bitmap fonts by
dvidrivers)
This is different with OpenType as used with e.g. XeTeX. There a font family has equivalents of series and only roman and italic shapes (making both italic and slanted roman fonts is too difficult without METAFONT). Things like small capitals or strange ligatures are enabled by features with the same font file. Clearly, this model does not have the problems listed above.
CSS3 fonts module working draft describes another set of font attributes. It has ‘correct’ style, a one for width (one font family for Antykwa Toruńska and Antykwa Toruńska Condensed would be nice), separate attribute for small caps, and much nicer support for relative font sizing than LaTeX.
But this is not everything that can be done with a font. For TeX only the metrics are important, but still other things cannot be easily expressed there. For example, coloured or underlined hyphenated text is very difficult to obtain in TeX. Colour clearly does not affect boxes (I’m not sure how underlining affects the depth of a box), so it could be determined after breaking the paragraph into lines. Currently systems like XeTeX have specific support for such things, but in my opinion a generic method for all changes to the fonts after a page is produced is possible. So in my system I would add a one new font attribute – a Python function processing the text when a page is shipped to the output file. It would add things like colour, outlines or underlining to the text (letterspacing, although solves similarly to underlining in the soul package, would need a completely different solution, but it will be trivial in a system with complete access to hyphenation and boxes). This would be similar to whatsits in TeX boxes, used for writing to files when a box is shipped and for putting special instructions for dvi drivers (e.g. for coloured text or for boxes).
This also leads to another interesting problem – how should ‘interdisciplinary’ be hyphenated? And what to do when the font change has no obvious correlation with parts of words? In my opinion font should be treated as a property of character ignored for hyphenation (like ligatures and kerning).
