One of the most important typesetting ideas on which TeX is based is the box/glue/penalty model. It is used both to break paragraphs into lines, and to break lines into pages. Since these processes are similar, lines and pages have similar representations. The aim of this post is to describe how material of a paragraph is represented.
The TeXbook by Donald E. Knuth lists the elements of a horizontal list (the material which is broken into lines and put in horizontal boxes) in Chapter 14, page 94:
- boxes
- discretionary breaks
- whatsits
- vertical material
- glue
- kerns
- penalties
- math-on and math-off
Boxes do not need any explanation, they are the visible elements of texts, usually glyphs, rules or their combinations (e.g. a table is usually a box made from simpler boxes). Glue and kerns make whitespace between them. Discretionary breaks allow breaking lines in more complicated ways than just removing whitespace. Penalties control how bad the breaks are. These elements have clear use for the line breaking algorithm. They are the only elements of a horizontal list that I’ve directly met in LaTeX.
Math-on and math-off are the additional whitespace made by \mathsurround. They differ from kerns by not allowing breaking on glue or kerns inside math formulas. So in a new typesetting system they probably could be replaced by a kern and infinite penalties at appropriate places inside the formula.
Glue and kerns look similar (on paper they are the same, white areas between glyphs), but they have two main differences – glue is stretchable and separates words (for automatic hyphenation), while kerns do not change their size and make words unhyphenable. There are two types of kerns – explicit which are directly put by the \kern primitive and implicit which is completely automatic and do not affect hyphenation.
In all of the above differences between glue and kerns, explicit kerns look similar to empty boxes. But there are two important differences – boxes have also vertical dimensions (useful to make proper vertical spacing in tables) and they are not discardable, so a box cannot be removed on a page break while a kern is removed there (imagine a justified paragraph with empty boxes on beginnings or endings of lines, it would be ragged). This is a nice example of how different the elements of a horizontal list are – every one of them is useful, no one may be completely replaced by any other one.
Vertical mode material is put in a horizontal list to be placed between lines produced from the list. This may be used e.g. to put a page break after the current line when it is not known where the line ends. It is used also for marks which are token lists put in the page, the output routine (more on this later) will access some of them. Similarly, whatsits are used when a page is produced, but after the output routine. They are used to write page numbers to files (necessary to make an index), to make right to left text in e-TeX, and to give DVI drivers special commands, e.g. to change colour of text or to make a hyperlink.
