Difficulties of typesetting quote marks in LaTeX

| No TrackBacks

Probably the most complicated to typeset punctuation marks used in English are the quote marks. Although they should be used for short and simple quotations and other simple fragments of text, they are designed for more arcane uses. This combined with the influence of typewriters makes typesetting them difficult.

Quote marks are used exactly like parentheses – they delimit a fragment of a sentence. But unlike all other such characters, inner quotation marks are different symbols than the outer ones (unless larger outer delimiters in mathematical formulas count as different symbols (they are the most ‘mainstream’ use of parentheses in parentheses)). Another difference is that ‘((’ is easily interpreted in correct way, while ‘“ needs additional spacing (‘ “).

Another problem is that each language has different quote marks. American English uses double outer quotes and inner single quotes, British English uses them as inner and outer, Polish has low double opening quote and English double closing quote, the inner quotes are the French ones, although they are rarely used correctly. American English also includes following commas and periods in the quotes, obviously this would lead to problems in programming-related texts.

LaTeX does not solve these problems, but allows direct specification of appropriate symbols. The English quote marks are represented as ``, '', ` and ', since these are the nearest equivalents on a typical keyboard. The " character is not used and the space between quotes must be specified as \, (it could be specified in the font, but this would require separate sets of fonts for American and British texts, I’m not sure if it could support third level nested quotes in any of these dialects).

It would be interesting to use just the " character and let the software decide which quotes are opening and which are closing. But even without support for nested quotations this would be difficult (if possible) to do correctly in all cases. A naïve algorithm would just begin with an opening quote and then cycle between closing and opening ones. But this won’t interpret correctly quotes in multi-paragraph dialogue, where each paragraph begins with an opening quote (in Polish dashes are used instead of quotes for dialogue and there is no possibility for humans to interpret multi-paragraph dialogue correctly without backtracking). A common mistake in delimiting block quotations with quote marks may result in a paragraph containing only a closing quote mark, so this algorithm cannot be improved by just resetting to opening quote at each new paragraph.

Emacs uses a different algorithm in the TeX-insert-quote function. It puts opening quotes after whitespace or opening parenthesis. This method could not be implemented in LaTeX, but it can be done in language-specific fonts. But this algorithm fails when quoting spaces or parentheses, like ‘(’, which is commonly done in programming-related texts.

The only problem which can be easily solved is which quotes to use. I have written a LaTeX package for this, named quoted (available in my Mercurial repository), but it does not support spaces between quote marks of different levels or moving punctuation to the quotation. There are probably many better packages for this, but this will not make a useful document ‘portable’ between e.g. British and American dialects of English, so such packages aren’t very useful.

Since parentheses are similar to quotes, but simpler, maybe a single character in source files could be used for them. In times of typewriters a slash was sometimes used instead of parentheses, since it looks similar. Is it possible to implement a LaTeX macro or virtual font replacing / by a slash or appropriate parenthesis depending on context?

No TrackBacks

TrackBack URL: http://blog.mtjm.eu/cgi-bin/mt/mt-tb.cgi/14