Automatic cross references in LaTeX and their problems

| 2 Comments | No TrackBacks

Most non-fiction texts have numbered sections and many references to them, sometimes stating also which page is referred to. Using LaTeX numbering of pages, sections and all such references is completely automatic, making these numbers nearly always correct. However, the method by which it is implemented is not completely optimal.

It is used by writing the \label{id} command in section to be referred, where id identifies the section, preferably being easy to remember and not changed too often. This makes it possible to use the \ref{id} and \pageref{id} commands which typeset the number of the section or its page number. (References may lead to page of any text or a number of equation, list item, theorems, etc; I refer to all of them as ‘sections’ in this post.)

During the first run of LaTeX the text is completely typeset, using ‘??’ instead of numbers to be referred to. All \labels write their identifiers, section and page numbers to the .aux file. On the beginning of the second run this file is read, then all references are used as they were correct on the previous run, and new values are written to the file.

Here an important feature of TeX is seen – typesetting words, breaking paragraphs into lines, joining lines into pages, and outputting pages are done asynchronously. To know the page number of a given \label LaTeX uses the primitive TeX command \write which evaluates appropriate commands during page output. This makes it impossible to change text depending on current page number, so the number from previous run must be used instead.

The same method is used for other things, like tables of contents, bibliographic references, indices and correct placement of margin notes on two-sided documents by the mparhack package.

However, using multiple passes for cross references has several disadvantages. The most visible one is that the time needed to make correct output is several times larger, although only one output file is needed and error messages are useful for only one pass (this can be improved by using \batchmode for non-first runs of LaTeX and pdfTeX’s -draftmode option for non-last runs). Despite this, it is clearly visible that most of work during non-last passes is unnecessary (especially when referring only to numbers of sections, not pages).

Since the same files are modified in each pass, it is difficult to optimally use make or another generic build system with LaTeX. This leads to longer processing than necessary and uncertainty of the document having outdated references.

It is even possible to make a document which has always incorrect references. This document shows this problem:

\documentclass{minimal}

\pagestyle{empty}
\pagenumbering{roman}

\setlength{\textwidth}{8pt}
\setlength{\textheight}{10pt}
\setlength{\parindent}{0pt}

\makeatletter
\@ifundefined{pdfpagewidth}{}{%
  \pdfpagewidth=2in
  \pdfpageheight=2in
}
\makeatother

\begin{document}
\setcounter{page}{9}
\pageref{x}\hspace{0pt}i\label{x}
\end{document}

(The part with \pdfpagewidth is to make it easier to see both pages at once in a PDF viewer, I have described it in a previous post.)

Since the second pass, this document will oscillate between having one or two pages. When the reference leads to page x, then the ‘i’ is on page ix, but with reference to page ix the ‘i’ is on page x. (Leslie Lamport states in LaTeX: A Document Preparation System that using Roman numerals may lead to this problem, I did not know any specific example of such document before writing the above one.)

So despite being very useful, automatic cross references in LaTeX have some disadvantages. Usually a good enough solution is to run LaTeX on a document some times longer than possible necessary, and change text to avoid having infinite loops in this process. Could it be improved? I’ll write about some other ways to avoid these problems in a separate post.

No TrackBacks

TrackBack URL: http://blog.mtjm.eu/cgi-bin/mt/mt-tb.cgi/70

2 Comments

There are several tools allowing to automate this situation (resolving references via multiple compilations). One can use latexmk, for example (a part of every TeXLive distribution).

These programs just rerun LaTeX while needed, so they still work slower than for input without cross references. Other problems could be easily solved by such tools.

Leave a comment