Probably many users of LaTeX (including me) learned that dashes and hyphens look differently from texts about LaTeX. Many people, supported by keyboards limited to ASCII with some national and unused characters, write only hyphens, with various spacing around them, instead of dashes. Could a LaTeX user just include such text in their document and have correctly distinguished hyphens and dashes in the output? This post describes an attempt in this direction.
LaTeX already uses the ASCII hyphen character for both hyphens, minuses and dashes. If it is used in math mode, then it is a minus. Otherwise, - becomes a hyphen, -- an endash and --- becomes an emdash. The difference between endashes and emdashes lays only in their appearance, different languages require different ones with different spacing. This is the reason why I wrote the onedash package providing a single command, \dash, for typesetting the correct dash in the language and style of the document.
My new package, hyphdash, makes a dash or hyphen from a single hyphen with correct spacing. Both it, onedash and quoted (an equivalent of onedash for quotes) are available in my Mercurial repository. They are licensed under the GNU General Public License, version 3 or later.
Hyphens have two uses – they appear in compound words and in words divided across lines. Fortunately, the second use is done automatically by LaTeX and does not affect writing the package. Compound words do not have any spaces before the hyphen, but in lists like ‘mono- and polycrystals’ they may be followed by a space.
Dashes are sometimes surrounded by equal spaces – like in the British style used on this blog – or without spaces—like in the American style used in this sentence—or by unequal spaces. Usually the left space is unbreakable. This package assumes that dashes are surrounded by any normal spaces, i.e. input characters interpreted as spaces by TeX. (Unbreakable spaces appear more arcane than dashes or even inner quote marks, so they are unsupported in input by this package, but the output will have them.) TeXnical reason for this will be stated later.
My package does nearly all of its work in a macro which the hyphen made active character is defined to (see the packages’ README file of information about using this package). This expands to \relax followed by a normal hyphen if math mode is used (the \relax is probably useful in tables). The same result is obtained in horizontal mode if the current font have nonpositive space stretch parameter which probably occurs only for typewriter fonts.
In vertical mode a special dash is used, useful for representing dialogue in Polish texts (English use quote marks for this, making it easier to determine where a multiparagraph speech ends, and making inner quotes common). Probably no word begins with a hyphen, so this is used. Unix and GNU programs have commandline options beginning with a hyphen, but they are typeset in typewriter type (so there is a special case for it).
The complex part lays in the horizontal mode. Hyphens do not have leading spaces, so the are made if \lastskip does not contain positive value. So the common incorrect form of dash, alpha- beta will be kept as a hyphen. In other cases a dash is made. This ignores the possibility of having numeric ranges with endashes, like ‘69–105’. Instead, 69-105 will use a hyphen and 69 - 105 will use a much more incorrect (but more probable to be included in the input?) dash. Detecting digits before a hyphen is impossible without making all characters active (this could work only for verbatim typesetting of files, but this does not need dashes), and detecting them after the hyphen won’t distinguish such cases as ‘69–105’ and ‘2-chloro-3-methylpentane’.
There is also another problem – dashes represented by multiple hyphens. The first one will be made a dash, but the following ones (since there is no preceding space) will become hyphens. The standard ligatures for dashes will not be used. Maybe the macro could detect following hyphens, but ‘simple’ solutions like \@ifnextchar used for optional parameters will not work, since the hyphen is a complicated macro instead of a character. Changing catcodes (e.g. redefining the hyphen to a character) will not work with texts changing catcodes. The first hyphen could set a conditional to ignore following ones, but it would be difficult to change it to not ignore hyphens in the next dash. The rest is probably more difficult that these solutions. Therefore only a single hyphen may be used with this package as a dash. The macros \textendash and \textemdash may be used instead of multiple hyphens to make a dash character.
Another problem are hyphens used as minuses for numbers interpreted by TeX. For example, \hspace{-1em} will produce strange results and two error messages. In my opinion the only solutions to this are to not use a hyphen in arguments of such commands (e.g. by using the \hyphen macro or by putting all such things into the preamble), or to redefine each primitive TeX command to change the macro making dashes and hyphens (probably it is impossible to detect where such changes should be made). It is obvious why the first solution is used in the package.
This package uses also an example document as an automatic test to find regressions in future versions (I’ve written previously about this in other packages for dashes and quotes). From the example I learned about most of the limitations of this package described here.
