Ideas for an implementation of a typesetting system

| No TrackBacks

I’ve written several posts about TeX as a program and things which could be simpler with a typesetting system based on a different design. The basic ideas are that it will be incompatible with TeX, will be an easily extendable package written in the Python programming language with the full power of the language available for typesetting. Then I wrote about font selection a problem at first glance unrelated to the previous ones.

I haven’t written any useful code for a typesetting system, but if it will be written it will be based on these ideas. Maybe some of them might be useful without this program, so I’ve written them here.

The program will be called Tim, since it is an easy to pronounce, short name, which can be positively associated with the name and could mean ‘TeX Inspired Modules’. The name clearly shows that art is not important in this project, quality and efficiency may suffer without any useful reason. It is also easier to write about a named program than an unnamed one.

The whole code of Tim will be written in Python and published under the GNU General Public License version 3 or later. Therefore every program using this package will use the same license. This encourages sharing and making documents which are not derivative works of Tim, i.e. writing them in a format which can be treated as data, not a program.

The need of separating code and date leads to another idea. Everything except the core model of horizontal and vertical lists (based on the one used in TeX) should be trivial to replace. Many algorithms for some things are necessary, e.g. for hyphenation different ones are used for different languages, different font formats are used, etc. So every useful algorithm will be implemented in a plugin, i.e. a function or object used by code independent of this implementation. A special module will determine which plugins are used for which tasks, probably it will store this data in files (one per system, one per user and one per document).

There will be different plugins for selection of fonts (determining which fonts is nearest to the requested one), for getting metric data from different font formats, for interpreting input text (e.g. to replace spaces by glue and other characters by glyphs, possibly with some transliteration schemes), breaking paragraphs into lines, hyphenation, making lines, making pages, etc.

The similarity of pages and lines is an interesting one. In TeX they are different – a line has some glue on both sides, pdfTeX supports also kerning with the margins; while a page is produced by entirely customizable output routine. Also, total-fit is used to make optimal line breaks while first-fit makes page breaks. This results from large amount of memory used for pages and large number of lines being made. But is it possible to e.g. typeset each line of a paragraph in a different font? This would be simple when the code making a line box from a part of a horizontal list may be easily replaced by one specific to this job (in this way marginal kerning could also be implemented). This would be like an output routine, but for a line.

In this model the line breaking algorithm (I don’t call it justification, since this name looks more appropriate for make a box from a line) just makes a linear list of line hboxes from a horizontal list. The page breaking algorithm makes a linear list of page vboxes from a vertical list. Elements like glue and penalties are used in both in the same way. So it would be simpler to use exactly the same code for both things. It would also improve page breaks unless a first-fit plugin will be implemented for this.

There are also more complicated things in a typesetting system. Input languages like a one very similar to the one of TeX, and possibly XSL could be nearly separate from the typesetting code. Similarly, code for typesetting mathematical expressions with TeX-like quality (i.e. code for conversion of math lists to horizontal lists) could be completely separate from text typesetting code, it would just be used by the interpreter of the input language.

Maybe this will be a useful project or some new ideas for future TeX extensions.

No TrackBacks

TrackBack URL: http://blog.mtjm.eu/cgi-bin/mt/mt-tb.cgi/51