June 2009 Archives

Should all programming languages be used?

| No TrackBacks

In this week there is an interesting discussion about including Mono in Debian. The problem is nicely explained by Richard Stallman in ‘Why free software shouldn’t depend on Mono or C#’. It contains the statement that ‘[i]deally we want to provide free implementations for all languages that programmers have used’. It is clear that providing differs from using, so should we use all programming languages/tools that have free software implementations free of patent problems?

Initially I though about languages like Fortran and TECO (the editor on which the original Emacs was based) which were important in elder days, but now have very limited use (I used only one program written partially in Fortran, the R system for statistical computing; scientists still use Fortran for processing large amount of numerical data). Clearly, older programming languages are replaced by newer, improved ones. Now K&R C is largely replaced by ANSI C and with one significant exception Python 2.5 is supported by software initially designed for older versions. There are programs known to be rewritten in ‘new’ programming languages, like Maxima in Common Lisp which replaced Maclisp (I haven’t seen any other program with copyright notices from 1969; I’m more then twice younger that this program). Adding features made vi much more useful than ed except in extreme situations.

But usually the evolution of programming languages is not linear, with only one useful branch. COBOL, the third oldest high-level programming language, is still used. There are many languages which have better alternatives, for example AWK is usually replaced by Perl (as Eric S. Raymond wrote in ‘The Art of Unix Programming’), or TROFF replaced by TeX for most uses except man pages (although TeX-based Texinfo is used for documentation of the GNU system). As stated by ESR Perl was designed to replace AWK. Similarly, TeX was designed partially to replace TROFF by being not compatible with it, allowing better design (as explained in an interesting interview with Donald E. Knuth); the same argument made TeX82 better than TeX78.

There are also cases when a programming language is not clearly better than another one. C# differs from Java in many issues, but they both have generally similar design. I’m not sure if there are important arguments in that case other than the freeness of implementations and availability of software written in these languages.

These are not the worst cases. COBOL and BASIC are considered harmful to the minds of their users (also to the companies which paid for the Y2K bug). As stated in the Jargon File, a part of the problem with BASIC is that writing short programs in it is easy. Now PHP is considered easy, it makes most security problems of the Web incredibly easy (an article on the Plone website explains how the Python, Zope and Plone avoid these problems and gives some supporting statistics), at least one of the most important ones is presented in XKCD. I’m not sure how commonly used with PHP database server MySQL differs in security from e.g. PostgreSQL, but my only experience with MySQL (except using it and resetting passwords) was losing data and problems with non-ASCII characters. PostgreSQL is marketed as more reliable, but much software written in PHP does not support it. I do not have these problems by using only Python-based web applications (they usually use ORMs supporting equally different SQL database systems, or the object database of Zope) on my server, but still MySQL is required by KDE 4.2 in Debian (in Gentoo of KDE programs which I used only Amarok requires it).

There are cases when supporting better (maybe not written yet) languages is easy, for example using ORMs or portable SQL instead of one working only with MySQL, or using dash as /bin/sh instead of Bash when writing shell scripts (it makes them more portable to systems like FreeBSD which uses faster POSIX shells instead of Bash). But usually there are no solutions other than a complete rewrite of a program, clearly easier with simple programs doing one thing correctly.

Why I use Gentoo

| No TrackBacks

Today I found a website encouraging using GNU/Linux instead of non-free operating systems (not exactly true, but not more incorrect than the popular use of ‘Linux’). It presents Ubuntu, Fedora and gNewSense as appropriate distributions for new users. I used the first two of these distributions some years ago (before the gNewSense project began), so I decided to write how my attitude towards the user-friendliness of them changed.

Initially I used a GNU/Linux distribution made by RedHat (I don’t remember if I called it like that, but probably I knew the reasons for this). Then I used one made by Mandriva, it had better support for some hardware. Some time later I used Fedora, since the previous distribution was difficult to update.

Then I discovered the reason for which now I don’t use Fedora or similar distributions – it was new twice per year, later only small changes were made. I did not like frequent reinstalling of operating systems.

Later a failing hard disk encouraged me to try a different operating system. I installed FreeBSD (then version 5.4; it was the only operating system for which I paid). It did not require reinstalling for a CD, but still the core system had to be completely rebuilt for any change. FreeBSD also has a source-based package manager which makes software installation slower, but leading to more optimal software for the machine on which it was used. I previously wrote about an advantage of source-based systems, but then I didn’t use these features (I did not have a DVD player and I knew much less about fonts).

Later I used Kubuntu. It had better support for hardware which I used. It also had many patches to popular software which probably were not used in FreeBSD. Maybe it was easy to use for a user of Microsoft Windows, but for me it was difficult. I wrote about some problems which apply to it just like to Debian Sid which I now use on my laptop.

Since 2007 I use Gentoo, although in that year I used also Ubuntu and now I use Debian on one of my three computers. Gentoo has several advantages for which I still use it:

  • it is a continuous distribution – a release is only for installing (and for news sites), then all of them may be updated to the newest system
  • USE flags allow simple selection of needed features of installed software
  • it is so fast that Firefox is usable
  • newer than officially released software may be used easily, necessary for kernel and graphics drivers on some hardware
  • it is simple to improve, writing ebuilds is easy after reading some
  • it had no working GUI installer when I installed it for first time, so no data was lost and I knew exactly what was done
  • it may be easily administered remotely (clear structure of /etc, no GUI configuration tools are necessary)
  • only necessary software is installed by default, nearly all services enabled are necessary

Gentoo is useful for learning how GNU/Linux works. Of course, this is the main reason against using it, and the time needed to compile KDE or GCC is the second one.

Ideas for an implementation of a typesetting system

| No TrackBacks

I’ve written several posts about TeX as a program and things which could be simpler with a typesetting system based on a different design. The basic ideas are that it will be incompatible with TeX, will be an easily extendable package written in the Python programming language with the full power of the language available for typesetting. Then I wrote about font selection a problem at first glance unrelated to the previous ones.

I haven’t written any useful code for a typesetting system, but if it will be written it will be based on these ideas. Maybe some of them might be useful without this program, so I’ve written them here.

The program will be called Tim, since it is an easy to pronounce, short name, which can be positively associated with the name and could mean ‘TeX Inspired Modules’. The name clearly shows that art is not important in this project, quality and efficiency may suffer without any useful reason. It is also easier to write about a named program than an unnamed one.

The whole code of Tim will be written in Python and published under the GNU General Public License version 3 or later. Therefore every program using this package will use the same license. This encourages sharing and making documents which are not derivative works of Tim, i.e. writing them in a format which can be treated as data, not a program.

The need of separating code and date leads to another idea. Everything except the core model of horizontal and vertical lists (based on the one used in TeX) should be trivial to replace. Many algorithms for some things are necessary, e.g. for hyphenation different ones are used for different languages, different font formats are used, etc. So every useful algorithm will be implemented in a plugin, i.e. a function or object used by code independent of this implementation. A special module will determine which plugins are used for which tasks, probably it will store this data in files (one per system, one per user and one per document).

There will be different plugins for selection of fonts (determining which fonts is nearest to the requested one), for getting metric data from different font formats, for interpreting input text (e.g. to replace spaces by glue and other characters by glyphs, possibly with some transliteration schemes), breaking paragraphs into lines, hyphenation, making lines, making pages, etc.

The similarity of pages and lines is an interesting one. In TeX they are different – a line has some glue on both sides, pdfTeX supports also kerning with the margins; while a page is produced by entirely customizable output routine. Also, total-fit is used to make optimal line breaks while first-fit makes page breaks. This results from large amount of memory used for pages and large number of lines being made. But is it possible to e.g. typeset each line of a paragraph in a different font? This would be simple when the code making a line box from a part of a horizontal list may be easily replaced by one specific to this job (in this way marginal kerning could also be implemented). This would be like an output routine, but for a line.

In this model the line breaking algorithm (I don’t call it justification, since this name looks more appropriate for make a box from a line) just makes a linear list of line hboxes from a horizontal list. The page breaking algorithm makes a linear list of page vboxes from a vertical list. Elements like glue and penalties are used in both in the same way. So it would be simpler to use exactly the same code for both things. It would also improve page breaks unless a first-fit plugin will be implemented for this.

There are also more complicated things in a typesetting system. Input languages like a one very similar to the one of TeX, and possibly XSL could be nearly separate from the typesetting code. Similarly, code for typesetting mathematical expressions with TeX-like quality (i.e. code for conversion of math lists to horizontal lists) could be completely separate from text typesetting code, it would just be used by the interpreter of the input language.

Maybe this will be a useful project or some new ideas for future TeX extensions.

After several days break I still plan to learn more about digital typography by writing a typesetting system inspired by TeX. Previously I wrote why I believe that this system should not be compatible with TeX and how using a general purpose programming language for typesetting will help. This post is about a different issue – how fonts would be represented in this system. For simplicity, I will not write about problems specific for math typesetting; text has enough problems for a post.

As stated by Vulis the plain TeX model of fonts is inadequate for e.g. academic publications. There each font is a set of 256 characters which for TeX are just boxes of specific size, combined with different characters by ligatures and kerning. Although a specific font may be scaled, this model does not provide any support for using different styles and sizes of fonts. Therefore macros used for books written by Donald Knuth (e.g. in Appendix E of The TeXbook), GNU Texinfo (texinfo.tex) and LaTeX 2.09 use static tables of different font definitions in several styles for some sizes. This approach makes using different font families clearly difficult.

Therefore LaTeX2e uses a different model, called the New Font Selection Scheme. There a font has the following attributes (from LaTeX2e font selection, the file fntguide.pdf in a TeX distribution):

encoding
the mapping of character commands to 8-bit character numbers in TeX fonts; font encodings define also ligatures used
family
this is commonly known as a typeface or font
series
e.g. medium or bold
shape
e.g. italic, roman, slanted, caps and small caps
size
the size of one em

This is clearly appropriate for the original Computer Modern fonts (the default fonts in LaTeX, the only ones known to be available in every TeX distribution since 1980s), but now it has at least the following problems:

  • font encodings are a useless waste of time and hindrance for multilingual typesetting; I believe that Unicode will be enough for everybody (imagine that a list of all its characters would not fit in a typical book)
  • slanted (or italic) small capitals cannot be easily represented in this scheme; the package slantsc allows their use as a different shape, exactly what the scheme was designed to avoid
  • usually font size is artificially limited to avoid scaling them (this was a problem before scalable fonts or automatic generation of bitmap fonts by dvi drivers)

This is different with OpenType as used with e.g. XeTeX. There a font family has equivalents of series and only roman and italic shapes (making both italic and slanted roman fonts is too difficult without METAFONT). Things like small capitals or strange ligatures are enabled by features with the same font file. Clearly, this model does not have the problems listed above.

CSS3 fonts module working draft describes another set of font attributes. It has ‘correct’ style, a one for width (one font family for Antykwa Toruńska and Antykwa Toruńska Condensed would be nice), separate attribute for small caps, and much nicer support for relative font sizing than LaTeX.

But this is not everything that can be done with a font. For TeX only the metrics are important, but still other things cannot be easily expressed there. For example, coloured or underlined hyphenated text is very difficult to obtain in TeX. Colour clearly does not affect boxes (I’m not sure how underlining affects the depth of a box), so it could be determined after breaking the paragraph into lines. Currently systems like XeTeX have specific support for such things, but in my opinion a generic method for all changes to the fonts after a page is produced is possible. So in my system I would add a one new font attribute – a Python function processing the text when a page is shipped to the output file. It would add things like colour, outlines or underlining to the text (letterspacing, although solves similarly to underlining in the soul package, would need a completely different solution, but it will be trivial in a system with complete access to hyphenation and boxes). This would be similar to whatsits in TeX boxes, used for writing to files when a box is shipped and for putting special instructions for dvi drivers (e.g. for coloured text or for boxes).

This also leads to another interesting problem – how should ‘interdisciplinary’ be hyphenated? And what to do when the font change has no obvious correlation with parts of words? In my opinion font should be treated as a property of character ignored for hyphenation (like ligatures and kerning).

Fast scrolling in Firefox on X11 with Radeon

| No TrackBacks

For some days I use much more tabs in IceWeasel than before, so it uses about a gigabyte of RAM and so slow that I’m not sure if its tab list has data here or on distant servers. When scrolling a page, keeping an arrow key pressed led to very slow scrolling of much larger areas than expected. Fortunately, it was not due to large amount of information needed, but due to misconfigured graphics driver for the X server.

There are two useful, popular and free drivers for AMD RS690 chipset which my laptop has. Debian by default uses xf86-video-ati which stopped working a month ago (maybe now it works, but I have no reasons to change the driver again), so I replaced it by xf86-video-radeonhd which does exactly the same but with different code and configuration (if something doesn’t work, it is independent of these drivers). Then the file /etc/X11/xorg.conf consisted of this:

Section "Device"
        Identifier      "Configured Video Device"
        Driver          "radeonhd"
EndSection

(Useless comments skipped. Everything else is automatically configured by modern versions of X.Org X11 server.) This worked with nice 3D acceleration (automatically enabled in newer releases of the driver), but I haven’t noticed that 2D rendering was slow.

There are many 2D acceleration architectures for the X server – XAA and EXA are the important ones. XAA is slow for modern hardware and software, while EXA is newer and less tested (ten months ago I couldn’t use both it and 3D acceleration without having to reboot the machine just after starting any OpenGL-based program). Unlike Radeon (xf86-video-ati) the RadeonHD driver supports both XAA and EXA for this hardware and uses XAA by default. Therefore, (after reading about it in the man page of this driver) I had to add this additional line to the X server configuration file (before EndSection):

        Option          "AccelMethod" "exa"

After restarting the X server even IceWeasel is interactive and can be used to read long Web pages.

(3D acceleration with Radeon hardware in Debian requires also the firmware-linux package from the non-free repository; although the firmware has MIT/X11 license it is not allowed in main, probably since there only fonts do not have to include source.)

In the paper ‘Should TeX be extended’ Michael Vulis described how extensions to TeX would make four problems simpler – text rotation, graphics inclusion, index preparation and font selection. Eighteen years later, all of these problems are solved in LaTeX, but TeX extensions would make it much simpler. So how these problems would affect writing documents in the general purpose programming language-based system which I proposed yesterday?

The first problem, typesetting rotated text, is solved by drivers allowing rotating of characters and ‘rotation’ of boxes in TeX. This requires computing the sine and cosine of the rotation angle. Nearly all modern programming languages support operations on floating-point numbers and have builtin trigonometric functions – simpler to use, faster, more precise than these computed by the LaTeX trig package. So this problem is solved just by writing texts in Python (or Lisp, or any other useful general purpose programming language; I will use Python as an example) instead of TeX.

The next problem is graphics inclusion. Here a picture is typeset as a box of appropriate dimensions with a special command for the driver to put the graphics there. Determining dimensions of such boxen requires parsing the graphics files in TeX, this is easy only for formats like Encapsulated PostScript or additional header files with a line specifying the size of the picture. Clearly, this would be trivial in a language with libraries like PIL.

The third problem is index processing. In LaTeX, a document contains commands to put certain index terms with the page number of the place where this command occurs. Writing a file with terms and page numbers is easy in TeX, but processing it to merge page numbers of the same entry and to sort the entries alphabetically is impossible without using external programs. This requires just some string processing, an associative array and a sorting function. I’m too young to know a programming language other than TeX or METAFONT that does not have all of these elements builtin or in its standard library. This would allow making an index in parallel to typesetting the publication if it refers only to the previous parts of the work (otherwise index entries from previous attempt would be used, like now).

The last problem is font selection, i.e. changing each font parameter like size or boldness separately. This clearly requires different font model than the one of TeX where font families did not exist except in typesetting super- and subscripts in mathematics. Here a general purpose programming language would have only one advantage – less time would be spent on the above problems and more on font representation and similar less trivial problems.

On improvements in TeX

| No TrackBacks

The limitations of TeX for typesetting text are well-known. Typesetting coloured text, letterspacing, hypertext, and hanging punctuation is much easier in pdfTeX. Omega, XeTeX and LuaTeX solve the 256 glyphs restriction. Every one of these extensions support right-to-left typesetting. OpenType fonts can be used natively in XeTeX and probably LuaTeX.

But is there any visible development related to typesetting math? Of course, support for right-to-left typesetting in TeX extensions allows typesetting RTL texts with LTR mathematics, but it does not affect the inside of the mathematical formula.

Except e-TeX there was no well-known (to a user of LaTeX) development of the programming-related features of TeX. Now other languages, including Lua, are preferred to the TeX macro-based language. In my opinion, this results from lack of competition (can any non-TeX-based program typeset \primes30 as a sequence of the first thirty prime numbers?) and small visibility of the need to program texts. Haskell or Python with a typesetting library could be better than TeX, if it is still possible to design pragmatically error-free programs.

This would clearly be incompatible with TeX. But addition of a new primitive command also makes a program incompatible with TeX. Try compiling this input with both TeX and e-TeX:

\csname TeXXeTstate\endcsname1

Hello \csname beginR\endcsname world\csname endR\endcsname.
\bye

The results will be different, although it is not so strange to use \csname with undefined commands. (I used slightly similar code in macros for generating parametric text when the parameters could be unspecified due to a bug.)

Therefore I believe that writing a program which could typeset mathematical books with the quality of TeX, but with different design (a library in Python with separate modules for different things) would be useful. Obviously, all existing documents could be typeset with existing tools, but the new ones could be written more simply for this program.

Math typesetting contains clearly redundant commands, although they cannot be written as macros. The command \abovewithdelims can represent any generalized fraction, but some specific commands like \over are builtins. They lead also to a larger problem – generalized fractions make the style of previous material dependent on the later one. Therefore TeX has a separate glue type for math (mu depends on the current style) and the command \mathchoice does four times more work than it could without it. But most mathematics is written with LaTeX \frac command which does not have this problems. But primitive generalized fraction commands do not have to be used to make TeX’s work more complicated.

Texts written with plain TeX use very small amount of macros. So rewriting them in Python would make simpler and easier to improve code. Maybe it would be a nice exercise in learning TeX to write a typesetter based on the above ideas, just like kernels are written for their authors to learn Unix.

Testing a LaTeX package for logical quote formatting

| No TrackBacks

Several months ago I wrote a LaTeX package for logical formatting of quotations, i.e. putting quote marks in appropriate style when the user decides only about the semantics of the document. The package is free software available in my Mercurial repository (it is licensed under the GNU General Public License version 3 or later, with an exception for documents using it).

Then I maintained several large documents using the package (and some others), so I could modify it and detect most errors in the next compilation of these documents. Now I develop some packages which will be rarely used and will have much larger changes, so some automatic tests are necessary.

Now I used a trivial method to check the correctness of this package. I wrote a one-page example (i.e. useless) document using the package in all documented ways, although only one set of package options was used. The I compiled it using a Bourne shell script which just calls LaTeX for all such examples and corrected all visible errors in the output (I noticed also that support for multi-paragraph multi-level quotations is very unintuitive to use).

Now the correct output lays in the Mercurial repository. I though that Mercurial could detect changes in these files, but it detects changes in the timestamps recorded there. So I added to the shell script (runtests.sh in the main directory of the package) TeX code to set the time to a constant value (January 1, 1970, 00:00). Now it words.

Now the package needs only tests for use with other options before large changes will be easier in it. The test output still needs changes when e.g. a new version of TeX is used or file names listed in the log change, but this will not require much work.

Will we read essays written by computers?

| No TrackBacks

After using the ‘random’ comic link several times on XKCD, I found one about the Turing test. When I was an IB DP student some people though that some of my essays were written by computer programs (I have heard similar opinions on nearly every text which I have translated from English to Polish). So if an essay written by a human may not pass the Turing test, may a text written by a computer be considered useful for us?

This is obviously true for most texts, if all textual program output is considered a text. So a stricter definition of text is needed to make this question useful. A standard essay for an English writing exam might be appropriate, since they clearly express several useful criteria, like having interestingly complicated grammar use and discombobulating message with clearly visible personal involvement.

It is clearly difficult to describe an essay in an algorithm. Although clear description of ideas is one of the largest problems in essay-writing, a program converting a trivial description of reasoning into an essay would be useful. Essays involve many examples which should not be the same in every student’s work, so a large database of facts could be used to add examples for some theses.

So with a given message, the essay would be written with many encyclopedic examples and as complicated grammatical structure as foreseen by the authors of the program. From grammar point of view, it is nearly impossible to map an English sentence to an abstract thought representation, but the reverse process, which would be used in the program, would be simple. A problem would occur when the generated sentence has other meanings unknown to the computer, but it is a problem also for human students.

It could be interesting how a program would represent all facts which could be used in an essay. Humans use large collections of useful facts written in the English or equivalent language (formally, languages are not isomorphic due to the Sapir-Whorf’s hypothesis, but all popular languages have the same drawback for this use). Therefore to write it is necessary to read which is too difficult for computers.

Maybe with a formal notation for facts useful in essays and a formal description of an essay, a computer would be able to write a highly marked essay. But I do not believe that for a human it would be simpler to write such program and its data than to write a good essay. (I hope that a computer will quote a part of this blog entry in an essay and explain an opposite point of view.)

Installing TeX Gyre fonts in Debian

| No TrackBacks

Today I tried to compile another flyer typeset with LaTeX, but it resulted in the following error:

! pdfTeX error (font expansion): auto expansion is only possible with scalable
fonts.
<to be read again>
                   \endgroup \set@typeset@protect

The reason for this was lack of scalable TeX Gyre Pagella fonts in my TeX installation (TeXLive 2007 from Debian Sid). When I disabled font expansion (by removing \usepackage[expansion]{microtype} from the document preamble), it tried to make bitmap fonts and failed. So I recalled that the TeX Gyre fonts (much better derivatives of the URW’s 35 standard PostScript fonts, with correct diacritics for many languages) were not available in TeXLive 2007. (With the previous flyer I hadn’t had such problems, since the Concrete Roman fonts are much older and CM-Super scalable fonts are included in Debian.)

Therefore I’ve downloaded them from the GUST site and unpacked the TDS archive into the $HOME/texmf directory. Then I run the mktexlsr $HOME/texmf command for programs in TeXLive to know that new files are available there.

This would be enough for LaTeX packages or METAFONT fonts, but for scalable ones more work is needed. This is since there are many different scalable fonts for one metric one and sometimes bitmap ones are also available and preferred. The files also have many different names, specified in special map files.

Since I do not have any experience with map files in Debian, I used a search engine to find a manual with complete instructions for installing fonts in Debian’s TeXLive. Using this and the next section of this manual I run the following commands to enable TeX Gyre Pagella scalable fonts:

updmap --enable Map qpl.map
update-updmap
mktexlsr
updmap

Then I successfully compiled the flyer with these Palatino-like fonts with correct support for my native language.

Integer number types in C

| No TrackBacks

In mathematics there are no numbers, but only some specific sets with the word ‘numbers’ in their names. Similarly, in C programming language and many others there are different numbers for use in different situations. The aim of this post is to compare them and show when they might be useful.

We assume that integers are a countable set of all natural numbers (zero and every successor of a natural number) with positive or negative sign and unsigned zero. This is clearly not possible to represent exactly in digital way in a finite amount of silicon, so ordinary computers don’t use these numbers.

The simplest value in a classical computer is a bit representing one of two values. For the most typical use of a single bit the bool type from the C99 standard is used, with the values named true and false (C++ also has this type). Although it stores a single bit of information, it is usually aligned to at least a byte, so sometimes bit fields are used instead, leading to more complicated code using less memory.

Therefore integers modulo a large number are used instead. For large values they are very unintuitive, as shown in an XKCD comic.

A non-negative integer is simply represented in binary as several bits. For this the types unsigned char (typically one octet), unsigned short (usually 16 bits), unsigned int (usually 32 bits), and unsigned long (64 bits on 64 bit architectures or 32 bits else) are used.

The methods for representing negative numbers are interesting. Some lead to a negative zero, others make more negative numbers than positive numbers (usually I would write ‘ones’ instead of the second ‘numbers’ here, but it could mean that in these systems there are at least two negative numbers). C represents these numbers as the above without unsigned or with signed instead. Many other programming languages do not have separate types for non-negative numbers.

An interesting algorithm showing the difference in different integer representations is presented by William Gosper in the HAKMEM, item 154 (page 74).

Text editors which I use

| No TrackBacks

Although for many a text editor is a scary beast, they belong to the most useful software for a hacker. It is interesting how different they may be to fulfill the needs of their users (it is why five text editors were used as a case study in Eric Raymond’s The Art of Unix Programming). In last two years I used the following text editors, ordered partly by the time spend with them:

  1. GNU Emacs – described in its manual as ‘the extensible, customizable, self-documenting real-time display editor’. Emacsen are the only editors programmed with their own Lisp dialect allowing very helpful support for users writing in many different programming languages, a mail client, a Web browser, several games and a psychoanalysis. A nice feature of Emacs is support for doing everything with keyboard only, using many modifier keys (that’s why its name is sometimes expanded as ‘escape, meta, alt, control, shift’). I haven’t heard also of any other text editor having its own religion. Emacs is very user-friendly, but its friends must spend much time learning it.
  2. GNU Nano – a very simple to user text editor. It has most useful commands listed at the bottom of the screen, making learning its use as simple as writing nano foo to edit file foo. I use it for every administrative task for my other computers and for commit messages in version control systems (although Emacs has specific support for using a VCS).
  3. vi (mostly Vim and Nvi) – the only well-known text editor with modes. Later I used it only when the above programs were unavailable. Before Emacs 22 I used Vim more due to its Unicode support.
  4. ed – the line-editor with which Unix was written. I used it only once, to remove broken /usr filesystem from /etc/fstab in an older computer with FreeBSD. No other text editor was available, since they resided in the filesystem eaten by bad blocks on the disk. From this experience I know how helpful man pages may be.

Before these four I used GUI-based text editors (although both Emacs and Vim has nice GTK+ interfaces). They weren’t so interesting and required much more typing work. Then I used a mouse, the pain-bringing device required by such editors.

Making a URL easier

| No TrackBacks

We use URLs everyday. Although most are not usually seen by humans, some are remembered and typed into a browser’s address bar. So I believe that webmasters should make them easy to remember (if they want people to visit their sites). But the situation is not so simple, partly due to badly configured software and partly due to humans accustomed to unfriendly URLs.

Formally, a URL consists of many parts. But in most cases a URL is not more complicated than http://example.com/foo/bar?baz=qux which has a scheme (http), host name (example.com), path (/foo/bar) and query string (baz=qux). So what’s usually wrong with these?

Web browsers usually use http if the user does not specify a schema. So many people will not notice that for secure connections the schema https is used. Since they type the URL for HTTP, secure websites put there redirects to HTTPS. Clearly since HTTP is insecure (another server may send other data), redirecting from it to HTTPS is also insecure. With newer browsers hiding the protocol used, this makes the Web less secure.

The host name also may lead to securing problems when users mistype domain names, but these are less interesting problems. In WWW many host names are prefixed with www. It clearly does not contribute any information – compare http://www.plone.org/ with http://plone.org/. The second one is shorter and contains less technical information. So it is easier to type and more friendly to people. This may be a reason why the longer URL is a redirect to the website of the Plone content management system.

The path may be an easy to read text in a hierarchy. But dynamic websites made with e.g. CGI or PHP by default include many technical data in the path and use the query string to specify which page is used. Fortunately, search engines rank websites with readable URLs higher, so this may become a lesser problem in the future.

My advice here is simple – write whole URLs, do not prefer leading www and use words instead of implementation-specific details or query strings in the rest.

Getting more from less

| No TrackBacks

One of the most commonly used programs in a Unix system is the pager less. It is inspired by more and has much greater functionality. Since it may be so simply used, reading its documentation is not necessary and many of its advanced features are not used. The aim of this post is to describe what I found reading its man page.

I use less when I do not need to edit a file or I need to view output of other program. It does not load the whole file, so it can be useful even with a dump of a whole hard disk (I used it in times of 20 GB disks). Typical text editors have problems with files of several hundred megabytes.

The most popularized feature of less is backward movement in a file. It can be easily seen that less supports arrow keys for movement in a file, but it supports also many vi-like key sequences. Of them I usually use only ng to move to nth line (Xdvi uses the same key sequence to move to nth page), / for searching with regular expressions and q to quit less.

Rarely I use less to view more than one file. The command :e allows opening a new file (giving many files on the command line is also useful, especially with shell wildcards). The files may be switched with :n and :p, meaning next and previous, respectively; and :x shows the first file. These commands also accept a number before : to use nth file instead of first in the given direction. To close a file, use :d.

Another useful feature in less is support for input processors. For example, in Gentoo GNU/Linux by default less uncompresses files in many formats, converts PDFs to text, views files in archives, etc. In Debian just enable it in the .bashrc file.

School websites, useful content as search engine optimization

| No TrackBacks

Yesterday evening I was reading the Official Google Webmaster Central Blog and some other blogs recommending designing websites for users with a consequence of improving their position in search engines. Then I had a dream that I should say about it, so I decided to write about a particular example of this – websites of schools.

Websites are for users. This statement is obviously true, but its consequences are usually ignored. So we should know who could use a website of a school. In my opinion these would be teachers, students, their parents and people who consider to became a part of one of these groups.

As a student of a Polish secondary school with IB DP and its website administrator, I was obviously interested in websites of such schools. Therefore the following arguments will apply mostly to such cases. For younger students some decisions possibly might be different than for this type of school.

So what would each of these groups want from the website? Some ideas:

teachers
some space to share learning materials with the students
students
materials which would replace the use of books (some students would help the teachers to write them)
parents
photos of their children?
future students and parents
what will be taught, what will it give them, how much it will cost, what will be the social part of it be, etc

I have no idea what a future teacher might look for.

Many months ago when I compared websites of all Polish schools with IB DP, none of them provided all of these features. Only the one which I administered had learning materials. Information for future students is limited everywhere.

In Poland there is an additional problem – we use Polish language while for IB DP English is used (French and Spanish are not used for it in Poland). Some websites were in English, some in Polish, some had two language versions. Usually parents know only Polish, students and teachers know both languages but using Polish for IB DP-related tasks may be difficult for them. Let’s assume that everything is in one language, this problem affects only real life and not this post.

What’s more interesting, we cannot measure how successful the websites are. It is impossible to determine how changes in the website would affect the number of new IB DP students in a given school. Using the above list of materials useful for particular groups of people, most of these websites may be considered worthless. So let’s assume that brochureware websites work if they can be found. So the position in a search engine search determines it.

Search engines index content. They prefer content to which useful sites link. So publicly available learning materials are good for this, since others might find them useful and link to them.

It has another advantage – people completely unrelated to IB DP might use it. Most popular search queries for my school’s website were for short stories usually read in gimnazja, schools immediately before secondary schools. We had these short stories also at IB DP Language A1. So I believe that a student might want to attend a school which published material which they used before? Maybe it is known for MIT but not for IB DP schools in Poland.

Raster graphics files for the Web

| No TrackBacks

On the Web mainly two raster graphics format are used – PNG and JPEG. They both have specific uses and are nearly useless in others.

PNG is used for computer generated graphics, i.e. everything that is not a photo. Most of these are generated from vector graphics formats, so SVG should be better for them, but most web browsers have better support for PNG.

Many years ago the GIF format was used instead of PNG. Then there were patents on an algorithm used for GIF compression, which led to the development of PNG. Now non-animated GIF is clearly obsolete, since PNG supports higher quality graphics (e.g. by having 24-bit colour or gamma correction) and has smaller file size due to better compression.

JPEG files are used only for photos. This is because they use a compression algorithm which decreases quality in a way visible in most graphics except photos. It also does not provide such features as alpha transparency which are useless for photos.

Compression algorithms for both PNG and JPEG may make many compressed files which can be decompressed to the same graphics (most compressed file formats have this property which has very uncommon uses). In case of PNG, there are several parameters selected by the compressing software. The choice of optimal values of these parameters is difficult without many slow trials, so there are programs like OptiPNG optimizing PNGs to decrease their size without any change in decompressed image. For JPEGs there are similar programs, e.g. jpegoptim. Both of these programs largely decreased sizes of some files which I recently used and generated by Inkscape or the GIMP.

Some improvements in an old flyer typeset in LaTeX

| No TrackBacks

In September and October 2007 I designed and typeset a flyer for my school using LaTeX. Today I corrected some small typographic problems of this flyer and noticed how better it might be with the improvements in TeX distributions and my knowledge of LaTeX.

The flyer was mostly designed interactively in school consulting every change with the coordinators of international programmes described in the flyer. That is, I used a SSH client (most probably OpenSSH in Cygwin with X11; the school uses a difficult operating system) to connect with my FreeBSD server at home with Emacs 21 and LaTeX from teTeX 3. After each change to the source the resulting PDF file was downloaded using HTTP to a local PDF viewer. All of this was done using a 512 kbps connection; we waited several minutes for each transfer of about 12 megabytes PDF file.

Things have changed in last two years. Now I use a laptop with newer (i.e. less tested) software and the viewed PDF may be updated in several seconds. Emacs 23 pre-releases have support for nicely antialiased fonts and Unicode, making it much more comfortable to use than xterm. But the most important for the flyer are changes in TeXLive and how I use it.

The flyer has three typographic improvements – correct hyphenation of ‘diploma’ (by adding \hyphenation{dip-lo-ma} to the preamble), much better line breaking by font expansion (now scalable Concrete Roman fonts in T1 encoding are available, without them it would be very difficult) and first paragraphs of each section are not indented (each such texts begins by a macro to whose definition I added \noindent).

Also contact data changed. Maybe it is an appropriate reason for improvements in typeset texts for which new experience may be used?

Why I use Computer Modern and its derivatives

| No TrackBacks

Many free Latin fonts are available in TeX distributions, but for most of my texts I use Latin Modern, a Computer Modern derivative with better support for most European languages. The aim of this post is to describe my main reasons for this choice.

A one of the advantages of Computer Modern results clearly from its origin. TeX, MetaFont and Computer Modern were written for one task – typesetting The Art of Computer Programming by Donald E. Knuth, although they are useful for many other texts. Originally TAOCP was typeset with Monotype Modern metal fonts. Then modern typefaces were used for nearly everything, but now they are used mostly for academic texts. So using a typeface in the modern style makes a text look more academic.

Another advantage is support for high-quality typesetting of most mathematical notations. This is a clear consequence of the original task, analysis of algorithms requires complex (in non-mathematical meaning of this word) mathematics, while non-TeX software had problems with mathematical typography (thirty years later it still has).

Computer Modern is the most well-known meta-font. It has a parametric description which is used with certain values of 62 parameters to define nearly hundred fonts. There are no other fonts for which adding a complete different style (e.g. slab serif Computer Concrete Roman) is easy. Usually amount of work is directly proportional to the number of fonts. With meta-fonts it is simple and allows things which were impossible before (nicely illustrated in a paper written by Donald Knuth, ‘The Concept of a Meta-font’ available in his book Digital Typography). Except Lucida, Palatino and Bitstream Vera, I haven’t seen a font family containing nicely matched serif and sans-serif fonts which is not a meta-font.

A nice example of advantages of meta-fonts is the support for optical scaling in Computer Modern. There are different designs for different font sizes (in Computer Modern Roman 5, 6, 7, 8, 9, 10, 12 and 17 points). It would require about eight times more work for a non-parametric font, so it was done only when necessary, i.e. when the fonts were made from lead. Therefore other fonts use affine scaling to support different type sizes, which makes too small text too light and too narrow to be nicely read.

Since Computer Modern is the default typeface in TeX, several derivative meta-fonts have been made for languages different than English. They use the same (or very similar) files with parameter values, so e.g. adding Concrete Roman support for them is trivial.

Later, when PostScript Type 1 font format became popular, outline fonts became popular. It is mathematically difficult to make an outline meta-font, so the most popular way of converting fonts like Computer Modern to this format was determining outlines of bitmaps generated from the fonts. In this way outline variants of all canonical Computer Modern fonts were made and on them Latin Modern fonts are based. They still have all advantages of the specific designs available, but making new ones is not as simple as for meta-fonts.

After an attempt to watch a DVD movie

| No TrackBacks

I bought a film on a DVD (legally, at least where I live). The film, South Park: Bigger, Longer & Uncut, is about freedom of speech and politics limiting it. I won’t write about it now, since the technology controlling such things is interesting enough for a blog post (or more).

When using a Gentoo GNU/Linux workstation, with nearly only free software (but a non-free graphics card driver did not affect this situation in a visible way), there was no problem with this technology. Although in the US it would be possibly illegal to watch an encrypted DVD with free software, it is legal to use such software in Poland. Gentoo is an American GNU/Linux distribution, but being a source-based distribution, it does not provide software violating US patents or the DMCA (known for prohibiting circumventing DRMs like the one based on DVD encryption). The user may compile it themselves.

Several days later I wanted to watch the movie again. For this I used my laptop which has Debian GNU/Linux installed. Since Debian is a binary distribution, all compiled code is shared by it. Therefore US law (also allowing software patents) limits software available in such distributions everywhere.

Clearly this did not work in the intended way. After installing necessary software from a repository outside US, it also did not work. I do not remember if on this laptop DVD movies could be watched with Gentoo (Update: now I know that with Gentoo it also does not work, next time I will buy DVD hardware not made by Matshita). The messages from the kernel log suggest that the drive’s firmware prohibits using DVD from this region (a DVD from EU used in EU in a laptop bought in EU).

There are two solutions. Use a different computer with software which cannot be used in the US, or avoid DVDs. Or (unless the Spanish Inquisition will do it sooner), tell others how DVDs are designed to limit freedom of their users.

liability-deltoid