Recently in writing Category

Common internationalization problems

| No TrackBacks

Some time ago I wrote about localization of software. This post describes some problems in using a program in language other than American English except the two trivial ones – not having a single language used by everyone or a program without localization. It is based on my experience in using free software localized to Polish, but it should apply to some other European inflected languages. Some ‘localization’ mistakes can be easily observed even in English.

In these situations translations are often incorrect:

sentence/title construction
‘Remove icon’ is clearly correct, maybe in English ‘Remove Icon’ would be also accepted. But in Polish ‘Usuń Ikona’ is incorrect. There are two problems here: lack of inflection and incorrect capitalization. In this case the problem is caused by using the normal name of the object with a general removal text. It would be solved by each object having a separate ‘Remove X’ text, e.g. ‘Remove icon’ translated into ‘Usuń ikonę’ (although it won’t make translators avoid using incorrect capitalization in their texts). The GNU Coding Standards show a different example of this.
using a single text for counted objects
‘N comments’ is a good example of this. Even in English I have found programs using the form ‘1 comments’ or ‘N comment(s)’. In Polish it is more difficult with three plurals, as stated by the GNU Coding Standards. Fortunately, for positive numbers the problem is completely solved by e.g. GNU Gettext, although having a different form for zero objects would be still better (e.g. ‘no comments’).
ignoring the grammatical gender
This may occur in construction of text about such objects as icons or floppy disks, but it is commonly found on the Web in texts about users. In English ‘he’ or ‘she’ are rarely used in messages about the user, but in many Indo-European languages nearly everything depends on gender. Fortunately, some software begins to support specifying grammatical gender of its user, like MediaWiki. (It is interesting that many roguelikes require the user to specify their gender, although they support only English.)
non-ASCII punctuation
Again, this problem can be easily shown in English. A common web browser separates its name from the page title by a hyphen while a dash should be used. Our language has also different apostrophes and quote marks than typewriters of our ancestors. For Polish it is more difficult, since even in print inner quote marks are usually put in incorrect order.

There is one simple solution – write a program which uses completely correct English and let translators correct it until it will be correct in other languages.

Will we read essays written by computers?

| No TrackBacks

After using the ‘random’ comic link several times on XKCD, I found one about the Turing test. When I was an IB DP student some people though that some of my essays were written by computer programs (I have heard similar opinions on nearly every text which I have translated from English to Polish). So if an essay written by a human may not pass the Turing test, may a text written by a computer be considered useful for us?

This is obviously true for most texts, if all textual program output is considered a text. So a stricter definition of text is needed to make this question useful. A standard essay for an English writing exam might be appropriate, since they clearly express several useful criteria, like having interestingly complicated grammar use and discombobulating message with clearly visible personal involvement.

It is clearly difficult to describe an essay in an algorithm. Although clear description of ideas is one of the largest problems in essay-writing, a program converting a trivial description of reasoning into an essay would be useful. Essays involve many examples which should not be the same in every student’s work, so a large database of facts could be used to add examples for some theses.

So with a given message, the essay would be written with many encyclopedic examples and as complicated grammatical structure as foreseen by the authors of the program. From grammar point of view, it is nearly impossible to map an English sentence to an abstract thought representation, but the reverse process, which would be used in the program, would be simple. A problem would occur when the generated sentence has other meanings unknown to the computer, but it is a problem also for human students.

It could be interesting how a program would represent all facts which could be used in an essay. Humans use large collections of useful facts written in the English or equivalent language (formally, languages are not isomorphic due to the Sapir-Whorf’s hypothesis, but all popular languages have the same drawback for this use). Therefore to write it is necessary to read which is too difficult for computers.

Maybe with a formal notation for facts useful in essays and a formal description of an essay, a computer would be able to write a highly marked essay. But I do not believe that for a human it would be simpler to write such program and its data than to write a good essay. (I hope that a computer will quote a part of this blog entry in an essay and explain an opposite point of view.)

Some semi-technical notes on writing

| No TrackBacks

Notes written on September 28, 2008 and slightly modified.

Presentations using slides

Issues discussed by Don McMillan in ‘Life after death by PowerPoint’:

  • do not put every word you are going to say on slide – otherwise you would not be needed for the audience
  • use spell-checking software – many people will have much time to see your text [originally this text was not spell-checked]
  • bullet only key points – bullets are for shooting people or ideas, when too many are used the key ideas will not stand out
  • avoid bad colour combinations
  • do not use too much slides
  • do not put too much data on a graph – it should be easy to read and convey only the important information. Avoid three-dimensional effects and useless labels.
  • animations should not distract
  • the font used says something about you.

My own advice not discussed above:

  • do not use slow and strange visual effects – it is not the Monty Python’s Flying Circus but a presentation to convey some information
  • avoid useless images, etc
  • use TeX if you use any bit of mathematical notation
  • check how much time the presentation will take

Graphs

Issues discussed by Don McMillan:

  • do not put too much data on a graph – it should be easy to read and convey only the important information. Avoid three-dimensional effects and useless labels.

My own observations:

  • remember to check if your software interprets the x-values as numbers, not labels, for a sequence of y-values
  • use sane number format

Fonts

When using Polish ogonek (the diacritical mark used in Polish ‘ą’, ‘ę’ and their uppercase variants):

  • do not use Microsoft Core Fonts for the Web – they made it wrongly
  • use fonts made by GUST, e.g. Latin Modern or TeX Gyre
  • do not substitute it by cedilla (used correctly in ‘ç’) or any quote mark
  • before using a different font with it, learn if it is designed correctly.

For Polish ‘ł’ and ‘Ł’:

  • do not use Computer Modern or any OT1-encoded font – the bar on ‘Ł’ is larger than on ‘ł’, all CM-descendant fonts containing these characters are correct
  • do not substitute it by other characters – there is no reason to do it