October 2009 Archives

Six tips for optimization of homework C programs

| No Comments | No TrackBacks

A nice aspect of formal computer science education is that it requires writing useless programs in a low-level programming language like C which pass automated tests with harsh time and memory limits. The following list shows several methods which I used to optimize such programs and examines their usefulness for ‘real’ programs.

1. Don’t optimize uncompleted programs

Eric S. Raymond explains this by the fact that incomplete programs are not understood completely, so a thing considered not an optimization may make the program slower when it will be completed. In case of small programs for batch processing of data there is another reason – slow but easy to understand program may generate output which will be compared to output of optimized versions.

For real programs there is another advantage – a simple prototype might be used to describe the design and allow much faster development than a fast program written in C. Once I wrote a trivial Python script to explain a limitation of standard Unix program uniq. It was so simple that in next few days I decided to implement a complete uniq-like program in C.

2. Don’t optimize fast enough programs

In case of homeworks the aim is usually to make a program passing some automated tests. If it is possible to test the program many times, it would be not useful to optimize it after all tests are passed in required time.

In real life programs are used with more than several different input files, so this argument does not apply to non-homework programs. But there the Moore’s Law makes most optimization pointless.

3. Measure before any change

It is very difficult for humans to predict performance effect of a change in a program. For example, C programs with a loop consisting of billions of pre- or postincrements of a variable saved to another variable produce on my machine assembly code differing only by the order of instructions in the loop. It looks obvious that these programs would work for the same amount of time, but the one with preincrements was faster by about 25% when using an AMD Phenom processor. In this case the correct solution was to enable compiler optimizations which removed the whole loop, but in many situations it is better to make small changes and compare their effect on the program performance.

In real Unix programs profilers are used for this, but for programs containing only one function it is appropriate to just measure the time spent running the whole program on a large input.

If a typical input takes one microsecond to process, then it might be necessary to write a script making much larger inputs. Since they would be too large to have output correctness verified by human, an unoptimized ‘correct’ implementation of the program being improved is useful to generate output for regression testing.

4. Don’t use dynamic memory allocation

As explained by Joel Spolsky, malloc and similar functions use a slow algorithm to decide which memory is free. This probably could have been improved since 2001, but still is slow. Fortunately many homework problems specify limits on input size which may allow using statically allocated arrays for everything.

In real programs malloc is often avoided by ignoring characters after the 2047 column of a line or by crashing the program when they occur. An advantage of many GNU programs is that they do not limit input length and dynamically allocate input buffers. Therefore this tip is mostly useless for non-homework problems.

5. Don’t copy memory

CPUs are thousands of times faster than memory. Therefore operations on large amounts of memory become the performance bottleneck of programs without significant I/O.

Performance of a program can be improved by using a single buffer for e.g. an input line, instead of copying it many times. In some cases code reading the input might do some calculations on it which will lead to storing much less data in memory. This might be a useful improvement in case of programs performing O(n) algorithms on these data.

6. Write small and simple programs

Since memory is slow, modern CPUs store some of it in faster and more expensive caches. Therefore smaller code might be stored completely in cache and avoid slow memory access, as explained by Raymond.

This is done mainly by removing optimizations designed for older processors. A nice example of this is the removal of many uses of Duff’s device from the XFree86 X11 server. As stated by Theodore Ts’o, ‘by eliminating all instances of Duff's Device from the XFree86 4.0 server, the server shrunk in size by _half_ _a_ _megabyte_ (!!!), and was faster to boot’.

Using Gentoo on a server without C++ compiler

| 1 Comment | No TrackBacks

After reading a post by Diego E. Pettenò about replacing groff with heirloom-doctools leading to having nearly no C++ programs in a Gentoo system, I tried to rebuild all packages on my Gentoo-using server without C++. The main difference is that my computer compiles all packages instead of using another system for this, so I need also all build dependencies of useful packages compiling and working without C++.

I started by adding the nocxx -cxx USE flags to everything except GCC. Rebuilding affected packages showed that app-arch/lzma-utils has programs written in C++ for everything except decompression. So I checked on my unstable Gentoo workstation that its app-arch/xz-utils does not have programs linking to libstdc++, which is nearly equivalent to using C++. Therefore I added app-arch/xz-utils to /etc/portage/package.keywords and installed it. Now both LZMA and XZ compression formats may be both compressed and decompressed, without any C++ code.

Since C++ support is assumed to be in every system, the package manager cannot determine which packages use it. Therefore I decided to check which programs link to the C++ standard library. The following shell script lists all files on $PATH which link to libraries containing ++ in their names:

#!/bin/sh

for directory in `echo $PATH | sed 's/:/ /g'`;
do
  for f in $directory/*
  do
    if file $f | grep ELF > /dev/null
    then
      list=`readelf -dw $f | grep Shared \
        | sed -r 's/^.*\[(.*)\].*$/\1/' | fgrep ++`
      if [ $? = 0 ]
      then
        echo $f
      fi
    fi
  done
done

Passing the output of the script to xargs qfile -q | sort -u listed packages having these files.

The only necessary packages on my system listed by the above pipe were app-arch/lzma-utils and sys-apps/groff, but they can be easily replaced by app-arch/xz-utils and app-doc/heirloom-doctools.

Then I compiled sys-devel/gcc with the nocxx USE flag and ran emerge -ev --keep-going world.

Next day I saw that build failed for 27 packages. Some of them, like sys-apps/sed had just failing tests for completely unrelated reasons. Many other packages, like sys-devel/libtool have tests using C++. So I recompiled all such packages with FEATURES=-test. Now only 12 packages failed.

The rest failed mostly when running the configure script, with messages like this:

checking how to run the C++ preprocessor... /lib/cpp
configure: error: C++ preprocessor "/lib/cpp" fails sanity check

These were caused by useless checks for C++ compiler made by older versions of libtool. In case of dev-libs/popt and sys-apps/shadow updating to the testing version solved this, but for some other packages adding eautoreconf to their ebuilds was necessary.

After these changes the only packages with build errors were dev-libs/apr, net-libs/courier-authlib and net-mail/courier-imap. The first one also fails at the configure check, while the rest uses some C++ code which is not installed.

The modified ebuilds are available in my overlay. I haven’t reported any of the problems observed to the Gentoo Bugzilla, since I cannot access it today and the changes would just make compilation slower in all normal uses.

liability-deltoid