Comparing performance of common Unix shells

| No TrackBacks

Probably most programs written by a user of a GNU/Linux operating system are scripts interpreted by programs inspired by the Bourne shell. Although most of their work is either interactive (so most probably faster than a human can see) or done by efficient C programs, it would be interesting to compare how the choice of shell affects the time needed to run some scripts.

Most GNU/Linux systems use Bash as their only shell. This is different on BSD derivatives like FreeBSD using ash for scripting and tcsh (a C shell derivative with largely different syntax than other shells) for interactive use.

Most shell scripts do not use specific features of any shell and need just a mostly-POSIX-compatible shell like dash (an ash derivative) or Bash. Therefore they specify /bin/sh as their interpreter, which is always such shell. In most GNU/Linux distributions Bash is used as /bin/sh, while in BSDs ash is used, and Ubuntu and Debian Squeeze use dash. Therefore many scripts using Bash-specific features declare incorrectly to be used with the default shell and fail on Ubuntu or FreeBSD.

Avoiding the above problem by testing scripts with shells having only the features required by POSIX is not the only reason to use non-Bash shells for scripting. The dash shell is faster then Bash, this is why it was proposed for Debian Lenny release to use dash as the default shell for scripts.

To check how time performance of different shells differs, I wrote several trivial scripts which can be interpreted by the popular POSIX-like shells. Two of the scripts calculate factorials using different recursive algorithms (one is the ‘standard’ definition used in mathematical textbooks, the other one is the tail-recursive one used in functional programming textbooks), another one calculates elements of the Fibonacci sequence using the recursive definition, the fourth one just calls the shell about one hundred times to check how slow is its initialization. I haven’t seen a real shell script doing such things, but the ones which I normally use depend mostly on other program performance or use Bash-specific features. Another script calculates average time spent by each script and shell combination from ten runs (one additional run of each is done before counting, since this needs loading the shell from the disk) and outputs the result in a simple to parse format.

I compared six shells available in Gentoo GNU/Linux ebuilds sys-apps/busybox-1.15.2, app-shells/bash-4.0_p35, app-shells/dash-0.5.5.1.2, app-shells/mksh-39, app-shells/pdksh-5.2.14-r4, app-shells/zsh-4.3.10. The average times in seconds on the machine which I’m using calculated by the script are:

Scriptbbdashbashzshmkshpdksh
tail-recursive factorial0.120.0830.2540.230.1220.117
standard factorial0.1060.0840.2290.2420.120.121
Fibonacci sequence1.0610.8012.1772.0631.0441.301
recursive shell invocation0.3410.2690.5151.910.3780.349

For all above tests dash is the fastest, BusyBox and Korn shell variants have similar performance, while Bash or zsh is the slowest one. Bash was two to three times slower than dash for these tests.

Of course, real scripts are something completely different. Probably everyone who wants to write functional programs knows more appropriate languages than POSIX shells. Also, extensions of many shells probably might make them faster for some scripts using them. The main reason for shell scripting is the ease of writing trivial scripts similar to commands written for daily interactive use. Therefore it is more useful to write a simple script and rewrite it in a better language when needed.

The scripts used for the above calculations are available in my Mercurial repository. The main script is licensed under the GNU General Public License, version 3 or later, while the tested scripts are public domain, since I hope that these are too unoriginal to be copyrightable.

No TrackBacks

TrackBack URL: http://blog.mtjm.eu/cgi-bin/mt/mt-tb.cgi/13