In mathematics there are no numbers, but only some specific sets with the word ‘numbers’ in their names. Similarly, in C programming language and many others there are different numbers for use in different situations. The aim of this post is to compare them and show when they might be useful.
We assume that integers are a countable set of all natural numbers (zero and every successor of a natural number) with positive or negative sign and unsigned zero. This is clearly not possible to represent exactly in digital way in a finite amount of silicon, so ordinary computers don’t use these numbers.
The simplest value in a classical computer is a bit representing one of two values. For the most typical use of a single bit the bool type from the C99 standard is used, with the values named true and false (C++ also has this type). Although it stores a single bit of information, it is usually aligned to at least a byte, so sometimes bit fields are used instead, leading to more complicated code using less memory.
Therefore integers modulo a large number are used instead. For large values they are very unintuitive, as shown in an XKCD comic.
A non-negative integer is simply represented in binary as several bits. For this the types unsigned char (typically one octet), unsigned short (usually 16 bits), unsigned int (usually 32 bits), and unsigned long (64 bits on 64 bit architectures or 32 bits else) are used.
The methods for representing negative numbers are interesting. Some lead to a negative zero, others make more negative numbers than positive numbers (usually I would write ‘ones’ instead of the second ‘numbers’ here, but it could mean that in these systems there are at least two negative numbers). C represents these numbers as the above without unsigned or with signed instead. Many other programming languages do not have separate types for non-negative numbers.
An interesting algorithm showing the difference in different integer representations is presented by William Gosper in the HAKMEM, item 154 (page 74).
