This is the second article in a series of 4 articles on GNU/Linux Benchmarking, to be published by the Linux Gazette. The first article presented some basic benchmarking concepts and analyzed the Whetstone benchmark in more detail. The present article deals with practical issues in GNU/Linux benchmarking: what benchmarks already exist, where to find them, what they effectively measure and how to run them. And if you are not happy with the available benchmarks, some guidelines to write your own. Also, an application benchmark (Linux kernel 2.0.0 compilation) is analyzed in detail.
GNU/Linux is a great OS in terms of performance, and we can hope it will only get better over time. But that is a very vague statement: we need figures to prove it. What information can benchmarks effectively provide us with? What aspects of microcomputer performance can we measure under GNU/Linux?
Kurt Fitzner reminded me of an old saying: "When performance is measured, performance increases."
Let's list some general benchmarking rules (not necessarily in order of decreasing priority) that should be followed to obtain accurate and meaningful benchmarking data, resulting in real GNU/Linux performance gains:
These are some benchmarks I have collected over the Net. A few are Linux-specific, others are portable across a wide range of Unix-compatible systems, and some are even more generic.
doom -timedemo demo3. Anton Ertl has setup a Web page listing results for various architectures/OS's.
All the benchmarks listed above are available by ftp or http from the Linux Benchmarking Project server in the download directory: www.tux.org/pub/bench or from the Links page.
We have seen last month that (nearly) all benchmarks are based on either of two simple algorithms, or combinations/variations of these:
We also saw that the Whetstone benchmark would use a combination of these two procedures to "calibrate" itself for optimum resolution, effectively providing a workaround for the low resolution timer available on PC type machines.
Note that some newer benchmarks use new, exotic algorithms to estimate system performance, e.g. the Hint benchmark. I'll get back to Hint in a future article.
Right now, let's see what algorithm 2 would look like:
start_time = time()
until loop_count = 0
duration = time() - start_time
Here, time() is a system library call which returns, for example, the elapsed wall-clock time since the last system boot. Benchmark_kernel() is obviously exercising the system feature or characteristic we are trying to measure.
Even this trivial benchmarking algorithm makes some basic assumptions about the system being tested and will report totally erroneous results if some precautions are not taken:
You can substitute the benchmark "kernel" with whatever computing task interests you more or comes closer to your specific benchmarking needs.
Examples of such kernels would be:
For good examples of actual C source code, see the UnixBench and Whetstone benchmark sources.
The more one gets to use and know GNU/Linux, and the more often one compiles the Linux kernel. Very quickly it becomes a habit: as soon as a new kernel version comes out, we download the tar.gz source file and recompile it a few times, fine-tuning the new features.
This is the main reason for proposing kernel compilation as an application benchmark: it is a very common task for all GNU/Linux users. Note that the application that is being directly tested is not the Linux kernel itself, it's gcc. I guess most GNU/Linux users use gcc everyday.
The Linux kernel is being used here as a (large) standard data set. Since this is a large program (gcc) with a wide variety of instructions, processing a large data set (the Linux kernel) with a wide variety of data structures, we assume it will exercise a good subset of OS functions like file I/O, swapping, etc and a good subset of the hardware too: CPU, memory, caches, hard disk, hard disk controller/driver combination, PCI or ISA I/O bus. Obviously this is not a test for X server performance, even if you launch the compilation from an xterm window! And the FPU is not exercised either (but we already tested our FPU with Whetstone, didn't we?). Now, I have noticed that test results are almost independent of hard disk performance, at least on the various systems I had available. The real bottleneck for this test is CPU/cache performance.
Why specify the Linux kernel version 2.0.0 as our standard data set? Because it is widely available, as most GNU/Linux users have an old CD-ROM distribution with the Linux kernel 2.0.0 source, and also because it in quite near in terms of size and structure to present-day kernels. So it's not exactly an out-of-anybody's-hat data set: it's a typical real-world data set.
Why not let users compile any Linux 2.x kernel and report results? Because then we wouldn't be able to compare results anymore. Aha you say, but what about the different gcc and libc versions in the various systems being tested? Answer: they are part of your GNU/Linux system and so also get their performance measured by this benchmark, and this is exactly the behaviour we want from an application benchmark. Of course, gcc and libc versions must be reported, just like CPU type, hard disk, total RAM, etc (see the Linux Benchmarking Toolkit Report Form).
Basically what goes on during a gcc kernel compilation (make zImage) is that:
Step 2 is where most of the time is spent.
This test is quite stable between different runs. It is also relatively insensitive to small loads (e.g. it can be run in an xterm window) and completes in less than 15 minutes on most recent machines.
Do I really have to tell you where to get the kernel 2.0.0 source? OK, then: ftp://sunsite.unc.edu/pub/Linux/kernel/source/2.0.x or any of its mirrors, or any recent GNU/Linux CD-ROM set with a copy of sunsite.unc.edu. Download the 2.0.0 kernel, gunzip and untar under a test directory (
tar zxvf linux-2.0.tar.gz will do the trick).
Cd to the linux directory you just created and type
make config. Press <Enter> to answer all questions with their default value. Now type
make dep ; make clean ; sync ; time make zImage. Depending on your machine, you can go and have lunch or just an expresso. You can't (yet) blink and be done with it, even on a 600 MHz Alpha. By the way, if you are going to run this test on an Alpha, you will have to cross-compile the kernel targetting the i386 architecture so that your results are comparable to the more ubiquitous x86 machines.
This is what I get on my test GNU/Linux box:
186.90user 19.30system 3:40.75elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (147838major+170260minor)pagefaults 0swaps
The most important figure here is the total elapsed time: 3 min 41 s (there is no need to report fractions of seconds).
If you were to complain that the above benchmark is useless without a description of the machine being tested, you'd be 100% correct! So, here is the LBT Report Form for this machine:
LINUX BENCHMARKING TOOLKIT REPORT FORM
Core clock:208 MHz (2.5 x 83MHz)
Motherboard vendor: ASUS
Mbd. model: P55T2P4
Mbd. chipset: Intel HX
Bus type: PCI
Bus clock: 41.5 MHz
Cache total: 512 Kb
Cache type/speed: Pipeline burst 6 ns
SMP (number of processors): 1
Total: 32 MB
Type: EDO SIMMs
Speed: 60 ns
Size: 4.3 GB
Driver/Settings: Bus Master DMA mode 2
Vendor: Generic S3
Video RAM type: 60 ns EDO DRAM
Video RAM total: 2 MB
X server vendor: XFree86
X server version: 3.3
X server chipset choice: S3 accelerated
Resolution/vert. refresh rate: 1152x864 @ 70 Hz
Color depth: 16 bits
Swap size: 64 MB
libc version: 5.4.23
Very light system load.
Linux kernel 2.0.0 Compilation Time: 3 m 41 s
Whetstone Double Precision (FPU) INDEX: N/A
UnixBench 4.10 system INDEX: N/A
BYTEmark integer INDEX: N/A
BYTEmark memory INDEX: N/A
Just tested kernel 2.0.0 compilation.
Again, you will want to compare your results to those obtained on different machines/configurations. You will find some results on my Web site about 6x86s/Linux, in the November News page.
This of course is pure GNU/Linux benchmarking, unless you want to go ahead and try to cross compile the Linux kernel on a Windows95 box!? ;-)
I expect that by next month you will have downloaded and tested a few benchmarks, or even started writing your own. So, in the next article: Collecting and Interpreting Linux Benchmarking Data