Initial Performance Testing

With further progress on boulder, we can now build native stone packages with some easy tweaks such as profile guided optimizations (PGO) and link time optimizations (LTO). That means we can take a first look at what the performance of the first cut of Serpent OS shows for the future. The tests have been conducted using benchmarking-tools with Serpent OS measured in a chroot on the same host with the same kernel and config.

One of the key focuses for early in the project is on reducing build time. Every feature can either add or subtract from the time it takes to produce a package. With a source/binary hybrid model, users will greatly benefit from the faster builds as well. In terms of what I’ve targeted in these tests is the performance of clang and testing some compiler flag options on cmake.

Clang Shows its Promise

clang has always been a compiler with a big future. The performance credentials have also been improving each release and are now starting to see it perform strongly against its GNU counterpart. It is common to hear that clang is slow and produces less optimized code. I will admit that most distros provide a slow build of clang, but that will not be the case in Serpent OS.

It is important to note that in this comparison the Host distro has pulled in some patches from LLVM-13 that greatly improve the performance of clang. Prior to this, their tests actually took 50% longer for cmake and configure but only 10% longer for compiling. boulder does not yet support patching in builds so the packages are completely vanilla.

Test using clangSerpentHostDifference
cmake LLVM5.89s10.58s79.7%
Compile -j4 llvm-ar126.16s155.32s23.1%
configure gettext36.64s63.55s73.4%

Based on the results during testing, the performance of clang in Serpent OS still has room to improve and was just a quick tuning pass. At stages where I would have expected to be ahead already, the compile performance was only equal (but cmake and configure were still well ahead).

GCC Matters Too!

While clang is the default compiler in Serpent OS, there may be instances where the performance is not quite where it could be. It is common to see software have more optimized code paths where they are not tested with clang upstream. As an example, here’s a couple of patches in flac (1, 2) that demonstrate this being improved. Using benchmarking-tools, it is easy to see where gcc and clang builds are running different functions via perf results.

In circumstances where the slowdown is due to hitting poor optimization paths in clang, we always have the option to build packages using gcc, where the GNU toolchain is essential for building glibc. Therefore having a solid GNU toolchain is important but small compile time improvements won’t be noticed by users or developers as much.

Test using gccSerpentHostDifference
cmake LLVM7.00s7.95s13.6%
Compile llvm-ar168.11s199.07s18.4%
configure gettext45.45s51.93s14.3%

An OS is More Than Just a Compiler

While the current bootstrap exists only as a starting point for building the rest of Serpent OS, there are some other packages we can easily test and compare. Here’s a summary of those results.

TestSerpentHostDifference
Pybench1199.67ms1024.33ms-14.6%
xz Compress Kernel (-3 -T1)42.67s46.57s9.1%
xz Compress Kernel (-9 -T4)71.25s76.12s6.8%
xz Decompress Kernel8.03s8.18s1.9%
zlib Compress Kernel12.60s13.17s4.5%
zlib Decompress Kernel5.14s5.21s1.4%
zstd Compress Kernel (-8 -T1)5.77s7.06s22.3%
zstd Compress Kernel (-19 -T4)51.87s66.52s28.3%
zstd Decompress Kernel2.90s3.08s6.3%

State of the Bootstrap

From my experiences with testing the bootstrap, it is clear there’s some cobwebs in there that require some more iterations of the toolchain. There also seems to be some slowdowns in not including all the dependencies of some packages. Once more packages are included, naturally all the testing will be redone and help influence the default compiler flags of the project.

It’s not yet clear the experience of using libc++ vs libstdc++ with the clang compiler. Once the cobwebs are out and Serpent OS further developed, the impact (if any) should become more obvious. There are also some parts not yet included in boulder such as stripping files, LTO and other flags by default that will speed up loading libraries. At this stage this is deliberate until integrating outputs from builds (such as symbol information).

But this provides an excellent platform to build out the rest of the OS. The raw speed of the clang compiler will make iterating and expanding the package set a real joy!

Hang On, What’s Going on With Python?

Very astute of you to notice! python in its current state is an absolute minimal build of python in order to run meson. However, I did an analyze run in benchmarking-tools where it became obvious that they were doing completely different things.

Apples and oranges comparison

For now I’ll simply be assuming this will sort itself out when python is built complete with all its functionality. And before anyone wants to point the finger at clang, you get the same result with gcc.