Initial Performance Testing
Monday, August 2, 2021 - Peter O'Connor (Performance Guy)
With further progress on
boulder, we can now build native stone packages with some easy tweaks such as profile
guided optimizations (PGO) and link time optimizations (LTO). That means we can take a first look at what the
performance of the first cut of Serpent OS shows for the future. The tests have been conducted using
benchmarking-tools with Serpent OS measured in a
the same host with the same kernel and config.
One of the key focuses for early in the project is on reducing build time. Every feature can either add or subtract
from the time it takes to produce a package. With a source/binary hybrid model, users will greatly benefit from the
faster builds as well. In terms of what I’ve targeted in these tests is the performance of
clang and testing some
compiler flag options on
Clang Shows its Promise
clang has always been a compiler with a big future. The performance credentials have also been improving each release
and are now starting to see it perform strongly against its
GNU counterpart. It is common to hear that
clang is slow
and produces less optimized code. I will admit that most distros provide a slow build of
clang, but that will not be
the case in Serpent OS.
It is important to note that in this comparison the Host distro has pulled in some patches from
LLVM-13 that greatly
improved the performance of
clang. Prior to this, their tests actually took
50% longer for
10% longer for compiling.
boulder does not yet support patching in builds so the packages are completely
|Test using clang||Serpent||Host||Difference|
Based on the results during testing, the performance of
clang in Serpent OS still has room to improve and was just a
quick tuning pass. At stages where I would have expected to be ahead already, the compile performance was only equal
configure were still well ahead).
GCC Matters Too!
clang is the default compiler in Serpent OS, there may be instances where the performance is not quite where it
could be. It is common to see software have more optimized code paths where they are not tested with
clang upstream. As
an example, here’s a couple of patches in flac (1,
2) that demonstrate this being improved.
benchmarking-tools, it is easy to see where
clang builds are running different functions via
In circumstances where the slowdown is due to hitting a poor optimization paths in
clang, we always have the option to
build packages using
gcc, where the
GNU toolchain is essential for building
glibc. Therefore having a solid
toolchain is important but small compile time improvements won’t be noticed by users or developers as much.
|Test using gcc||Serpent||Host||Difference|
An OS is More Than Just a Compiler
While the current bootstrap exists only as a starting point for building the rest of Serpent OS, there are some other packages we can easily test and compare. Here’s a summary of those results.
State of the Bootstrap
From my experiences with testing the bootstrap, it is clear there’s some cobwebs in there that require some more iterations of the toolchain. There also seems to be some slowdowns in not including all the dependencies of some packages. Once more packages are included, naturally all the testing will be redone and help influence the default compiler flags of the project.
It’s not yet clear the experience of using
libstdc++ with the
clang compiler. Once the cobwebs are out and
Serpent OS further developed, the impact (if any) should become more obvious. There are also some parts not yet included in
boulder such as stripping files, LTO and other flags by default that will speed up loading libraries. At this stage this is
deliberate until integrating outputs from builds (such as symbol information).
But this provides an excellent platform to build out the rest of the OS. The raw speed of the
clang compiler will make
iterating and expanding the package set a real joy!
Hang On, What’s Going on With Python?
Very astute of you to notice!
python in its current state is an absolute minimal build of
python in order to run
However, I did an
analyze run in
benchmarking-tools where it became obvious that they were doing completely different
For now I’ll simply be assuming this will sort itself out when
python is built complete with all its functionality. And
before anyone wants to point the finger at
clang, you get the same result with