Who Knew Building Packages Could be so Easy?

The goal of boulder is to make creating packages so easy that anyone could do it! For veterans it means that time can be spent on the more important parts of the distro. As boulder and moss share the same codebase, the advantages built into moss also benefit boulder! There's very little to learn before getting started!

Reducing the time spent packaging saves everyone time, over and over again. This is why the process is highly tuned to minimize the time at each step. Packagers build a lot of packages and users can take advantage of source builds without the long wait time. Every second counts and we save a lot of them!

Really Fast to Set Up Builds

A real time sink of shorter builds is the time it takes to create the build environment and extract the needed dependencies. We've seen many variations of build environments from setting up a root image to build from in a minimal chroot or container to full VM builds. For a package like nano the build stage is only a couple of seconds, so taking up to a minute to set up the build would be unacceptable!

One of the fundamental differences with boulder and moss is sharing of installed packages, so a new package only needs to be extracted and cached one time and can be reused for multiple system roots and as many builds as needed. If you've used the package before, you won't have to extract it again. Waiting for packages to extract at the start of a build is now a thing of the past!

A New Parallel Model

There's been many good improvements to speed up build setups such as parallel downloads and using zstd for faster decompression of packages. But underneath you find the process is still sequential, you download the files in parallel, but you have to wait till all files are downloaded before starting to install them. With boulder (and moss), packages will be queued for extraction (caching) as soon as they are downloaded! So by the time you finish downloading the last file, the other files are already installed so you can get started.

Post Build Analysis Also in Parallel

Using the same parallel model we are able to make huge reductions to the time taken in the post build stages as well. At the end of the build, files are scanned so they can be processed by their type as well creating hashes for each file. As an example, for an ELF file we need to separate the debug information and strip the final binary to minimize its size. There's a lot of overhead involved (especially when calling out to external programs like strip), so takes a long time when processing sequentially. The good news is that with boulder you won't have to wait and really cuts down on waiting after the build is complete.

Optimized Toolchain to Make Builds Faster

While this isn't technically boulder, a turbo charged compiler compounds the gains from the parallel pre and post builds. We go to great lengths to build clang with PGO+LTO (and soon with llvm-bolt). With tackling each part of the process we make packaging easy and less time consuming.

Simple Yet Flexible Build Format

The stone.yml packaging format is quite simple, yet flexible when required. Using easy to understand YAML formatting, a stone.yml will be compact for the majority of cases. Here's a simple example of the nano build file:

name        : nano
version     : 5.8
release     : 1
summary     : GNU Text Editor
license     : GPL-3.0-or-later
homepage    : https://www.nano-editor.org/
description : |
        The GNU Text Editor
upstreams   :
    - https://www.nano-editor.org/dist/v5/nano-5.8.tar.xz : e43b63db2f78336e2aa123e8d015dbabc1720a15361714bfd4b1bb4e5e87768c
setup       : |
        %configure
build       : |
        %make
install     : |
        %make_install

The rules of how packages are split into separate -devel packages are taken care of for you. This helps new packagers a lot as there's no need to learn all these packaging rules and reduces the number of mistakes that they make. In rare circumstances you may need to override one of these rules and boulder makes that easy too!

Details of the build are output as manifest files, which track the files contained in each package, the dependencies and the ABI. Being tracked in git it's easy to see the result of any changes to the build.

A Keen Eye On Performance

With the importance of performance in Serpent OS, it is essential that boulder is able to output fast and small packages. One compiler does not always provide the fastest result so boulder provides an option to switch between the LLVM and GNU toolchains (LLVM being the default).

Compatibility and performance are not mutually exclusive. boulder is designed from the outset with a multi-architecture approach. This means we can build the package multiple times for old and newer CPU architectures. A baseline of x86-64-v2 provides compatibility for processors released all the way back in 2010, while the more optimized x86-64-v3+ builds provide greater performance for more recent CPUs that include the advanced capabilities.

One part of the build process that is easy to automate is building with profile guided optimizations (PGO). This involves running a representative workload from a build with instrumentation to learn how it runs and what paths it typically takes. With this information the compiler can learn and build a more efficient program and reduce cache pressure. It does increase the build time quite a lot, but is well worth it in many circumstances. boulder also includes support for two stage context sensitive PGO with clang.

Compiler flags are another way to squeeze out the last bit of performance from package builds. The performance gain in python from adding --no-semantic-interposition was well into double figures, which highlights the benefits from adding performance flags where appropriate. boulder exposes a range of tunables to add (or remove from the defaults) on a per build basis to help facilitate performance testing. But how do you know which options improve performance?

Getting the Most Out of Performance Options

Benchmarking can be a time consuming process, but it doesn't have to be. By integrating benchmarks with the packaging files, they are easily accessible and allows all users to test the performance of packages. It also makes it extremely easy to automate tests within boulder to compare multiple configurations and whether any tuning flags have a positive performance. Benchmarking is more than a one-off process, and regular testing allows us to identify regressions early to make data driven choices.

Automate As Much As Possible

With differences between distros, manually adding package names can be a real hassle. boulder is designed to detect as many build dependencies as possible so that you don't have to! It also reads the build file to determine the build tooling required to complete the build. So if you use the %cmake we already know you'll want the cmake package installed, so you don't need to muck around ensuring every little dependency is included, or having to restart the build because you forgot it. This is also true for dependencies of created packages, where only required dependencies are added. Therefore if a package removes a dependency (which you may not even be aware of), you don't have to remember to remove it.

Check Out These Related Blog Posts: