Performance Corner: Faster Builds, Smaller Packages

Friday, November 5, 2021 - Peter O'Connor (Performance Guy)

Performance Corner is a new series where we highlight to you some changes in Serpent OS that may not be obvious, but show a real improvement. Performance is a broad term that also covers efficiency, so things like making files smaller, making things faster or reducing power consumption. In general things that are unquestionably improvements with little or no downside. While the technical details may be of interest to some, the main purpose is to highlight the real benefit to users and/or developers that will make using Serpent OS a more enjoyable experience. Show me the numbers!

Here we focus on a few performance changes Ikey has been working on to the build process that are showing some pretty awesome results! If you end up doing any source builds, you’ll be thankful for these improvements. Special thanks to ermo for the research into hash algorithms and enabling our ELF processing.

Start From the Beginning

When measuring changes, it’s always important to know where you’re starting from. Here are some results from a recent glibc build, but before these latest changes were incorporated.

Payload: Layout [Records: 5441 Compression: Zstd, Savings: 83.13%, Size: 673.46 KB]
Payload: Index [Records: 2550 Compression: Zstd, Savings: 55.08%, Size: 247.35 KB]
Payload: Content [Records: 2550 Compression: Zstd, Savings: 81.46%, Size: 236.72 MB]
 ==> 'BuildState.Build' finished [4 minutes, 6 secs, 136 ms, 464 μs, and 7 hnsecs]
 ==> 'BuildState.Analyse' finished [21 secs, 235 ms, 300 μs, and 2 hnsecs]
 ==> 'BuildState.ProducePackages' finished [25 secs, 624 ms, 996 μs, and 8 hnsecs]

The build time is a little high, but a lot of that is due to a slow compiler on the host machine. But analysing and producing packages was also taking a lot longer than it needed to.

The Death of moss-jobs in boulder

In testing an equivalent build outside of boulder, the build stages were about 5% faster. Testing under perf, the jobs system was a bit excessive for the needs of boulder, polling for work when we already know the times when parallel jobs would be useful. Removing moss-jobs allowed for simpler codepaths using multiprocessing techniques from the core language. This work is integrated in moss-deps and the excess overhead of the build has now been eliminated.

Before:
 ==> 'BuildState.Build' finished [4 minutes, 6 secs, 136 ms, 464 μs, and 7 hnsecs]
 ==> 'BuildState.Analyse' finished [21 secs, 235 ms, 300 μs, and 2 hnsecs]
After:
[Build] Finished: 3 minutes, 53 secs, 386 ms, 306 μs, and 4 hnsecs
[Analyse] Finished: 8 secs, 136 ms, 22 μs, and 8 hnsecs

The new results reflect a 26s reduction in the overall build time. But only 13s of this relates to the moss-jobs removal. The other major change is making the analyse stage parallel in moss-deps (a key part of why we wanted parallelism to begin with). Decreasing the time from 21.2s to 8.1s is a great achievement despite it doing more work as we’ve also added ELF scanning for dependency information in-between these results.

New Hashing Algorithm

One of the unique features in moss is using hashes for file names which allows full deduplication within packages, the running system, previous system roots and for source builds with boulder. Initially this was hooked up using sha256, but it was proving to be a bit of a slowdown.

Enter xxhash, the hash algorithm by Yann Collet for use in fast decompression software such as lz4 and zstd (and now in many places!). This is seriously fast, with the potential to produce hashes faster than RAM can feed the CPU. The hash is merely used as a unique identifier in the context of deduplication, not a cryptographic verification of origin. XXH3_128bit has been chosen due to it having an almost zero probability of a collision across 10s of millions of files.

The benefit is actually two-fold. First of all, the hash length is halved from sha256, so there’s savings in the package metadata. This shouldn’t be understated as hash data is generally not as compressible as typical text and there are packages with a lot of files! Here the metadata for the Layout and Index payloads has reduced by 232KB! That’s about a 25% reduction with no other changes.

Before:
Payload: Layout [Records: 5441 Compression: Zstd, Savings: 83.13%, Size: 673.46 KB]
Payload: Index [Records: 2550 Compression: Zstd, Savings: 55.08%, Size: 247.35 KB]
After:
Payload: Layout [Records: 5441 Compression: Zstd, Savings: 86.66%, Size: 522.97 KB]
Payload: Index [Records: 2550 Compression: Zstd, Savings: 60.02%, Size: 165.75 KB]

Compressed this turns out to be about a 89KB reduction in the package size. For larger packages, this probably doesn’t mean much but could help a lot more with delta packages. For deltas, we will be including the full metadata of the Layout and Index payloads, so the difference will be more significant there.

The other benefit of course is the speed and the numbers speak for themselves! A further 6.4s reduction in build time removing most of the delay at the end of the build for the final package. This will also improve speeds for caching or validating a package.

Before:
 ==> 'BuildState.Analyse' finished [21 secs, 235 ms, 300 μs, and 2 hnsecs]
After:
[Analyse] Finished: 1 sec, 688 ms, 681 μs, and 8 hnsecs

With these changes combined, building packages can take 12x less time in the analyse stage, while reducing the size of the metadata and the overall package. We do expect the analyse time to increase in future as we add more dependency types, debug handling and stripping, but with the integrated parallel model, we can minimize the increase in time.

We’re Not Done Yet

The first installment of Performance Corner shows some great wins to the Serpent OS tools and architecture. This is just the beginning and there will likely be a follow up soon (you may have also noticed that it takes too long to make the packages), and there’s a couple more tweaks to further decrease the size of the metadata. Kudos to Ikey for getting these implemented!