Skip to content

Blog

Unpacking the Build Process: Part 2

Part 2 looks at the core of the build process, turning the source into compiled code. In Serpent OS this is handled by our build tool boulder. It is usually the part of the build that takes the longest, so where speed ups have the most impact. How long it takes is largely down to the performance of your compiler and what compile flags you are building with.

This post follows on from Part 1.

Turning Source into Compiled Code

The steps for compiling code are generally quite straight-forward: - Setting up the build (cmake, configure, meson) - Compiling the source (in parallel threads) - Installing the build into a package directory

This will build software compiled against packages installed on your system. It's a bit more complicated when packaging as we first set up an environment to compile in (Part 1). But even then you have many choices to make and each can have an impact on how long it takes to compile the code. Do you build with Link Time Optimizations (LTO) or Profile Guided Optimizations (PGO), do you build the package for performance or for the smallest size? Then there's packages that benefit considerably from individual tuning flags (like -fno-semantic-interposition with python). With so many possibilities, boulder helps us utilize them through convenient configuration options.

What Makes boulder so Special?

As I do a lot of packaging and performance tuning, boulder is where I spend most of my time. Here are some key features that boulder brings to make my life easier.

  • Ultimate control over build C/CXX/LDFLAGS via the tuning key
  • Integrated 2 stage context sensitive PGO builds with a single line workload
  • Able to switch between gnu and llvm toolchains easily
  • Rules based package creation
  • Control the extraction locations for multiple upstream tarballs

boulder will also be used to generate and amend our stone.yml files to take care of as much as possible automatically. This is only the beginning for boulder as it will continue to be expanded to learn new tricks to make packaging more automated and able to bring more information to help packagers know when they can improve their stone.yml, or alert them that something might be missing.

Serpent OS is focused on the performance of produced packages, even if that means that builds will take longer to complete. This is why we have put in significant efforts to speed up the compiler and setup tools in order to offset and minimize the time needed to enable greater performance.

Why do You Care so Much About Setup Time?

My initial testing focused on the performance of clang as well as the time taken to run cmake and configure. This lays the foundation for all future work in expanding the Serpent OS package archives at a much faster pace. On the surface, running cmake can be a small part of the overall build. However, it is important in that it utilizes a single thread, so is not sped up by adding more CPU cores like the compile time is. With a more compile heavy build, our highly tuned compiler can build the source in around 75s. So tuning the setup step to run in 5s rather than 10s actually reduces the overall build time by an additional 6%!

There are many smaller packages where the setup time is an even higher proportion of the overall build and becomes more relevant as you increase the numbers of threads on the builder. For example, when building nano on the host, the configure step takes 13.5s, while the build itself takes only 2.3s, so there's significant gains to be had from speeding up the setup stage of a build (which we will absolutely be taking advantage of!).

A Closer Look at the clang Compiler's Performance

A first cut of the compiler results were shared earlier in Initial Performance Testing, and given the importance to overall build time, I've been taking a closer look. In the post I said that "At stages where I would have expected to be ahead already, the compile performance was only equal" and now I have identified the discrepancy.

I've tested multiple configurations for the clang compiler and noticed that changing the default standard C++ library makes a difference to the time of this particular build. The difference in the two runs is compiling llvm-ar with the LLVM libraries of compiler-rt/libc++/libunwind or the GNU libraries of libgcc/libstdc++. And just to be clear, this is increasing the time of compiling llvm-ar with libc++ vs libstdc++ and not to do with the performance of either library. The clang compiler itself is built with libc++ in both cases as it produces a faster compiler.

{{}}

Test using clang Serpent LLVM libs Serpent GNU libs Host
cmake LLVM 5.89s 5.67s 10.58s
Compile -j4 llvm-ar 126.16s 112.51s 155.32s
configure gettext 36.64s 36.98s 63.55s

The host now takes 38% longer than the Serpent OS clang when building with the same GNU libraries and is much more in line with my expectations. Next steps will be getting bolt and perf integrated into Serpent OS to see if we can shave even more time off the build.

What remains unclear is whether this difference is due to something specifically in the LLVM build or whether it would translate to other C++ packages. I haven't noticed a 10% increase in build time when performing the full compiler build with libc++ vs libstdc++.

Unpacking the Build Process: Part 1

While the build process (or packaging as it's commonly referred to) is largely hidden to most users, it forms a fundamental and important aspect to the efficiency of development. In Serpent OS this efficiency also extends to users via source based builds for packages you may want to try/use that aren't available as binaries upstream.

The build process can be thought of in three distinct parts, setting up the build environment, compiling the source and post build analysis plus package creation. Please note that this process hasn't been finalized in Serpent OS so we will be making further changes to the process where possible.

Setting up the Build Environment

Some key parts to setting up the build environment:

  • Downloading packages needed as dependencies for the build
  • Downloading upstream source files used in the build
  • Fetching and analyzing the latest repository index
  • Creating a reproducible environment for the build (chroot, container or VM for example)
  • Extracting and installing packages into the environment
  • Extracting tarballs for the build (this is frequently incorporated as part of the build process instead)

While the focus of early optimization work has been on build time performance, there's more overhead to creating packages time than simply compiling code. Now the compiler is in a good place, we can explore the rest of the build process.

Packaging is More than just Compile Time

There's been plenty of progress in speeding up the creation of the build environment such as parallel downloads to reduce connection overhead and using zstd for the fast decompression of packages. But there's more that we can do to provide an optimal experience to our packagers.

Some parts of the process are challenging to optimize as while you can download multiple files at once to ensure maximum throughput, you are still ultimately limited by your internet speed. When packaging regularly (or building a single package multiple times), downloaded files are cached so become a one off cost. One part we have taken particular interest in speeding up is extracting and installing packages into the environment.

{{}}

The Dynamic Duo: boulder and moss

Installing packages to a clean environment can be the most time consuming part of setting up the build (excluding fetching files which is highly variable). Serpent OS has a massive advantage with the design of moss where packages are cached (extracted on disk) and ready to be used by multiple roots, including the creation of clean build environments for boulder. Having experienced a few build systems in action, setting up the root could take quite some time with a large number of dependencies (even getting over a minute). moss avoids the cost of extracting packages entirely every build by utilizing its cache!

There are also secondary benefits to how moss handles packages via its caches where disk writes are reduced by only needing to extract packages a single time. But hang on, won't you be using tmpfs for builds? Of course we will have RAM builds as an option and there are benefits there too! When extracting packages to the RAM disk, it consumes memory which can add up to more than a GB before the build even begins. moss allows for us to start with an empty tmpfs so we can perform larger builds before exhausting the memory available on our system.

Another great benefit is due to the atomic nature of moss. This means that we can add packages to be cached as soon as they're fetched while waiting for the remaining files to finish downloading (both for boulder and system updates). Scheduling jobs becomes much more efficient and we can have the build environment available in moments after the last file is downloaded!

moss allows us to eliminate one of the bigger time sinks in setting up builds, enabling developers and contributors alike to be more efficient in getting work done for Serpent OS. With greater efficiency it may become possible to provide a second architecture for older machines (if the demand arises).

Part 1?

Yes, there's plenty more to discuss so there will be more follow up posts showing the cool features Serpent OS is doing to both reduce the time taken to build packages and in making packages easier to create so stay tuned!

A Rolling Boulder Gathers No Moss

We actually did it. Super pleased to announce that moss is now capable of installing and removing packages. Granted, super rough, but gotta start somewhere right?

Transactional system roots + installation in moss

OK let's recap. A moss archive is super weird, and consists of multiple containers, or payloads. We use a strongly typed binary format, per-payload compression (Currently zstd), and don't store files in a typical archive fashion.

Instead a .stone file (moss archive) has a Content Payload, which is a compressed "megablob" of all the unique files in a given package. The various files contained within that "megablob" are described in an IndexPayload, which simply contains some IDs and offsets, acting much like a lookup table.

That data alone doesn't actually tell us where files go on the filesystem when installed. For that, we have a specialist Layout Payload, encoding the final layout of the package on disk.

As can be imagined, the weirdness made it quite difficult to install in a trivial fashion.

Databases

Well, persistence really. Thanks to RocksDB and our new moss-db project, we can trivially store information we need from each package we "precache". Primarily, we store full system states within our new StateDB, which at present is simply a series of package ID selections grouped by a unique 64-bit integer.

Additionally we remember the layouts within the LayoutDB so that we can eventually recreate said layout on disk.

Precaching

Before we actually commit to an install, we try to precache all of the stone files in our pool. So we unpack the content payload ("megablob"), split it into various unique files in the pool ready for use. At this point we also record the Layouts, but do not "install" the package to a system root.

Blitting

This is our favourite step. When our cache is populated, we gather all relevant layouts for the current selections, and then begin applying them in a new system root. All directories and symlinks are created as normal, whereas any regular file is hardlinked from the pool. This process takes a fraction of a second and gives us completely clean, deduplicated system roots.

Currently these live in /.moss/store/root/$ID/usr. To complete the transaction, we update /usr to point to the new usr tree atomically assuming that a reboot isn't needed. In future, boot switch logic will update the tree for us.

Removal

Removal is quite the same as installation. We simply remove the package IDs from the new state selections (copied from the last state) and blit a new system root, finally updating the atomic /usr pointer.

Removal

Tying it all together

We retain classic package management traits such as having granular selections, multiple repositories, etc, whilst sporting advanced features like full system deduplication and transactions/rollbacks.

When we're far enough along, it'll be possible to boot back to the last working transaction without requiring an internet connection. Due to the use of pooling and hardlinks, each transaction tree is only a few KiB, with files shared between each transaction/install.

On the list..

We need some major cleanups, better error handling, logging, timed functions, and an eventloop driven process to allow parallel fetching/precaching prior to final system rootfs construction.

It's taken us a very long time to get to this point, and there is still more work to be done. However this is a major milestone and we can now start adding features and polish.

Once the required features are in place, we'll work on the much needed pre alpha ISO :) If you fancy helping us get to that stage quicker, do check out our OpenCollective! (We won't limit prealpha availability, don't worry :))

Moss DB Progress

I'll try to make this update as brief as I can but it's certainly an important one, so let's dive right into it. The last few weeks have been rough but work on our package manager has still been happening. Today we're happy to reveal another element of the equation: moss-db.

{{}}

moss-db is an abstract API providing access to simplistic "Key Value" stores. We had initially used some payload based files as databases but that introduced various hurdles, so we decided to take a more abstract approach to not tie ourselves to any specific implementation of a database.

Our main goal with moss-db is to encapsulate the RocksDB library, providing sane, idiomatic access to a key value store.

High level requirements

At the highest level, we needed something that could store arbitrary keys and values, grouped by some kind of common key (commonly known as "buckets"). We've succeeded in that abstraction, which also required us to fork a rocksdb-binding to add the Transform APIs required.

Additionally we required idiomatic range behaviours for iteration, as well as generic access patterns. To that affect we can now foreach a bucket, pipe it through the awesomely powerful std.algorithm APIs, and automatically encode/decode keys and values through our generic APIs when implementing the mossdbEncode() and mossdbDecode() functions for a specific type.

In a nutshell, this was the old, ugly, hard way:

    /* old, hard way */
    auto nameZ = name.toStringz();
    int age = 100;
    ubyte[int.sizeof] ageEncoded = nativeToBigEndian(ageEncoded);
    db.setDatum(cast(ubyte[]) (nameZ[0 .. strlen(nameZ)]), ageEncoded);

And this is the new, shmexy way:

    db.set("john", 100);
    db.set("user 100", "bobby is my name");

    auto result = db.get!int("john");
    if (result.found)
    {
        writeln(result.value);
    }

    auto result2 = db.get!string("user 100");
    if (result2.found)
    {
        writeln(result2.value);
    }

It's quite easy to see the new API lends itself robustly to our needs, so that we may implement stateful, strongly typed databases for moss.

Next Steps

Even though some APIs in moss-db may still be lacking (remove, for example) we're happy that it can provide the foundation for our next steps. We now need to roll out the new StateDB, MetaDB and LayoutDB, to record system states, package metadata, and filesystem layout information, respectively.

With those 3 basic requirements in place we can combine the respective works into installation routines. Which, clearly, warrants another blog post ... :)

For now you can see the relevant projects on our GitLab project.

Initial Performance Testing

With further progress on boulder, we can now build native stone packages with some easy tweaks such as profile guided optimizations (PGO) and link time optimizations (LTO). That means we can take a first look at what the performance of the first cut of Serpent OS shows for the future. The tests have been conducted using benchmarking-tools with Serpent OS measured in a chroot on the same host with the same kernel and config.

One of the key focuses for early in the project is on reducing build time. Every feature can either add or subtract from the time it takes to produce a package. With a source/binary hybrid model, users will greatly benefit from the faster builds as well. In terms of what I've targeted in these tests is the performance of clang and testing some compiler flag options on cmake.

Clang Shows its Promise

clang has always been a compiler with a big future. The performance credentials have also been improving each release and are now starting to see it perform strongly against its GNU counterpart. It is common to hear that clang is slow and produces less optimized code. I will admit that most distros provide a slow build of clang, but that will not be the case in Serpent OS.

It is important to note that in this comparison the Host distro has pulled in some patches from LLVM-13 that greatly improve the performance of clang. Prior to this, their tests actually took 50% longer for cmake and configure but only 10% longer for compiling. boulder does not yet support patching in builds so the packages are completely vanilla.

Test using clang Serpent Host Difference
cmake LLVM 5.89s 10.58s 79.7%
Compile -j4 llvm-ar 126.16s 155.32s 23.1%
configure gettext 36.64s 63.55s 73.4%

Based on the results during testing, the performance of clang in Serpent OS still has room to improve and was just a quick tuning pass. At stages where I would have expected to be ahead already, the compile performance was only equal (but cmake and configure were still well ahead).

GCC Matters Too!

While clang is the default compiler in Serpent OS, there may be instances where the performance is not quite where it could be. It is common to see software have more optimized code paths where they are not tested with clang upstream. As an example, here's a couple of patches in flac (1, 2) that demonstrate this being improved. Using benchmarking-tools, it is easy to see where gcc and clang builds are running different functions via perf results.

In circumstances where the slowdown is due to hitting poor optimization paths in clang, we always have the option to build packages using gcc, where the GNU toolchain is essential for building glibc. Therefore having a solid GNU toolchain is important but small compile time improvements won't be noticed by users or developers as much.

Test using gcc Serpent Host Difference
cmake LLVM 7.00s 7.95s 13.6%
Compile llvm-ar 168.11s 199.07s 18.4%
configure gettext 45.45s 51.93s 14.3%

An OS is More Than Just a Compiler

While the current bootstrap exists only as a starting point for building the rest of Serpent OS, there are some other packages we can easily test and compare. Here's a summary of those results.

Test Serpent Host Difference
Pybench 1199.67ms 1024.33ms -14.6%
xz Compress Kernel (-3 -T1) 42.67s 46.57s 9.1%
xz Compress Kernel (-9 -T4) 71.25s 76.12s 6.8%
xz Decompress Kernel 8.03s 8.18s 1.9%
zlib Compress Kernel 12.60s 13.17s 4.5%
zlib Decompress Kernel 5.14s 5.21s 1.4%
zstd Compress Kernel (-8 -T1) 5.77s 7.06s 22.3%
zstd Compress Kernel (-19 -T4) 51.87s 66.52s 28.3%
zstd Decompress Kernel 2.90s 3.08s 6.3%

State of the Bootstrap

From my experiences with testing the bootstrap, it is clear there's some cobwebs in there that require some more iterations of the toolchain. There also seems to be some slowdowns in not including all the dependencies of some packages. Once more packages are included, naturally all the testing will be redone and help influence the default compiler flags of the project.

It's not yet clear the experience of using libc++ vs libstdc++ with the clang compiler. Once the cobwebs are out and Serpent OS further developed, the impact (if any) should become more obvious. There are also some parts not yet included in boulder such as stripping files, LTO and other flags by default that will speed up loading libraries. At this stage this is deliberate until integrating outputs from builds (such as symbol information).

But this provides an excellent platform to build out the rest of the OS. The raw speed of the clang compiler will make iterating and expanding the package set a real joy!

Hang On, What's Going on With Python?

Very astute of you to notice! python in its current state is an absolute minimal build of python in order to run meson. However, I did an analyze run in benchmarking-tools where it became obvious that they were doing completely different things.

Apples and oranges comparison

For now I'll simply be assuming this will sort itself out when python is built complete with all its functionality. And before anyone wants to point the finger at clang, you get the same result with gcc.

Boulder Keeps On Rolling

Squirrelling away in the background has been some great changes to bring boulder closer to its full potential. Here's a quick recap of some of the more important ones.

Boulder hard at work

Key Changes to Boulder

  • Fixed a path issue that prevented manifests from being written for 32bit builds
  • Added keys to control where the tarballs are extracted to
  • This results in a greatly simplified setup stage when using multiple upstreams
  • More customizations to control the final c{,xx}flags exported to the build
  • Added a key to run at the start of every stage so definitions can be exported easily in the stone.yml file
  • Fixed an issue where duplicate hash files were being included in the Content Payload
  • This resulted in reducing the Content Payload size by 750MB of a glibc build with duplicate locale files
  • Finishing touches on profile guided optimization (PGO) builds - including clang's context-sensitive PGO
  • Fixed a few typos in the macros to make it all work correctly
  • Profile flags are now added to the build stages
  • Added the llvm profile merge steps after running the workload
  • Recreate a clean working directory at the start of each PGO phase

With all this now in place, the build stages of boulder are close to completion. But don't worry, there's plenty more great features to come to make building packages for Serpent OS simple, flexible and performant. Next steps will be testing out these new features to see how much they can add to the overall stage4 performance.

Let There Be Databases

We haven't been too great on sharing progress lately, so welcome to an overdue update on timelines, progress, and database related shmexiness.

Emerging DB design

OK, so you may remember moss-format, our module for reading and writing moss binary archives. It naturally contains much in the way of binary serialisation support, so we've extended the format to support "database" files. In reality, they are more like tables binary encoded into a single file, identified by a filepath.

The DB archives are currently stored without compression to ensure 0-copy mmap() access when loading from disk, as a premature optimisation. This may change in future if we find the DB files taking up too much disk space.

So far we've implemented a "StateMetaDB", which stores metadata on every recorded State on the system, and right now I'm in the progress of implementing the "StateEntriesDB", which is something akin to a binary encoded dpkg selections file with candidate specification reasons.

Next on the list is the LayoutsDB (file manifests) and the CacheDB, for recording refcounts of every cached file in the OS pool.

Integration with Serpent ECS

An interesting trial we're currently implementing is to hook the DB implementation up to our Entity Component system from the Serpent Engine, in order to provide fast, cache coherent, in memory storage for the DB. It's implemented using many nice DLang idioms, allowing the full use of std.algorithm APIs:

    auto states()
    {
        import std.algorithm : map;

        auto view = View!ReadOnly(entityManager);
        return view.withComponents!StateMetaArchetype
            .map!((t) => StateDescriptor(t[1].id, t[3].name, t[4].description,
                    t[1].type, t[2].timestamp));
    }
    ...

    /* Write the DB back in ascending numerical order */
    db.states
        .array
        .sort!((a, b) => a.id < b.id)
        .each!((s) => writeOne(s));

Tying it all together

Ok, so you can see we need basic DB types for storing the files for each moss archive, plus each cache and state entry. If you look at the ECS code above, it becomes quite easy to imagine how this will impact installation of archives. Our new install code will simply modify the existing state, cache the incoming package, and apply the layout from the DB to disk, before committing the new DB state.

In essence, our DB work is the current complex target, and installation is a <50 line trick tying it all together.

    /* Psuedocode */

    State newState...

    foreach (pkgID; currentState.filter!((s) => s.reason == SelectionReason.Explicit))
    {
        auto fileSet = layoutDB.get(pkgID);
        fileSet.array.sort!((a, b) => a.path < b.path).each!((f) => applyLayout(f));
        /* Record into new state */
        ...
    }

Til next time -

Ikey

Moss Unlocked

Well, it's not all doom and gloom these days. We've actually made some significant progress in the last few days, so it seems a good time to share a progress update.

Extracting content from moss archives

moss can now extract

Oh yeah, that totally happened. So, we can now successfully build moss packages from boulder and then extract them to disk once again with moss. This might sound totally uninteresting, but it demonstrates that our format is actually working as intended.

Admittedly the code is super rough within moss and somewhat proof of concept, however we're able to extract the contents of the moss archive and rebuild the layout on disk.

What makes extraction difficult?

Well, quirky new format for one. A moss archive currently consists of 4 "payloads", or major sections:

  • MetaPayload

    Contains all package information, with strongly typed keys.

  • IndexPayload

    Contains the IDs of all unique files (hash) and their offsets within the ContentPayload

  • LayoutPayload

    A sequence of specialised structs describing the final "layout" of the package on disk, with attributes, paths, and for regular files, the ID of the file in the ContentPayload to build this file from.

  • ContentPayload

    A binary blob containing every unique file from the package, in an order described by the IndexPayload. The files are stored sequentially with no gaps.

Additionally, each payload is independently compressed using zstd. In order to extract files to disk, we must first decompress ContentPayload to a temporary file. Next, we blit each file from the "megablob" to the cache store, using the IndexPayload to understand the offsets. Finally, we apply the instructions in LayoutPayload to construct the final layout on disk, hardlinking the cache assets into their final locations, setting attributes, etc.

Net result? Deduplication on a per package basis, and a system-wide deduplication policy allowing sharing of identical assets on disk between multiple packages. This will also power our core update mechanism, whereby each update is atomic, and is simply the difference on disk from the previous version, permitting a powerful rollback mechanism.

Room for improvement

Multiple

There are areas where we're doing things inefficiently, and we'll certainly improve that in future revisions of the important. For example, IndexPayload actually wastes some bytes by storing redundant information that can be calculated at runtime. Additionally, we want to use the zstd C APIs directly to gain the level of control we actually need. We're also going to wrap the copy_file_range syscall to make extraction of the content payload more efficient and not rely on userspace copies.

However, we're working towards a prealpha, and some inefficiencies are OK. Our first port of call will be a prealpha image constructed from .stone files, produced by boulder, installed by moss. This will validate our toolchain and tooling and serve as a jumping off point for the project.

Stay tuned, there is a whole bunch of awesome coming now that moss is officially unlocked and progressing.

And finally

I want to thank everyone who is currently supporting the project. I also want to personally thank you for your understanding of the setbacks of real life, given the difficult times myself and my family have been going through. I hope it is clear that I remain committed to the project and it's future, which is why we're transparently run and funded via OpenCollective.

Despite the rough times, work continues, and awesome people join our ranks on a regular basis. Stability is on the immediate horizon and work with Serpent OS grows exponentially. You can be part of our journey, and help us build an amazing community and project that outlives us all.

Optimising Package Distribution

Getting updates as fast as possible to users has made deltas a popular and sought after feature for distributing packages. Over the last couple of days, I've been investigating various techniques we can look at to support deltas in moss.

Trade-offs Between Packages and Deltas

Minimising the size of updates is particularly valuable where files are downloaded many times and even better if they're updated infrequently. With a rolling release, packages will be updated frequently, so creating deltas can become resource intensive, especially if supporting updates over a longer period of time. Therefore it's important to get the right balance between compression speed, decompression memory and minimising file size.

Package priorities How best to meet these needs
Developers Creation speed Quickly created packages
Users File size and update speed Size minimised deltas

From the users point of view, minimising file size and upgrade time are important priorities, but for a developer, the speed at which packages are created and indexed is vital to progression. Deltas are different to packages in that they aren't required immediately, so there's minimal impact in taking longer to minimise their size. By getting deltas right, we can trade-off the size of packages to speed up development, while users will not be affected and fetch only size optimised deltas.

Test Case - QtWebEngine

QtWebEngine provides a reasonable test case where the package is a mix of binaries, resources and translations, but above average in size (157.3MB uncompressed). The first trade-off for speed over size has already been made by incorporating zstd in moss over xz, where even with max compression zstd is already 5.6% larger than using xz. This is of course due to the amazing decompression speeds where zstd is magnitudes faster.

Compression levels with zstd

With maximum compression, large packages can take over a minute to compress. With a moderate increase in size, one can reduce compression time by 2-10x. While making me happier as a developer, it does create extra network load during updates.

Full Package zstd -16 -T0 zstd -16 zstd -19 -T0 zstd -19 zstd -22 xz -9
Time to create 5.4s 26.8s 27.8s 56.0s 70.6s 66.5s
Size of package 52.6MB 52.6MB 49.2MB 49.2MB 48.4MB 45.9MB

Deltas to the Rescue!

There are two basic methods for deltas. One simple method is to include only files that have been changed since the earlier release. With reproducible builds, it is typical to create the same file from the same inputs. However, with a rolling release model, files will frequently have a small change from dependency changes and new versions of the package itself. In these circumstances the delta size starts to get closer to the full package anyway. As a comparison to other delta techniques, this approach resulted in a 38.2MB delta as it was a rebuild of the same version at a different time so the resources and translations were unchanged (and therefore omitted from the delta).

An alternative is a binary diff, which is a significant improvement when files have small changes between releases. bsdiff has long been used for this purpose and trying it out (without knowing much about it) I managed to create a delta of 33.2MB, not a bad start at all.

To highlight the weakness of the simple method, when you compare the delta across a version change, the simple delta was only a 7% reduction of the full package (as most files have changed), while using an optimal binary diff, it still managed to achieve a respectable 31% size reduction.

A New Contender

While looking into alternatives, I happened to stumble across a new feature in zstd which can be used to create deltas. As we already use zstd heavily it should make integration easier. --patch-from allows zstd to use the old uncompressed package as a dictionary to create the newer package. In this way common parts between the releases will be reused in order to reduce the file size. Playing around I quickly achieved the same size as bsdiff, and with a few tweaks was able to further reduce the delta by a further 23.5%! The best part is that it has the same speedy decompression as zstd, so it will recreate most packages from deltas in the blink of an eye!

Delta only changed files bsdiff zstd -19 zstd -22 zstd -22 --zstd=chainLog=30
Time to create 60.8s 153.0s 85.5s 111.6s 131.8s
Size of delta 38.2MB 33.2MB 33.3MB 28.5MB 25.4MB

Next Steps

There's certainly a lot of information to digest, but the next step is to integrate a robust delta solution into the moss format. I really like the zstd approach, where you can tune for speed with an increase in size if desired. With minimising on delta size, users can benefit from smaller updates while developers can benefit from faster package creation times.

Some final thoughts for future consideration:

  • zstd has seen many improvements over the years, so I believe that ratios and performance will see incremental improvements over time. Release 1.4.7 already brought significant improvements to deltas (which are reflected in this testing).
  • The highest compression levels (--ultra) are single threaded in zstd, so delta creation can be done in parallel to maximise utilisation.
  • Over optimising the tunables can have a negative impact on both speed and size. As an example, --zstd=targetLength=4096 did result in a 2KB reduction in size at the same speed, but when applied to different inputs (kernel source tree), it not only made it larger by a few hundred bytes, but added 4 minutes to delta creation!
  • Memory usage of applying deltas can be high for large packages (1.74GB for the kernel source tree) as it has to ingest the full size of the original uncompressed package. It is certainly possible to split up payloads with some delta users even creating patches on a per file basis. It is a bit more complicated when library names change version numbers each release with only the SONAME symlink remaining unchanged.
  • There's always the option to repack packages at higher compression levels later (when deltas are created). This solves getting the package 'live' ASAP and minimises the size (eventually), but adds some complication.

Moss Format: Read Write Support

It's been 8 days since our last blogpost and a lot of development work has happened in that time. Specifically, we've completely reworked the internals of the moss-format module to support read/write operations.. Which means installation is coming soon (TM)

Development work on moss-format

So, many commits have been made to the core repositories, however the most important project to focus on right now is moss-format, which we used to define and implement our binary and source formats. This module is shared between boulder, our build tool, and moss, our package manager.

We've removed the old enumeration approach from the Reader class, instead requiring that it processes data payloads in-place, deferring reading the content payload streams. We've also enforced strong typing to allow safe and powerful APIs:

        import moss.format.binary.payload.meta : MetaPayload;

        auto reader = new Reader(File(argv[0], "rb"));
        auto metadata = reader.payload!MetaPayload();

Right now we can read and write our MetaPayload from and to the stream, allowing us to encode & decode the metadata associated with the package, with strong type information (i.e. Uint64, String, etc.)

Next Steps

We need to restore the IndexPayload, LayoutPayload and ContentPayload definitions. The first two are simply data payloads and will largely follow the design of the newly reimplemented MetaPayload. Then we restore ContentPayload support, and this will allow the next steps: unpack, install.

Many of the babysteps required are done now, which power our binary format. The design of the API is done in a way which will allow powerful manipulation via the std.algorithm and std.range APIs, enabling extremely simple and reliable installation routines.

It might seem odd that we've spent so much time on the core format, however I should point out that the design of the format is central to the OS design. Our installation routine is not unpacking of an archive.

With our binary format, the stream contains multiple PayloadHeaders, with fixed lengths, type, tag, version, and compression information. It is up to each Payload implementation to then parse the binary data contained within. Currently our Payloads are compressed using ZSTD, though other compression algorithms are supported.

So, we have the MetaPayload for metadata. Additionally, we encode all unique files in the package in a single compressed payload, the ContentPayload. The offset to each unique file (by hash) is stored within the IndexPayload, and the filesystem layout is encoded as data within the LayoutPayload.

In order to unpack a moss package, the ContentPayload blob will be decompressed to a temporary location. Then, each struct within the IndexPayload can be used to copy portions (copy_file_range) of the blob to the cache store for permanence. We skip each Index if its already present within the cache. Finally, the target filesystem is populated with a view of all installed packages using the LayoutPayload structs, creating a shallow installation using hardlinks and directories.

The net result is deep deduplication, atomic updates, and flexibility for the user. Once we add transactions it'll be possible to boot an older version of the OS using the deduplication capabilities for offline recovery. Additionally there is no requirement for file deletion, rename or modification for an update to succeed.

Nutshell

Huge progress. Major excitement. Such wow. Soon alphas.