Moss Unlocked

Saturday, March 20, 2021 - Ikey Doherty (BDFL)
  

Well, it’s not all doom and gloom these days. We’ve actually made some significant progress in the last few days, so it seems a good time to share a progress update.

Extracting content from moss archives

moss can now extract

Oh yeah, that totally happened. So, we can now successfuly build moss packages from boulder and then extract them to disk once again with moss. This might sound totally uninteresting, but it demonstrates that our format is actually working as intended.

Admittedly the code is super rough within moss and somewhat proof of concept, however we’re able to extract the contents of the moss archive and rebuild the layout on disk.

What makes extraction difficult?

Well, quirky new format for one. A moss archive currently consists of 4 “payloads”, or major sections:

  • MetaPayload

    Contains all package information, with strongly typed keys.

  • IndexPayload

    Contains the IDs of all unique files (hash) and their offsets within the ContentPayload

  • LayoutPayload

    A sequence of specialised structs describing the final “layout” of the package on disk, with attributes, paths, and for regular files, the ID of the file in the ContentPayload to build this file from.

  • ContentPayload

    A binary blob containing every unique file from the package, in an order described by the IndexPayload. The files are stored sequentially with no gaps.

Additionally, each payload is independently compressed using zstd. In order to extract files to disk, we must first decompress ContentPayload to a temporary file. Next, we blit each file from the “megablob” to the cache store, using the IndexPayload to understand the offsets. Finally, we apply the instructions in LayoutPayload to construct the final layout on disk, hardlinking the cache assets into their final locations, setting attributes, etc.

Net result? Deduplication on a per package basis, and a system-wide deduplication policy allowing sharing of identical assets on disk between multiple packages. This will also power our core update mechanism, whereby each update is atomic, and is simply the difference on disk from the previous version, permitting a powerful rollback mechanism.

Room for improvement

Extracting multiple moss archives

There are areas where we’re doing things inefficiently, and we’ll certainly improve that in future revisions of the important. For example, IndexPayload actually wastes some bytes by storing redundant information that can be calculated at runtime. Additionally, we want to use the zstd C APIs directly to gain the level of control we actually need. We’re also going to wrap the copy_file_range syscall to make extraction of the content payload more efficient and not rely on userspace copies.

However, we’re working towards a prealpha, and some inefficiencies are OK. Our first port of call will be a prealpha image constructed from .stone files, produced by boulder, installed by moss. This will validate our toolchain and tooling and serve as a jumping off point for the project.

Stay tuned, there is a whole bunch of awesome coming now that moss is officially unlocked and progressing.

And finally

I want to thank everyone who is currently supporting the project. I also want to personally thank you for your understanding of the setbacks of real life, given the difficult times myself and my family have been going through. I hope it is clear that I remain commited to the project and it’s future, which is why we’re transparently run and funded via OpenCollective.

Despite the rough times, work continues, and awesome people join our ranks on a regular basis. Stability is on the immediate horizon and work with Serpent OS grows exponentially. You can be part of our journey, and help us build an amazing community and project that outlives us all.