Moss Format Defined

The core team have been hard at work lately implementing the Moss package manager. We now have an initial version of the binary format that we’re happy with, so we thought we’d share a progress update with you.

Development work on moss

Explaining the format

Briefly, the binary container format consists of 4 payloads:

  • Meta (Information on the package)
  • Content (a binary blob containing all files)
  • Index (indices to files within the binary blob)
  • Layout (How to apply the files to disk)

Each payload is verified internally using a CRC64-ISO, and contains basic information such as the length of the payload both compressed and uncompressed, the compression algorithm used (zstd and zlib supported) as well as the type and version of the payload. All multiple-byte values are stored in Big Endian order (i.e. Network Byte Order).

Payloads

Internally the representation of a Payload is defined as a 32-byte struct:

    @autoEndian uint64_t length = 0; /* 8 bytes */
    @autoEndian uint64_t size = 0; /* 8 bytes */
    ubyte[8] crc64 = 0; /* CRC64-ISO */
    @autoEndian uint32_t numRecords = 0; /* 4 bytes */
    @autoEndian uint16_t payloadVersion = 0; /* 2 bytes  */
    PayloadType type = PayloadType.Unknown; /* 1 byte  */
    PayloadCompression compression = PayloadCompression.Unknown; /* 1 byte */

We merge all unique files in a package rootfs into the Content payload, and compress that using zstd. The offsets to each unique file (i.e. the sha256sum) are stored within the Index payload, allowing us to extract relevant portions from the “megablob” using copy_file_range().

These files will become part of the system hash store, allowing another level of deduplication between all system packages. Finally, we use the Layout payload to apply the layout of the package into a transactional rootfs. This will define paths, such as /usr/bin/nano, along with permissions, types, etc. All regular files will actually be created as hard links from the hash store, allowing deduplication and snapshots.

The Meta payload consists of a number of records, each with strongly defined types (such as String or Int64) along with the tag, i.e. Name or Summary. The entire format is binary to ensure greater resilience and a more compact representation. For example, each metadata key is only 8 bytes.

    @autoEndian uint32_t length; /** 4 bytes per record length*/
    @autoEndian RecordTag tag; /** 2 bytes for the tag */
    RecordType type; /** 1 byte for the type */
    ubyte[1] padding = 0;

TLDR that for me…

Binary format that is self deduplicating at several layers, permitting fast transactional operations.

Up Next

Before we work any more on the binary format, we now need to pivot to the source format. Our immediate goal is to now have moss actually build packages from source, with resulting .stone packages. Once this step is complete we can work on installation, upgrades, repositories, etc, and race to becoming a self hosting distribution.

Note, the format may still change before it goes into production, as we encounter more cases for optimisation or improvement.