SPIR-V Compression: SMOL vs MARK

Two years ago I did a small utility to help with Vulkan (SPIR-V) shader compression: SMOL-V (see blog post or github repo).

It is used by Unity, and looks like also used by some non-Unity projects as well (if you use it, let me know! always interesting to see where it ends up at).

Then I remembered the github issue where SPIR-V compression was discussed at. It mentioned that SPIRV-Tools was getting some sort of “compression codec” (see comments) and got closed as “done”, so I decided to check it out.

SPIRV-Tools compression: MARK-V

SPIRV-Tools repository, which is a collection of libraries and tools for processing SPIR-V shaders (validation, stripping, optimization, etc.) has a compressor/decompressor in there too, but it’s not advertised much. It’s not built by default; and requires passing a SPIRV_BUILD_COMPRESSION=ON option to CMake build.

The sources related to it are under source/comp and tools/comp folders; and compression is not part of the main interfaces under include/spirv-tools headers; you’d have to manually include source/comp/markv.h. The build also produces a command line executable spirv-markv that can do encoding or decoding.

The code is well commented in terms of “here’s what this small function does”, but I didn’t find any high level description of “the algorithm” or properties of the compression. I see that it does something with shader instructions; there’s some Huffman related things in there, and large tables that are seemingly auto-generated somehow.

Let’s give it a go!

Getting MARK-V to compile

In SMOL-V repository I have a little test application (see testmain.cpp) that has on a bunch of shaders, runs either SMOL-V or Spirv-Remapper on them, additionally compresses result with zlib/lz4/zstd and so on. “Let’s add MARK-V in there too” sounded like a natural thing to do. And since I refuse to deal with CMake in my hobby projects :), I thought I’d just add relevant MARK-V source files…

First “uh oh” sign: while the number of files under compression related folders (source/comp, tools/comp) is not high, that is 500 kilobytes of source code. Half a meg of source, Carl!

And then of course it needs a whole bunch of surrounding code from SPIRV-Tools to compile. So I copied everything that it needed to work. In total, 1.8MB of source code across 146 files.

After finding all the source files and setting up include paths for them, it compiled easily on both Windows (VS2017) and Mac (Xcode 9.4).

Pet peeve: I never understood why people don’t use file-relative include paths (like #include "../foo/bar/baz.h"), instead requiring the users of your library to setup additional include path compiler flags. As far as I can tell, relative include paths have no downsides, and require way less fiddling to both compile your library and use it.

Side issue: STL vector for input data

The main entry point for MARK-V decoding (this is what would happen on the device when loading shaders – so this is the performance critical part) is:

spv_result_t MarkvToSpirv(
    spv_const_context context, const std::vector<uint8_t>& markv,
    const MarkvCodecOptions& options, const MarkvModel& markv_model,
    MessageConsumer message_consumer, MarkvLogConsumer log_consumer,
    MarkvDebugConsumer debug_consumer, std::vector<uint32_t>* spirv);

Ok, I kind of get the need (or at least convenience) of using std::vector for output data; after all you are decompressing and writing out an expanding array. Not ideal, but at least there is some explanation.

But for input data – why?! One of const uint8_t* markv, size_t markv_size or a const uint8_t* markv_begin, const uint8_t* markv_end is just as convenient, and allows way more flexibility for the user at where the data is coming from. I might have loaded my data as memory-mapped files, which then literally is just a pointer to memory. Why would I have to copy that data into an additional STL vector just to use your library?

Side issue: found bugs in “Max” compression

MARK-V has three compression models - “Lite”, “Mid” and “Max”. On some test shaders I had the “Max” one could not decompress successfully after compression, so I guess “some bugs are there somewhere”. Filed a bug report and excluded the “Max” model from further comparison :(

MARK-V vs SMOL-V

Size evaluation

CompressionNo filterSMOL-VMARK-V LiteMARK-V Mid
Size KBRatioSize KBRatioSize KBRatioSize KBRatio
Uncompressed 4870100.0% 163033.5% 136928.1% 108522.3%
zlib default 121324.9% 60212.4% 4118.5% 3366.9%
LZ4HC default 134327.6% 60612.5% 4108.4% 3346.9%
Zstd default 89918.5% 4469.1% 3948.1% 3296.8%
Zstd level 20 59012.1% 3487.1% 2936.0% 2575.3%

Two learnings from this:

  • MARK-V without additional compression on top (“Uncompressed” row) is not really competitive (~25%); just compressing shader data with Zstandard produces smaller result; or running through SMOL-V coupled with any other compression.
  • This suggests that MARK-V acts more like a “filter” (similar to SMOL-V or spirv-remap), that makes the data smaller, but also makes it more compressible. Coupled with additional compression, MARK-V produces pretty good results, e.g. the “Mid” model ends up compressing data to ~7% of original size. Nice!

Decompression performance

I checked how much time it takes to decode/decompress shaders (4870KB uncompressed size):

Windows
AMD TR 1950X
3.4GHz
Mac
i9-8950HK
2.9GHz
MARK-V Lite536.7ms9.1MB/s 492.7ms9.9MB/s
MARK-V Mid 759.1ms6.4MB/s 691.1ms7.0MB/s
SMOL-V 8.8ms 553.4MB/s 11.1ms438.7MB/s

Now, I haven’t seriously looked at my SMOL-V decompression performance (e.g. Zstandard general decompression algorithm does ~1GB/s), but at ~500MB/s it’s perhaps “not terrible”.

I can’t quite say the same about MARK-V though; it gets under 10MB/s of decompression performance. That, I think, is “pretty bad”. I don’t know what it does there, but this low decompression speed is within a “maybe I wouldn’t want to use this” territory.

Decompressor size

There is only one case where the decompressor code size does not matter: it’s if it comes pre-installed on the end hardware (as part of OS, runtimes, drivers, etc.). In all other cases, you have to ship decompressor inside your own application, i.e. statically or dynamically link to that code – so that, well, you can decompress the data you have compressed.

I evaluated decompressor code size by making a dynamic/shared library on a Mac (.dylib) with a single exported function that does a “decode these bytes please” work. I used -O2 -fvisibility=hidden -std=c++11 -fno-exceptions -fno-rtti compiler flags, and -shared -fPIC -lstdc++ -dead_strip -fvisibility=hidden linker flags.

  • SMOL-V decompressor .dylib size: 8.2 kilobytes.
  • MARK-V decompressor .dylib size (only with “Mid” model): 1853.2 kilobytes.

That’s right. 1.8 megabytes! At first I thought I did something wrong!

I looked at the size report via Bloaty, and yeah, in MARK-V decompressor it’s like: 570KB GetIdDescriptorHuffmanCodecs, 137KB GetOpcodeAndNumOperandsMarkovHuffmanCodec, 64KB GetNonIdWordHuffmanCodecs, 44KB kOpcodeTableEntries and then piles and piles of template instantiations that are smaller, but there’s lots of them.

In SMOL-V by comparison, it’s 2KB smolv::Decode, 1.3KB kSpirvOpData and the rest is misc stuff and/or dylib overhead.

Library compilation time

While this is not that important aspect, it’s relevant to my current work role as a build engineer :)

Compiling MARK-V libraries with optimizations on (-O2) takes 102 seconds on my Mac (single threaded; obviously multi-threaded would be faster). It is close to two megabytes of source code after all; and there is one file (tools/comp/markv_model_shader.cpp) that takes 16 seconds to compile alone. I think that got CI agents into timeouts in SPIRV-Tools project, and that was the reason why MARK-V is not enabled by default in the builds :)

Compiling SMOL-V library takes 0.4 seconds in comparison.

Conclusion

While looking at compression ratio in isolation, MARK-V coupled with additional lossless compression looks good, I don’t think I would recommend it due to other issues.

The decompressor executable size alone (almost 2MB!) means that in order for MARK-V to start to “make sense” compared to say SMOL-V, your total shader data size needs to be over 100 megabytes; only then additional compression from MARK-V offsets the massive decompressor size.

Sure, there are games with shaders that large, but then MARK-V is also quite slow at decompression – it would take over 10 seconds to decompress 100MB worth of shader data :(

All my evaluation code is on mark-v branch in SMOL-V repository. At this point I’m not sure I’ll merge it to the main branch.

This is all.