Gradients in linear space aren't better

People smarter than me have already said it (Bart Wronski on twitter), but here’s my take in a blog post form too. (blog posts? is this 2005, grandpa?!)

When you want “a gradient”, interpolating colors directly in sRGB space does have a lot of situations where “it looks wrong”. However, interpolating them in “linear sRGB” is not necessarily better!

Background

In late 2020 Björn Ottosson designed “Oklab” color space for gradients and other perceptual image operations. I read about it, mentally filed under a “interesting, I should play around with it later” section, and kinda forgot about it.

Come October 2021, and Photoshop version 2022 was announced, including an “Improved Gradient tool”. One of the new modes, called “Perceptual”, is actually using Oklab math underneath.

Looks like CSS (“Color 4”) will be getting Oklab color space soon.

I was like, hmm, maybe I should look at this again.

sRGB vs Linear

Now - color spaces, encoding, display and transformations are a huge subject. Most people who are not into all that jazz, have a very casual understanding of it. Including myself. My understanding is two points:

  • Majority of images are in sRGB color space, and stored using sRGB encoding. Storage is primarily for precision / compression purposes – it’s “quite enough” to have 8 bits/channel for regular colors, and precision across the visible colors is okay-ish.
  • Lighting math should be done with “linear” color values, since we’re basically counting photons, and they add up linearly.

Around year 2010 or so, there was a big push in real-time rendering industry to move all lighting calculations into a “proper” linear space. This kind-of coincided with overall push to “physically based rendering”, which tried to undo various hacks done in many decades prior, and to have a “more correct” approach to rendering. All good.

However, I think that, in many bystander minds, has led to a “sRGB bad, Linear good” mental picture.

Which is the correct model when you’re thinking about calculating illumination or other areas where physical quantities of countable things are added up. “I want to go from color A to color B in a way that looks aesthetically pleasing” is not one of them though!

Gradients in Unity

While playing around with Oklab, I found things about gradients in Unity that I had no idea about!

Turns out, today in Unity you can have gradients either in sRGB or in Linear space, and this is independent on the “color space” project setting. The math being them is “just a lerp” in both cases of course, but it’s up to the system that uses the gradients to decide how they are interpreted.

Long story short, the particle systems (a.k.a. “shuriken”) assume gradient colors are specified in sRGB, and blended as sRGB; whereas the visual effect graph specifies colors as linear values, and blends them as such.

As I’ll show below, neither choice is strictly “better” than the other one!

Random examples of sRGB, Linear and Oklab gradients

All the images below have four rows of colors:

  1. Blend in sRGB, as used by a particle system in Unity.
  2. Blend in Oklab, used on the same particle system.
  3. Blend in Linear, as used by a visual effect graph in Unity.
  4. Blend in Oklab, used on the same visual effect graph.

Each color row is made up by a lot of opaque quads (i.e. separate particles), that’s why they are not all neatly regular:

Black-to-white is “too bright” in Linear.

Blue-to-white adds a magenta-ish tint in the middle, and also “too bright” in Linear.

Red-to-green is “too dark & muddy” in sRGB. Looks much better in Linear, but if you compare it with Oklab, you can see that in Linear, it feels like the “red” part is much smaller than the “green” part.

Blue-to-yellow is too dark in sRGB, too bright in Linear, and in both cases adds a magenta-ish tint. The blue part feels too narrow in Linear too.

Rainbow gradient using standard “VIBGYOR” color values is missing the cyan section in sRGB.

Black-red-yellow-blue-white adds magenta tint around blue in Linear, and the black part goes too bright too soon.

Random set of “muddy” colors - in Linear, yellow section is too wide & bright, and brown section is too narrow.

Red-blue-green goes through too dark magenta/cyan in sRGB, and too bright magenta/cyan in Linear.

Further reading

I don’t actually know anything about color science. If the examples above piqued your interest, reading material from people in the know might be useful. For example:

That’s it!


EXR: Filtering and ZFP

In the previous blog post I looked at using libdeflate for OpenEXR Zip compression. Let’s look at a few other things now!

Prediction / filtering

As noticed in the zstd post, OpenEXR does some filtering of the input pixel data before passing it to a zip compressor. The filtering scheme it does is fairly simple: assume input data is in 16-bit units, split that up into two streams (all lower bytes, all higher bytes), and delta-encode the result. Then do regular zip/deflate compression.

Another way to look at filtering is in terms of prediction: instead of storing the actual pixel values of an image, we try to predict what the next pixel value will be, and store the difference between actual and predicted value. The idea is, that if our predictor is any good, the differences will often be very small, which then compress really well. If we’d have a 100% perfect predictor, all we’d need to store is “first pixel value… and a million zeroes here!”, which takes up next to nothing after compression.

When viewed this way, delta encoding is then simply a “next pixel will be the same as the previous one” predictor.

But we could build more fancy predictors for sure! PNG filters have several types (delta encoding is then the “Sub” type). In audio land, DPCM encoding is using predictors too, and was invented 70 years ago.

I tried using what is called “ClampedGrad” predictor (from Charles Bloom blog post), which turns out to be the same as LOCO-I predictor in JPEG-LS. It looks like this in pseudocode:

// +--+--+
// |NW|N |
// +--+--+
// |W |* |
// +--+--+
//
// W - pixel value to the left
// N - pixel value up (previous row)
// NW - pixel value up and to the left
// * - pixel we are predicting
int grad = N + W - NW;
int lo = min(N,W);
int hi = max(N,W);
return clamp(grad,lo,hi);

(whereas the current predictor used by OpenEXR would simply be return W)

Does it improve the compression ratio? Hmm, at least on my test image set, only barely. Zstd compression at level 1:

  • Current predictor: 2.463x compression ratio,
  • ClampedGrad predictor: 2.472x ratio.

So either I did something wrong :), or my test image set is not great, or trying this more fancy predictor sounds like it’s not worth it – the compression ratio gains are tiny.

Lossless ZFP compression

A topic jump! Let’s try ZFP (github) compression. ZFP seems to be primarily targeted at lossy compression, but it also has a lossless (“reversible”) mode which is what we’re going to use here.

It’s more similar to GPU texture compression schemes – 2D data is divided into 4x4 blocks, and each block is encoded completely independently from the others. Inside the block, various magic stuff happens and then, ehh, some bits get out in the end :) The actual algorithm is well explained here.

I used ZFP development version (d83d343 from 2021 Aug 18). At the time of writing, it only supported float and double floating point data types, but in OpenEXR majority of data is half-precision floats. I tested ZFP as-is, by converting half float data into floats back and forth as needed, but also tried hacking in native FP16 support (commit).

Here’s what I got (click for an interactive chart):

  • ▴ - ZFP as-is. Convert EXR FP16 data into regular floats, compress that.
  • ■ - as above, but also compress the result with Zstd level 1.
  • ● - ZFP, with added support for half-precision (FP16) data type.
  • ◆ - as above, but also compress the result with Zstd level 1.

Ok, so basically ZFP in lossless mode for OpenEXR data is “meh”. Compression ratio not great (1.8x - 2.0x), compression and decompression performance is pretty bad too. Oh well! If I’ll look at lossy EXR compression at some point, maybe it would be worth revisiting ZFP then.

Next up?

The two attempts above were both underwhelming. Maybe I should look into lossy compression next, but of course lossy compression is always hard. In addition to “how fast?” and “how small?”, there’s a whole additional “how good does it look?” axis to compare with, and it’s a much more complex comparison too. Maybe someday!


EXR: libdeflate is great

Previous blog post was about adding Zstandard compression to OpenEXR. I planned to look into something else now, but a github comment from Miloš Komarčević and a blog post from Matt Pharr reminded me to look into libdeflate, which I was not consciously aware of before.

TL;DR: libdeflate is most excellent. If you need to use zlib/deflate compression, look into it!

Here’s what happens by replacing zlib usage for Zip compression in OpenEXR with libdeflate v1.8 (click for a larger chart):

zlib is dark green (both the currently default compression level 6, and my proposed level 4 are indicated). libdeflate is light green, star shape.

  • Compression ratio is almost the same. Level 4: 2.421x for zlib, 2.427x for libdeflate; level 6: 2.452x for zlib, 2.447x for libdeflate.
  • Writing: level 4 goes 456 -> 640 MB/s (1.4x faster), and level 6 goes 213 -> 549 MB/s (2.6x faster). Both are faster than writing uncompressed.
  • Reading: with libdeflate reaches 2GB/s speed, and becomes same speed as Zstandard. I suspect this might be disk bandwidth bound at that point, since the numbers all look curiously similar.

So, changing zlib to libdeflate should be a no-brainer. Way faster, and a huge advantage is that the file format stays exactly the same; everything that could read or write EXR files in the past can still read/write them if libdeflate is used.

In compression performance, Zip+libdeflate does not quite reach Zstandard speeds though.

Another possible thing to watch out is security/bugs. zlib, being an extremely popular library, has been quite thoroughly battle-tested against bugs, crashes, handling of malformed or malicious data, etc. I don’t know if libdeflate got a similar treatment.

In terms of code, my quick hack is not even very optimal – I create a whole new libdeflate compressor/decompressor object for each compression request. This could be optimized somehow if one were to switch to libdeflate for real, and maybe the numbers would be a tiny bit better. All my change did was this in src/lib/OpenEXR/ImfZip.cpp:

// in Zip::compress:
//
// if (Z_OK != ::compress2 ((Bytef *)compressed, &outSize,
//                  (const Bytef *) _tmpBuffer, rawSize, level))
// {
//     throw IEX_NAMESPACE::BaseExc ("Data compression (zlib) failed.");
// }
libdeflate_compressor* cmp = libdeflate_alloc_compressor(level);
size_t cmpBytes = libdeflate_zlib_compress(cmp, _tmpBuffer, rawSize, compressed, outSize);
libdeflate_free_compressor(cmp);
if (cmpBytes == 0)
{
    throw IEX_NAMESPACE::BaseExc ("Data compression (libdeflate) failed.");
}
outSize = cmpBytes;

// in Zip::uncompress:
// if (Z_OK != ::uncompress ((Bytef *)_tmpBuffer, &outSize,
//                  (const Bytef *) compressed, compressedSize))
// {
//     throw IEX_NAMESPACE::InputExc ("Data decompression (zlib) failed.");
// } 
libdeflate_decompressor* cmp = libdeflate_alloc_decompressor();
size_t cmpBytes = 0;
libdeflate_result cmpRes = libdeflate_zlib_decompress(cmp, compressed, compressedSize, _tmpBuffer, _maxRawSize, &cmpBytes);
libdeflate_free_decompressor(cmp);
if (cmpRes != LIBDEFLATE_SUCCESS)
{
    throw IEX_NAMESPACE::InputExc ("Data decompression (libdeflate) failed.");
}
outSize = cmpBytes;

Next up?

I want to look into more specialized compression schemes, besides just “let’s throw a general purpose compressor”. For example, ZFP.


EXR: Zstandard compression

In the previous blog post I looked at OpenEXR Zip compression level settings.

Now, Zip compression algorithm (DEFLATE) has one good thing going for it: it’s everywhere. However, it is also from the year 1993, and both the compression algorithm world and the hardware has moved on quite a bit since then :) These days, if one were to look for a good, general purpose, freely available lossless compression algorithm, the answer seems to be either Zstandard or LZ4, both by Yann Collet.

Let’s look into Zstandard then!

Initial (bad) attempt

Some quick hacky plumbing of Zstd (version 1.5.0) into OpenEXR, here’s what we get:

Zip/Zips has been bumped from previous compression level 6 to level 4 (see previous post), the new Zstandard is the large blue data point. Ok that’s not terrible, but also quite curious:

  • Both compression and decompression performance is better than Zip, which is expected.
  • However, that compression ratio? Not great at all. Zip and PIZ are both at ~2.4x compression, whereas Zstd only reaches 1.8x. Hmpft!

Turns out, OpenEXR does not simply just “zip the pixel data”. Quite similar to how e.g. PNG does it, it first filters the data, and then compresses it. When decompressing, it first decompresses and then does the reverse filtering process.

In OpenEXR, here’s what looks to be happening:

  • First the incoming data is split into two parts; first all the odd-indexed bytes, then all the even-indexed bytes. My guess is that this is based on assumption that 16-bit float is going to be the dominant input data type, and splitting it into “first all the lower bytes, then all the higher bytes” does improve compression when a general purpose compressor is used.
    • That got me thinking: EXR also supports 32-bit float and 32-bit integer pixel data types. However here for compression, they are still split into two parts, as if data is 16-bit sized. This does not cause any correctness issues, but I’m wondering whether it might be slightly suboptimal for compression ratio.
  • Then the resulting byte stream is delta encoded; e.g. this turns a byte sequence like { 1, 2, 3, 4, 5, 6, 4, 2, 0 } (not very compressible) into { 1, 129, 129, 129, 129, 129, 126, 126, 126 } which is much tastier for a compressor.

Let’s try doing exactly the same data filtering for Zstandard too:

Zstd with filtering

Look at that! Zstd sweeps all others away!

  • Ratio: 2.446x for Zstd, 2.442x for PIZ, 2.421x for Zip. These are actually very close to each other.
  • Writing: At 735MB/s, Zstd is fastest of all, by far. 1.7x faster than uncompressed or Zip, and handily winning against previous “fast to write, good ratio” PIZ. And it would be 3.6x faster than previous Zip at compression level 6.
  • Reading: At 2005MB/s, Zstd almost reaches RLE reading performance, is a bit faster to read than uncompressed (1744MB/s) or Zip (1697MB/s), and quite a bit faster than PIZ (1264MB/s).

Zstd also has various compression levels; the above chart is using the default (3) level. Let’s look at those.

Zstd compression levels

We have much more compression levels to choose from compared to Zip – there are “regular levels” between 1 and 22, but also negative levels that drop quite a bit of compression ratio in hopes to increase performance (this makes Zstd almost reach into LZ4 territory). Here’s a chart (click for an interactive page) where I tried most of them:

  • Negative levels (-1 and -3 in the chart) don’t seem to be worth it: compression ratio drops significantly (from 2.4-2.5x down to 2.1x) and they don’t buy any additional performance. I guess the compression itself might be faster, but the increased file size makes it slower to write, so they cancel each other out.
  • There isn’t much compression ratio changes between the levels – it varies between 2.446x (level 3) up to 2.544x (level 16). Slightly more variation than Zip, but not much. Levels beyond 10 get into “really slow” territory without buying much more ratio.
  • Level 1 looks better than default Level 3 in all aspects: quite a bit faster to write (745 -> 837 MB/s), and curiously enough slightly better compression ratio too (2.446x -> 2.463x)! Zstd with level 1 looks quite excellent (marked with a star shape point in the graph):
    • Writing: 2.0x faster than uncompressed, 1.9x faster than Zip, 1.4x faster than PIZ.
    • Reading: 1.16x faster than uncompressed, 1.06x faster than Zip, 1.7x faster than PIZ.
    • Ratio: a tiny bit better than either Zip or PIZ, but all of them about 2.4x really.

Next up?

I’ll report these findings to “Investigate additional compression” OpenEXR github issue, and see if someone says that Zstd makes sense to add (maybe? TIFF added it in v4.0.10 back in year 2017…). If it does, then most of the work will be “ok how to properly do that with their CMake/Bazel/whatever build system”; C++ projects are always “fun” in that regard, aren’t they.

Maybe it would be worth looking at some different filter than the one used by Zip (particularly for 32-bit float/integer images) too?

I also want to look into more specialized compression schemes, besides just “let’s throw something better than zlib at the thing” :)

Update: next blog post turned out to be about libdeflate.


EXR: Zip compression levels

Update 2021 October: default zip compression level was switched from 6 to 4, for OpenEXR 3.2 (see PR). Yay faster zipped exr writing, soon!

In the previous blog post I looked at lossless compression options that are available in OpenEXR.

The Zip compression in OpenEXR is just the standard DEFLATE algorithm as used by Zip, gzip, PNG and others. That got me thinking - the compression has different “compression levels” that control ratio vs. performance. Which one is OpenEXR using, and would changing them affect anything?

OpenEXR seems to be mostly using the default zlib compression level (6). It uses level 9 in several places (within the lossy DWAA/DWAB compression), we’ll ignore those for now.

Let’s try all the zlib compression levels, 1 to 9 (click for an interactive chart):

  • The Zip compression level used in current OpenEXR is level 6, marked with the triangle shape point on the graph.
  • Compression ratio is not affected much by the level settings - fastest level (1) compresses data 2.344x; slowest (9) compresses at 2.473x.
  • Levels don’t affect decompression performance much.
  • Maybe level 4 should be the default (marked with a star shape point on the graph)? It’s a tiny compression ratio drop (2.452x -> 2.421x), but compression is over 2x faster (206 -> 437 MB/s)! At level 4, writing a Zip-compressed EXR file becomes faster than writing an uncompressed one.
    • Just a tiny 4 line change in OpenEXR library source code would be enough for this.
    • A huge advantage is that this does not change the compression format at all. All the existing EXR decoding software can still decode the files just fine; it’s still exactly the same compression algorithm.

With a bit more changes, it should be possible to make the Zip compression level be configurable, like so:

Header header(width, height);
header.compression() = ZIP_COMPRESSION;
addZipCompressionLevel(header, level); // <-- new!
RgbaOutputFile output(filePath, header);

So that’s it. I think switching OpenEXR from Zip compression level 6 to level 4 by default should be a no-brainer. Let’s make a PR and see what happens!

Next up

In the next post I’ll try adding a new lossless compression algorithm to OpenEXR and see what happens.