Texture Compression in 2020

I’ve spent some time looking at various texture compression formats and libraries (the ones meant for GPUs, i.e. “ASTC, BC7” kind, not the “PNG, JPG” kind). Here’s a fully incomprehensible chart to set the stage (click for a semi-interactive page):

If your reaction is “whoa what is this mess?!”, that’s good. It is a mess! But we’ll get through that :)

Backstory on GPU compression formats

Majority of textures on the GPU end up using “block compression” formats, to save sampling bandwidth, memory usage and for faster texture loads. A common theme between them is that all of them are lossy, have fixed compression ratio (e.g. 4:1 compared to uncompressed), and are based around idea of encoding some small pixel block (e.g. 4x4 pixels) using fewer bits than what would be normally needed.

This is very different from image storage compression formats like PNG (lossless) or JPG (lossy) that are based on somehow transforming an image, (in JPG case) throwing away some detail, and compressing the data using traditional compression algorithms.

Why the GPUs don’t do “usual” compression like PNG or JPG? Because a texture needs random access; if some thing rendered on screen needs to sample a texture pixel at coordinate (x=17,y=287), it would be very inconvenient if in order to get that pixel color the GPU would have to decompress all the previous pixels too. With block-based compression formats, the GPU only needs to read the small, fixed size chunk of bytes of the compressed block that the pixel belongs to, and decompress that. Since blocks are always the same size in pixels (e.g. 4x4), and always the same size in bits (e.g. 128 bits), that works out nicely.

I’m leaving out lossless texture or framebuffer compression that some GPUs do. As far as I know, these are also somewhat block-based, but exact details are extremely GPU specific and mostly not public information. It’s also not something you control; the lossless compression in various parts of GPU pipeline is largely applied without you doing anything.

Different block-based GPU compression schemes have been developed over the years. Hans-Kristian Arntzen has a nice overview in the “Compressed GPU texture formats – a review and compute shader decoders – part 1” blog post. Wikipedia has some detail too (S3TC, ETC, ASTC), but these pages read more like spec-sheet bullet point lists.

A simplified (and only slightly incorrect) summary of texture compression format situation could be:

  • On PC, you’d use BC7 (modern) or DXTC (old) formats.
  • On mobile, you’d use ASTC (modern) or ETC (old) formats.

Backstory on compression libraries

How do you produce this “block-compressed” image? You use some piece of code that knows how to turn raw uncompressed pixel data into compressed bits in one of the block-compressed formats. There are multiple of these libraries available out there, and in the complexicated-looking chart above I’ve looked at some of them, for some of the compressed formats.

Some formats are better than others.

Some libraries are better than others.

And that was my evaluation :) Read on for details!

Testbed setup

I made a small application that contains multiple texture compressors (all used in “library” form instead of standalone compressor executables, to only measure compression performance that would not include file I/O), and gathered a bunch of textures from various Unity projects (a mix of everything: albedo maps, normal maps, sprites, UI textures, skyboxes, VFX textures, roughness maps, ambient occlusion maps, lightmaps etc.).

The application loads uncompressed image, compresses it into different formats using different compression libraries and their settings, and evaluates both performance (in terms of Mpix/s) and quality (in terms of Luma-PSNR). Now, “how to evaluate quality” is a very wide and complicated topic (PSNR, SSIM, Luma-PSNR, other approaches, etc.). Just this automated “compute one number” is often not enough to capture some of typical block compression artifacts, but for simplicity sake I’ve settled on Luma-PSNR here. For the compressors, I’ve asked them to do “perceptual” error metric mode, when that is available.

For normal maps I actually do “compute some lighting” on them (both uncompressed and compressed+decompressed), and evaluate Luma-PSNR on the result. In my experience this better captures “effective quality” of lossy normal map compression. Additionally, all the normal maps are passed with a (Y,Y,Y,X) (aka “DXT5nm”) swizzle into the compressors, with assumption that X & Y channels being mostly uncorrelated, the compressor will be able to capture that better by having one of them in RGB, and another one in Alpha. For compressors that have special “normal map” compression mode, that is used too.


1) Raw normal map; 2) “computed lighting” on the normal map used to evaluate compression quality; 3) compressed to ASTC 4x4 with ARM astcenc “medium” setting; 4) compressed to ASTC 6x6 with ARM astcenc “very fast” setting; 5) compressed to ETC2 with ETCPACK “slow” setting.

I have not tested all the block compression formats out there (e.g. BC6H as ASTC HDR for HDR content, or ETC, or some ASTC variants), and definitely have not tested all the compression libraries (some are not publicly available, some I haven’t been able to build easily, some I’m probably not even aware of, etc.).

For the ones I have tested, I compiled them with up to AVX2 (including BMI, FMA and POPCNT) instruction set options where applicable.

Everything compiled with Xcode 11.7 (Apple clang version 11.0.3), running on 2018 MacBook Pro (Core i9 2.9GHz, 6 core / 12 thread). Multi-threading was used in a simple way, to split up the image into separate chunks and compress each on their own thread. Timings on a Windows box with different hardware (AMD ThreadRipper 1950X) and different compiler (Visual Studio 2019) were more or less similar.

Formats overview

Here’s a cleaner version of the first chart (click for an interactive page):

Points on the scatter plot are average across the whole test texture set compression performance (horizontal axis, Mpix/s, note logarithmic scale!) against resulting image quality (Luma-PSNR, higher values are better). Best possible results would be towards upper right corner (great quality, fast compression performance).

Different compression formats use different point shapes (e.g. DXTC aka BC1/BC3 uses circles; BC7 uses triangles etc.).

Different compression libraries use their own hue (e.g. Intel ISPCTextureCompressor is green; ARM astcenc is blue). Libraries tested are:

  • ISPC: Intel ISPCTextureCompressor. DXTC, BC7, ASTC formats. Built with code from 2020 August, with determinism patch applied and using ISPC compiler version 1.14 (changes).
  • bc7e: Binomial bc7e. BC7 format. Built with code from 2020 October, using ISPC compiler version 1.14.
  • bc7enc: Rich Geldreich bc7enc. BC7. Built with code from 2020 April.
  • rgbcx: Rich Geldreich rgbcx. DXTC (BC1,BC3). Built with code from 2020 April.
  • stb_dxt: Fabian Giesen & Sean Barrett stb_dxt. DXTC (BC1,BC3). Built with code from 2020 July.
  • icbc: Ignacio Castano ICBC. DXTC (BC1 only). Built with code from 2020 August.
  • ARM: ARM astcenc. ASTC. Built with code from 2020 November.
  • ETCPACK: Ericsson ETCPACK. ETC2. Built on 2020 October.
  • Etc2Comp: Google Etc2Comp. ETC2. Built on 2020 October.
  • etcpak: Bartosz Taudul etcpak. ETC2. Built on 2020 October.
  • Apple: Apple <AppleTextureEncoder.h>. I can’t find any official docs online; here’s a random stack overflow answer that mentions it. This one is only available on Apple platforms, and only supports ASTC 4x4 and 8x8 formats. Also tends to sometimes produce non-deterministic results (i.e. slightly different compression bits on the same input data), so I only included it here as a curiosity.

From the overall chart we can see several things:

  • There are high quality texture formats (BC7, ASTC 4x4), where PSNR is > 42 dB. Both of these are 8 bits/pixel compression ratio. There’s a range of compression performance vs. resulting quality options available, with BC7 achieving very similar quality to ASTC 4x4, while being faster to compress.
  • Medium quality (35-40 dB) formats include DXTC (BC1 at 4bits/pixel, BC3 at 8bits/pixel), ASTC 6x6 (3.6bits/pixel), ETC2 (4 or 8 bits/pixel). There’s very little quality variance in DXTC compressors, and most of them are decently fast, with ISPC one approaching 1000 Mpix/s here. ETC2 achieves comparable quality with the same compression ratio, but is two-three orders of magnitude slower to compress. ASTC 6x6 has lower bit rate for the same quality, and similar compression speed as ETC2.
  • “Meh” quality (<35 dB) formats or compressors. ASTC 8x8 is just two bits per pixel, but the artifacts really start to be visible there. Likewise, ETC2 “etcpak” compressor is impressively fast (not quite at DXTC compression speed though), but quality is also not great.
  • There is a 50000x compression speed difference between slowest compressor on the chart (ETC2 format, ETCPACK compressor at “slow” setting: average 0.013Mpix/s), and fastest compressor on the chart (DXTC format, ISPC compressor: average 654Mpix/s). And the slowest one produces lower quality images too!
    • Of course that’s an outlier; realistically usable ETC2 compressors are Etc2Comp “normal” (0.8Mpix/s) and “fast” options (3.7Mpix/s). But then, for DXBC you have many options that go over 100Mpix/s while achieving comparable quality – still a 100x performance difference.
    • In the high quality group, there’s a similar performance difference between BC7 compressors producing >45dB quality results at 10-20Mpix/s speed, whereas ASTC 4x4 achieves the same quality several times slower, at 2-8Mpix/s speed.

Individual Formats

BC7 (PC): 8 bits/pixel, using 4x4 pixel blocks. Has been in all DX11-class PC GPUs (NVIDIA since 2010, AMD since 2009, Intel since 2012).

  • bc7e (red) looks like a clear winner. The various presets it has either have better quality, or faster compression (or both) compared to ISPC (green).
  • bc7enc is behind the curve in all aspects, which I think is by design – it seems more like an experiment “how to do a minimal BC7 compressor”; just use bc7e instead.
  • A bunch of image results that are low quality (<30 dB in the chart) are all normal maps. I haven’t deeply investigated why, but realistically you’d probably use BC5 format for normal maps anyway (which I’m not analyzing in this post at all).

ASTC 4x4 (mobile): 8 bits/pixel, using 4x4 pixel blocks. Has been in most modern mobile GPUs: ARM since Mali T624 (2012), Apple since A8 (2014), Qualcomm since Adreno 4xx (2015), PowerVR since GX6250 (2014), NVIDIA since Tegra K1 (2014).

  • ARM astcenc (blue) is a good choice there. It used to be several times slower in 1.x versions; back then ISPC (green) might have been better. Apple (purple) is of limited use, since it only works on Apple platforms and sometimes produces non-deterministic results. It’s impressively faster than others though, although at expense of some quality loss.
  • Just like with BC7, all the low-quality results are from normal maps. This one’s curious, since I am passing a “normal map please” flag to the compressor here, and ASTC format is supposed to handle uncorrelated components in a nice way. Weird, needs more investigation! Anyway, at 8 bits/pixel for normal maps one can use EAC RG compression format, which I’m also not looking at in this post :)

DXTC (PC): 4 bits/pixel for RGB (BC1 aka DXT1), 8 bits/pixel for RGBA (BC3 aka DXT5). This has been on PCs since forever, and BC1 is still quite used if you need to get below 8 bits/pixel (it’s literally the only option available, since BC7 is always 8 bits/pixel).

It’s a fairly simple format, and that combined with the “has been there for 20 years” aspect means that effective compression techniques for it are well researched at this point.

  • ISPC (green) has been the go-to compressor for some years in many places, offering impressive compression performance.
  • stb_dxt (gray; hiding between icbc 1 and rgbcx 0 in the chart) has been available for a long time too, for a small and decently fast compressor.
  • rgbcx (yellow) and icbc (cyan) are fairly new (both appeared in Spring 2020), and both are able to squeeze out a (little) bit more quality of this old format. icbc is BC1 only though; here in my testbed it falls back to stb_dxt for RGBA/BC3 images.

ASTC 6x6 (mobile): 3.6 bits/pixel, using 6x6 pixel blocks. Same platform availability as ASTC 4x4.

  • Same conclusions as for ASTC 4x4: ARM compressor is a good choice here. Well, there isn’t much to choose from, now is there :)

ETC2 (mobile): 4 bits/pixel for RGB, 8 bits/pixel for RGBA. Has been on mobile slightly longer than ASTC has (ETC2 is supported on all OpenGL ES 3.0, Vulkan and Metal mobile GPUs – it’s only OpenGL ES 2.0 GPUs that might not have it).

  • Use Etc2Comp (cyan) - it’s not very fast (only 0.1-5 Mpix/s speed) but produces good quality for the format.
  • If compression speed is important, etcpak (gray) is fast.
  • ETCPACK (orange) does not look like it’s useful for anything :)

ASTC 8x8 (mobile): 2 bits/pixel, using 8x8 pixel blocks. Same platform availability as other ASTC variants.

  • Similar conclusions as for ASTC 4x4: ARM compressor is a good choice here; ISPC is not bad either.
  • Same comments w.r.t. Apple one as in 4x4 case.

Further Work

For the “maybe someday” part…:

  • Investigate other texture formats, like:
    • BC4 (PC) / EAC R (mobile) for grayscale images,
    • BC5 (PC) / EAC RG (mobile) for normal maps,
    • BC6H (PC) / ASTC HDR (mobile) for high dynamic range images,
    • PVRTC (mobile)? Not sure if it’s very practical these days though.
    • ETC (mobile)? Likewise, not sure if needed much, with ETC2 & ASTC superseding it.
  • Look at some GPU compute shader based texture compressors, e.g. Matias Goldberg betsy or AMD Compressonator.
    • Also would like to look into determinism of those. Even already on CPUs it’s easy to have a compressor that produces different results when ran on different machines. In GPU land it’s probably way more common!
  • Looks into other compression libraries.
    • I’ve skipped some for this time since I was too lazy to build them (e.g. Compressonator does not compile on macOS).
    • etcpak “QuickETC2” (commit) branch looks interesting, but I haven’t tried it yet.
    • There’s a whole set of compressors that are focused on “Rate Distortion Optimization” (RDO) compression aspect, where they can trade off texture quality for better further compressibility of texture data for storage (i.e. if game data files are using zlib-compression, the compressor can make compressed bits “zlib friendlier”). Oodle Texture and Binomial Basis are compressors like that, however both are not publicly available so it’s not trivial to casually try them out :)
    • Maybe there are other compressors out there, that are worth looking at? Who knows!

That’s it for now! Go and compress some textures!


Black Hole Demo

I made a demo with my kid, and it somehow managed to get 1st place at a (small) demoparty!

The story of this demo started around year 2006 (!). I had just started working at Unity in Copenhagen, and was renting a tiny room (it was like 12 square meters; back then that’s all I could afford). The guy I was renting from was into writing and music. One of his stories was narrated into a music track, and that’s what I wanted to make a demo from… so we agreed that I would make a demo, using his story & music for it.

His band was called “Aheadahead”, back then they (of course) had a MySpace page. Now that is gone, but turns out they do have an album on Spotify! It’s called “High Status and Continued Success in the Ongoing Battle for Power and Prestige”, go check it out.

And… I never made the demo with that soundtrack.

Until sometime in 2018, during one week of vacation, I sat down with my kid and we decided to try to make it. A bunch of sketching, planning and experimenting with fractals followed.

And then the vacation week was over, and Regular Life happened again. I only came back to fiddle a bit with the demo sometime in 2019.

And then… (hmm is there a pattern here?) that got parked again. Until now, mid-2020, I had another vacation week with nothing much to do. So here it is! Wrapped it up, submitted to Field-FX demoparty, and somehow managed to get 1st place \o/

Credits

Tech Details

It’s a Unity 2018.2 project with nothing fancy really. A timeline, and mostly raymarched fractals that render into the g-buffer. Some utilities I’ve used from the most excellent Keijiro’s github, and the fractals are based on “Kali Sets”, particularly “kalizyl and kalizul” shadertoy by Stefan Berke.


Various details about Handles

I wanted to fix one thing about Unity editor Handles, and accidentally ended up fixing some more. So here’s more random things about Handles than you can handle!

What I wanted to fix

For several years now, I wanted built-in tool (Move/Scale/Rotate) handle lines to be thicker than one pixel. One pixel might have been fine when displays were 96 DPI, but these days a pixel is a tiny little non-square.

Often with some real colored content behind the tool handles, my old eyes can’t quite see them. Look (click for full size):

That’s way too thin for my taste. So I looked at how other similar programs (Maya, Blender, 3dsmax, Modo, Unreal) do it, and all of them except 3dsmax have thicker tool handles. Maya in particular has extremely customizable handles where you can configure everything to your taste – perhaps too much even :)

Recently at work I got an “ok aras do it!" approval for making the handles thicker, and so I went:

That by itself is not remarkable at all (well, except that I spent way too much time figuring out how to do something, that is placed in world space, to be constant size in pixel space, LOL). But while playing around with handle thickness, I noticed a handful other little things that were bugging me, I just never clearly thought about them.

Here they are in no particular order, in no other reason than an excuse to post some pictures & gifs!

Hover indicator with yellow color is not great

When mouse is hovering over or near enough the handle axis, it changes color to yellow-ish, to indicate that when you click, the handle will get picked up. The problem is, “slight yellow” is too similar to say the green (Y axis) color, or to the gray “rotate facing screen” outline. In the two images below, one of them has the outer rotation circle highlighted since I’m hovering over it. It’s not easy to tell that the highlight is actually happening.

What I tried doing instead is: upon hovering, the existing handle color turns a bit brighter, more opaque and the handle gets a bit thicker.

Move/Scale tool cap hover detection is weird

In some cases, the mouse being directly over one cap still picks another (close, but further from the mouse) axis. Here, the mouse is directly over the red axis cap, yet the blue one is “picked”. That seems to be fine with axes, but the “cap” part feels wonky.

I dug into the code, and the cause turned out to be that cone & cube caps use very approximate “distance to sphere” mouse hover calculation. E.g. for the Move tool, these spheres are what the arrow caps “look like” for the mouse picking. Which does not quite match the actual cone shape :) For a Scale tool, the virtual spheres are even way larger than cubes. A similar issue was with the scene view axis widget, where axis mouse hit detection was using spheres of this size for picking :facepalm:

Now, I get that checking distance to sphere is much easier, particularly when it has to be done in screen space, but come on. A sphere is not a great approximation for a cone :) Fixed this by writing “distance to cone” and “distance to cube” (in screen space) functions. Underneath both are “distance to a 2D convex hull of these points projected into screen space”. Yay my first convex hull code, I don’t remember ever writing that!

At almost parallel viewing directions, axis is wrongly picked

Here, moving the mouse over the plane widget and try to move, poof the Z axis got picked instead (see this tweet too).

What I did: 1) when the axis is almost faded out, never pick it. The code was trying to do that, but only when the axis is almost entirely invisible. 2) for a partially faded out axis, make the hover indicator not be faded out, so you can clearly see it being highlighted.

Cap draw ordering issues (Z always on top of X etc.)

Handle axes were always rendered in X, Y, Z order. Which makes the Z always look “in front” of X even when it’s actually behind:

Fixing this one is easy, just sort the axis handles before processing them.

Free rotate circle is barely visible

The inner circle of rotation gizmo that is for “free” (arcball) rotation is barely visible:

Make it more visible. A tiny bit more visible when not hovering (left), thicker + more visible when hovering (right):

Move tool plane handles are on the opposite side in Ortho projection

With an orthographic scene view camera, the “move on plane” handles are on the opposite side of the gizmo:

Turns out that was just a sign bug.

And that’s it for this round of Handles tweaks! There’s a ton more that could be done (see replies to Will’s tweet), but that’s for some other day. Nothing from above has shipped or landed to mainline code branch yet, by the way. So no promises :) Update: should ship in Unity 2020.2 alpha 9 soon!


Fourteen years at Unity o_O

Looks like I’ve been working at Unity for 14 years. What?!?! So here’s another blog post that looks at the past without presenting any useful information, similar to the ones from two, four, ten, eleven years.

A year ago I wrote how I started mentoring several juniors at work, and then how I’ve spent two years on the build system team.

What happened next is that I somehow managed to convince others (or someone else has convinced me – it’s a blur) that I should stop being on the build system team, “steal” the juniors I was mentoring and create a whole new team. And so one thing led to another, and I ended up leading/managing a whole new 8-person team, with most of us being in Unity Kaunas office. Due to lack of imagination, this was simply called a “Core Kaunas” team.

We spent most of 2019 focusing on improving version control (mostly Perforce) integration in Unity – fixing bugs, improving integration UI/UX, fixing lots of cases of versioned files being written to disk for no good reason (which under Perforce causes a checkout), and so on. See release notes items starting with “Version Control” in 2019.3 release notes for an example. Most of that work ships in 2019.3, some in 2020.1, some was backported all the way back to 2018.4 LTS. Most of what we did was either reported bugs / feature requests by users, or things coming from our own internal production(s), that for the first time ever used Perforce (on purpose! so that we could see the issues with our own eyes).

But also we managed to do some “random other” work, here’s a summary of what we casually did on the side in 2019 Q3 and Q4 respectively:

For a team where 5 out of 8 people have only about a year of “professional programming/QA” experience, and where this “side work” is not even our main focus area, I think that’s pretty decent! Happy there.

Starting this year, my team will be transitioning towards “various quality-of-life improvements” work, mostly in the editor based on artist/production feedback. Not “large features”, but various “low hanging fruit” that is relatively easy to do, but for whatever reason no one did yet. Some of that because teams are busy doing more important stuffs, some because work lands in-between teams with unclear ownership, and so on. “Editor Quality of Life” in Q4 random work image above is basically what we’re after. Version Control integration and improvements we’ll hand over to another team. Let’s see how that goes.

On a more personal side of work, I keep on doing short summaries of every week, and then at end of year I write up a “wot is it that aras did” doc. Partially because every one of my bosses is in some other office and I rarely get to see them, and partially so that I can argue I’m worth my salary :), or whatever.

Happy to report that I managed to delete 747 thousand lines of code last year! That is a bit cheating though, since half a million of that was versioned Quicktime headers, and turns out they are huge. Most of other deletions were things like “remove the old C#<->C++ bindings system”, which is no longer used. Anyway, I like deleting code, and this year was good.

Looking forward to what “my” team will be able to pull off in 2020, and also how juniors on the team will grow. Let’s make and ship more of these improvements, optimizations and “someone should totally have done this by now” type of things. \o/


It's Raining Cubes

So a dozen years ago I wrote “hey, 4 kilobyte intros are starting to get interesting”. Fast forward to 2019, and we made an attempt to make a 4KB demo with my team at work. None of us have any previous size-limited demo experience? ✅ We have no idea what the demo would be about? ✅ Does it have a high chance of being totally “not good”? ✅ So we did the only thing that made sense in this situation – try to do it!

We did not follow the modern trend of making 4KB demos that are purely “one giant shader that does raymarching”, and instead did most of the code on the CPU in C++. Physics simulation? Sure why not. Deferred rendering? Of course. Just write it in regular programming style, without paying that much attention to size coding tricks (see in4k or sizecoding)? Naturally.

Maybe that’s why this did not fit into 4 kilobytes :) and ended up being 6.6KB in size.

Executable: ItsRainingCubes.exe (6.6KB)
Source: Zip with VS2019 projects
Pouët: Link

Tech details:

  • Verlet style physics simulation. Simulates points and springs between them; also approximates each cube with a sphere :) and pushes points outside of them.
  • Deferred rendering (world’s most pointless deferred usage?) with colors, normals and the Z-buffer. There’s one shadowmap for the light source. The whole G-buffer is blurred (including depth and normals too!) with an Masaki Kawase style iterative filter and then the lighting is computed. That’s what produces the bloom-like outlines, soft edges on cubes and weird shadow shapes. It should not have worked at all.
  • Music is made in Renoise, using 4klang for playback.
  • Executable compressed using Crinkler, and shaders minified using Shader Minifier.
  • Visual Studio 2019, C++, OpenGL (compatibility profile with some GL3/GL4 extensions) was used.

Credits: Ascentress, shana, TomasK, NeARAZ.

Youtube video capture:

While it’s not impressive by any standards, I kinda expected us to achieve even less. Again, no previous experience in this area whatsoever! Four (well ok, almost seven…) kilobytes is not much, but with tools like Crinkler (great executable size reporting there, by the way - here’s an example) it’s manageable. There’s some wrestling with MSVC if you want to ignore all the default libraries, like you have to make your own implementations of _fltused, _ftol2(), _ftol2_sse(), memset(); load functions like cos() manually from the old msvcrt.dll, and so on. Funtimes. But once the basic setup is done, then it’s “just programming” really.

That’s it! Go make some demos.