EXR: Lossless Compression

One thing led to another, and I happened to be looking at various lossless compression options available in OpenEXR image file format.

EXR has several lossless compression options, and most of the available material (e.g. “Technical Introduction to OpenEXR” and others) basically end up saying: Zip compression is slow to write, but fast to read; whereas PIZ compression is faster to write, but slower to read than Zip. PIZ is the default one used by the library/API.

How “slow” is Zip to write, and how much “faster” is PIZ? I decided to figure that out :)

Test setup

Hardware: MacBookPro 16" (2019, Core i9 9980HK, 8 cores / 16 threads). I used latest OpenEXR version (3.1.1), compiled with Apple Clang 12.0 in RelWithDebInfo configuration.

Everything was tested on a bunch of EXR images of various types: rendered frames, HDRI skyboxes, lightmaps, reflection probes, etc. All of them tend to be “not too small” – 18 files totaling 1057 MB of raw uncompressed (RGBA, 16-bit float) data.

2048x2048, render of Blender 'Monster Under the Bed' sample scene 1024x1024, lightmap from a Unity project 1024x1024, lightmap from a Unity project 1180x1244, a depth buffer pyramid atlas 256x256, file used for VFX in a Unity project 1024x1024, prerendered explosion flipbook 4096x4096, 'rocks_ground_02' normal map from Polyhaven 2048x1556, ACES reference 'DigitalLAD' image 1920x1080, ACES reference 'SonyF35.StillLife' image 3840x2160, render of Blender 'Lone Monk' sample scene 4096x2560, render from Houdini 4096x2560, render from Houdini 8192x4096, 'Gareoult' from Unity HDRI Pack 8192x4096, 'Kirby Cove' from Unity HDRI Pack 4096x2048, 'lilienstein' HDRI from Polyhaven 2048x1024, 'Treasure Island' from Unity HDRI Pack 1536x256, reflection probe from a Unity project 1536x256, reflection probe from a Unity project

What are we looking for?

As with any lossless compression, there are at least three factors to consider:

  • Compression ratio. The larger, the better (e.g. “4.0” ratio means it produces 4x smaller data).
  • Compression performance. How fast does it compress the data?
  • Decompression performance. How fast can the data be decompressed?

Which ones are more important than others depends, as always, on a lot of factors. For example:

  • If you’re going to write an EXR image once, and use it a lot of times (typical case: HDRI textures), then compression performance does not matter that much. On the other hand, if for each written EXR image it will get read just once or several times (typical case: capturing rendered frames for later encoding into a movie file), then you would appreciate faster compression.
  • The slower your storage or transmission medium is, the more you care about compression ratio. Or to phrase it differently: the slower I/O is, the more CPU time you are willing to spend to reduce I/O data size.
  • Compression ratio can also matter when data size is costly. For example, modern SSDs might be fast, but their capacity still be a limiting factor. Or a network transmission of files might be fast, but you’re paying for bandwidth used.

There are other things to keep in mind about compression: memory usage, technical complexity of compressor/decompressor, ability to randomly access parts of image without decompressing everything else, etc. etc., but let’s not concern ourselves with those right now :)

Initial (bad) result

What do we have here? (click for a larger interactive chart)

This is two plots of compression ratio vs. compression performance, and compression ratio vs. decompression performance. In both cases, the best place on the chart is top right – the largest compression ratio, and the best performance.

For performance, I’m measuring it in MB/s, in terms of uncompressed data size. That is, if we have 1GB worth of raw image pixel data and processing it took half a second, that’s 2GB/s throughput (even if compressed data size might be different).

The time it has taken to write or read the file itself is included into the measurement. This does mean that results are not only CPU dependent, but also storage (disk speed, filesystem speed) dependent. My test is on 2019 MacBookPro, which is “quite fast” SSD for today, and average (not too fast, not too slow) filesystem. I’m flushing the OS file cache between writing and reading the file (via system("purge")) so that EXR file reading is closer to a “read a new file” scenario.

What we can see from the above is that:

  • Writing an uncompressed EXR goes at about 400 MB/s, reading at 1400 MB/s,
  • Zip and PIZ compression ratio is roughly the same (2.4x),
  • Compression and decompression performance is quite terrible. Why?

Turns out, OpenEXR library is single-threaded by default. The file format itself is much better than the image formats of yore (e.g. PNG, which is completely single threaded, fully, always) – EXR format in most cases splits up the whole image into smaller chunks that can be compressed and decompressed independently. For example, Zip compression does it on 16 pixel row chunks – this loses some of the compression ratio, but each 16-row image slice could be compressed & decompressed in parallel.

If you tell the library to use multiple threads, that is. By default it does not. So, one call to Imf::setGlobalThreadCount() later…

Threaded result

There, much better! (16 threads on this machine)

  • Compression ratio: Zip and PIZ EXR compression types both have very similar compression ratio, making the data 2.4x smaller.
  • Writing: If you want to write EXR files fast, you want PIZ. It’s faster than writing them uncompressed (400 -> 600 MB/s), and about 3x faster to write than Zip (200 -> 600 MB/s). Zip is about 2x slower to write than uncompressed.
  • Reading: However, if you mostly care about reading files, you want Zip instead – it’s about the same performance as uncompressed (~1600 MB/s), whereas PIZ reads at a lower 1200 MB/s.
  • RLE compression is fast both at writing and reading, but compression ratio is much lower at 1.7x.
  • Zips compression is very similar to Zip; it’s slightly faster but lower compression ratio. Internally, instead of compressing 16-pixel-row image chunks, it compresses each pixel row independently.

Next up?

So that was with OpenEXR library and format as-is. In the next post I’ll look at what could be done if, hypothetically, one were free to extend of modify the format just a tiny bit. Until then!


Texture Compression on Apple M1

In the previous post I did a survey of various GPU format compression libraries. I just got an Apple M1 MacMini to help port some of these compression libraries to it, and of course decided to see some performance numbers. As everyone already noticed, M1 CPU is quite impressive. I’m comparing three setups here:

  1. MacBookPro (2019 16", 8 cores / 16 threads). This is basically the “top” MacBook Pro you can get in 2020, with 9th generation Coffee Lake Core i9 9980HK CPU. It starts at $3000 for this CPU.
  2. MacMini (M1, 4 perf + 4 efficiency cores). It starts at $700 for this CPU (but realistically you’d want maybe a $1300 model for more decent RAM/SSD sizes).
  3. The same MacMini, but testing Intel/x64 builds of the compressors under Rosetta 2 translator.

Multi-threaded compression

Here we’re compressing a bunch of textures into various GPU formats, using various compression libraries, and various quality settings of those. See previous post for details. The tests are done by using all the CPU cores, and results are in millions of pixels per second (higher = better).

Desktop BC7 format, using ISPCTextureCompressor and bc7e libraries:

Desktop BC1/BC3 (aka DXT1/DXT5) format, using ISPCTextureCompressor and stb_dxt libraries:

Mobile ASTC 4x4 format, using ISPCTextureCompressor and astcenc (2.3-ish) libraries:

Mobile ETC2 format, using Etc2Comp and etcpak libraries:

Overall the 2019 MacBookPro is from “a bit faster” to “about twice as fast” as the M1, when compression is fully multi-threaded. This makes sense due to two things:

  • 2019 MBP uses 16 threads, whereas M1 uses 8 threads. In both cases these are not “100% the same” threads, since the former only has 8 “real” cores, with two SMT threads per core; and the latter has 4 “high performance” cores and 4 “low power” cores. But with some squinting we should probably expect MBP to be almost 2x faster overall, just due to higher CPU thread count.
  • Some of the texture compressors (ISPCTexComp, bc7e) use AVX2 code paths for “almost ideal” speedup, meaning the full compression algorithm is fully SIMD, using AVX2 8-wide execution when available. These compressors are written in ISPC language. M1 on the other hand, only has 4-wide SIMD execution (via NEON). If a program can take really good advantage of wider SIMD, then Intel CPU has an advantage there.

Summary: on all-cores texture compression, 2019 MBP is about 2x faster than M1, for compressors written with ISPC (ISPCTexComp, bc7e) that take really good advantage of AVX2. In other compressors, 2019 MBP is “a bit” faster. ETC2 etcpak compressor has M1 faster than 2019 MBP.

Rosetta 2 translator for x64/SSE works impressively well, reaching ~70-90% performance of natively compiled Arm+NEON code.

Single-threaded compression

Ok, what if we limited compression to a single CPU thread? For texture compression itself that does not make a whole lot of sense, but it’s interesting to see how 2019 MBP and M1 compare without the “MBP has more threads” advantage. You could maybe extrapolate how M1 CPU would behave if it had more cores.

Same formats and compressors as above, just single threaded everywhere:

Here it’s basically: if a compressor is fully SIMD with AVX2 (ISPCTexComp, bc7e), then 2019 MBP is 1.5x faster than M1. Otherwise M1 is a bit faster.

Multi-thread speedup

Once we have multi-threaded and single-thread numbers, we can see what’s the effective speedup from using all the CPU cores. Ideally 2019 MBP would be 16x faster, and M1 would be 8x faster, since that’s the amount of threads we’re distributing the work to. In practice, as mentioned above, not all of these threads are fully independent or equal. And the computation could hit some other limits, e.g. RAM bandwidth and so on. Anyway, what’s the effective speedup for texture compression, when using all the CPU cores?

  • 2019 MacBook Pro is ~6x faster from using all cores. This one’s curious, since it’s even below the “full 8 cores” scaling. Maybe loading all the SMT threads ends up doing more harm than good here, or we’re hitting some other bottleneck that prevents further scaling.
  • M1 is ~4.5x faster from using all cores. This either means there’s a fairly large performance difference between “performance” and “efficiency” cores, or we’re hitting some other bottleneck.

Anyway, that’s it! Now I’m curious to see what the next iteration of Apple CPUs will look like. M1 is impressive!


Texture Compression in 2020

I’ve spent some time looking at various texture compression formats and libraries (the ones meant for GPUs, i.e. “ASTC, BC7” kind, not the “PNG, JPG” kind). Here’s a fully incomprehensible chart to set the stage (click for a semi-interactive page):

If your reaction is “whoa what is this mess?!”, that’s good. It is a mess! But we’ll get through that :)

Backstory on GPU compression formats

Majority of textures on the GPU end up using “block compression” formats, to save sampling bandwidth, memory usage and for faster texture loads. A common theme between them is that all of them are lossy, have fixed compression ratio (e.g. 4:1 compared to uncompressed), and are based around idea of encoding some small pixel block (e.g. 4x4 pixels) using fewer bits than what would be normally needed.

This is very different from image storage compression formats like PNG (lossless) or JPG (lossy) that are based on somehow transforming an image, (in JPG case) throwing away some detail, and compressing the data using traditional compression algorithms.

Why the GPUs don’t do “usual” compression like PNG or JPG? Because a texture needs random access; if some thing rendered on screen needs to sample a texture pixel at coordinate (x=17,y=287), it would be very inconvenient if in order to get that pixel color the GPU would have to decompress all the previous pixels too. With block-based compression formats, the GPU only needs to read the small, fixed size chunk of bytes of the compressed block that the pixel belongs to, and decompress that. Since blocks are always the same size in pixels (e.g. 4x4), and always the same size in bits (e.g. 128 bits), that works out nicely.

I’m leaving out lossless texture or framebuffer compression that some GPUs do. As far as I know, these are also somewhat block-based, but exact details are extremely GPU specific and mostly not public information. It’s also not something you control; the lossless compression in various parts of GPU pipeline is largely applied without you doing anything.

Different block-based GPU compression schemes have been developed over the years. Hans-Kristian Arntzen has a nice overview in the “Compressed GPU texture formats – a review and compute shader decoders – part 1” blog post. Wikipedia has some detail too (S3TC, ETC, ASTC), but these pages read more like spec-sheet bullet point lists.

A simplified (and only slightly incorrect) summary of texture compression format situation could be:

  • On PC, you’d use BC7 (modern) or DXTC (old) formats.
  • On mobile, you’d use ASTC (modern) or ETC (old) formats.

Backstory on compression libraries

How do you produce this “block-compressed” image? You use some piece of code that knows how to turn raw uncompressed pixel data into compressed bits in one of the block-compressed formats. There are multiple of these libraries available out there, and in the complexicated-looking chart above I’ve looked at some of them, for some of the compressed formats.

Some formats are better than others.

Some libraries are better than others.

And that was my evaluation :) Read on for details!

Testbed setup

I made a small application that contains multiple texture compressors (all used in “library” form instead of standalone compressor executables, to only measure compression performance that would not include file I/O), and gathered a bunch of textures from various Unity projects (a mix of everything: albedo maps, normal maps, sprites, UI textures, skyboxes, VFX textures, roughness maps, ambient occlusion maps, lightmaps etc.).

The application loads uncompressed image, compresses it into different formats using different compression libraries and their settings, and evaluates both performance (in terms of Mpix/s) and quality (in terms of Luma-PSNR). Now, “how to evaluate quality” is a very wide and complicated topic (PSNR, SSIM, Luma-PSNR, other approaches, etc.). Just this automated “compute one number” is often not enough to capture some of typical block compression artifacts, but for simplicity sake I’ve settled on Luma-PSNR here. For the compressors, I’ve asked them to do “perceptual” error metric mode, when that is available.

For normal maps I actually do “compute some lighting” on them (both uncompressed and compressed+decompressed), and evaluate Luma-PSNR on the result. In my experience this better captures “effective quality” of lossy normal map compression. Additionally, all the normal maps are passed with a (Y,Y,Y,X) (aka “DXT5nm”) swizzle into the compressors, with assumption that X & Y channels being mostly uncorrelated, the compressor will be able to capture that better by having one of them in RGB, and another one in Alpha. For compressors that have special “normal map” compression mode, that is used too.


1) Raw normal map; 2) “computed lighting” on the normal map used to evaluate compression quality; 3) compressed to ASTC 4x4 with ARM astcenc “medium” setting; 4) compressed to ASTC 6x6 with ARM astcenc “very fast” setting; 5) compressed to ETC2 with ETCPACK “slow” setting.

I have not tested all the block compression formats out there (e.g. BC6H as ASTC HDR for HDR content, or ETC, or some ASTC variants), and definitely have not tested all the compression libraries (some are not publicly available, some I haven’t been able to build easily, some I’m probably not even aware of, etc.).

For the ones I have tested, I compiled them with up to AVX2 (including BMI, FMA and POPCNT) instruction set options where applicable.

Everything compiled with Xcode 11.7 (Apple clang version 11.0.3), running on 2018 MacBook Pro (Core i9 2.9GHz, 6 core / 12 thread). Multi-threading was used in a simple way, to split up the image into separate chunks and compress each on their own thread. Timings on a Windows box with different hardware (AMD ThreadRipper 1950X) and different compiler (Visual Studio 2019) were more or less similar.

Formats overview

Here’s a cleaner version of the first chart (click for an interactive page):

Points on the scatter plot are average across the whole test texture set compression performance (horizontal axis, Mpix/s, note logarithmic scale!) against resulting image quality (Luma-PSNR, higher values are better). Best possible results would be towards upper right corner (great quality, fast compression performance).

Different compression formats use different point shapes (e.g. DXTC aka BC1/BC3 uses circles; BC7 uses triangles etc.).

Different compression libraries use their own hue (e.g. Intel ISPCTextureCompressor is green; ARM astcenc is blue). Libraries tested are:

  • ISPC: Intel ISPCTextureCompressor. DXTC, BC7, ASTC formats. Built with code from 2020 August, with determinism patch applied and using ISPC compiler version 1.14 (changes).
  • bc7e: Binomial bc7e. BC7 format. Built with code from 2020 October, using ISPC compiler version 1.14.
  • bc7enc: Rich Geldreich bc7enc. BC7. Built with code from 2020 April.
  • rgbcx: Rich Geldreich rgbcx. DXTC (BC1,BC3). Built with code from 2020 April.
  • stb_dxt: Fabian Giesen & Sean Barrett stb_dxt. DXTC (BC1,BC3). Built with code from 2020 July.
  • icbc: Ignacio Castano ICBC. DXTC (BC1 only). Built with code from 2020 August.
  • ARM: ARM astcenc. ASTC. Built with code from 2020 November.
  • ETCPACK: Ericsson ETCPACK. ETC2. Built on 2020 October.
  • Etc2Comp: Google Etc2Comp. ETC2. Built on 2020 October.
  • etcpak: Bartosz Taudul etcpak. ETC2. Built on 2020 October.
  • Apple: Apple <AppleTextureEncoder.h>. I can’t find any official docs online; here’s a random stack overflow answer that mentions it. This one is only available on Apple platforms, and only supports ASTC 4x4 and 8x8 formats. Also tends to sometimes produce non-deterministic results (i.e. slightly different compression bits on the same input data), so I only included it here as a curiosity.

From the overall chart we can see several things:

  • There are high quality texture formats (BC7, ASTC 4x4), where PSNR is > 42 dB. Both of these are 8 bits/pixel compression ratio. There’s a range of compression performance vs. resulting quality options available, with BC7 achieving very similar quality to ASTC 4x4, while being faster to compress.
  • Medium quality (35-40 dB) formats include DXTC (BC1 at 4bits/pixel, BC3 at 8bits/pixel), ASTC 6x6 (3.6bits/pixel), ETC2 (4 or 8 bits/pixel). There’s very little quality variance in DXTC compressors, and most of them are decently fast, with ISPC one approaching 1000 Mpix/s here. ETC2 achieves comparable quality with the same compression ratio, but is two-three orders of magnitude slower to compress. ASTC 6x6 has lower bit rate for the same quality, and similar compression speed as ETC2.
  • “Meh” quality (<35 dB) formats or compressors. ASTC 8x8 is just two bits per pixel, but the artifacts really start to be visible there. Likewise, ETC2 “etcpak” compressor is impressively fast (not quite at DXTC compression speed though), but quality is also not great.
  • There is a 50000x compression speed difference between slowest compressor on the chart (ETC2 format, ETCPACK compressor at “slow” setting: average 0.013Mpix/s), and fastest compressor on the chart (DXTC format, ISPC compressor: average 654Mpix/s). And the slowest one produces lower quality images too!
    • Of course that’s an outlier; realistically usable ETC2 compressors are Etc2Comp “normal” (0.8Mpix/s) and “fast” options (3.7Mpix/s). But then, for DXBC you have many options that go over 100Mpix/s while achieving comparable quality – still a 100x performance difference.
    • In the high quality group, there’s a similar performance difference between BC7 compressors producing >45dB quality results at 10-20Mpix/s speed, whereas ASTC 4x4 achieves the same quality several times slower, at 2-8Mpix/s speed.

Individual Formats

BC7 (PC): 8 bits/pixel, using 4x4 pixel blocks. Has been in all DX11-class PC GPUs (NVIDIA since 2010, AMD since 2009, Intel since 2012).

  • bc7e (red) looks like a clear winner. The various presets it has either have better quality, or faster compression (or both) compared to ISPC (green).
  • bc7enc is behind the curve in all aspects, which I think is by design – it seems more like an experiment “how to do a minimal BC7 compressor”; just use bc7e instead.
  • A bunch of image results that are low quality (<30 dB in the chart) are all normal maps. I haven’t deeply investigated why, but realistically you’d probably use BC5 format for normal maps anyway (which I’m not analyzing in this post at all).

ASTC 4x4 (mobile): 8 bits/pixel, using 4x4 pixel blocks. Has been in most modern mobile GPUs: ARM since Mali T624 (2012), Apple since A8 (2014), Qualcomm since Adreno 4xx (2015), PowerVR since GX6250 (2014), NVIDIA since Tegra K1 (2014).

  • ARM astcenc (blue) is a good choice there. It used to be several times slower in 1.x versions; back then ISPC (green) might have been better. Apple (purple) is of limited use, since it only works on Apple platforms and sometimes produces non-deterministic results. It’s impressively faster than others though, although at expense of some quality loss.
  • Just like with BC7, all the low-quality results are from normal maps. This one’s curious, since I am passing a “normal map please” flag to the compressor here, and ASTC format is supposed to handle uncorrelated components in a nice way. Weird, needs more investigation! Anyway, at 8 bits/pixel for normal maps one can use EAC RG compression format, which I’m also not looking at in this post :)

DXTC (PC): 4 bits/pixel for RGB (BC1 aka DXT1), 8 bits/pixel for RGBA (BC3 aka DXT5). This has been on PCs since forever, and BC1 is still quite used if you need to get below 8 bits/pixel (it’s literally the only option available, since BC7 is always 8 bits/pixel).

It’s a fairly simple format, and that combined with the “has been there for 20 years” aspect means that effective compression techniques for it are well researched at this point.

  • ISPC (green) has been the go-to compressor for some years in many places, offering impressive compression performance.
  • stb_dxt (gray; hiding between icbc 1 and rgbcx 0 in the chart) has been available for a long time too, for a small and decently fast compressor.
  • rgbcx (yellow) and icbc (cyan) are fairly new (both appeared in Spring 2020), and both are able to squeeze out a (little) bit more quality of this old format. icbc is BC1 only though; here in my testbed it falls back to stb_dxt for RGBA/BC3 images.

ASTC 6x6 (mobile): 3.6 bits/pixel, using 6x6 pixel blocks. Same platform availability as ASTC 4x4.

  • Same conclusions as for ASTC 4x4: ARM compressor is a good choice here. Well, there isn’t much to choose from, now is there :)

ETC2 (mobile): 4 bits/pixel for RGB, 8 bits/pixel for RGBA. Has been on mobile slightly longer than ASTC has (ETC2 is supported on all OpenGL ES 3.0, Vulkan and Metal mobile GPUs – it’s only OpenGL ES 2.0 GPUs that might not have it).

  • Use Etc2Comp (cyan) - it’s not very fast (only 0.1-5 Mpix/s speed) but produces good quality for the format.
  • If compression speed is important, etcpak (gray) is fast.
  • ETCPACK (orange) does not look like it’s useful for anything :)

ASTC 8x8 (mobile): 2 bits/pixel, using 8x8 pixel blocks. Same platform availability as other ASTC variants.

  • Similar conclusions as for ASTC 4x4: ARM compressor is a good choice here; ISPC is not bad either.
  • Same comments w.r.t. Apple one as in 4x4 case.

Further Work

For the “maybe someday” part…:

  • Investigate other texture formats, like:
    • BC4 (PC) / EAC R (mobile) for grayscale images,
    • BC5 (PC) / EAC RG (mobile) for normal maps,
    • BC6H (PC) / ASTC HDR (mobile) for high dynamic range images,
    • PVRTC (mobile)? Not sure if it’s very practical these days though.
    • ETC (mobile)? Likewise, not sure if needed much, with ETC2 & ASTC superseding it.
  • Look at some GPU compute shader based texture compressors, e.g. Matias Goldberg betsy or AMD Compressonator.
    • Also would like to look into determinism of those. Even already on CPUs it’s easy to have a compressor that produces different results when ran on different machines. In GPU land it’s probably way more common!
  • Looks into other compression libraries.
    • I’ve skipped some for this time since I was too lazy to build them (e.g. Compressonator does not compile on macOS).
    • etcpak “QuickETC2” (commit) branch looks interesting, but I haven’t tried it yet.
    • There’s a whole set of compressors that are focused on “Rate Distortion Optimization” (RDO) compression aspect, where they can trade off texture quality for better further compressibility of texture data for storage (i.e. if game data files are using zlib-compression, the compressor can make compressed bits “zlib friendlier”). Oodle Texture and Binomial Basis are compressors like that, however both are not publicly available so it’s not trivial to casually try them out :)
    • Maybe there are other compressors out there, that are worth looking at? Who knows!

That’s it for now! Go and compress some textures!


Black Hole Demo

I made a demo with my kid, and it somehow managed to get 1st place at a (small) demoparty!

The story of this demo started around year 2006 (!). I had just started working at Unity in Copenhagen, and was renting a tiny room (it was like 12 square meters; back then that’s all I could afford). The guy I was renting from was into writing and music. One of his stories was narrated into a music track, and that’s what I wanted to make a demo from… so we agreed that I would make a demo, using his story & music for it.

His band was called “Aheadahead”, back then they (of course) had a MySpace page. Now that is gone, but turns out they do have an album on Spotify! It’s called “High Status and Continued Success in the Ongoing Battle for Power and Prestige”, go check it out.

And… I never made the demo with that soundtrack.

Until sometime in 2018, during one week of vacation, I sat down with my kid and we decided to try to make it. A bunch of sketching, planning and experimenting with fractals followed.

And then the vacation week was over, and Regular Life happened again. I only came back to fiddle a bit with the demo sometime in 2019.

And then… (hmm is there a pattern here?) that got parked again. Until now, mid-2020, I had another vacation week with nothing much to do. So here it is! Wrapped it up, submitted to Field-FX demoparty, and somehow managed to get 1st place \o/

Credits

Tech Details

It’s a Unity 2018.2 project with nothing fancy really. A timeline, and mostly raymarched fractals that render into the g-buffer. Some utilities I’ve used from the most excellent Keijiro’s github, and the fractals are based on “Kali Sets”, particularly “kalizyl and kalizul” shadertoy by Stefan Berke.


Various details about Handles

I wanted to fix one thing about Unity editor Handles, and accidentally ended up fixing some more. So here’s more random things about Handles than you can handle!

What I wanted to fix

For several years now, I wanted built-in tool (Move/Scale/Rotate) handle lines to be thicker than one pixel. One pixel might have been fine when displays were 96 DPI, but these days a pixel is a tiny little non-square.

Often with some real colored content behind the tool handles, my old eyes can’t quite see them. Look (click for full size):

That’s way too thin for my taste. So I looked at how other similar programs (Maya, Blender, 3dsmax, Modo, Unreal) do it, and all of them except 3dsmax have thicker tool handles. Maya in particular has extremely customizable handles where you can configure everything to your taste – perhaps too much even :)

Recently at work I got an “ok aras do it!” approval for making the handles thicker, and so I went:

That by itself is not remarkable at all (well, except that I spent way too much time figuring out how to do something, that is placed in world space, to be constant size in pixel space, LOL). But while playing around with handle thickness, I noticed a handful other little things that were bugging me, I just never clearly thought about them.

Here they are in no particular order, in no other reason than an excuse to post some pictures & gifs!

Hover indicator with yellow color is not great

When mouse is hovering over or near enough the handle axis, it changes color to yellow-ish, to indicate that when you click, the handle will get picked up. The problem is, “slight yellow” is too similar to say the green (Y axis) color, or to the gray “rotate facing screen” outline. In the two images below, one of them has the outer rotation circle highlighted since I’m hovering over it. It’s not easy to tell that the highlight is actually happening.

What I tried doing instead is: upon hovering, the existing handle color turns a bit brighter, more opaque and the handle gets a bit thicker.

Move/Scale tool cap hover detection is weird

In some cases, the mouse being directly over one cap still picks another (close, but further from the mouse) axis. Here, the mouse is directly over the red axis cap, yet the blue one is “picked”. That seems to be fine with axes, but the “cap” part feels wonky.

I dug into the code, and the cause turned out to be that cone & cube caps use very approximate “distance to sphere” mouse hover calculation. E.g. for the Move tool, these spheres are what the arrow caps “look like” for the mouse picking. Which does not quite match the actual cone shape :) For a Scale tool, the virtual spheres are even way larger than cubes. A similar issue was with the scene view axis widget, where axis mouse hit detection was using spheres of this size for picking :facepalm:

Now, I get that checking distance to sphere is much easier, particularly when it has to be done in screen space, but come on. A sphere is not a great approximation for a cone :) Fixed this by writing “distance to cone” and “distance to cube” (in screen space) functions. Underneath both are “distance to a 2D convex hull of these points projected into screen space”. Yay my first convex hull code, I don’t remember ever writing that!

At almost parallel viewing directions, axis is wrongly picked

Here, moving the mouse over the plane widget and try to move, poof the Z axis got picked instead (see this tweet too).

What I did: 1) when the axis is almost faded out, never pick it. The code was trying to do that, but only when the axis is almost entirely invisible. 2) for a partially faded out axis, make the hover indicator not be faded out, so you can clearly see it being highlighted.

Cap draw ordering issues (Z always on top of X etc.)

Handle axes were always rendered in X, Y, Z order. Which makes the Z always look “in front” of X even when it’s actually behind:

Fixing this one is easy, just sort the axis handles before processing them.

Free rotate circle is barely visible

The inner circle of rotation gizmo that is for “free” (arcball) rotation is barely visible:

Make it more visible. A tiny bit more visible when not hovering (left), thicker + more visible when hovering (right):

Move tool plane handles are on the opposite side in Ortho projection

With an orthographic scene view camera, the “move on plane” handles are on the opposite side of the gizmo:

Turns out that was just a sign bug.

And that’s it for this round of Handles tweaks! There’s a ton more that could be done (see replies to Will’s tweet), but that’s for some other day. Nothing from above has shipped or landed to mainline code branch yet, by the way. So no promises :) Update: should ship in Unity 2020.2 alpha 9 soon!