Crank the World: Playdate demo

You know Playdate, the cute yellow console with a crank? I think I saw it in person last year via Inês, and early this year they started to have Lithuania as a shipping destination, so I got one. And then what would I do with it? Of course, try to make a demoscene demo :)

First impressions

The device is cute. The SDK is simple and no-nonsense (see docs). I only used the C part of SDK (there’s also Lua, which I did not use). You install it, as well as the official gcc based toolchain from ARM, and there you go. SDK provides simple things like “here’s the bitmap for the screen” or “here’s which buttons are pressed” and so on.

The hardware underneath feels similar to “about the first Pentiums” era - single core CPU at 180MHz (ARM Cortex-M7F), with hardware floating point (but no VFP/NEON), 16MB RAM, there’s no GPU. Screen is 400x240 pixels, 1 bit/pixel – kinda Kindle e-ink like, except it refreshes way faster (can go up to 50 FPS). Underlying operating system seems to be FreeRTOS but nothing about it is exposed directly; you just get what the SDK provides.

At first I tried checking out how many polygons can the device rasterize while keeping 30 FPS:

But in the end, going along with wise words of kewlers and mfx, the demo chose to use zero polys (and… zero shaders).

Packaging up the “final executable” of the demo felt like a breath of fresh air. You just… zip up the folder. That’s it. And then anyone with a device can sideload it from anywhere. At first I could not believe that it would actually work, without some sort of dark magic ritual that keeps on changing every two months. Very nice.

By the way, check out the “The Playdate Story: What Was it Like to Make Handheld Video Game System Hardware?” talk by Cabel Sasser from GDC 2024. It is most excellent.

The demo

I wanted to code some oldskool demo effects that I never did back when 30 years ago everyone else was doing them. You know: plasma, kefren bars, blobs, starfield, etc.

Also wanted to check out how much of shadertoy-like raymarching could a Playdate do. Turns out, “not a lot”, lol.

And so the demo is just that: a collection of random scenes with some “effects”. Music is an old GarageBand experiment that my kid did some years ago.

Tech bits

Playdate uses 1 bit/pixel screen, so to represent “shades of gray” for 3D effects I went the simple way and just used a static screen-size blue noise texture (from here). So “the code” produces a screen-sized “grayscale” image with one byte per pixel, and then it is dithered through the blue noise texture into the device screen bitmap. It works way better than I thought it would!

All the raymarched/raytraced scenes are way too slow to calculate each pixel every frame (too slow with my code, that is). So instead, calculate only every Nth pixel each frame, with update pattern similar to ordered dithering tables.

  • Raytraced spheres: update 1 out of 6 pixels every frame (in 3x2 pattern),
  • Raymarched tunnel/sponge/field: update 1 out of 4 pixels every frame (in 2x2 pattern), and run everything at 2x2 lower resolution too, “upscaling” the rendered grayscale image before dithering. So effectively, raymarching 1/16th the amount of screen pixels each frame.
  • Other simpler scenes: update 1 out of 4 pixels every frame.

You say “cheating”, I say “free motion blur” or “look, this is a spatial and temporal filter just like DLSS, right?” :)

For the raymarched parts, I tried to make them “look like something” while keeping the number of march iterations very limited, and doing other cheats like using too large ray step size which leads to artifacts but hey no one knows what is it supposed to look like anyway.

All in all, most of the demo runs at 30 FPS, with some parts dropping to about 24 FPS.

Size breakdown: demo is 3.1MB in size, out of which 3MB is the music track :) And that is because it is just an ADPCM WAV file. The CPU cost of doing something like MP3 playback was too high, and I did not go the MIDI/MOD/XM route since the music track comes from GarageBand.

Some of the scenes/effects are ripped off inspired by other shadertoys or demos:

When the music finishes, the demo switches to “interactive mode” where you can switch between the effects/scenes with Playdate A/B buttons. You can also use the crank to orbit/rotate the camera or change some other scene parameter. Actually, you can use the crank to control the camera during the regular demo playback as well.

All in all, this was quite a lot of fun! Maybe I should make another demo sometime.


I accidentally Blender VSE

Two months ago I started to contribute a bit of code to Blender’s Video Sequence Editor (VSE). Did you know that Blender has a suite of video editing tools? Yeah, me neither :) Even the feature page for it on the website looks… a bit empty lol.

Do I know anything at all about video editing, timelines, sequencers, color grading, ffmpeg, audio mixing and so on? Of course not! So naturally, that means I should start to tinker with it.

Wait what?

How does one accidentally start working on VSE?

You do that because you decide to check out Unity’s Unite 2023 conference in Amsterdam, and to visit some friends. For a spare half-a-day after the conference, you decide to check out Blender HQ. There, Francesco and Sergey, for some reason, ask you whether you’d like to contribute to VSE, and against your better judgement, you say “maybe?”.

So that’s how. And then it feels pretty much like this:

I started to tinker with it, mostly trying to do random “easy” things. By easy, I mean performance optimizations. Since, unless the code complexicates a lot, they are hard to argue against. “Here, this thing is 2x faster now”, in most places everyone will react with “oh nice!”. Hopefully.

So, two months of kinda-partime tinkering in this area that I did not even know existed before, and Blender VSE got a small set of improvements for upcoming Blender 4.1 (which just became beta and can be downloaded from usual daily builds). Here they are:

Snappier Timeline drawing

VSE timeline is the bottom part of the image above. Here it is zoomed out into the complete Sprite Fright edit project, with about 3000 “strips” visible at once. Just scrolling and panning around in that timeline was updating the user interface at ~15 frames per second.

Now that’s 60+ frames per second (#115311). Turns out, submitting graphics API draw calls two triangles at a time is not the fastest approach, heh. Here’s that process visualized inside the most excellent Superluminal profiler – pretty much all the time is spent inside “begin drawing one quad” and “finish drawing one quad” functions 🤦

As part of that, audio waveforms display also got some weirdness about it fixed, some UI polish tweaks, and now is on by default (#115274).

Scopes

VSE has options to display typical “scopes” you might expect: image histogram, waveform, vectorscope. Here’s their look, “before” on the left side, “now” on the right.

Histogram was drawn as pixelated image, with very saturated colors. Draw as nicer polygons, with a grid, and less saturation (#116798):

Waveform (here shown in “parade” option) was saturating very quickly. Oh, and make it 15x faster with some multi-threading (#115579).

Vectorscope’s outer color hexagon looked very 90s with all the pixelation. Copy the updated image editor vectorscope design, and voilà (#117738):

While at it, the “show overexposed regions” (“zebra stripes”) option was also sped up 2x-3x (#115622).

All these scopes (and image editor scopes) should at some point be done on the GPU with compute shaders, of course. Someday.

ffmpeg bits

Blender primarily uses ffmpeg libraries for audio/video reading and writing. That suite is famous for the extremely flexible and somewhat intimidating command line tooling, but within Blender the actual code libraries like libavcodec are used. Among other things, libswscale is used to do movie frame RGB↔YUV conversion. Turns out, libswscale can do those parts multi-threaded for a while by now, it’s just not exactly intuitive how to achieve that.

Previous code was like:

// init
SwsContext *ctx = sws_getContext(...);
// convert RGB->YUV
sws_scale(ctx, ...);

but that ends up doing the conversion completely single-threaded. There is a "threads" parameter that you can set on the context, to make it be able to multi-thread the conversion operation. But that parameter has to be set at initialization time, which means you can no longer use sws_getContext(), and instead you have to initialize the context the hard way:

SwsContext *get_threaded_sws_context(int width,
                                     int height,
                                     AVPixelFormat src_format,
                                     AVPixelFormat dst_format)
{
  /* sws_getContext does not allow passing flags that ask for multi-threaded
   * scaling context, so do it the hard way. */
  SwsContext *c = sws_alloc_context();
  if (c == nullptr) {
    return nullptr;
  }
  av_opt_set_int(c, "srcw", width, 0);
  av_opt_set_int(c, "srch", height, 0);
  av_opt_set_int(c, "src_format", src_format, 0);
  av_opt_set_int(c, "dstw", width, 0);
  av_opt_set_int(c, "dsth", height, 0);
  av_opt_set_int(c, "dst_format", dst_format, 0);
  av_opt_set_int(c, "sws_flags", SWS_BICUBIC, 0);
  av_opt_set_int(c, "threads", BLI_system_thread_count(), 0);

  if (sws_init_context(c, nullptr, nullptr) < 0) {
    sws_freeContext(c);
    return nullptr;
  }

  return c;
}

And you’d think that’s enough? Haha, of course not. sws_scale() never does multi-threading internally. For that, you need to use sws_scale_frame() instead. And once you do, it crashes since it turns out that you have had created your AVFrame objects just a tiny bit wrong that was completely fine for sws_scale, but is very much not fine for sws_scale_frame since the latter tries to do various sorts of reference counting and whatnot.

Kids, do not design APIs like this!

So all that took a while to figure out, but phew, done (#116008), and the RGB→YUV conversion step while writing a movie file is quite a bit faster now. And then do the same in the other direction, i.e. when reading a movie file, use multi-threaded YUV→RGB conversion, and fold vertical flip into the same operation as well (#116309).

Audio resampling

While looking at where time is spent while rendering a movie out of VSE, I noticed a this feels excessive moment where almost half of the time that takes to “produce a video or audio frame” is spent inside the audio library used by Blender (Audaspace). Not in encoding audio, just in mixing it before encoding! Turns out, most of that time is spent in resampling audio clip data, for example the movie is set to 48kHz audio, but some of the audio strips are 44.1kHz or similar. I started to dig in.

Audaspace, the audio engine, had two modes that it could do sound resampling: for inside-blender playback, it was using Linear resampler, which just linearly interpolates between samples. For rendering a movie, it was using Julius O Smith’s algorithm with, what it feels like, “uhh, somewhat overkill” parameter sizes.

One way to look at resampler quality is to take a synthetic sound, e.g. one that has a single increasing frequency, resample it, and look at the spectrogram of it. Here’s a “sweeping frequencies” sound, resampled inside Audacity with “best”, “medium” and “low” resampling settings. What you want is the result that looks like the “best” one, i.e. as little additional frequencies introduced as possible.

Inside Blender, Audaspace was providing two options: rendering vs. preview playback. Rendering one is good spectrogram indeed, whereas preview, while being fast to compute resampling, does introduce a lot of extra frequencies.

What I have done, is add a new “medium” resampling quality setting to Audaspace that, as far as I can tell, produces pretty much the same result while being about 3x faster to calculate. And made Blender use that when rendering:

With that, rendering a portion (2000 frames) of Sprite Fright on Windows Ryzen 5950X PC went 92sec→73 sec (#116059). And I’ve learned a thing or two about audio resampling. Not bad!

Image transformations and filtering

Strips that produce a visual (images, movies, text, scenes, …) in Blender VSE can be transformed: positioned, rotated, scaled, and additional cropping can be applied. Whenever that happens, the image that is normally produced by the strip is transformed into a new one. All of that is done on the CPU, and was multi-threaded already.

Yet it had some issues/bugs, and parts of the code could be optimized a bit. Plus some other things could be done.

“Why all of that is done on the CPU?!” you might ask. Good question! Part of the reason is, that no one made it be done on the GPU. Another part, is that the CPU fallback still needs to exist (at least right now), for the use case of: user wants to render things on a render farm that has no GPU.

“Off by half a pixel” errors

The code had various “off by half a pixel” errors that in many cases cancel themselves out or are invisible. Until they are not. This is not too dissimilar to “half texel offset” things that everyone had to go through in DirectX 9 times when doing any sort of image postprocessing. Felt like youth again :)

E.g. scaling a tiny image up 16x using Nearest and Bilinear filtering, respectively:

The Bilinear filter shifts the image by half the source pixel! (there’s also magenta – which is background color here – sneaking in; about that later)

In the other direction, scaling this image down exactly by 2x using Bilinear filtering does no filtering at all!

So things like that (as well as other “off by something” errors in other filters) got fixed (#116628). And the images above look like this with Bilinear 16x upscaling and 2x downscaling:

Transparency border around Bilinear filtering

VSE had three filtering options in Blender 4.0 and earlier: Nearest, Bilinear and Subsampled3x3. Of those, only the Bilinear one was adding half a source texel worth of transparency around the resulting image. Which is somewhat visible if you are scaling your media up. Why this discrepancy, no one remembers at this point, but it was there “forever”, it seems.

There’s a similar issue in Blender (CPU) Compositor, where Bilinear sampling of something blends in “transparency” when right on the edge of an image, whereas Bicubic sampling does not. Again, no one remembers why, and that should be addressed by someone. Someday.

I removed that “blend into transparency” from bilinear filtering code that is used by VSE. However! A side effect of this transparency thing, is that if you do not scale your image but only rotate it, the edge does get some sort of anti-aliasing. Which it would be losing now, if just removing that from bilinear.

So instead of blending in transparency when filtering the source image, instead I apply some sort of “transparency anti-aliasing” to the edge pixels of the destination image (#117717).

Filtering additions and changes

Regular VSE strip transforms did not have a cubic filtering option (it only existed in the special Transform Effect strip), which sounded like a curious omission. And that led into a rabbit hole of trying to figure out what exactly does Blender mean when they say “bicubic”, as well as what other software means by “bicubic”. It’s quite a mess lol! See an interactive comparison I made here:
aras-p.info/img/misc/upsample_filter_comp_2024

Anyway, “Bicubic” everywhere within Blender actually means “Cubic B-Spline” filtering, i.e. Mitchell-Netravali filter with B=1, C=0 coefficients, also known as “no ringing, but lots of blur”. Whether that’s a good choice depends on use case and what do the images represent. For VSE specifically, it sounded like the usual “Mitchell” filter (B=C=1/3) might have been better. Here’s both of them for example:

Both kinds of cubic filtering are an option in VSE now (#117100, #117517).

For downscaling the image, Blender 3.5 added a “Subsampled 3x3” filter. What it actually is, is a box filter that is hardcoded to 3x3 size. Whether box filter is a good filter, is a question for another day. But for now at least, I made it not be hardcoded to fixed 3x3 size (#117584), since if you scale the image down by not 3x3, it kinda starts to break down. Here, downscaling this perspective grid by 4x on each axis: original image, downscaled with current Subsampled 3x3 filter, and downscaled with the adjusted Box filter. Slightly better:

All of that is a lot of choices for the user, TBH! So I added an “Auto” filter option (#117853), that is now the default for VSE strips. It automatically picks the “most appropriate” filter based on transform data:

  • When there is no scaling or rotation: Nearest,
  • When scaling up by more than 2x: Cubic Mitchell,
  • When scaling down by more than 2x: Box,
  • Otherwise: Bilinear.

Besides all that, the image filtering process got a bit faster:

  • Get rid of virtual functions from the inner loop, and some SIMD for bilinear filtering (#115653),
  • Simplify cubic filtering, and add some SIMD (#117100),
  • Simplify math used by Box (née Subsampled3x3) filter (#117125),
  • Fix “does a solid image that covers the whole screen, and so we can skip everything under it” optimization not working, when said image has scale (#117786).

As a practical example, on my PC having a single 1920x1080 image in a 3840x2160 project (scaled up 2x), using Bilinear filtering: drawing the whole sequencer preview area went from 36.8ms down to 15.9ms. I have some ideas how to speed it up further.

Optimizing VSE Effects

While the actual movie data sets I have from Blender Studio do not use much/any effects, I optimized them by noticing something in the code. Most of that is just multi-threading.

  • Glow effect: multi-threaded now, 6x-10x faster (#115818).
  • Wipe effect: multi-threaded now, and simplify excessive trigonometry in Clock wipe; 6x-20x faster (#115837).
  • Gamma Cross effect: was doing really complex table + interpolation based things just to avoid a single square root call. Felt like the code was written before hardware floating point was invented :) 4x faster now (#115801).
  • Gaussian Blur effect: 1.5x faster by avoiding some redundant calculations (#116089).

What does all of that mean for render times?

On the three data sets I have from Blender Studio, the final render time of a VSE movie is about 2x faster on my PC. For example, the same Sprite Fright edit: rendering it went from almost 13 minutes down to 7 minutes.

I hope things can be further sped up. We “only” need to do 2x speedup another three times, and then it’s quite good, right? :P

Thoughts on actual work process

Is all of the above a “good amount of work” done, for two months part-time effort?

I don’t know. I think it’s quite okay, especially considering that the developer (me) knew nothing about the area or the codebase. Besides the user-visible changes outlined above, I did a handful of pull requests that were adding tests, refactoring code, cleaning something up, etc. In total 37 pull requests got done, reviewed and merged.

And here’s the interesting bit: I’m pretty sure I could have not done this at an “actual job”. I don’t have many jobs to compare, but e.g. at Unity between around 2015 and 2022, I think I would have been able to do like 30% of the above in the same time. Maybe less. I probably could have done the above at “ancient” Unity, i.e. around year 2010 or so.

The reasons are numerous and complex, and have to do with amount of people within the company, processes, expectations, communication, politics and whatnot. But it is weirdly funny, that if I’m able to do “X amount of work in Y amount of time” for free, then at a company where it would pay me relatively lotsa money for the work, various forces would try to make me do the same work slower. Or not finish the work at all, since due to (again, complex reasons) the effort might get cancelled midway!

I hope Blender does not venture into that size/complexity/workflow where it feels like The Process is not helping, but rather is there to demotivate and slow down everyone (not on purpose! it just slowly becomes that way).

What’s next?

Who knows! Blender 4.1 just became beta, which means feature-wise, it is “done” and the VSE related bits in 4.1 are going to be as they are right now.

However, work on Blender 4.2 starts now, and then 4.3, … For the near future, I want to keep tinkering with it. But without a clear plan :) Once I have something done, maybe I’ll write about them. Meanwhile, things can be observed in the Weekly Updates forum section.

Until next time! Happy video sequence editing!


Two years ago: left Unity

I left Unity at start of 2022. So for this lazy Tuesday afternoon, figured I can share the rambles I wrote in my goodbye email. No big insights there, just old man reminiscing. And hey the text is already written, so that makes it easy to copy-pasta it into the blog:


It’s now exactly 16 years of me working at Unity. And as everyone knows, if you don’t leave after 16 years, you have to wait until 32 years for your next chance (look, I don’t make the rules). It’s time for me to log off and watch this thing from the outside.

So, yeah, bye! Thanks everyone for this amazing journey! ❤️

However! @alexmclean suggested I should write up some random bits of trivia from the old times. You know, like your grandpa is always telling stories from his youth, and you have to listen no matter whether you like it or not. So here goes!

Back in 2005, I was looking for a new job. Applied to several companies (NVIDIA, Lionhead, DICE, Rockstar) mostly to no response. But! Some Tim Sweeney from Epic Games reached out and offered me a job. That went all the way through to an on-site interview (which was my first trip to the US, score!), where I did not do well on the technical interview, and that was a “nope” from Epic. That was sad!

However, several months later this company I’ve never heard about (“Over the Edge Entertainment”), making a game engine I’ve never heard about (“Unity”), wrote to me with a subject “Working on the future of middleware”, and text like:

We want to talk to you about joining our team and engineer this revolution with us. You’ll lead the PC version, and get to define your own projects, doing the coolest tech this side of the Sun. Make a mass market tool that can change how games are made…. write code to change the world.

To which, as any sane person would do, I replied “cool, but no thanks, good luck”. The world is filled with random 20-somethings imagining they can pull off the next big thing. But then they invited me over to a game jam they were organizing. We made this game called “Pakimono”:

And so I started working at Unity on 2006 January 4th. I absolutely thought that after a year or two this whole thing will go down as another cute, but failed attempt at making yet another game engine. Because these were the odds! But hey, at least it will be fun while it lasts.

Back then “heck yeah!” (alternative phrasing for “Win Wire” of today) moments were… sometimes strange. Like here’s @joe being extremely excited about getting a car model for the website. Why would our website have a car? No one knows.

Typical office experience was slightly different from the offices we have these days (or, well, used to have before 2020). Here’s @david sleeping in the Copenhagen office mid-2006:

End of 2006 was an “oh shit” moment when Microsoft shipped “XNA Game Studio Express”, a free game development environment based on C#, aimed at indies and small developers. That totally could have killed us; a handful of people against the whole Microsoft behemoth. Luckily, they never really focused on it, and by 2010 XNA was dead. There’s probably a lesson there of some sorts.

In 2007 I convinced one of my buddies (@valdemar) to join Unity, to work on porting it to our first console platform! That was the Nintendo Wii, of course (in retrospect, not the greatest platform for everyone except Nintendo itself). But, this started a long journey of non-PC based platform support in Unity.

In 2008 I made this (crappy, first time) felt work and in retrospect, have correctly predicted what our stock symbol will be a dozen years later. If that does not make me a professional analyst, I don’t know what would.

On a more serious note, in 2008 we started to port Unity editor to Windows (it was Mac-only before then). And given that I was pretty much the only person using Windows to begin with, that meant I was doing the port, yay (we’re still suffering the consequences today, I’m sorry). Anyway, in February we had this, which isn’t much but it’s a start:

And by 2008 September we had the first internal build for testing, in all 15.4 megabytes download size glory:

Overall 2008 was quite an eventful year. For example, we opened the brand new Kaunas (Lithuania) office, which (I think) was the first development office outside of Copenhagen. It looked as “fancy” as this, since that’s what we could afford at the time:

Fun fact: ten years later some scenes of HBO Chernobyl were filmed in the same building.

Also in 2008, I remember us watching Apple’s keynote where they announced the App Store and the whole SDK for creating third-party applications. We were “ok so this is pretty much the same as a Mac, it’s gonna be an easy port, let’s do it!!!1”. Famous last words, eh. Anyhoo, I somehow convinced @rej and @mantas to join in making the Unity iPhone port. The very first version, in a very rough state, shipped towards the end of 2008. And probably have affected the whole industry for decades to come.

Spring of 2009 saw us ship Unity 2.5 with the Windows Editor, and increased our game developer market by 10x or so. @amir had a suggestion to use this on the main website, to really drive down the message:

By 2010 the company grew to enormous size, for example here’s a whole company meeting:

I found this in my 2010 “yearly performance review / feedback”. Cute! Also, some things don’t really change.

2011, we had another massive “oh shit this could be bad” moment, when Adobe decided to add 3D capabilities to Flash (“Stage3D” as it would be called). Our web browser plugin was still going strong, and Flash was still huge. This could have been really, really bad! So we did the only thing that seemed to make sense at the time, which was “let’s add Flash as a platform for Unity”. Which meant converting C# code into ActionScript, and adding support for Flash’s custom 3D graphics API and their own strange shader language. Anyway, it (kinda, mostly) worked: “That’s a completely standard shader, written in Cg using our ‘surface shaders’ thing, compiled into Direct3D9 assembly by Cg, parsed into bytecode by mojoshader & converted to Flash’s shader bytecode (AGAL) by, well, me.

Anyway, by 2014 Flash, including all Stage3D stuff, was pretty much dead. There’s probably another lesson in there somewhere, maybe like “don’t be too afraid of the last attempt of a dying platform to make itself relevant”, or something.

Anyway, all that C# -> ActionScript conversion experience turned out to be quite useful when in 2014 Apple decided that all iOS apps must be 64-bit from now on. Scripting and mobile folks scrambled to get that working by creating C# -> C++ converter a.k.a. IL2CPP, since our existing scripting technology at the time (Mono) simply did not work on 64-bit iOS devices.

I was not involved in any of that, but in spring of 2014 we got an invitation for a secret project on-site at Apple. Two engineers were to be sent in there for a month, without being aware of what they would be doing. So me and @alexey went, and it turns out the secret project was Metal graphics API. Against all odds and much to everyone’s surprise, Apple was the first one to ship a “new/modern” graphics API (before Microsoft DX12 and Khronos Vulkan). A month at Apple was an… interesting experience. We did get Unity working on Metal in that month, but for the WWDC conference Apple decided to go with a keynote demo from Epic/Unreal instead. They were still friends back then, eh :)

Oh, here’s a funny thing I found from 2015, in an email from Joachim about a core team’s hack week:

The goal. How do we architect Unity so it can open a 100GB project and you start working in it, in less than 10 seconds. <…> To do this right I think we’ll have to soon form a 2-3 person asset pipeline team that can own this and get it to completion.

Yeah, maybe we should get onto that :)

One recent thing I’m fairly proud of, is that in 2019 we managed to create what used to be called the “Core Kaunas” team (now “Quality of Life” team – yeah we’re not great at names). A very small team of mostly junior people, doing “random improvements all over the place”. Day to day it does not sound much, but now when I read our 2019, 2020 and 2021 (ed: these were all links to internal Unity docs) summaries, eh it’s not too bad! Besides all the things we actually did, maybe we have influenced some other teams to also work on “quality of life” improvements. Rock on.

All in all, github (on our “main” code repository) says that my overall contribution over the years has been around minus one million lines of code, so 🎉. Why it thinks I’m #1 in the amount of code commits, I’ve no idea.

That’s it! Do good work, and take care of each other.


And that was it! Maybe next time I should write about what the heck I’ve been doing since I left. Or maybe something else.


Gaussian explosion

Over the past month it seems like Gaussian Splatting (see my first post) is experiencing a Cambrian Gaussian explosion of new research. The seminal paper came out in July 2023, and starting about mid-November, it feels like every day there’s a new paper or two coming out, related to Gaussian Splatting in some way. @MrNeRF and @henrypearce4D maintain an excellent list of all things related to 3DGS, check out their Awesome 3D Gaussian Splatting Resources.

By no means an exhaustive list, just random selection of interesting bits:

Ecosystem and tooling

Research

Unity Gaussian Splatting

The Unity Gaussian Splatting project that I created with intent of “eh, lemme try to make a quick toy 3DGS renderer in Unity, and maybe play around with data size reductions”, has somewhat surprisingly reached 1300+ GitHub stars. Since the previous blog post it got a bunch of random things:

  • Support for HDRP and URP rendering pipelines in adition to the built-in one.
  • Fine grained splat editing tools in form of selection and deletion (short video).
  • High level splat editing tools in form of ellipsoid and box shaped “cutouts”. @hybridherbst did the initial implementation, and then shortly afterwards all other 3DGS editing tools got pretty much the same workflow. Nice!
  • Ability to export modified/edited splats back into a .PLY file.
  • Faster rendering via more tight oriented screenspace quads, instead of axis-aligned quads.
  • I made the gaussian splat rendering+editing piece an actual package (OpenUPM page), and clarified license to be MIT.
  • (not part of github release, but in latest main branch) More fine grained editing tools (move individual splats), ability to bake splat transform when exporting .PLY, and multiple splats can be merged together.

The project contains some bits that are not gaussian splat related, but might be useful elsewhere:

Aaaand with that, I’m thinking that my toying around will end here. I’ve made a toy renderer and integration into Unity, learned a bunch of random things in the process, it’s time to call it a day and move onto something else. I suspect there will be another kphjillion gaussian splatting related papers coming out over the next year. Will be interesting to see where all of this ends up at!


Making Gaussian Splats more smaller

Previous post was about making Gaussian Splatting data sizes smaller (both in-memory and on-disk). This one is still about the same topic! Now we look into clustering / VQ.

Teaser: this scene (garden tools from my own shed) is just 7.5 megabytes of data now. And it represents the metal shading (anisotropy / brushed metal parts) quite well!

Spherical Harmonics take up a lot of space!

In raw uncompressed Gaussian Splat data, majority of the data is Spherical Harmonics coefficients. If we ignore the very first SH coefficient (which we treat as “a color”), the rest is 45 floating point numbers for each splat (15 numbers for R,G,B channels each). For something like the “bike” scene with 6.1 million splats, this is 1.1GB data just for the SH coefficients alone. And while they can be converted into half-precision (FP16) floats with pretty much no quality loss at all, or into smaller quantized formats (Norm11 and Norm565 from previous post), that still leaves them at 350MB and 187MB worth of data. Even the idea that should not actually work – lay them out in a Morton order inside a texture and compress as GPU BC1 format – does not look entirely terrible, but is still about 46MB of data.

Are Spherical Harmonics even worth having? That’s a good question. Without them, the scenes still look quite good, but the surfaces lose quite a lot of “shininess”, especially different reflectance when moving the viewpoint. Below are “bike” and “garden” scenes, rendered with full SH data (left side) vs just color (right side):

How does “reflection” of the vase on the metal part of the table work, you might ask? The gaussians have “learned” the ages-old trick of duplicating and mirroring the geometry for reflection! Cute!

Anyway, for now let’s assume that we do want this “surface reflectivity looks nicer” effect that is provided by SH data.

Remember palettized images?

Remember how ages ago image files used to have a “color palette” of say 256 or 16 distinct colors, and each pixel would just say “yeah, that one”, pointing at the index of the color inside the palette. Heck, even whole computer displays were using palettes because “true color” was too costly at the time.

We can try doing the same thing for our SH data – given several million SH items inside a gaussian splat scene, can we actually pick “just some amount” of distinct SH values, and have each splat just point to the needed SH item?

Why, yes, we can. I’ve spent a bit of time learning about “vector quantization”, “clustering” and “k-means” and related jazz, and have played around with clustering SHs into various amounts (from 1024 up to 65536).

Note that SH data, at 45 numbers per splat, is quite “high dimensional”, and that has various challenges (see curse of dimensionality). One of them is that clustering millions of splats into thousands of items, in 45 dimensions, is not exactly fast. Another is that clustering might not produce good results. ⚠️ I don’t know anything about any of that; it could very well be that I should have done clustering entirely differently! But hey, whatever :)

Also, I’m very impatient, like if anything takes longer than 10 minutes I go “this is not supposed to be that long”. I first tried scikit-learn but that was taking ages to cluster SHs into even one thousand items. Faiss was way faster, taking about 5 minutes to cluster “bike” scene SHs into 16k items. However, I did not want to add that as a dependency, so I whipped up my own variant of mini-batch k-means using Burst’ed C# directly inside Unity. I probably did it all wrong and incorrectly, but it is about 3x faster than even Faiss and seems to provide better quality, at least for this task, so 🤷

So the process is:

  • Take all the SH data from the gaussian splat scene,
  • Cluster that into 4k - 16k distinct SH item “palette”. Store that. I’m storing as FP16 numbers, so that’s 360KB - 1.44MB data for the palette itself.
  • For each original SH data point, find which item of the palette it is closest to. Store that inded per splat. I’m storing as 16 bits (even if some of the bits are not used), so for “bike” scene (6.1M splats) this is about 12MB indices.

Here’s full SH (left side) vs. SHs clustered into 16k items (right side):

This does retain the “shininess” effect, at expense of ~13MB data for either scene above. And while it does have some lighting artifacts, they are not terribly bad. So… probably okay?

Aside: the excellent gsplat.tech by Jakub Červený (@jakub_c5y) seems to also be using some sort of VQ/Clustering for the data. Seriously, check it out, it’s probably be nicest gaussian splatting thing right now w.r.t. usability – very intuitive camera controls, nicely presented file sizes, and works on WebGL2. Craftsmanship!

New quality levels

In my toy “gaussian splatting for Unity” implementation, currently I only do SH clustering at “Low” and “Very Low” quality levels.

Previously, “Low” preset had data sizes of 119MB, 49MB, 113MB; PSNR respectively 34.72, 31.81, 33.05):

Now, the “Low” preset clusters SH into 16k items. Data sizes 98MB, 41MB, 93MB; PSNR respectively 35.17, 35.32, 35.00:

The “Very Low” preset previously was pretty much unusable (data sizes of 74MB, 32MB, 74MB; PSNR 24.02, 22.28, 23.10):

However now the Very Low preset is in “somewhat usable” territory! File sizes are similar; the savings from clustered SH I’ve spent on other components that were suffering before. SH clustered into 4k items. Data sizes 79MB, 33MB, 75MB; PSNR 32.27, 30.19, 31.10:

Quality Pos Rot Scl Col SH Compr PSNR
Very High Norm16x3 Norm10_2 Norm16x3 F16x4 F16x3 2.1x
High Norm16x3 Norm10_2 Norm16x3 F16x4 Norm11 2.9x 57.77
Medium Norm11 Norm10_2 Norm11 Norm8x4 Norm565 5.1x 47.46
Low Norm11 Norm10_2 Norm565 Norm8x4 Cluster16k 14.9x 35.17
Very Low Norm11 Norm10_2 Norm565 BC7 Cluster4k 18.4x 32.27

Conclusions and future work

At this point, we can have “bike” and “garden” scenes in under 100MB of data (instead of original 1.4GB PLY file) at fairly acceptable quality. Not bad!

Of course gaussian splatting at this point is useful for “rotate around a scanned object” use case; it is not useful for “in games” or many other cases. We don’t know how to re-light them, or how to animate them well, etc. etc. Yet.

I haven’t done any of the “small things I could try” from the end of the previous post yet. So maybe that’s next? Or maybe look into how to further reduce the splat data on-disk, as opposed to just reducing the memory representation.

All the code for above is in this UnityGaussianSplatting#9 PR on github.