Blogs · Aras' website

A case of slow Visual Studio project open times

Posted on Mar 22, 2017

I was working on some new code to generate Visual Studio solution/project files, and that means regenerating the files and checking them in VS a lot of times. And each time, it felt like VS takes ages to reopen them! This is with Visual Studio 2017, which presumably is super-turbo optimized in project open performance, compared to previous versions.

The VS solution I’m working on has three projects, with about 5000 source files in each, grouped into 500 folders (“filters” as VS calls them) to reflect hierarchy on disk. Typical stuff.

On my Core i7-5820K 3.3GHz machine VS2017 takes 45 seconds to open that solution first time after rebuilding it, and about 20 seconds each time I open it later.

That’s not fast at all! What is going on?!

Time for some profiling

Very much like in the “slow texture importing” blog post, I fired up windows performance recorder, and recorded everything interesting that’s going on while VS was busy opening up the solution file.

Predictably enough, VS (devenv.exe) is busy during most of the time of solution load:

Let’s dig into the heaviest call stacks during this busy period. Where will this puppy with 16k samples lead to?

First it leads us through the layers of calls, which is fairly common in software. It’s like an onion; with a lot of layers. And you cry as you peel them off :)

So that is Visual Studio, seemingly processing windows messages, into some thread dispatcher, getting into some UI background task scheduler, and into some async notifications helper. We are 20 stack frames in, and did not get anywhere so far, besides losing some profiling samples along the way. Where to next?

A-ha! Further down, it is something from JetBrains. I do have ReSharper (2016.3.2) installed in my VS… Could it be that R# being present causes the slow project load times (at least in VS2017 with R# 2016.3)? Let’s keep on digging for a bit!

One branch of heavy things under that stack frame leads into something called CVCArchy::GetCfgNames, which, I guess, is getting the build configurations available in the project or solution. Internally it’s another onion, getting into marshaling and into some immutable dictionaries and concurrent stacks.

And another call branch goes into CVCArchy::GetPlatformNames, which seemingly goes into exactly the same implementation again. ¯\_(ツ)_/¯

So it would seem that two things are going on: 1) possibly R# is querying project configurations/platforms a lot of times (once for each file?), and 2) querying that from VS is actually a fairly costly operation.

VS seemingly tries to fulfill these “what configurations do you have, matey?” queries in an asynchronous fashion, since that also causes quite some activity on other VS threads. Hey at least it’s trying to help :)

Some of which causes not much actual work being done, e.g. this thread spends 1.5k samples doing spin waits. Likely an artifact of some generic thread work system not being quite used as it was intended to, or something.

There’s another background thread activity that kicks in towards end of the “I was busy opening the project” period. That one is probably some older code, since the call stack is not deep at all, and it fairly quickly gets to actual work that it tries to do :)

Let’s try with R# disabled

Disabling R# in VS 2017 makes it open the same solution in 8 seconds (first time) and 4 seconds (subsequent opens). So that is pretty much five times faster.

Does this sound like something that should be fixed in R#, somehow? That’s my guess too, so here’s a bug report I filed. Fingers crossed it will be fixed soon! They already responded on the bug report, so things are looking good.

(Edit: looks like this will be fixed in R# 2017.1, nice!)

Visual Studio 2015 does not seem to be affected; opening the same solution with R# enabled is about 8 seconds as well. So this could be Microsoft’s bug too, or an unintended consequence of some implementation change (e.g. “we made config queries async now”).

Complex software is complex, yo.

While at it: dotTrace profiler

Upon suggestion from JetBrains folks, I did a dotTrace capture of VS activity while it’s opening the solution. Turns out, it’s a pretty sweet C# profiler! It also pointed out to basically the same things, but has C# symbols in the callstacks, and a nice thread view and so on. Sweet!

So there. Profiling stuff is useful, and can answer questions like “why is this slow?”. Other news at eleven!

Developer Tooling, a week in

Posted on Mar 16, 2017

So I switched job role from graphics to developer tooling / build engineering about 10 days ago. You won’t believe what happened next! Click to find out!

Quitting Graphics

I wrote about the change right before GDC on purpose - wanted to see reactions from people I know. Most of them were along what I expected, going around “build systems? why?!” theme (my answer: between “why not” and ¯\_(ツ)_/¯). I went to the gathering of rendering people one evening, and the “what are you doing here, you’re no longer graphics” joke that everyone was doing was funny at first, but I gotta say it to you guys: hearing it 40 times over is not that exciting.

At work, I left all the graphics related Slack channels (a lot of them), and wow the sense of freedom feels good. I think the number of Slack messages I do per week should go down from a thousand to a hundred or so; big improvement (for me, anyway).

A pleasant surprise: me doing that and stopping to answer the questions, doing graphics related code reviews and writing graphics code did not set the world on fire! Which means that my “importance” in that area was totally imaginary, both in my & in some other people’s heads. Awesome! Maybe some years ago that would have bothered me, but I think I’m past the need of wanting to “feel important”.

Not being important is liberating. Highly recommended!

Though I have stopped doing graphics related work at work, I am still kinda following various graphics research & advances happening in the world overall.

Developer Tooling

The “developer tools” team that I joined is six people today, and the “mission” is various internal tools that the rest of R&D uses. Mostly the code build system, but also parts of version control, systems for handing 3rd party software packages, various helper tools (e.g. Slack bots), and random other things (e.g. “upgrade from VS2010 to VS2015/2017” that is happening as we speak).

So far I’ve been in the build system land. Some of the things I noticed:

Wow it’s super easy to save hundreds of milliseconds from build time. This is not a big deal for a clean build (if it takes 20 minutes, for example), but for an incremental build add 100s of milliseconds enough times and we’re talking some real “developer flow” improvements. Nice!
Turns out, a lot of things are outdated or obsolete in the build scripts or the dependency graph. Here, we are generating some config headers for the web player deployment (but we have dropped web player a long time ago). There, we are always building this small little tool, that turns out is not used by anything whatsoever. Over here, tons of build graph setup done for platforms we no longer support. Or this often changing auto-generated header file, is included into way too many source files. And so on and so forth.
There’s plenty of little annoyances that everyone has about the build process or IDE integrations. None of them are blocking anyone, and very often do not get fixed. However I think they add up, and that leads to developers being much less happy than they could be.
Having an actual, statically typed, language for the build scripts is really nice. Which brings me to the next point…

C#

Our build scripts today are written in C#. At this very moment, it’s this strange beast we call “JamSharp” (primarily work of @lucasmeijer). It is JamPlus, but with an embedded .NET runtime (Mono), and so the build scripts and rules are written in C# instead of the not-very-pleasant Jam language.

Once the dependency graph is constructed, today it is still executed by Jam itself, but we are in the process of replacing it with our own, C# based build graph execution engine.

Anyway. C# is really nice!

I was supposed to kinda know this already, but I only occasionally dabbled in C# before, with most of my work being in C++.

In a week I’ve learned these things:

JetBrains Rider is a really nice IDE, especially on a Mac where the likes of VisualStudio + Resharper do not exist.
Most of C# 6 additions are not rocket surgery, but make things so much nicer. Auto-properties, expression bodies on properties, “using static”, string interpolation are all under “syntax sugar” category, but each of them makes things just a little bit nicer. Small “quality of life” improvements is what I like a lot.
Perhaps this is a sign of me getting old, but e.g. if I look at new features added to C++ versions, my reaction to most of them is “okay this probably makes sense, but also makes my head spin. such. complexity.”. Whereas with C# 6 (and 7 too), almost all of them are “oh, sweet!”.

So how are things?

One week in, pretty good! Got a very vague grasp of the area & the problem. Learned a few neat things about C#. Did already land two pull requests to mainline (small improvements and warning fixes), with another improvements batch waiting for code reviews. Spent two days in Copenhagen discussing/planning next few months of work and talking to people.

Is very nice!

Stopping graphics, going to build engineering

Posted on Feb 26, 2017

I’m doing a sideways career move. Which is: stopping whatever graphics related programming I was doing, and start working on internal build engineering. Been somewhat removing myself from many graphics related areas (ownership, code reviews, future tasks, decisions & discussions) for a while now, and right now GDC provides a conceptual break between graphics and non-graphics work areas.

Also, I can go into every graphics related GDC talk, sit there at the back and shout “booo, graphics sucks!” or something.

“But why?” - several reasons, with major one being “why not?”. In random order:

I wanted to “change something” for a while, and this does qualify as that. I was mostly doing graphics relate things for what, 11 years by now at the same company? That’s a long time!
I wanted to try myself in an area where I’m a complete newbie, and have to learn everything from scratch. In graphics, while I’m nowhere near being “leading edge” or having actual knowledge, at least I have a pretty good mental picture of current problems, solutions, approaches and what is generally going on out there. And I know the buzzwords! In build systems, I’m Jon Snow. I want to find out how that is and how to deal with it.
This one’s a bit counter-intuitive… but I wanted to work in an area where there are three hundred customers instead of five million (or whatever is the latest number). Having an extremely widely used product is often inspiring, but also can be tiring at times.
Improving ease of use, robustness, reliability and performance of our own internal build system(s) does sound like a useful job! It’s something all the developers here do many times per day, and there’s no shortage of improvements to do.
Graphics teams at Unity right now are in better state than ever before, with good structure, teamwork, plans and efficiency in place. So me leaving them is not a big deal at all.
The build systems / internal developer tooling team did happen to be looking for some helping hands at the time. Now, they probably don’t know what they signed up for by accepting me… but we’ll see :)

I’m at GDC right now, and was looking for relevant talks about build/content/data pipelines. There are a couple, but actually not as much as I hoped for… That’s a shame! For example 2015 Rémi Quenin’s talk on Far Cry 4 pipeline was amazing.

What will my daily work be about, I still have no good idea. I suspect it will be things like:

Working on our own build system (we were on JamPlus for a long time, and replacing pieces of it).
Improving reliability of build scripts / rules.
Optimizing build times for local developer machines, both for full builds as well as incremental builds.
Optimizing build times for the build farm.
Fixing annoyances in current builds (there’s plenty of random ones, e.g. if you build a 32 bit version of something, it’s not easy to build 64 bit version without wiping out some artifacts in between).
Improving build related IDE experiences (project generation, etc.).

Anyhoo, so that’s it. I expect future blog posts here might be build systems related.

Now, build all the things! Picture unrelated.

Font Rendering is Getting Interesting

Posted on Feb 15, 2017

Caveat: I know nothing about font rendering! But looking at the internets, it feels like things are getting interesting. I had exactly the same outsider impression watching some discussions unfold between Yann Collet, Fabian Giesen and Charles Bloom a few years ago – and out of that came rANS/tANS/FSE, and Oodle and Zstandard. Things were super exciting in compression world! My guess is that about “right now” things are getting exciting in font rendering world too.

Ye Olde CPU Font Rasterization

A true and tried method of rendering fonts is doing rasterization on the CPU, caching the result (of glyphs, glyph sequences, full words or at some other granularity) into bitmaps or textures, and then rendering them somewhere on the screen.

FreeType library for font parsing and rasterization has existed since “forever”, as well as operating system specific ways of rasterizing glyphs into bitmaps. Some parts of the hinting process have been patented, leading to “fonts on Linux look bad” impressions in the old days (my understanding is that all these expired around year 2010, so it’s all good now). And subpixel optimized rendering happened at some point too, which slightly complicates the whole thing. There’s a good overview of the whole thing in 2007 Texts Rasterization Exposures article by Maxim Shemanarev.

In addition to FreeType, these font libraries are worth looking into:

stb_truetype.h – single file C library by Sean Barrett. Super easy to integrate! Article on how the innards of the rasterizer work is here.
font-rs – fast font renderer by Raph Levien, written in Rust \o/, and an article describing some aspects of it. Not sure how “production ready” it is though.

But at the core the whole idea is still rasterizing glyphs into bitmaps at a specific point size and caching the result somehow.

Caching rasterized glyphs into bitmaps works well enough. If you don’t do a lot of different font sizes. Or very large font sizes. Or large amounts of glyphs (as happens in many non-Latin-like languages) coupled with different/large font sizes.

One bitmap for varying sizes? Signed distance fields!

A 2007 paper from Chris Green, Improved Alpha-Tested Magnification for Vector Textures and Special Effects, introduced game development world to the concept of “signed distance field textures for vector-like stuffs”.

The paper was mostly about solving “signs and markings are hard in games” problem, and the idea is pretty clever. Instead of storing rasterized shape in a texture, store a special texture where each pixel represents distance to the closest shape edge. When rendering with that texture, a pixel shader can do simple alpha discard, or more complex treatments on the distance value to get anti-aliasing, outlines, etc. The SDF texture can end up really small, and still be able to decently represent high resolution line art. Nice!

Then of course people realized that hey, the same approach could work for font rendering too! Suddenly, rendering smooth glyphs at super large font sizes does not mean “I just used up all my (V)RAM for the cached textures”; the cached SDFs of the glyphs can remain fairly small, while providing nice edges at large sizes.

Of course the SDF approach is not without some downsides:

Computing the SDF is not trivially cheap. While for most western languages you could pre-cache all possible glyphs off-line into a SDF texture atlas, for other languages that’s not practical due to sheer amount of glyphs possible.
Simple SDF has artifacts near more complex intersections or corners, since it only stores a single distance to closest edge. Look at the letter A here, with a 32x32 SDF texture - outer corners are not sharp, and inner corners have artifacts.
SDF does not quite work at very small font sizes, for a similar reason. There it’s probably better to just rasterize the glyph into a regular bitmap.

Anyway, SDFs are a nice idea. For some examples or implementations, could look at libgdx or TextMeshPro.

The original paper hinted at the idea of storing multiple distances to solve the SDF sharp corners problem, and a recent implementation of that idea is “multi-channel distance field” by Viktor Chlumský which seems to be pretty nice: msdfgen. See associated thesis too. Here’s letter A as a MSDF, at even smaller size than before – the corners are sharp now!

That is pretty good. I guess the “tiny font sizes” and “cost of computing the (M)SDF” can still be problems though.

Fonts directly on the GPU?

One obvious question is, “why do this caching into bitmaps at all? can’t the GPU just render the glyphs directly?” The question is good. The answer is not necessarily simple though ;)

GPUs are not ideally suited for doing vector rendering. They are mostly rasterizers, mostly deal with triangles, etc etc. Even something simple like “draw thick lines” is pretty hard (great post on that – Drawing Lines is Hard). For more involved “vector / curve rendering”, take a look at a random sampling of resources:

Rendering Vector Art on the GPU by Charles Loop, Jim Blinn (of course!) (2007),
Precise vector textures for real-time 3D rendering by Zhipei Qin, Michael Mccool, Craig Kaplan (2008),
Random-access rendering of general vector graphics by Diego Nehab, Hugues Hoppe (2008),
NVIDIA Path Rendering (2012),
Efficient GPU Path Rendering Using Scanline Rasterization by Rui Li, Qiming Hou and Kun Zhou (2016).

That stuff is not easy! But of course that did not stop people from trying. Good!

Vector Textures

Here’s one approach, GPU text rendering with vector textures by Will Dobbie - divides glyph area into rectangles, stores which curves intersect it, and evaluates coverage from said curves in a pixel shader.

Pretty neat! However, seems that it does not solve “very small font sizes” problem (aliasing), has limit on glyph complexity (number of curve segments per cell) and has some robustness issues.

Glyphy

Another one is Glyphy, by Behdad Esfahbod (بهداد اسفهبد). There’s video and slides of the talk about it. Seems that it approximates Bézier curves with circular arcs, puts them into textures, stores indices of some closest arcs in a grid, and evaluates distance to them in a pixel shader. Kind of a blend between SDF approach and vector textures approach. Seems that it also suffers from robustness issues in some cases though.

Pathfinder

A new one is Pathfinder, a Rust (again!) library by Patrick Walton. Nice overview of it in this blog post.

This looks promising!

Downsides, from a quick look, is dependence on GPU features that some platforms (mobile…) might not have – tessellation / geometry shaders / compute shaders (not a problem on PC). Memory for the coverage buffer, and geometry complexity depending on the font curve complexity.

Hints at future on twitterverse

From game developers/middleware space, looks like Sean Barrett and Eric Lengyel are independently working on some sort of GPU-powered font/glyph rasterization approaches, as seen by their tweets (Sean’s and Eric’s).

Can’t wait to see what they are cooking!

Did I say this is all very exciting? It totally is. Here’s to clever new approaches to font rendering happening in 2017!

Some figures in this post are taken from papers or pages I linked to above:

SDF figure from Chris Green’s paper.
A letter SDF and MSDF images from msdfgen github page.
Vector textures illustration from vector textures demo.
Pathfinder performance charts from pathfinder post.

Every Possible Scalability Limit Will Be Reached

Posted on Feb 5, 2017

I wrote this the other day, and @McCloudStrife suggested I should call it “Aras’s law”. Ok! here it is:

Every possible scalability limit will be reached eventually.

Here’s a concrete example that I happened to work on a bit during the years: shader “combinatorial variant explosion” and dealing with it.

In retrospect, I should have skipped a few of these steps and recognized that each “oh we can do 10x more now” improvement won’t be enough when people will start doing 100x more. Oh well, live and learn. So here’s the boring story.

Background: shader variants

GPU programming models up to this day still have not solved “how to compose pieces together” problem. In CPU land, you have function calls, and function pointers, and goto, and virtual functions, and more elaborate ways of “do this or that, based on this or that”. In shaders, most of that either does not exist at all, or is cumbersome to use, or is not terribly performant.

So many times, people resort to writing many slightly different “variants” of some shader, and pick one or another to use depending on what is being processed. This is called “ubershaders” or “megashaders”, and often is done by stiching pieces of source code together, or by using a C-like preprocessor.

Things are slowly improving to move away from this madness (e.g. specialization constants in Vulkan, function constants in Metal), but it will take some time to get there.

So while we have the “shaders variants” as a thing, it can end up being a problem, especially if number of variants is large. Turns out, it can get large really easily!

Unity 1.x: almost no shader variants

Many years ago, shaders in Unity did not have many variants. They were only dealing with simple forward shading; you would write // autolight 7 in your shader, and that would compile into 5 internal variants. And that was it.

Why a compile directive behind a C++ style comment? Why 7? Why 5 variants? I don’t know, it was like that. 7 was probably the bitmask of which light types (three bits: directional, spot, point) to support, but I’m not sure if any other values besides “7” worked. Five variants, because some lights needed more to support light cookies vs. no light cookies.

Back then Unity supported just one graphics API (OpenGL), and five variants of a shader was not a problem. You can count them on your hand! They were compiled at shader import time, and all five were always included into the game data build.

Unity 2.x: add some more variants

Unity 2.0 changed the syntax into a #pragma multi_compile, so that it looks less of a commnt and more like a proper compile directive. And at some point it got the ability for users to add their own variants, which we called “shader keywords”. I forget which version exactly that happened in, but I think it was 2.x series.

Now people could make shaders do one or another thing of their choice (e.g. use a normalmap vs do not use a normal map), and control the behavior based on which “shader keywords” were set.

This was not a big problem, since:

There was no way to have custom inspector UIs for materials, so doing complex data-dependent shader variants was not practical,
We did not use the feature much in the Unity’s built-in shaders, so many thought of it as “something advanced, maybe not for me”,
All shader variants for all graphics APIs (at this time: OpenGL & Direct3D 9) were always compiled at shader import time, and always included into game build data.
I think there was a limit of maximum 32 shader keywords being used.

In any case, “crazy amount of shader variants” was not happening just yet.

Unity 3.x: add some more variants

Unity 3 added built-in lightmapping support (which meant more shader variants: with & without lightmaps), and added deferred lighting too (again more shader variants). The game build pipeline got ability to not include some of the “well this surely won’t be needed” shader variants into the game data. But compilation of all variants present in the shader was still happening at shader import time, making it impractical to go above couple dozen variants. Maybe up to a hundred, if they are simple enough each.

Unity 4.x: things are getting out of hand! New import pipeline

Little by little, people started adding more and more shader variants. I think it was Marmoset Skyshop that gave us the “wow something needs to be done” realization, either in 2012 or 2013.

The thing with many shader-variant based systems is: number of possible shader variants is always much, much higher than the number of actually used shader variants. Imagine a simple shader that has these things in a multi-variant fashion:

Normal map: off / on
Specular map: off / on
Emission map: off / on
Detail map: off / on
Alpha cutout: off / on

Now, each of the features above essentially is a bit with two states; there are 5 features so in total there are 32 possible shader variants (2^5). How many will actually be used? Likely a lot less; a particular production usually settles for some standard way of authoring their materials. For example, most materials will end up using a normal map and specular map; with occasional one also putting in either an emission map or alpha cutout feature. That’s a handful of shader variants that are needed, instead of full set of 32.

But up to this point, we were compiling each and every possible shader variant at shader import time! It’s not terribad if there’s 32 of them, but some people wanted to have ten or more of these “toggleable features”. 2^N gets to a really large number, really fast.

It also did not help that by then, we did not only have OpenGL & Direct3D 9 graphics APIs; there also was Direct3D 11, OpenGL ES, PS3, Xbox360, Flash Stage3D etc. We were compiling all variants of a shader into all these backends at import time! Even if you never ever needed the result of that :(

So @robertcupisz and myself started rewriting the shader compilation pipeline. I have written about it before: Shader compilation in Unity 4.5. It basically changed several things:

Shader variants would be compiled on-demand (whenever needed by editor itself; or for game data build) and cached,
Shader compiler was split into a separate process per CPU core, to work around global mutexes in some shader compiler backends; these were preventing effictive multithreading.

This was a big improvement compared to previous state. But are we done yet? Far from it.

Unity 5.0: need to make some variants optional

The new compilation pipeline meant that editing a shader with a thousand potential variants was no longer a coffee break. However, all 1000 variants were still always included into the build. For Unity 5 launch we were developing a new built-in Standard shader with 20000 or so possible variants, and always including all of them was a problem.

So we added a way to indicate that “you know, only include these variants if some materials use them” when authoring a shader. Variants that were never used by anything were:

never even compiled and
not included into the game build either.

That was done in 2013 December, with plenty of time to ship in Unity 5.0.

During that time other people started “going crazy” with shader variant counts too – e.g. Alloy shaders were about 2 million variants. So we needed some more optimizations that I wrote about before, that managed to get just in time for Unity 5 launch.

So we went from “five variants” to “two million possible variants” by now… Is that the limit? Are we there yet? Not quite!

Unity 5.4: people need more shader keywords

Sometime along the way amount of shader keywords (the “toggleable shader features that control which variant is picked”) that we support went from 32 up to 64, then up to 128. That was still not enough, as you can see from this long forum thread.

So I looked at increasing the keyword count to 256. Turns out, it was doable, especially after fiddling around with some hash functions. Side effect of investigating various hash functions: replaced almost all hash functions used across the whole codebase \o/.

Ok this by itself does neither improve scalability of shader variant parts, nor makes it worse… Except that with more shader keywords, people started adding even more potential shader variants! Give someone a thing, and they will start using it in both expected and unexpected ways.

Are we there yet? Nope, still a few possible scalability cliffs in near future.

Unity 5.5: we are up to 70 million variants now, or “don’t expand that data”

Our own team working on the new “HD Render Pipeline” had shaders with about 70 million of possible variants by now.

Turns out, there was a step where in the editor, we were still expanding some data for all possible variants into some in-memory structures. At 70 million potential variants, that was taking gobs of memory, and a lot of time was spent searching through that.

Time to fix that! Stop expanding that data, instead search directly from a fairly compact “unexpanded” data. That unblocked the team; import time from doing a minor shader edit went from “a minute” to “a couple seconds”.

Yay! For a while.

Unity 5.6: half a billion variants anyone? or “don’t search that data”

Of course they went up to half a billion possible variants, and another problem surfaced: in some paths in the editor, when it was looking for “okay so which shader variant I should use right now?”, the code was enumerating all possible variants and checking which one is “closest to what we want”. In Unity, shader keywords do not have to exactly match some variant present in the shader, for better or worse… Previous step made it so that the table of “all possible variants” is not expanded in memory. But purely enumerating half a billion variants and doing fairly simple checks for each was still taking a long time! “Half a billion” turns out to be a big number.

Now of course, doing a search like that is fairly stupid. If we know we are searching for a shader variant with a keyword “NORMALMAP_ON” in it, there’s very little use in enumerating all the ones that do not have it. Each keyword cuts the search space in half! So that optimization was done, and nicely got some timings from “dozens of seconds” to “feels instant”. For that case when you have half a billion shader variants, that is :)

We are done now, right?

Now: can I have hundred billion variants? or “dont’ search that other data”

Somehow the team ended up with a shader that has almost a hundred billion of possible variants. How? Don’t ask; my guess is by adding everything and the kitchen sink to it. From a quick look, it is “layered lit + tessellation” shader, and it seems to have:

Usual optional textures: normal map, specular map, emissive map, detail map, detail mask map.
One, two, three or four “layers” of the maps, mixed together.
Mixing based on vertex colors, or height, or something else.
Tessellation: displacement, Phong + displacement, parallax occlusion mapping.
Several double sided lighting modes.
Transparency and alpha cutout options.
Lightmapping options.
A few other oddball things.

The thing is, you “only” need about 36 of individually toggleable features to get to a hundred billion variant range (2^36=69B). 36 features is a lot of features, but imaginable.

The problem they ran into, was that at game data build time, the code was, similar to the previous case, looping over all possible shader variants and deciding whether each one should be included into the data file or not. Looping over a hundred billion simple things is a long process! So they were like “oh we wanted to do a build to check performance on a console, but gave up waiting”. Not good!

And of course it’s a stupid thing to do. The loop should be inverted, since we already have the info about which materials are included into the game build, and from there know which shader variants are needed. We just need to augment that set with variants that “always have to be in the build”, and that’s it. That got the build time from “forever” down to ten seconds.

Are we done yet? I don’t know. We’ll see what the future will bring :)

Moral of the story is: code that used to do something with five things years ago might turn out to be problematic when it has to deal with a hundred. And then a thousand. And a million. And a hundred billion. Kinda obvious, isn’t it?