Every Possible Scalability Limit Will Be Reached

I wrote this the other day, and @McCloudStrife suggested I should call it “Aras’s law”. Ok! here it is:

Every possible scalability limit will be reached eventually.

Here’s a concrete example that I happened to work on a bit during the years: shader “combinatorial variant explosion” and dealing with it.

In retrospect, I should have skipped a few of these steps and recognized that each “oh we can do 10x more now” improvement won’t be enough when people will start doing 100x more. Oh well, live and learn. So here’s the boring story.

Background: shader variants

GPU programming models up to this day still have not solved “how to compose pieces together” problem. In CPU land, you have function calls, and function pointers, and goto, and virtual functions, and more elaborate ways of “do this or that, based on this or that”. In shaders, most of that either does not exist at all, or is cumbersome to use, or is not terribly performant.

So many times, people resort to writing many slightly different “variants” of some shader, and pick one or another to use depending on what is being processed. This is called “ubershaders” or “megashaders”, and often is done by stiching pieces of source code together, or by using a C-like preprocessor.

Things are slowly improving to move away from this madness (e.g. specialization constants in Vulkan, function constants in Metal), but it will take some time to get there.

So while we have the “shaders variants” as a thing, it can end up being a problem, especially if number of variants is large. Turns out, it can get large really easily!

Unity 1.x: almost no shader variants

Many years ago, shaders in Unity did not have many variants. They were only dealing with simple forward shading; you would write // autolight 7 in your shader, and that would compile into 5 internal variants. And that was it.

Why a compile directive behind a C++ style comment? Why 7? Why 5 variants? I don’t know, it was like that. 7 was probably the bitmask of which light types (three bits: directional, spot, point) to support, but I’m not sure if any other values besides “7” worked. Five variants, because some lights needed more to support light cookies vs. no light cookies.

Back then Unity supported just one graphics API (OpenGL), and five variants of a shader was not a problem. You can count them on your hand! They were compiled at shader import time, and all five were always included into the game data build.

Unity 2.x: add some more variants

Unity 2.0 changed the syntax into a #pragma multi_compile, so that it looks less of a commnt and more like a proper compile directive. And at some point it got the ability for users to add their own variants, which we called “shader keywords”. I forget which version exactly that happened in, but I think it was 2.x series.

Now people could make shaders do one or another thing of their choice (e.g. use a normalmap vs do not use a normal map), and control the behavior based on which “shader keywords” were set.

This was not a big problem, since:

  • There was no way to have custom inspector UIs for materials, so doing complex data-dependent shader variants was not practical,
  • We did not use the feature much in the Unity’s built-in shaders, so many thought of it as “something advanced, maybe not for me”,
  • All shader variants for all graphics APIs (at this time: OpenGL & Direct3D 9) were always compiled at shader import time, and always included into game build data.
  • I think there was a limit of maximum 32 shader keywords being used.

In any case, “crazy amount of shader variants” was not happening just yet.

Unity 3.x: add some more variants

Unity 3 added built-in lightmapping support (which meant more shader variants: with & without lightmaps), and added deferred lighting too (again more shader variants). The game build pipeline got ability to not include some of the “well this surely won’t be needed” shader variants into the game data. But compilation of all variants present in the shader was still happening at shader import time, making it impractical to go above couple dozen variants. Maybe up to a hundred, if they are simple enough each.

Unity 4.x: things are getting out of hand! New import pipeline

Little by little, people started adding more and more shader variants. I think it was Marmoset Skyshop that gave us the “wow something needs to be done” realization, either in 2012 or 2013.

The thing with many shader-variant based systems is: number of possible shader variants is always much, much higher than the number of actually used shader variants. Imagine a simple shader that has these things in a multi-variant fashion:

  • Normal map: off / on
  • Specular map: off / on
  • Emission map: off / on
  • Detail map: off / on
  • Alpha cutout: off / on

Now, each of the features above essentially is a bit with two states; there are 5 features so in total there are 32 possible shader variants (2^5). How many will actually be used? Likely a lot less; a particular production usually settles for some standard way of authoring their materials. For example, most materials will end up using a normal map and specular map; with occasional one also putting in either an emission map or alpha cutout feature. That’s a handful of shader variants that are needed, instead of full set of 32.

But up to this point, we were compiling each and every possible shader variant at shader import time! It’s not terribad if there’s 32 of them, but some people wanted to have ten or more of these “toggleable features”. 2^N gets to a really large number, really fast.

It also did not help that by then, we did not only have OpenGL & Direct3D 9 graphics APIs; there also was Direct3D 11, OpenGL ES, PS3, Xbox360, Flash Stage3D etc. We were compiling all variants of a shader into all these backends at import time! Even if you never ever needed the result of that :(

So @robertcupisz and myself started rewriting the shader compilation pipeline. I have written about it before: Shader compilation in Unity 4.5. It basically changed several things:

  • Shader variants would be compiled on-demand (whenever needed by editor itself; or for game data build) and cached,
  • Shader compiler was split into a separate process per CPU core, to work around global mutexes in some shader compiler backends; these were preventing effictive multithreading.

This was a big improvement compared to previous state. But are we done yet? Far from it.

Unity 5.0: need to make some variants optional

The new compilation pipeline meant that editing a shader with a thousand potential variants was no longer a coffee break. However, all 1000 variants were still always included into the build. For Unity 5 launch we were developing a new built-in Standard shader with 20000 or so possible variants, and always including all of them was a problem.

So we added a way to indicate that “you know, only include these variants if some materials use them” when authoring a shader. Variants that were never used by anything were:

  1. never even compiled and
  2. not included into the game build either.

That was done in 2013 December, with plenty of time to ship in Unity 5.0.

During that time other people started “going crazy” with shader variant counts too – e.g. Alloy shaders were about 2 million variants. So we needed some more optimizations that I wrote about before, that managed to get just in time for Unity 5 launch.

So we went from “five variants” to “two million possible variants” by now… Is that the limit? Are we there yet? Not quite!

Unity 5.4: people need more shader keywords

Sometime along the way amount of shader keywords (the “toggleable shader features that control which variant is picked”) that we support went from 32 up to 64, then up to 128. That was still not enough, as you can see from this long forum thread.

So I looked at increasing the keyword count to 256. Turns out, it was doable, especially after fiddling around with some hash functions. Side effect of investigating various hash functions: replaced almost all hash functions used across the whole codebase \o/.

Ok this by itself does neither improve scalability of shader variant parts, nor makes it worse… Except that with more shader keywords, people started adding even more potential shader variants! Give someone a thing, and they will start using it in both expected and unexpected ways.

Are we there yet? Nope, still a few possible scalability cliffs in near future.

Unity 5.5: we are up to 70 million variants now, or “don’t expand that data”

Our own team working on the new “HD Render Pipeline” had shaders with about 70 million of possible variants by now.

Turns out, there was a step where in the editor, we were still expanding some data for all possible variants into some in-memory structures. At 70 million potential variants, that was taking gobs of memory, and a lot of time was spent searching through that.

Time to fix that! Stop expanding that data, instead search directly from a fairly compact “unexpanded” data. That unblocked the team; import time from doing a minor shader edit went from “a minute” to “a couple seconds”.

Yay! For a while.

Unity 5.6: half a billion variants anyone? or “don’t search that data”

Of course they went up to half a billion possible variants, and another problem surfaced: in some paths in the editor, when it was looking for “okay so which shader variant I should use right now?”, the code was enumerating all possible variants and checking which one is “closest to what we want”. In Unity, shader keywords do not have to exactly match some variant present in the shader, for better or worse… Previous step made it so that the table of “all possible variants” is not expanded in memory. But purely enumerating half a billion variants and doing fairly simple checks for each was still taking a long time! “Half a billion” turns out to be a big number.

Now of course, doing a search like that is fairly stupid. If we know we are searching for a shader variant with a keyword “NORMALMAP_ON” in it, there’s very little use in enumerating all the ones that do not have it. Each keyword cuts the search space in half! So that optimization was done, and nicely got some timings from “dozens of seconds” to “feels instant”. For that case when you have half a billion shader variants, that is :)

We are done now, right?

Now: can I have hundred billion variants? or “dont’ search that other data”

Somehow the team ended up with a shader that has almost a hundred billion of possible variants. How? Don’t ask; my guess is by adding everything and the kitchen sink to it. From a quick look, it is “layered lit + tessellation” shader, and it seems to have:

  • Usual optional textures: normal map, specular map, emissive map, detail map, detail mask map.
  • One, two, three or four “layers” of the maps, mixed together.
  • Mixing based on vertex colors, or height, or something else.
  • Tessellation: displacement, Phong + displacement, parallax occlusion mapping.
  • Several double sided lighting modes.
  • Transparency and alpha cutout options.
  • Lightmapping options.
  • A few other oddball things.

The thing is, you “only” need about 36 of individually toggleable features to get to a hundred billion variant range (2^36=69B). 36 features is a lot of features, but imaginable.

The problem they ran into, was that at game data build time, the code was, similar to the previous case, looping over all possible shader variants and deciding whether each one should be included into the data file or not. Looping over a hundred billion simple things is a long process! So they were like “oh we wanted to do a build to check performance on a console, but gave up waiting”. Not good!

And of course it’s a stupid thing to do. The loop should be inverted, since we already have the info about which materials are included into the game build, and from there know which shader variants are needed. We just need to augment that set with variants that “always have to be in the build”, and that’s it. That got the build time from “forever” down to ten seconds.

Are we done yet? I don’t know. We’ll see what the future will bring :)

Moral of the story is: code that used to do something with five things years ago might turn out to be problematic when it has to deal with a hundred. And then a thousand. And a million. And a hundred billion. Kinda obvious, isn’t it?


UI is hard, and other Typical Work Stories

Recently I’ve seen a mention that game engine programming is considered a mysterious, elite, and highly demanding type of work. So let me write up what often actually happens in day to day work. Also about how in many types of tasks, doing the UI is often the hardest one :)

Request: separate texture UV wrapping modes

I saw “could we have separate wrap modes for texture U and V axes?” being requested on the internets. This is not new; we had actually discussed this same thing internally just a few weeks before.

Up until now, in Unity you could only specify one texture coordinate wrapping mode that would apply to both (or in volume texture case, all three) axes.

All the graphics APIs and and GPUs I’ve seen do support separate UV(W) wrap modes, and while it’s not a common use case, there are valid cases where that is useful to have. For example, when using Lat-Long environment maps for reflection probes, it is useful to have Clamp on vertical texture coordinate, but Repeat on the horizontal coordinate (why use lat-long environment maps? because some platforms don’t support cubemap arrays, and yeah I’m looking at you mobile platforms).

So I thought I’d do it as part of Mondays are for Mandatory Fun™ thing we have:

How hard could this possibly be?

The change itself is trivial. Instead of having one wrap mode in a sampler descriptor” we need to have three, and set them up to the graphics API acoordingly. Actual platform specific change looks something like this (Metal here, but very similar for any other API). Somewhere where the sampler state is created or set up:

“Oh, but Unity supports, I don’t know, three million graphics APIs? That’s gonna be hard to do?” – turns out, not really. At the time of writing, I had to add code to 11 “platform API abstraction” implementations. Eleven is more than one or two, but doing all that was trivial enough. Even without having compilers/SDKs for at least half of them :)

Real amount of work starts to appear once you try to write down what are “all the things” that need to be done. The “graphics API” bit is just one entry there!

Before doing a task like this, I look for whether that particular area is in a need of some small cleanup or refactor. In this case it was; we were passing all sampler state as separate arguments into platform abstraction functions, instead of something like a “sampler descriptor struct”. It was already cumbersome, and adding separate wrapping modes would not make it better. So first item on the list becomes, “refactor that”.

And then when doing actual changes, I’d keep on noticing “well this should be cleaned up” type of code too, and write that down to the end of the list. None of that is critical for the task at hand, but codebase cleanup does not happen by itself otherwise.

Most of the items on the list are easy enough though. Except… yeah, UI.

User Interface

Ok, so how do you show UI in texture settings for separate wrapping modes?

I looked at what others do, and for example UE4 just shows two dropdowns. This is trivial to implement, but did not feel “quite right”. Afterall, the expected 95% use case is that you’d want to use the same wrapping mode on all axes. Doing this would get you the feature/flexibility (yay!), but a fairly rarely used one that costs an extra row of setting controls, no matter whether you need it or not.

It should be possible to do better.

Try 1: extend wrap mode popup with more options

Today we only support Repeat and Clamp wrapping modes, and absolute majority of textures are non-volume textures. Which means extending to separate UV wrapping modes only needs to add two more entries into a single popup:

That is not too bad. For volume textures, there are three axes to worry about, so the popup becomes a choice of 8 possible options. This is more confusing, but maybe we can sweep it under a “hey this is a super rare case” rug.

A slightly bigger worry is that people are also asking for other coordinate wrapping modes that we have not exposed before (“border” and “mirror”). If/when we add them, the single popup would not be a good solution. The number of entries in in would become too large to be useful.

Try 2: per-axis popups, linked by default

You know that “these fields are linked” widget from image editors?

I thought maybe let’s do that; show one popup per-axis, but by default have them linked together. Here’s how it looks like (using “lock” icon to mean “linked”, because no one painted a proper icon yet):

And then it can be unlinked to select different wrapping modes. For volume textures, it would display three popups, but otherwise function the same.

This almost works fine. The downsides are:

  • Still additional visual noise in the settings even if you don’t use the feature, and
  • In the image editors, “linked” is mostly used for numeric input fields; linking dropdown controls together is not a familiar UI pattern.

Try 3: one popup with “Per-Axis” choice

Here’s another idea: keep one popup by default, but instead of it having just [Repeat, Clamp] options, make them [Repeat, Clamp, Per-axis]. When per-axis is selected, that rolls out two more popups underneath (or three more, for volume textures):

This one actually feels nice. Yay! And only took three attempts to get right.

Oh, and then some more fumbling around to nicely handle various cases of multiple textures being selected, them all possibly having different settings set.

Doing all that UI related work took me about twice as long as doing everything else combined (and that includes “changes to eleven graphics APIs”). Now of course, I’m not a UI programmer, but still. UI is hard.

That’s it!

So yeah. A super small feature, that ended up probably two full days of work. Majority of that: trying to decide how exactly to present two popup controls. Who would have thunk, eh.

Otherwise, pretty much trivial steps to get there. However this does end up with about a hundred files being changed.

…and that is how “mysterious engine programming” looks like :) Now of course there is plenty of really challenging, “cutting edge knowledge required” type of work, where juggling chainsaws would probably look easy in comparison. But, plenty of “nothing special, just work” type of items too.

Separate texture UV wrapping modes might be coming to a nearby Unity version soon-ish. Thanks to Alexey, Lukas, Shawn, Vlad for UI discussions & suggestions.


Chrome Tracing as Profiler Frontend

Did you know that Google Chrome has a built-in profiler? Some people assume it’s only for profiling “web stuff” like JavaScript execution. But it can also be used as a really nice frontend for your own profiling data.

Colt McAnlis actually wrote about using it for game profiling, in 2012: Using Chrome://tracing to view your inline profiling data. Everything written there still stands, and I’m mostly going to repeat the same things. Just to make this Chrome Tracing thing more widely known :)

Backstory

For a part of our code build system, we have a super simple profiler (called TinyProfiler) that captures scoped sections of the code and measures time spent in them. It was producing a simple HTML+SVG “report file” with a flame graph style visualization:

This gets the job done, but the resulting HTML is not very interactive. You can hover over things and have their time show up in the text box. But no real zooming, panning, search filtering or other niceties that one could expect from a decent profiler frontend UI.

All these things could be implemented… but also, why do that when someone else (Chrome) already wrote a really nice profiler UI?

Using Chrome Tracing

All that is needed to do to use Chrome Tracing view is:

  1. Produce a JSON file with the format expected by Chrome,
  2. Go to chrome://tracing in Chrome,
  3. Click “Load” and open your file, or alternatively drag the file into Chrome,
  4. Profit!

The final result looks pretty similar to our custom HTML+SVG thing, as Chrome is also visualizing the profiling data in the flame graph style:

Advantages of doing this:

  1. Much better profiling UI: zooming, panning, filtering, statistics, and so on.
  2. We no longer need to write profiler UI frontend (no matter how simple) ourselves.
  3. 500 lines of code less than what we had before (writing out the JSON file is much simpler than the SVG file).

And some potential disadvantages:

  1. No easy way to “double click report file to open it” as far as I can see. Chrome does not have any command line or automated interface to open itself, go to tracing view, and load a file all in one go. So you have to manually do that, which is more clicks.
    • This could be improved by producing HTML file with everything needed, by using trace2html tool from the Chrome Tracing repository. However, that whole repository is ~1GB in size (it has way more stuff in it than just tracing UI), and I’m not too keen on adding a gigabyte of external dependency just for this. Maybe it would be possible to produce a “slimmed down, just the trace2html bits” version of that repository.
  2. Added dependency on 3rd party profiling frontend. However:

My take is that the advantages outweigh the disadvantages.

JSON file format

Trace Event Format is really nicely documented, so see there for details on more advanced usage. Basic structure of the JSON file is as follows:

{
"traceEvents": [
{ "pid":1, "tid":1, "ts":87705, "dur":956189, "ph":"X", "name":"Jambase", "args":{ "ms":956.2 } },
{ "pid":1, "tid":1, "ts":128154, "dur":75867, "ph":"X", "name":"SyncTargets", "args":{ "ms":75.9 } },
{ "pid":1, "tid":1, "ts":546867, "dur":121564, "ph":"X", "name":"DoThings", "args":{ "ms":121.6 } }
],
"meta_user": "aras",
"meta_cpu_count": "8"
}

traceEvents are the events that show up in the UI. Anything that is not recognized by the event format is treated as “metadata” that is shown by the UI metadata dialog (in this case, meta_user and meta_cpu_count are metadata). The JSON above looks like this in the Chrome Tracing UI:

Events described in it are the simplest ones, called “Complete Events” (indicated by ph:X). They need a timestamp and duration given in microseconds (ts and dur). The other fields are process ID and thread ID (pid and tid), and a name to display. There’s no need to indicate parent-child relationships between the events; the UI automatically figures that out based on event timings.

Events can also have custom data attached to them (args), which is displayed in the lower pane when an event is selected. One gotcha is that there has to be some custom data in order for the event to be selectable at all. So at least put some dummy data in there.

And basically that’s it for a super simple usage. Check out Trace Event Format for more advanced event types and usage. Happy profiling!


A look at 2016, and onto 2017

I don’t have any technical insights to share right now, so you’ll have to bear with me blabbering about random stuff!

Unity Work

By now it’s been 11 years at Unity (see 10 Years at Unity a year ago), and the last year has been interesting. Started out as written in that blog post – that I will be mostly doing plumbing & fixing and stuff.

But then at some point we had a mini-hackweek of “what should the future rendering pipeline of Unity should be?”, and it went quite well. So a team with tasks of “build building blocks for future rendering pipelines” and “modernize low level graphics code” was built, see more on Scriptable Render Loops google doc and corresponding github project. I’ve been doing mostly, well, plumbing there, so nothing new on that front :) – but overall this is very exciting, and if done well could be a big improvement in how Unity’s graphics engine works & what people can do with it.

I’ve also been a “lead” of this low level graphics team (“graphics foundation team” as we call it), and (again) realized that I’m the worst lead the world has ever seen. To my fellow colleagues: sorry about that (シ. .)シ So I stopped being a lead, and that is now in the capable hands of @stramit.

Each year I write a “summary of stuff I’ve done”, mostly for myself to see whether my expectations/plans matched reality. This year a big part in the mismatch has been because at start of year I did not know we’d go big into “let’s do scriptable render loops!”, otherwise it went as expected.

Day to day it feels a bit like “yeah, another day of plumbing”, but the summary does seem to indicate that I managed to get a decent amount of feature/improvement work done too. Nothing that would set the world on fire, but not bad either! Though thinking about it, after 2016, maybe the world could do with less things that set it on fire…

My current plan for next year is to continue working on scriptable render loops and whatever low level things they end up needing. But also perhaps try some new area for a bit, something where I’m a newbie and would have to learn some new things. We’ll see!

Oh, and extrapolating from the trend, I should effectively remove close to half a million lines of code this year. My fellow colleagues: brace yourselves :)

Open Source

My github contributions are not terribly impressive, but I managed to do better than in 2015.

  • Primarily SMOL-V, a SPIR-V shader compressor for Vulkan. It’s used in Unity now, and as far as I know, also used in two non-Unity game productions. If you use it (or tried it, but it did not work or whatever), I’d be happy to know!
  • Small amount of things in GLSL Optimizer and HLSL2GLSL, but with Unity itself mostly moving to a different shader compiler toolchain (HLSLcc at the moment), work there has been winding down. I did not have time or will to go through the PRs & issues there, sorry about that.
  • A bunch of random tiny fixes or tweaks submitted to other github projects, nothing of note really.

Writing

I keep on hearing that I should write more, but most of the time don’t feel like I have anything interesting to say. So I keep on posting random lame jokes and stuff on twitter instead.

Nevertheless, the amount of blog posts has been very slowly rising. The drop starting around year 2009 very much coincides with me getting onto twitter. Will try to write more in 2017, but we’ll see how that goes. If you have ideas on what I should write about, let me know!

The amount of readers on the website is basically constant through all the time that I had analytics on it (starting 2013 or so). Which could mean it’s only web crawling bots that read it, for all I know :) There’s an occasional spike, pretty much only because someone posted link on either reddit or hackernews, and the hive mind there decided to pay attention to it for five minutes or whatever.

Giving Back

Financially, I live a very comfortable life now, and that is by no means just an “I made it all by myself” thing. Starting conditions, circumstances, family support, a ton of luck all helped massively.

I try to pay some of that back by sharing knowledge (writing the blog, speaking, answering questions on twitter and ask.fm). This is something, but I could do more!

So this year started doing more of a “throw money towards good causes” thing too, compared to what I did before. Helped some local schools, and several local charities / help funds / patreons / “this is a good project” type of efforts. I’m super lucky that I can afford to do that, looking forward to doing more of it in 2017.

Started work towards installing solar panels on the roof, which should generate about as much electricity as we use up. That’s not exactly under “giving back” section, but feels good to have finally decided to do it. Will see how that goes in our (not that sunny) land.

Other Things

I’m not a twenty-something anymore, and turns out I have to do some sort of extra physical activity besides working with a computer all day long. But OMG it’s so boring, so it takes a massive effort to force myself to do it. I actually stopped going to the gym for half a year, but then my back pains said “ohai! long time no see!”. Ugggh. So towards end of 2016 started doing the gym thing again. I can only force myself to go there twice a week, but that’s enough to keep me from collapsing physically :) So far so good, but still boring.

Rocksmith keeps on being the only “hobby” that I have, and still having loads of fun with it. I probably have played about 200 hours this year.

It feels like I am still improving my guitar playing skills, but I’m not doing enough effort to actually improve. Most of the time I just play through songs in a messy way, without taking time to go through difficult riffs or solos and really nail the things down. So my playing is like: eh, ¯\_(ツ)_/¯. Maybe in 2017 I should try playing with some other people (something I have never done, besides a few very lame attempts in high school), or try taking “real” guitar lessons.

INSIDE and Firewatch are my games of the year. They are both made with Unity too, which does not feel bad at all.

Did family vacations, lazy one in Maldives and a road trip in Spain. Both were nice in their own way, but we all (again) realized that lazy-type vacations are not our style, even if they are nice to have once in a while.

In 2017 I want to spend less time on social media, which lately is mostly a source of really depressing news, and just do more useful or helpful things instead. Wish me luck.


Amazing Optimizers, or Compile Time Tests

I wrote some tests to verify sorting/batching behavior in rendering code, and they were producing different results on Windows (MSVC) vs Mac (clang). The tests were creating a “random fake scene” with a random number generator, and at first it sounded like our “get random normalized float” function was returning slightly different results between platforms (which would be super weird, as in how come no one noticed this before?!).

So I started digging into random number generator, and the unit tests it has. This is what amazed me.

Here’s one of the unit tests (we use a custom native test framework that started years ago on an old version of UnitTest++):

TEST (Random01_WithSeed_RestoredStateGenerateSameNumbers)
{
	Rand r(1234);
	Random01(r);
	RandState oldState = r.GetState();
	float prev = Random01(r);
	r.SetState(oldState);
	float curr = Random01(r);
	CHECK_EQUAL (curr, prev);
}

Makes sense, right?

Here’s what MSVC 2010 compiles this down into:

push        rbx  
sub         rsp,50h  
mov         qword ptr [rsp+20h],0FFFFFFFFFFFFFFFEh  
	Rand r(1234);
	Random01(r);
	RandState oldState = r.GetState();
	float prev = Random01(r);
movss       xmm0,dword ptr [__real@3f47ac02 (01436078A8h)]  
movss       dword ptr [prev],xmm0  
	r.SetState(oldState);
	float curr = Random01(r);
mov         eax,0BC5448DBh  
shl         eax,0Bh  
xor         eax,0BC5448DBh  
mov         ecx,0CE6F4D86h  
shr         ecx,0Bh  
xor         ecx,eax  
shr         ecx,8  
xor         eax,ecx  
xor         eax,0CE6F4D86h  
and         eax,7FFFFFh  
pxor        xmm0,xmm0  
cvtsi2ss    xmm0,rax  
mulss       xmm0,dword ptr [__real@34000001 (01434CA89Ch)]  
movss       dword ptr [curr],xmm0  
	CHECK_EQUAL (curr, prev);
call        UnitTest::CurrentTest::Details (01420722A0h)
; ...

There’s some bit operations going on (the RNG is Xorshift 128), looks fine on the first glance.

But wait a minute; this seems like it only has code to generate a random number once, whereas the test is supposed to call Random01 three times?!

Turns out the compiler is smart enough to see through some of the calls, folds down all these computations and goes, “yep, so the 2nd call to Random01 will produce 0.779968381 (0x3f47ac02)“. And then it kinda partially does actual computation of the 3rd Random01 call, and eventually checks that the result is the same.

Oh-kay!

Now, what does clang (whatever version Xcode 8.1 on Mac has) do on this same test?

pushq  %rbp
movq   %rsp, %rbp
pushq  %rbx
subq   $0x58, %rsp
movl   $0x3f47ac02, -0xc(%rbp)   ; imm = 0x3F47AC02 
movl   $0x3f47ac02, -0x10(%rbp)  ; imm = 0x3F47AC02 
callq  0x1002b2950               ; UnitTest::CurrentTest::Results at CurrentTest.cpp:7
; ...

Whoa. There’s no code left at all! Everything just became an “is 0x3F47AC02 == 0x3F47AC02” test. It became a compile-time test.

w(☆o◎)w

By the way, the original problem I was looking into? Turns out RNG is fine (phew!). What got me was code in my own test that I should have known better about; it was roughly like this:

transform.SetPosition(Vector3f(Random01(), Random01(), Random01()));

See what’s wrong?

.

.

.

The function argument evaluation order in C/C++ is unspecified.

(╯°□°)╯︵ ┻━┻

Newer languages like C# or Java have guarantees that arguments are evaluated from left to right. Yay sanity.