Fast Mobile Shaders or, I did a talk at SIGGRAPH!

Finally after many years of dreaming I made it to SIGGRAPH! And not only that, I also did a talk/course with ReJ for 1.5 hours. This was the first time Unity had real presence at SIGGRAPH and I hope we’ll be more active & visible next time around.

Here it is, 100+ slides with notes: Fast Mobile Shaders (17MB pdf). This isn’t strictly about shaders; there’s info about mobile GPU architectures, general performance, hidden surface removal and so on. Also, graphs with logarithmic scales; can’t go wrong with that!


Testing Graphics Code, 4 years later

Almost four years ago I wrote how we test rendering code at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 people to more than 100 people?

I’m happy to say it did! That’s it, move on to read the rest of the internets.

The earlier post was more focused on hardware compatibility area (differences between platforms, GPUs, driver versions, driver bugs and their workarounds etc.). In addition to that, we do regression tests on a bunch of actual Unity made games. All that is good and works, let’s talk about what tests the rendering team at Unity is using in the daily lives instead.

Graphics Feature & Regression Testing

In daily life of a graphics programmer, you care about two things related to testing:

1. Whether a new feature you are adding, more or less, works. 2. Whether something new you added or something you refactored broke or changed any existing features.

Now, “works” is a vague term. Definitions can range from equally vague

Works For Me!

to something like

It has been battle tested on thousands of use cases, hundreds of shipped games, dozens of platforms, thousands of platform configurations and within each and every one of them there’s not a single wrong pixel, not a single wasted memory byte and not a single wasted nanosecond! No kittehs were harmed either!

In ideal world we’d only consider the latter as “works”, however that’s quite hard to achieve.

So instead we settle for small “functional tests”, where each feature has a small scene setup that exercises said feature (very much like talked about in previous post). It’s graphics programmer’s responsibility to add tests like that for his stuff.

For example, Fog handling might be tested by a couple scenes like this:

Another example, tests for various corner cases of Deferred Lighting:

So that’s basic testing for “it works” that the graphics programmers themselves do. Beyond that, features are tested by QA and a large beta testing group, tried, profiled and optimized on real actual game projects and so on.

The good thing is, doing these basic tests also provides you with point 2 (did I break or change something?) automatically. If after your changes, all the graphics tests still pass, there’s a pretty good chance you did not break anything. Of course this testing is not exhaustive, but any time a regression is spotted by QA, beta testers or reported by users, you can add a new graphics test to check for that situation.

How do we actually do it?

We use TeamCity for the build/test farm. It has several build machines set up as graphics test agents (unlike most other build machines, they need an actual GPU, or a iOS device connected to them, or a console devkit etc.) that run graphics test configurations for all branches automatically. Each branch has it’s graphics tests run daily, and branches with “high graphics code activity” (i.e. branches that the rendering team is actually working on) have them run more often. You can always initiate the tests manually by clicking a button of course. What you want to see at any time is this:

The basic approach is the same as 4 years ago: a “game level” (“scene” in Unity speak) for each test, runs for defined number of frames, run everything at fixed timestep, take a screenshot at end of each frame. Compare each screenshot with “known good” image for that platform; any differences equals “FAIL”. On many platforms you have to allow a couple of wrong pixels because many consumer GPUs are not fully deterministic it seems.

So you have this bunch of “this is the golden truth” images for all the tests:

And each platform automatically tested on TeamCity has it’s own set:

Since the “test controller” can run on a different device than actual tests (the case for iOS, Xbox 360 etc.), the test executable opens a socket connection to transfer the screenshots. The test controller is a relatively simple C# application that listens on a socket, fetches the screenshots and compares them with the template ones. The result of it is output that TeamCity can understand; along with “build artifacts” that consist of failed tests (for each failed test: expected image, failed image, difference image with increased contrast).

That’s pretty much it! And of course, automated tests are nice and all, but that should not get too much into the way of actual programming manifesto.


Notes on Native Client & Pepper Plugin API

Google’s Native Client (NaCl) is a brilliant idea. TL;DR: it allows native code to be run securely in the browser.

But is it secure?

“Bububut, waitaminnit! Native code is not secure by definition” you say. Turns out, that isn’t necessarily true. With a specially massaged compiler, some runtime support and careful native code validation it is possible to ensure native code, when ran in the browser, can’t cause harm to user’s machine. I suggest taking a look at the original NaCl for x86 paper and more recently, how similar techniques would apply to ARM CPUs.

But what can you do with it?

So that’s great. It means it is possible to take C/C++ code, compile it with NaCl SDK (a gcc derived toolchain) and have it run in the browser. We can make a loop in C that multiplies a ton of floating point numbers, and it will run at native speed. That’s wonderful, except you can’t really do much interesting stuff with your own C code in isolation…

You need access to the hardware and/or OS. As game developers, we need pixels to appear on the screen. Preferably lots of them, with the help of something like a GPU. Audio waves to come out of the speakers. Mouse moves and keyboard presses to translate to some fancy actions. Post a high score to the internets. And so on.

NaCl surely can’t just allow my C code to call Direct3DCreate9 and run with it, while keeping the promise of “it’s secure”? Or a more extreme case, FILE* f = fopen("/etc/passwd", "rt");?!

And that’s true; NaCl does not allow you to use completely arbitrary APIs. It has it’s own set of APIs to interface with “the system”.

Ok, how do I interface with the system?

…and that’s where the current state of NaCl gets a bit confusing.

Initially Google developed an improved “browser plugin model” and called it Pepper. This Pepper thing would then take care of actually putting your code into the browser. Starting it up, tearing it down, controlling repaints, processing events and so on. But then apparently they realized that building on top of a decade-old Netscape plugin API (NPAPI) isn’t going to really work, so they developed Pepper2 or PPAPI (Pepper Plugin API) which ditches NPAPI completely. To write a native client plugin, you only interface with PPAPI.

So some of the pages on the internets reference the “old API” (which is gone as far as I can see), and some others reference the new one. It does not help that Native Client’s own documentation are scattered around in Chromium, NaCl, NaCl SDK and PPAPI sites. Seriously, it’s a mess, with seemingly no high level, up to date “introduction” page that tells what exactly PPAPI can and can’t do. Edit: I’m told that the definitive entry point to NaCl right now is this page: http://code.google.com/chrome/nativeclient/ which clears up some mess.

Here’s what I think it can do

Note: At work we have an in-progress Unity NaCl port using this PPAPI. However, I am not working on it, so my knowledge may or may not be true. Take everything with a grain of NaCl ;)

Most of things below found by poking around at PPAPI source tree, and by looking into Unity’s NaCl platform dependent bits.

Graphics

PPAPI provides an OpenGL ES 2.0 implementation for your 3D needs. You need to setup the context and initial surfaces via PPAPI (ppapi/cpp/dev/context_3d_dev.h, ppapi/cpp/dev/surface_3d_dev.h) - similar to what you’d use EGL on other platforms for - and beyond that you just include GLES2/gl2.h, GLES2/gl2ext.h and call ye olde GLES2.0 functions.

Behind the scenes, all your GLES2.0 calls will be put into a command buffer and transferred to actual “3D server” process for consuming them. Chrome splits up itself into various processes like that for security reasons – so that each process has the minimum set of privileges, and a crash or a security exploit in one of them can’t easily transfer over to other parts of the browser.

Audio

For audio needs, PPAPI provides a simple buffer based API in ppapi/cpp/audio_config.h and ppapi/cpp/audio.h. Your own callback will be called whenever audio buffer needs to be filled with new samples. That means you do all sound mixing yourself and just fill in the final buffer.

Input

Your plugin instance (subclass of pp::Instance) will get input events via HandleInputEvent virtual function override. Each event is a simple PPInputEvent struct and can represent keyboard & mouse. No support for gamepads or touch input so far, it seems.

Other stuff

Doing WWW requests is possible via ppapi/cpp/url_loader.h and friends.

Timer & time queries via ppapi/cpp/core.h (e.g. pp::Module::Get()->core()->CallOnMainThread(...)).

And, well, a bunch of other stuff is there, like ability to rasterize blocks of text into bitmaps, pop up file selection dialogs, use the browser to decode video streams and so on. Everything - or almost everything - is there to make it possible to do games on it.

Summary

Like Chad says, it would be good to end “thou shalt only use Javascript” on the web. Javascript is a very nice language - especially considering how it came into existence - but forcing it on everyone is quite silly. And no matter how hard V8/JägerMonkey/Nitro folks are trying, it is very, very hard to beat performance of a simple, static, compiled language (like C) that has direct access to memory and the programmer is in almost full control of both the code flow and the memory layout. Steve rightly points out that even if for some tasks a super-optimized Javascript engine will approach the speed of C, it will burn much more energy to do so – a very important aspect in the increasingly mobile world.

Native Client does give some hope that there will be a way to run native code, at native speeds, in the browser, without compromising on security. Let it happen.


A way to visualize mip levels

Recently a discussion on Twitter about folks using 2048 textures on a pair of dice spawned this post. How do artists know if the textures are too high or too low resolution? Here’s what we do in Unity, which may or may not work elsewhere.

When you have a game scene that, for example, looks like this:

We provide a “mipmaps” visualization mode that renders it like this:

Original texture colors mean it’s a perfect match (1:1 texels to pixels ratio); more red = too much texture detail; more blue = too little texture detail.

That’s it, end of story, move along!

Now of course it’s not that simple. You can just go and resize all textures that were used on the red stuff. The player might walk over to those red objects, and then they would need more detail!

Also, the amount of texture detail needed very much depends on the screen resolution the game will be running at:

Still, even with varying resolution sizes and the fact that the same objects in 3D can be near & far from the viewer, this view can answer the question of “does something have a too high/too low texture detail?”, mostly by looking at colorization mismatch between nearby objects.

In the picture above, the railings have too little texture detail (blue), while the lamp posts have too much (red). The little extruded things on the floating pads have too much detail as well.

The image below reveals that floor and ceiling have mismatching texture densities: floor has too little, while ceiling has too much. Probably should be the other way around, in a platform you’d more often be looking at the floor.

How to do this?

In the mipmap view shader, we display the original texture mixed with a special “colored mip levels” texture. The regular texture is sampled with original UVs, while the color coded texture is sampled with more dense ones, to allow visualization of “too little texture detail”. In shader code (HLSL, shader model 2.0 compatible):

struct v2f {
    float4 pos : SV_POSITION;
    float2 uv : TEXCOORD0;
    float2 mipuv : TEXCOORD1;
};
float2 mainTextureSize;
v2f vert (float4 vertex : POSITION, float2 uv : TEXCOORD0)
{
    v2f o;
    o.pos = mul (matrix_mvp, vertex);
    o.uv = uv;
    o.mipuv = uv * mainTextureSize / 8.0;
    return o;
}
half4 frag (v2f i) : COLOR0
{
    half4 col = tex2D (mainTexture, i.uv);
    half4 mip = tex2D (mipColorsTexture, i.mipuv);
    half4 res;
    res.rgb = lerp (col.rgb, mip.rgb, mip.a);
    res.a = col.a;
    return res;    
}

The mainTextureSize above is the pixel size of the main texture, for example (256,256). Division by eight might seem a bit weird, but it really isn’t!

To show the colored mip levels, we need to create mipColorsTexture that has different colors in each mip level.

Let’s say we would create a 32x32 size texture for this, and the largest mip level would be used to display “ideal texel to pixel density”. If the original texture was 256 pixels in size and we want to sample a 32 pixels texture at exactly the same texel density as the original one, we have to use more dense UVs: newUV = uv * 256 / 32 or in a more generic way, newUV = uv * textureSize / mipTextureSize.

Why there’s 8.0 in the shader then, if we create the mip texture at 32x32 size? That’s because we don’t want the largest mip level to indicate “ideal texel to pixel” density. We also want a way to visualize “not enough texel density”. So we push the ideal mip level two levels down, which means it’s four times UV difference. That’s how 32 becomes 8 in the shader.

The actual colors we use for this 32x32 mipmaps visualization texture are, in RGBA: (0.0,0.0,1.0,0.8); (0.0,0.5,1.0,0.4); (1.0,1.0,1.0,0.0); (1.0,0.7,0.0,0.2); (1.0,0.3,0.0,0.6); (1.0,0.0,0.0,0.8). Alpha channel controls how much to interpolate between the original color and the tinted color. Our 3rd mip level has zero alpha so it displays unmodified color.

Now, step 2 is somehow forcing artists to actually use this ;)


Mercurial/Kiln experience so far

At work we switched to Mercurial almost two months ago. Like Richard says, it was time to stop using Subversion. Here are my impressions so far.

_Preemptive warning: I’ve only ever used CVS, SourceSafe, Subversion, git and Mercurial as source contro systems (never used Perforce). I never really used a code review tool before Kiln. Everything below might be non-issues in other tools/systems, or not suitable for different setups/workflows! _

The Story

At Unity we used Subversion for source code versioning as long as I remember. svn revision 1 – an import from CVS – happened in 2005. We don’t talk about CVS. Nor about SourceSafe. Subversion was fine while the number of developers was small; we had a saying that CVS scales up to 5 people, and experimentally found out that svn scales up to about 50.

Since merging branches in subversion does not really work well, everyone was mostly working on one trunk, carefully. We would do an occasional branch for “this will surely break everything” features; and would branch off trunk sometime before each Unity release, but that’s about it. Having something like 50 people and 10 platforms on a single branch in version control does get a bit uneasy.

So we looked at various options, like git, Mercurial, Perforce and so on. I don’t know why exactly we ended up with Mercurial (someone made a decision I guess…). It felt like distributed versioning systems are teh future and unlike most game developers we don’t need to version hundreds of gigabytes of binary assets (hence no big need for Perforce).

So while some people were at GDC, we did a big switch to several things at once: 1) replace Subversion with Mercurial, 2) replace “everyone works on the same trunk” workflow with “teams work on their own topic branches”, 3) introduce a bit more formal code reviews via Kiln.

In hindsight, maybe switching three things at once wasn’t the brightest idea; there’s only so much change a person can absorb per unit of time. On the other hand, everyone experienced a large initial shock but now that the debris is setting down they just continue working with no big shocks predicted in the near future.

Our Setup

We use Fogcreek’s Kiln and host it on our own servers. This is mostly for legal reasons I think (in our source code we have 3rd party bits which are under strict NDAs). Advantage of hosting ourselves is that we’re under complete control. Disadvantage is that we have to do some work; and we only get Kiln updates each couple of months (so for example everyone who lets Fogcreek host Kiln is on Kiln 2.4.x right now, while we’re still on 2.3.x).

Our source tree is about 12000 files amounting to about 600MB. Mercurial’s history (60000 revisions imported from svn) adds another 200MB. Additionally, we pull almost 1GB of binary files (see below for binary file versioning) into the source tree.

Each “team” (core, editor, graphics, ios, android, …) has it’s own “branch” (actually, a separate repository clone) of the codebase, and merge back and forth between “trunk” repository. The trunk is supposed to be stable and shippable at almost any time (in theory… :)); unfinished, unreviewed code or code that has any failing tests can’t be pushed into trunk. Additionally, long-lasting features get their own “feature branches” (again, actually full clones of the repository). So right now we have more than 40 of those team+feature branches.

We have almost 50 developers committing to the source tree. Additionally, there is a build farm of 30 machines building most of those branches and running automated test suites. All this does put some pressure on the Kiln server ;) Everything below describes usage of Kiln 2.3.x with Mercurial 1.7.x; with more recent versions anything might have changed.

Mercurial, or: I Have Two Heads!

Probably the hardest thing to grok is the whole centralized-to-distributed versioning transition. Not everyone has github as their start page yet, and DVCS is actually more complex than a simple centralized model that Subversion has.

Things like this:

OMG it says I have two heads now, what do I do?!

just do not happen in centralized systems. It’s not easy for a developer to accept he has two heads now, either. Or where this extra head came from…

And the benefits of distributed source control system are not immediately obvious to someone who’s never used one. The initial reaction is that suddenly everything got more complex for no good reason. Compare operations that you would use daily:

  • Subversion: update, commit.

    • Since merges don’t really work: branch, switch & merge are rarely used by mere mortals.
  • Mercurial: pull, update or merge, commit, push.

    • And you might find you have two heads now!

    • You should also see their faces when you go “well, let me tell you about rebase…”. You might just as well explain everything with easy to understand spatial analogies ;)

Thankfully, there’s this thing called the intertubes, which often has helpful tutorials.

Myself, I think maybe switching to git would have been a smaller overall shock. Mercurial is easier to get into, but it kind of pretends to work like ye olde versioning system, while underneath it is very different. Git, on the other hand, does not even try to look similar; it says “I’ll fuck with your brain” immediately after initial “hi how are you”. So it’s a larger initial shock, but maybe that forces people to get into this different mindset faster.

Versioning large binary files

Even if we mostly version only the code, there are occasional binaries. In our case it’s mostly 3rd party SDKs that are linked into Unity. For example, PhysX, Mono, FMOD, D3DX, Cg etc. We do have the source code for most of them, but we don’t need each developer to have 30000 files of Mono’s source code for example. So we build them separately, and version the prebuilt headers/libraries/DLLs in the regular source tree. Some of those prebuilt things can get quite large though (think couple hundred megabytes).

Most distributed version control systems (including git and mercurial) have trouble with this. Every version of every file is stored in your own local checkoutclone. Try having 50 versions of whole Mono build in there and you’ll wonder where the precious SSD space on your laptop did go!

Luckily, Kiln has a solution for this: kbfiles extension. For each file marked as “large binary file”, only it’s “stand in” SHA1 hash is versioned, and the file itself is fetched from a central server into your local machine on demand. Think of it as a centralized versioning model for those special binary files. kbfiles itself is based on bfiles extension, with a tighter integration into Mercurial.

So the good news, with Kiln large binary files are handled easy and with no pain. You can globally set “large size” threshold, filename patterns etc. that are turned into “big files” automatically; or manually indicate “big file” when adding new files. And then continue using Mercurial as usual.

The bad news, however, is that kbfiles still has occasional bugs. Of course they will be fixed eventually, but for example right now rebasing with an incoming bigfiles commit will result in the wrong bigfile version in the end. Or, presence of kbfiles extension makes various Mercurial operations (like hg status) be much slower than usual.

Kiln as Web Interface

Kiln itself is the server hosting Mercurial repositories, a web interface to view/admin them, and a code review tool. It’s fairly nice and does all the standard stuff, like show overview of all activity happening in a group of repositories:

And shows the overview of any particular repository:

And of course diff view of any particular commit:

My largest complaints about Kiln’s web interface are: 1) speed and 2) merge spiderwebs.

Speed: like oh so many modern fancy-web systems, Kiln sometimes feels sluggish. Sometimes, in a time taken for Kiln to display a diff, Crysis 2 would have rendered New York fifty times. We did various things to boost up our server’s oomph, but it still does not feel fast enough. Maybe we don’t know how to setup our servers right; or maybe Kiln is actually quite slow; or maybe our repository size + branch count + number of people hitting it are exceeding whatever limits Kiln was designed for. That said, this is not unique of Kiln, lots of web systems are slow for sometimes no good reasons. If you are a web developer, however, keep this in mind: latency of any user operation is super important.

Merge spiderwebs: distributed version control makes merges reliable and easy. However, merges happen all the time and can make it hard to see what was actually going on in the code. You can’t see the actual changes through the merge spiderwebs.

The change history is littered with “merge”, “merge remote repo”, “merge again” commits. The branch graph goes crazy and starts taking half of the page width. Not good! Now of course, this is where rebasing would help, however right now we’re not very keen on using it because of Kiln’s bigfiles bug mentioned above.

Kiln as Code Review Tool

Reviewing code is fairly easy: there’s a Review button that shows up when hovering over any commit. Each commit also shows how many reviews it has pending or accepted. So you just click on something, and voilà, you can request a code review:

Within each review you see the diffs, send comments back and forth between people, and highlight code snippets to be attached with each comment:

In Kiln 2.3.x (which is what we use at the moment) the reviews still have a sort of “unfinished” feeling. For example, if you want multiple people to review a change, Kiln actually creates multiple reviews that are only very loosely coupled. The good news is that in Kiln 2.4 they have improved this, and I’m quite sure more improvements will come in the future.

Another option that I’m missing right now: in the repository views, filter out all approved commits. As an occasional “merge master”, I need to see if my big merge had any unreviewed or pending-review commits – something that’s quite hard to see with a merge-heavy history.

Summary

I’m quite happy with how switch to Mercurial + Kiln turned out to be so far. With each team working on their own repository, it does feel like we’re much less stepping on each other’s toes. That said, we haven’t shipped any Unity release from Mercurial yet; doing that will be a future exercise.

Kiln is promising. It has some very good ideas (integrated code reviews & versioning of big files in Mercurial), but it still has quite a lot of rough edges. I’m not totally happy with the web side performance of it either. That said, Fogcreek’s support for us has been fantastic; we got some bugfixes in the matter of days and they’ve been really helpful with setup/workflow/optimization issues. So it seems like it has a good future. Fogcreek guys, if you’re reading this: keep up wrk!