Visuals in Some Great Games

I was thinking about visuals of the best games I’ve recently played. Now, I’m not a PC/console gamer, and I am somewhat biased towards playing Unity-made games. So almost all these examples will be iPad & Unity games, however even taking my bias into account I think they are amazing games.

So here’s some list (Unity games):

Monument Valley by ustwo.

DEVICE 6 by Simogo.

Year Walk by Simogo (also for PC).

Gone Home by The Fullbright Company.

Kentucky Route Zero by Cardboard Computer.

The Room by Fireproof Games.

And just to make it slightly less biased, some non-Unity games:

Papers, Please by Lucas Pope.

The Stanley Parable by Galactic Cafe.

Now for the strange part. At work I’m working on physically based shading and things now, but take a look at the games above. Five out of eight are not “realistic looking” games at all! Lights, shadows, BRDFs, energy conservation and linear colors spaces don’t apply at all to a game like DEVICE 6 or Papers, Please.

But that’s okay. I’m happy that Unity is flexible enough to allow these games, and we’ll certainly keep it that way. I was looking at our game reel from GDC 2014 recently, and my reaction was “whoa, they all look different!”. Which is really, really good.

Cross Platform Shaders in 2014

A while ago I wrote a Cross platform shaders in 2012 post. What has changed since then?

Short refresher on the problem: people need to do 3D things on multiple platforms, and different platforms use different shading languages (big ones are HLSL and GLSL). However, no one wants to write their shaders twice. It would be kind of stupid if one had to write different C++ for, say, Windows & Mac. But right now we have to do it for shader code.

Most of the points from my previous post; I’ll just link to some new tools that appeared since then:

#1. Manual / Macro approach

Write some helper macros to abstract away HLSL & GLSL differences, and make everyone aware of all the differences. Examples: Valve’s Source 2 (DevDays talk), bkaradzic’s bgfx library (shader helper macros), FXAA 3.11 source etc.

Pros: Simple to do.

Cons: Everyone needs to be aware of that macro library and other syntax differences.

#2. Invent your own language with HLSL/GLSL backends

Or generate HLSL/GLSL from some graphical shader editor, and so on.

#3. Translate compiled shader bytecode into GLSL

Pros: Simpler to do than full language translation. Microsoft’s HLSL compiler does some decent optimizations, so resulting GLSL would be fairly optimized.

Cons: Closed compiler toolchain (HLSL) that only runs on Windows. HLSL compiler in some cases does too many optimizations that don’t make much sense these days.


  • HLSLCrossCompiler by James Jones. Supports DX10-11 bytecode and produces various GLSL versions as output. Under active development.
  • MojoShader by Ryan Gordon. Supports DX9 (shader model 1.1-3.0).
  • TOGL from Valve. Again DX9 only, and only partial one at that (some shader model 3.0 features aren’t there).

#4. Translate HLSL into GLSL at source level, or vice versa

  • hlsl2glslfork from Unity. DX9 level HLSL in, GLSL 1.xx / OpenGL ES (including ES3) out. Does work (used in production at Unity and some other places), however quite bad codebase and we haven’t shoehorned DX10/11 style HLSL support into it yet.
  • ANGLE from Google. OpenGL ES 2.0 (and possibly 3.0?) shaders in, DX9/DX10 HLSL out. This is whole OpenGL ES emulation on top of Direct3D layer, that also happens to have a shader cross-compiler.
  • OpenGL Reference Compiler from Khronos. While itself it’s only a GLSL validator, it has a full GLSL parser (including partial support for GL4.x at this point). Should be possible to make it emit HLSL with some work. A bit weird that source is on some subversion server though - not an ideal platform for contributing changes or filing bugs.
  • HLSL Cross Compiler from Epic. This is in Unreal Engine 4, and built upon Mesa’s GLSL stack (or maybe glsl optimizer), with HLSL parser in front. Note that this isn’t open source, but hey one can dream!
  • hlslparser from Unknown Worlds. Converts DX9-style HLSL (with constant buffers) into GLSL 3.1.
  • MojoShader by Ryan Gordon. Seems to have some code for parsing DX9-style HLSL, not quite sure how production ready.

I thought about doing similar thing like Epic folks did for UE4: take glsl optimizer and add a HLSL parser. These days Mesa’s GLSL stack already has support for compute & geometry shaders, and I think tessellation shaders will be coming soon. This would be much better codebase than hlsl2glslfork. However, never had time to actually do it, besides thinking about it for a few hours :(

Call to action?

Looks like we’ll stay with two shading languages for a while now (Windows and all relevant consoles use HLSL; Mac/Linux/iOS/Android use GLSL). So each and every graphics developer who does cross platform stuff is facing this problem.

I don’t think IHVs will solve this problem. NVIDIA did try once with Cg (perhaps too early), but Cg is pretty much dead now.

DX9-level shader translation is probably a solved problem (hlsl2glslfork, mojoshader, ANGLE). However, we need a DX10/11-level translation - with compute shaders, tessellation and all that goodness.

We have really good collaboration tools in forms of github & bitbucket. Let’s do this. Somehow.

Speaking at i3D and GDC 2014

I’ll be speaking at i3D Symposium and GDC in San Francisco in a couple of days.

At i3D, Industry Panel (Sunday at 11:00AM). Jason Mitchell (Valve) will do a panel on the scalability challenges inherent in shipping games on a diverse range of platforms. Panelists are Michael Bukowski (Vicarious Visions), Jeremy Shopf (Firaxis), Emil Persson (Avalanche) and yours truly.

My first i3D, can’t wait to see what it is about!

At GDC, a bunch of talks on Tuesday:

The Unity tracks will probably get a shorter repeat at Unity expo booth on Wednesday.

See you there!

Rough Sorting by Depth

TL;DR: use some highest bits from a float in your integer sorting key.

In graphics, often you want to sort objects back-to-front (for transparency) or front-to-back (for occlusion efficiency) reasons. You also want to sort by a bunch of other data (global layer, material, etc.). Christer Ericson has a good post on exactly that.

There’s a question in the comments:

I have all the depth values in floats, and I want to use those values in the key. What is the best way to encode floats into ‘bits’ (or integer) so that I can use it as part of the key ?

While “the best way” is hard to answer universally, just taking some highest bits off a float is a simple and decent approach.

Floating point numbers have a nice property that if you interpret their bits as integers, then larger numbers result in larger integers - i.e. you can treat float as an integer and compare them just fine (within same sign). See details at Bruce Dawson’s blog post.

And due to the way floats are laid out, you can chop off lowest bits of the mantissa and only lose some precision. For something like front-to-back sorting, we only need a very rough sort. In fact a quantized sort is good, since you do also want to render objects with same material together etc.

Anyhow, for example taking 10 bits. Assuming all numbers are positive (quite common if you’re sorting by “distance from camera”), we can ignore the sign bit which will be always zero. So you end up only using 9 bits for the depth sorting.

// Taking highest 10 bits for rough sort of positive floats.
// Sign is always zero, so only 9 bits in the result are used.
// 0.01 maps to 240; 0.1 to 247; 1.0 to 254; 10.0 to 260;
// 100.0 to 267; 1000.0 to 273 etc.
unsigned DepthToBits (float depth)
  union { float f; unsigned i; } f2i;
  f2i.f = depth;
  unsigned b = f2i.i >> 22; // take highest 10 bits
  return b;

And that’s about it. Put these bits into your sorting key and go sort some stuff!

Q: But what about negative floats?

If you pass negative numbers into the above DepthToBits function, you will get wrong order. Turned to integers, negative numbers will be larger than positive ones; and come sorted the wrong way:

-10.0 -> 772
-1.0 -> 766
-0.1 -> 759
0.1 -> 247
1.0 -> 254
10.0 -> 260

With some bit massaging you can turn floats into still-perfectly-sortable integers, even with both positive and negative numbers. Michael Herf has an article on that. Here’s the code with his trick, that handles both positive and negative numbers (now uses all 10 bits though):

unsigned FloatFlip(unsigned f)
  unsigned mask = -int(f >> 31) | 0x80000000;
  return f ^ mask;

// Taking highest 10 bits for rough sort of floats.
// 0.01 maps to 752; 0.1 to 759; 1.0 to 766; 10.0 to 772;
// 100.0 to 779 etc. Negative numbers go similarly in 0..511 range.
unsigned DepthToBits (float depth)
  union { float f; unsigned i; } f2i;
  f2i.f = depth;
  f2i.i = FloatFlip(f2i.i); // flip bits to be sortable
  unsigned b = f2i.i >> 22; // take highest 10 bits
  return b;

Q: Why you need some bits? Why not just sort floats?

Often you don’t want to sort only by distance. You also want to sort by material, or mesh, or various other things (much more details in Christer’s post).

Sorting front-to-back on very limited bits of depth has a nice effect that you essentially “bucket” objects into ranges, and within each range you can sort them to reduce state changes.

Packing sorting data tightly into a small integer value allows either writing a very simple comparison operator (just compare two numbers), or using radix sort.

Some Unity Codebase Stats

I was doing fresh codebase checkout & building on a new machine, so got some stats along the way. No big insights, move on!

Codebase size

We use Mercurial for source control right now. With ”largefiles” extension for some big binary files (precompiled 3rd party libraries mostly).

Getting only the “trunk” branch (without any other branches that aren’t in trunk yet), which is 97529 commits:

  • Size of whole Mercurial history (.hg folder): 2.5GB, 123k files.
  • Size of large binary files: 2.3GB (almost 200 files).
  • Regular files checked out: 811MB, 36k files.

Now, the build process has a “prepare” step where said large files are extracted for use (they are mostly zip or 7z archives). After extraction, everything you have cloned, updated and prepared so far takes 11.7GB of disk space.

Languages and line counts

Runtime (“the engine”) and platform specific bits , about 5000 files:

  • C++: 360 kLOC code, 29 kLOC comments, 1297 files.
  • C/C++ header: 146 kLOC code, 18 kLOC comments, 1480 files.
  • C#: 20 kLOC code, 6 kLOC comments, 154 files.
  • Others are peanuts: some assembly, Java, Objective C etc.
  • Total about half a million lines of code.

Editor (“the tools”), about 6000 files:

  • C++: 257 kLOC code, 23 kLOC comments, 588 files.
  • C#: 210 kLOC code, 16 kLOC comments, 1168 files.
  • C/C++ Header: 51 kLOC code, 6k comments, 497 files.
  • Others are peanuts: Perl, JavaScript etc.
  • Total, also about half a million lines of code!

Tests, about 7000 files. This is excluding C++ unit tests which are directly in the code. Includes our own internal test frameworks as well as tests themselves.

  • C#: 170 kLOC code, 11 kLOC comments, 2248 files.
  • A whole bunch of other stuff: C++, XML, JavaScript, Perl, Python, Java, shell scripts.
  • Everything sums up to about quarter million lines of code.

Now, all the above does not include 3rd party libraries we use (Mono, PhysX, FMOD, Substance etc.). Also does not include some of our own code that is more or less “external” (see github).

Build times

Building Windows Editor: 2700 files to compile; 4 minutes for Debug build, 5:13 for Release build. This effectively builds “the engine” and “the tools” (main editor and auxilary tools used by it).

Build Windows Standalone Player: 1400 files to compile; 1:19 for Debug build, 1:48 for Release build. This effectively builds only “the engine” part.

All this doing a complete build. As timed on MacBookPro (2013, 15” 2.3GHz Haswell, 16GB RAM, 512GB SSD model) with Visual Studio 2010, Windows 8.1, on battery, and watching Jon Blow’s talk on youtube. We use JamPlus build system (“everything about it sucks, but it gets the job done”) with precompiled headers.

Sidenote on developer hardware: this top-spec-2013 MacBookPro is about 3x faster at building code as my previous top-spec-2010 MacBookPro (it had really heavy use and SSD isn’t as fast as it used to be). And yes, I also have a development desktop PC; most if not all developers at Unity get a desktop & laptop.

However difference between a 3 minute build and 10 minute build is huge, and costs a lot more than these extra 7 minutes. Longer iterations means more distractions, less will to do big changes (“oh no will have to compile again”), less will to code in general etc.

Do get the best machines for your developers!

Well this is all!