Speaking at i3D and GDC 2014

I’ll be speaking at i3D Symposium and GDC in San Francisco in a couple of days.

At i3D, Industry Panel (Sunday at 11:00AM). Jason Mitchell (Valve) will do a panel on the scalability challenges inherent in shipping games on a diverse range of platforms. Panelists are Michael Bukowski (Vicarious Visions), Jeremy Shopf (Firaxis), Emil Persson (Avalanche) and yours truly.

My first i3D, can’t wait to see what it is about!

At GDC, a bunch of talks on Tuesday:

The Unity tracks will probably get a shorter repeat at Unity expo booth on Wednesday.

See you there!


Rough sorting by depth

TL;DR: use some highest bits from a float in your integer sorting key.

In graphics, often you want to sort objects back-to-front (for transparency) or front-to-back (for occlusion efficiency) reasons. You also want to sort by a bunch of other data (global layer, material, etc.). Christer Ericson has a good post on exactly that.

There’s a question in the comments:

I have all the depth values in floats, and I want to use those values in the key. What is the best way to encode floats into ‘bits’ (or integer) so that I can use it as part of the key ?

While “the best way” is hard to answer universally, just taking some highest bits off a float is a simple and decent approach.

Floating point numbers have a nice property that if you interpret their bits as integers, then larger numbers result in larger integers - i.e. you can treat float as an integer and compare them just fine (within same sign). See details at Bruce Dawson’s blog post.

And due to the way floats are laid out, you can chop off lowest bits of the mantissa and only lose some precision. For something like front-to-back sorting, we only need a very rough sort. In fact a quantized sort is good, since you do also want to render objects with same material together etc.

Anyhow, for example taking 10 bits. Assuming all numbers are positive (quite common if you’re sorting by “distance from camera”), we can ignore the sign bit which will be always zero. So you end up only using 9 bits for the depth sorting.

// Taking highest 10 bits for rough sort of positive floats.
// Sign is always zero, so only 9 bits in the result are used.
// 0.01 maps to 240; 0.1 to 247; 1.0 to 254; 10.0 to 260;
// 100.0 to 267; 1000.0 to 273 etc.
unsigned DepthToBits (float depth)
{
	union { float f; unsigned i; } f2i;
	f2i.f = depth;
	unsigned b = f2i.i >> 22; // take highest 10 bits
	return b;
}

And that’s about it. Put these bits into your sorting key and go sort some stuff!

Q: But what about negative floats?

If you pass negative numbers into the above DepthToBits function, you will get wrong order. Turned to integers, negative numbers will be larger than positive ones; and come sorted the wrong way:

-10.0 -> 772
-1.0 -> 766
-0.1 -> 759
0.1 -> 247
1.0 -> 254
10.0 -> 260

With some bit massaging you can turn floats into still-perfectly-sortable integers, even with both positive and negative numbers. Michael Herf has an article on that. Here’s the code with his trick, that handles both positive and negative numbers (now uses all 10 bits though):

unsigned FloatFlip(unsigned f)
{
	unsigned mask = -int(f >> 31) | 0x80000000;
	return f ^ mask;
}

// Taking highest 10 bits for rough sort of floats.
// 0.01 maps to 752; 0.1 to 759; 1.0 to 766; 10.0 to 772;
// 100.0 to 779 etc. Negative numbers go similarly in 0..511 range.
unsigned DepthToBits (float depth)
{
	union { float f; unsigned i; } f2i;
	f2i.f = depth;
	f2i.i = FloatFlip(f2i.i); // flip bits to be sortable
	unsigned b = f2i.i >> 22; // take highest 10 bits
	return b;
}

Q: Why you need some bits? Why not just sort floats?

Often you don’t want to sort only by distance. You also want to sort by material, or mesh, or various other things (much more details in Christer’s post).

Sorting front-to-back on very limited bits of depth has a nice effect that you essentially “bucket” objects into ranges, and within each range you can sort them to reduce state changes.

Packing sorting data tightly into a small integer value allows either writing a very simple comparison operator (just compare two numbers), or using radix sort.


Some Unity codebase stats

I was doing fresh codebase checkout & building on a new machine, so got some stats along the way. No big insights, move on!

Codebase size

We use Mercurial for source control right now. With “largefiles” extension for some big binary files (precompiled 3rd party libraries mostly).

Getting only the “trunk” branch (without any other branches that aren’t in trunk yet), which is 97529 commits:

  • Size of whole Mercurial history (.hg folder): 2.5GB, 123k files.
  • Size of large binary files: 2.3GB (almost 200 files).
  • Regular files checked out: 811MB, 36k files.

Now, the build process has a “prepare” step where said large files are extracted for use (they are mostly zip or 7z archives). After extraction, everything you have cloned, updated and prepared so far takes 11.7GB of disk space.

Languages and line counts

Runtime (“the engine”) and platform specific bits , about 5000 files:

  • C++: 360 kLOC code, 29 kLOC comments, 1297 files.
  • C/C++ header: 146 kLOC code, 18 kLOC comments, 1480 files.
  • C#: 20 kLOC code, 6 kLOC comments, 154 files.
  • Others are peanuts: some assembly, Java, Objective C etc.
  • Total about half a million lines of code.

Editor (“the tools”), about 6000 files:

  • C++: 257 kLOC code, 23 kLOC comments, 588 files.
  • C#: 210 kLOC code, 16 kLOC comments, 1168 files.
  • C/C++ Header: 51 kLOC code, 6k comments, 497 files.
  • Others are peanuts: Perl, JavaScript etc.
  • Total, also about half a million lines of code!

Tests, about 7000 files. This is excluding C++ unit tests which are directly in the code. Includes our own internal test frameworks as well as tests themselves.

  • C#: 170 kLOC code, 11 kLOC comments, 2248 files.
  • A whole bunch of other stuff: C++, XML, JavaScript, Perl, Python, Java, shell scripts.
  • Everything sums up to about quarter million lines of code.

Now, all the above does not include 3rd party libraries we use (Mono, PhysX, FMOD, Substance etc.). Also does not include some of our own code that is more or less “external” (see github).

Build times

Building Windows Editor: 2700 files to compile; 4 minutes for Debug build, 5:13 for Release build. This effectively builds “the engine” and “the tools” (main editor and auxilary tools used by it).

Build Windows Standalone Player: 1400 files to compile; 1:19 for Debug build, 1:48 for Release build. This effectively builds only “the engine” part.

All this doing a complete build. As timed on MacBookPro (2013, 15" 2.3GHz Haswell, 16GB RAM, 512GB SSD model) with Visual Studio 2010, Windows 8.1, on battery, and watching Jon Blow’s talk on youtube. We use JamPlus build system (“everything about it sucks, but it gets the job done”) with precompiled headers.

Sidenote on developer hardware: this top-spec-2013 MacBookPro is about 3x faster at building code as my previous top-spec-2010 MacBookPro (it had really heavy use and SSD isn’t as fast as it used to be). And yes, I also have a development desktop PC; most if not all developers at Unity get a desktop & laptop.

However difference between a 3 minute build and 10 minute build is huge, and costs a lot more than these extra 7 minutes. Longer iterations means more distractions, less will to do big changes (“oh no will have to compile again”), less will to code in general etc.

Do get the best machines for your developers!

Well this is all!


On having an ambitious vision

We just announced upcoming 2D tools for Unity 4.3, and one of responses I’ve seen is “I am rapidly running out of reasons not to use Unity”. Which reminds me of some stories of a few years back.

Perhaps sometime in 2006, I was eating shawarmas with Joachim and discussing possible futures of Unity. His answer to my question, “so what’s your ultimate goal with Unity” was along the lines of,

I want to make it so that whenever anyone starts making a game, Unity will be their first choice of tech.

Of course that was crazy talk, so my reaction was somewhere between “you know that’s going to be hard” and “good luck with that”. Fast forward to 2013 and the thought is not so crazy anymore. Of course not everyone has to use Unity, but quite many do consider and use it. My slightly pessimistic, pragmatic, probability-weighted thinking proved wrong.

Some time before that, in late 2005, I got an email from some David, asking if I’d want to join their company I’ve never heard about. The company made this engine, “Unity”, that I’ve never heard about; and it was Mac-only, and I’ve only seen a Mac before.

The email said:

<…>

We are building a game development suite called Unity. Unity is changing how small-to-medium developers create games. It is a power-tool combining the flexibility of Flash with all of the power of high-end game engines.

It’s on the market right now, and making a dent. Unity thrills people wonderfully, people find they are able to create stuff they only dreamt of before.

Our users are excited by extremely advanced technology combined with an intuitive editor. A flexible shader system, a unique completely automatic asset pipeline, Ageia physX (née Novodex), and publishing standalone Windows and OS X, and OS X web player with one click (and it actually just works).

<…>

Now, that was 2005. Unity was at version 1.1. No one besides Jon Czeck was probably using it; more or less.

Crazy fantasies from someone who’s somewhere between naïve and delusional? Yeah, sounds like it. So of course after a couple of exhanges I said “no” (but then they invited me to a gamejam, and I thought that while most likely nothing big will happen out of that, at least it will be fun…).

And now it’s 2013. And no matter if you like or dislike Unity, there’s no denying it is quite a thing, and perhaps changed the industry for better. A tiny bit at least. These crazy, ambitious ideas did come through.

Reminder for myself: probabilities and pragmatism do not always win. Gotta have goals that are beyond practical possibilities.


Inter-process Communication: How?

A post of mostly questions, and no answers!

So I needed to do some IPC (Inter-process Communication) lately for shader compilers. There are several reasons why you’d want to move some piece of code into another process; in my case they were:

  • Bit-ness of the process; I want a 64 bit main executable but some of our platforms have only 32 bit shader compiler libraries.
  • Parallelism. For example you can call NVIDIA’s Cg from multiple threads, but it will just lock some internal mutex for most of the shader compilation time. Result is, you’re trying to compile shaders on 16 cores, but they end up just waiting on each other. By running 16 processes instead, they are completely independent, and the shader compiler does not even have to be thread-safe ;)
  • Memory usage and fragmentation. This is less of an issue in 64 bit land, but in 32 bit it helps to put some stuff into separate process with its own address space.
  • Crash resistance. A crash or memory trasher in a shader compiler should not bring down whole editor.

Now of course, all that comes with a downside: IPC is much more cumbersome than just calling a function in some library directly. So I’m wondering - how people do that in C/C++ codebases?

(I’m getting flashbacks of CORBA from my early enterprisey days… but hey, that was last millenium, and seemed like a good idea at the time…)

Transport layer?

So there’s a question of over what medium the processes will communicate?

  • Files, named pipes, sockets, shared memory?
  • Roll your own code for one of the above?
  • Use some larger libraries like libuv, 0MQ, nanomsg or (shudder) boost::asio?

What I do right now is just some code for named pipes (on Windows) and stdin/stdout (on Unixes). We already had some code for that lying around anyway.

Message protocol?

And then there’s a question, how do you define the “communication protocol” between the processes. Ease of development, need (or no need) for backward/forward compatibility, robustness in presence of errors etc. all come into play.

  • Manually written, some binary message format?
  • Manually written, some text/line based protocol?
  • JSON, XML, YAML etc.?
  • Helper tools like protobuf or Cap’n Proto?

Right now I’m having a manually-written, line-based message format. But it’s quite a laborous process to write all that communication code, especially when you also want to do some error handling. It’s not hard, but stupid boring work, and high chance of accidental bugs due to bored programmer copy-pasting nonsense me.

Maybe I should use protobuf? (looked at Cap’n Proto, but can’t afford to use C++11 compilers yet)

Am I missing some easy, turnkey solution for IPC in C/C++?