Mercurial/Kiln experience so far

At work we switched to Mercurial almost two months ago. Like Richard says, it was time to stop using Subversion. Here are my impressions so far.

_Preemptive warning: I’ve only ever used CVS, SourceSafe, Subversion, git and Mercurial as source contro systems (never used Perforce). I never really used a code review tool before Kiln. Everything below might be non-issues in other tools/systems, or not suitable for different setups/workflows! _

The Story

At Unity we used Subversion for source code versioning as long as I remember. svn revision 1 – an import from CVS – happened in 2005. We don’t talk about CVS. Nor about SourceSafe. Subversion was fine while the number of developers was small; we had a saying that CVS scales up to 5 people, and experimentally found out that svn scales up to about 50.

Since merging branches in subversion does not really work well, everyone was mostly working on one trunk, carefully. We would do an occasional branch for “this will surely break everything” features; and would branch off trunk sometime before each Unity release, but that’s about it. Having something like 50 people and 10 platforms on a single branch in version control does get a bit uneasy.

So we looked at various options, like git, Mercurial, Perforce and so on. I don’t know why exactly we ended up with Mercurial (someone made a decision I guess…). It felt like distributed versioning systems are teh future and unlike most game developers we don’t need to version hundreds of gigabytes of binary assets (hence no big need for Perforce).

So while some people were at GDC, we did a big switch to several things at once: 1) replace Subversion with Mercurial, 2) replace “everyone works on the same trunk” workflow with “teams work on their own topic branches”, 3) introduce a bit more formal code reviews via Kiln.

In hindsight, maybe switching three things at once wasn’t the brightest idea; there’s only so much change a person can absorb per unit of time. On the other hand, everyone experienced a large initial shock but now that the debris is setting down they just continue working with no big shocks predicted in the near future.

Our Setup

We use Fogcreek’s Kiln and host it on our own servers. This is mostly for legal reasons I think (in our source code we have 3rd party bits which are under strict NDAs). Advantage of hosting ourselves is that we’re under complete control. Disadvantage is that we have to do some work; and we only get Kiln updates each couple of months (so for example everyone who lets Fogcreek host Kiln is on Kiln 2.4.x right now, while we’re still on 2.3.x).

Our source tree is about 12000 files amounting to about 600MB. Mercurial’s history (60000 revisions imported from svn) adds another 200MB. Additionally, we pull almost 1GB of binary files (see below for binary file versioning) into the source tree.

Each “team” (core, editor, graphics, ios, android, …) has it’s own “branch” (actually, a separate repository clone) of the codebase, and merge back and forth between “trunk” repository. The trunk is supposed to be stable and shippable at almost any time (in theory… :)); unfinished, unreviewed code or code that has any failing tests can’t be pushed into trunk. Additionally, long-lasting features get their own “feature branches” (again, actually full clones of the repository). So right now we have more than 40 of those team+feature branches.

We have almost 50 developers committing to the source tree. Additionally, there is a build farm of 30 machines building most of those branches and running automated test suites. All this does put some pressure on the Kiln server ;) Everything below describes usage of Kiln 2.3.x with Mercurial 1.7.x; with more recent versions anything might have changed.

Mercurial, or: I Have Two Heads!

Probably the hardest thing to grok is the whole centralized-to-distributed versioning transition. Not everyone has github as their start page yet, and DVCS is actually more complex than a simple centralized model that Subversion has.

Things like this:

OMG it says I have two heads now, what do I do?!

just do not happen in centralized systems. It’s not easy for a developer to accept he has two heads now, either. Or where this extra head came from…

And the benefits of distributed source control system are not immediately obvious to someone who’s never used one. The initial reaction is that suddenly everything got more complex for no good reason. Compare operations that you would use daily:

  • Subversion: update, commit.

    • Since merges don’t really work: branch, switch & merge are rarely used by mere mortals.
  • Mercurial: pull, update or merge, commit, push.

    • And you might find you have two heads now!

    • You should also see their faces when you go “well, let me tell you about rebase…”. You might just as well explain everything with easy to understand spatial analogies ;)

Thankfully, there’s this thing called the intertubes, which often has helpful tutorials.

Myself, I think maybe switching to git would have been a smaller overall shock. Mercurial is easier to get into, but it kind of pretends to work like ye olde versioning system, while underneath it is very different. Git, on the other hand, does not even try to look similar; it says “I’ll fuck with your brain” immediately after initial “hi how are you”. So it’s a larger initial shock, but maybe that forces people to get into this different mindset faster.

Versioning large binary files

Even if we mostly version only the code, there are occasional binaries. In our case it’s mostly 3rd party SDKs that are linked into Unity. For example, PhysX, Mono, FMOD, D3DX, Cg etc. We do have the source code for most of them, but we don’t need each developer to have 30000 files of Mono’s source code for example. So we build them separately, and version the prebuilt headers/libraries/DLLs in the regular source tree. Some of those prebuilt things can get quite large though (think couple hundred megabytes).

Most distributed version control systems (including git and mercurial) have trouble with this. Every version of every file is stored in your own local checkoutclone. Try having 50 versions of whole Mono build in there and you’ll wonder where the precious SSD space on your laptop did go!

Luckily, Kiln has a solution for this: kbfiles extension. For each file marked as “large binary file”, only it’s “stand in” SHA1 hash is versioned, and the file itself is fetched from a central server into your local machine on demand. Think of it as a centralized versioning model for those special binary files. kbfiles itself is based on bfiles extension, with a tighter integration into Mercurial.

So the good news, with Kiln large binary files are handled easy and with no pain. You can globally set “large size” threshold, filename patterns etc. that are turned into “big files” automatically; or manually indicate “big file” when adding new files. And then continue using Mercurial as usual.

The bad news, however, is that kbfiles still has occasional bugs. Of course they will be fixed eventually, but for example right now rebasing with an incoming bigfiles commit will result in the wrong bigfile version in the end. Or, presence of kbfiles extension makes various Mercurial operations (like hg status) be much slower than usual.

Kiln as Web Interface

Kiln itself is the server hosting Mercurial repositories, a web interface to view/admin them, and a code review tool. It’s fairly nice and does all the standard stuff, like show overview of all activity happening in a group of repositories:

And shows the overview of any particular repository:

And of course diff view of any particular commit:

My largest complaints about Kiln’s web interface are: 1) speed and 2) merge spiderwebs.

Speed: like oh so many modern fancy-web systems, Kiln sometimes feels sluggish. Sometimes, in a time taken for Kiln to display a diff, Crysis 2 would have rendered New York fifty times. We did various things to boost up our server’s oomph, but it still does not feel fast enough. Maybe we don’t know how to setup our servers right; or maybe Kiln is actually quite slow; or maybe our repository size + branch count + number of people hitting it are exceeding whatever limits Kiln was designed for. That said, this is not unique of Kiln, lots of web systems are slow for sometimes no good reasons. If you are a web developer, however, keep this in mind: latency of any user operation is super important.

Merge spiderwebs: distributed version control makes merges reliable and easy. However, merges happen all the time and can make it hard to see what was actually going on in the code. You can’t see the actual changes through the merge spiderwebs.

The change history is littered with “merge”, “merge remote repo”, “merge again” commits. The branch graph goes crazy and starts taking half of the page width. Not good! Now of course, this is where rebasing would help, however right now we’re not very keen on using it because of Kiln’s bigfiles bug mentioned above.

Kiln as Code Review Tool

Reviewing code is fairly easy: there’s a Review button that shows up when hovering over any commit. Each commit also shows how many reviews it has pending or accepted. So you just click on something, and voilà, you can request a code review:

Within each review you see the diffs, send comments back and forth between people, and highlight code snippets to be attached with each comment:

In Kiln 2.3.x (which is what we use at the moment) the reviews still have a sort of “unfinished” feeling. For example, if you want multiple people to review a change, Kiln actually creates multiple reviews that are only very loosely coupled. The good news is that in Kiln 2.4 they have improved this, and I’m quite sure more improvements will come in the future.

Another option that I’m missing right now: in the repository views, filter out all approved commits. As an occasional “merge master”, I need to see if my big merge had any unreviewed or pending-review commits – something that’s quite hard to see with a merge-heavy history.

Summary

I’m quite happy with how switch to Mercurial + Kiln turned out to be so far. With each team working on their own repository, it does feel like we’re much less stepping on each other’s toes. That said, we haven’t shipped any Unity release from Mercurial yet; doing that will be a future exercise.

Kiln is promising. It has some very good ideas (integrated code reviews & versioning of big files in Mercurial), but it still has quite a lot of rough edges. I’m not totally happy with the web side performance of it either. That said, Fogcreek’s support for us has been fantastic; we got some bugfixes in the matter of days and they’ve been really helpful with setup/workflow/optimization issues. So it seems like it has a good future. Fogcreek guys, if you’re reading this: keep up wrk!


Stories of Universities

I was doing a talk and a Q&A session at a local university. Unaware of the consequences, one guy asked about the usefulness of the programming courses they have in real work…

Oh boy. Do you really want to go there?

Now before I go ranting full steam, let me tell that there were really good courses and really bright teachers at my (otherwise unspectacular) university. Most of the math, physics and related fundamental sciences courses were good & taught by people who know their stuff. Even some of the computer science / programming courses were good!

With that aside, let’s bet back to ranting.

What is OOP?

Somehow conversation drifted to the topics of code design, architecture and whatnot. I asked the audience, for example, what do they think are the benefits of object oriented programming (OOP)? The answers were the following:

  • Mumble mumble… weeelll… something something mumble. This was the majority’s opinion.

  • OOP makes it very easy for a new guy to start at work, because everything nicely separated and he can just work on this one file without knowing anything else.

  • Without OOP there’s no way to separate things out; everything becomes a mess.

  • OOP uses classes, and they are nicer than not using classes. Because a class lets you… uhm… well I don’t know, but classes are nicer than no classes. I think it had something to do with something being in separate files. Or maybe in one file. I don’t actually know…

  • I forget if there was anything else really.

Let me tell you how easy it is for a guy to start at work. You come to new place all inspired and excited. You’re being put into some unholy codebase that grew in a chaotic way over last N years and being assigned to do some random feature or fix some bugs. When you encounter anything smelly in the codebase (this happens fairly often), the answer to “WTF is this?” is most often “it came from the past, yeah, we don’t like it either” or “I dunno, this guy who left last year wrote it” or “yeah, I wrote it but it was ages ago, I don’t remember anything about it… wow! this is idiotic code indeed! just be careful, touching it might break everything”. All this is totally independent of whether the codebase used OOP or not.

I am exaggerating of course; the codebase doesn’t have to be that bad. But still; whether it’s good or not, or whether it’s easy for a new guy to start there is really not related to it being OOP.

Interesting!

Clearly they have no frigging clue what OOP is, besides of whatever they’ve been told by the teacher. And the teacher in turn knows about OOP based on what he read in one or two books. And the author of the books… well, we don’t know; depends on the book I guess. But this is at least a second-order disconnect from reality, if not more!

Why is that?

I guess part of the problem is teachers having no real actual work experience except by reading books. This can work for math. For a lot of programming courses… not so much. Another part is students learning in a vacuum, trying to kind of get what the lectures are about and pass the tests.

In both cases it’s totally separated from doing some real actual work and trying to apply what you’re trying to learn. Which leads to some funny things like…

How are floating point numbers stored?

I saw this about 11 years ago in one lecture of a C++ course. The teacher quickly explained how various types are stored in memory. He got over the integer types without trouble and started explaining floats.

So there’s one bit for the sign. Then come the digits before the decimal point. Since there are 10 possible choices for each digit, you need four bits of memory for each digit. Then comes one bit for the decimal point. After the decimal point, again you have four bits per digit. Done!

ORLY? This was awesome, especially trying to imagine how to store the decimal point.

See that decimal digit bit, haha! You see, it’s one bit and you can’t… what do you mean you don’t get it? And not only that; this needs variable length and… really? You’re going to a party instead? I wasn’t very popular.

Funny or not, this is not exactly telling a correct story on how floats are stored in memory on 101% of the architectures you’d ever care about.

I could tell a ton of other examples of little disconnects with reality, which I think are caused by not ever having to put your knowledge into practice.

Where do we go from here?

Now of course, the university I went to is not something that would be considered “good” by world standards. I went to several lectures by Henrik Wann Jensen at DTU at that was like night and day! But how many of these not-too-good-only-passable universities are around the world? I’d imagine certainly more than one, and certainly less than the number of MITs, Stanfords et al combined.

As a student, I somehow figured I should take a lot of things with a grain of salt doubt. And in a lot of cases, trying to do something for real trumps lab work / tests / exams in how much you’ll be able to learn. Go make a techdemo, a small game, play around with some techniques, try to implement that clever sounding paper from siggraph and observe it burst in flames, team up with friends while doing any of the above. Do it!


Mobile graphics API wishlist: some features

In my previous post I talked about things I’d want from OpenGL ES 2.0 in the performance area. Now it’s time to look at what extra features it might expose with an extension here or there.

Note that I’m focusing on, in my limited understanding, low-hanging fruits. The features I want already exist in the current GPUs or platforms; or could be easily made available. Of course more radical new architectures would bring more & fancier features, but that’s a topic for another story.

Programmable blending

At least two out of three big current mobile GPU families (PVR SGX, Adreno, Tegra 2) support programmable blending in the hardware. Maybe all of them do this and I just don’t have enough data. By “support it in the hardware” I mean either: 1) the GPU has no blending hardware, the drivers add “read current pixel & blend” instructions to the shaders or 2) has blending hardware for commonly used modes, but fancier modes use shader patching with no severe performance penalties.

Programmable blending is useful for various things; from deferred-style decals (blending normals is hard in fixed function!) to fancier Photoshop-like blend modes to potentially faster single-pixel image postprocessing effects (like color correction).

Currently only NVIDIA exposes this capability via NV_shader_framebuffer_fetch extension.

Suggestion: expose it on other hardware that can do this! It’s fine to not handle hard edge cases (for example, what happens when multisampling is used?), we can live with the limitations.

Direct, fast access to frame buffer on the CPU

Most (all?) mobile platforms use unified memory approach, where there’s no physical distinction between “system memory” and “video memory”. Some of those platforms are slightly unbalanced, e.g. a strong GPU coupled with a weak CPU or vice versa. More and more of those systems will have multicore CPUs. It might make sense to do similar approaches that PS3 guys are doing these days - offload some of the GPU work to the CPU(s).

Image processing, deferred lighting and similar things could be done more efficiently on a general purpose CPU, where you aren’t limited to “one pixel at a time” model of current mobile GPUs.

Suggestion: can haz get a pointer to framebuffer memory perhaps? Of course this is grossly oversimplifying all the synchronization & security issues, but something should be possible to do in order to exploit the unified memory model. Right now it just sits there largely unused, with GLES2.0 still pretending CPU is talking to a GPU over a ten meter high concrete wall.

Expose Tile Based GPU capabilities

PowerVR GPUs found in all iOS and some Android devices are so called “tile based” architectures. So is, to some extent, Qualcomm Adreno family.

Currently this capability is mostly sitting behind a black box. On PowerVR GPUs the programmer does know that “overdraw of opaque objects does not matter”, or that “alpha testing is really slow” but that’s about it. There’s no control over the whole rendering process, even if some of the things could benefit from having more control over the whole tiling thing.

Take, for example, deferred lighting/shading. The cool folks are doing it tile-based already on DirectX 11 or PS3.

On a tile-based GPU, all rendering is already happening in tiles, so what if we could say “now, you work on this tile, render this, render that; now we go this this tile”? Maybe that way we could achieve two things at once: 1) better light culling because it’s at tile level, and 2) most of the data could stay on this super-fast on-chip memory, without having to be written into system memory & later read again. Memory bandwidth is very often a limiting factor in mobile graphics performance, and ability to keep deferred lighting buffers on-chip through the whole process could cut down bandwidth requirements a lot.

Suggestion: somehow (I’m feeling very hand-wavy today) expose more control over tiled rendering. For example, explicitly say that rendering will only happen to the given tiles; and these textures are very likely to be read just after they are rendered into - so don’t resolve them to memory if they fit into on-chip one.

There’s already a Qualcomm extension of something towards that area - QCOM_tiled_rendering - though it seems to be more concerned about where does rendering happen. More control is needed on how to mark FBO textures as “keep in on-chip memory for sampling as a texture plz”.

OpenCL

Current mobile GPUs already are, or very soon will be, OpenCL capable. Also OpenCL can be implemented on the CPU, nicely SIMDified via NEON, and use multicore. DO WANT! (and while you’re at it, everything that’s doable to make interop between CL & GL faster)

This can be used for a ton of things; skinning, culling, particles, procedural animations, image postprocessing and so on. And with a much less restrictive programming model, it’s easier to reuse computation results across draw calls or frames.

Couple this with “direct access to memory on the CPU” and OpenCL could be used for more things than graphics (again I’m grossly oversimplifying here and ignoring the whole synchronization/latency/security elephant…).

MOAR?

Now of course there are more things I’d want to see, but for today I’ll take just those above, thank you. Have a nice day!


Mobile graphics API wishlist: performance

Most mobile platforms currently are based on OpenGL ES 2.0. While it is much better than traditional OpenGL, there are ways where it limits performance or does not expose some interesting hardware features. So here’s an unorganized wishlist for GLES2.0 performance part!

Note that I’m focusing on, in my limited understanding, short term low-hanging fruits how to extend/patch existing GLES2.0 API. A pipe dream would be starting from scratch, getting rid of all OpenGL baggage and hopefully come up with a much cleaner, leaner & better API, especially if it’s designed to only support some particular platform. But I digress, back to GLES2.0 for now.

No guarantees when something expensive might happen.

Due to some flexibility in GLES2.0, there might be expensive things happening at almost any point in your frame. For example, binding a texture with a different format might cause a driver to recompile a shader at the draw call time. I’ve seen 60 milliseconds on iPhone 3Gs at first draw call with a relatively simple shader, all spent inside shader compiler backend. 60 milliseconds! There are various things that can cause performance hiccups like this: texture formats, blending modes, vertex layout, non power of two textures and so on.

Suggestion: work with GPU vendors and agree on an API that could make guarantees on when the expensive resource creation / patching work can happen, and when it can’t. For example, somehow guarantee that a draw call or a state set will not cause any object recreation / shader patching in the driver. I don’t have much experience with D3D10/11, but my impression is that this was one of the things it got right, no?

Offline shader compilation.

GLES2.0 has the functionality to load binary shaders, but it’s not mandatory. Some of the big platforms (iOS, I’m looking at you) just don’t support it.

Now of course, a single platform (like iOS or Android) can have multiple different GPUs, so you can’t fully compile a shader offline into final optimized GPU microcode. But some of the full compilation cost could very well be done offline, without being specific to any particular GPU.

Suggestion: come up with a platform independent binary shader format. Something like D3D9 shader assembly is probably too low level (it assumes a vector4-based GPU, limited number of registers and so on), but something higher level should be possible. All of the shader lexing, parsing and common optimizations (constant folding, arithmetic simplifications, dead code removal etc.) can be done offline. It won’t speed up shader loading by an order of magnitude, but even if it’s possible to cut it by 20%, it’s worth it. And it would remove a very big bug surface area too!

Texture loading.

A lot (all?) of mobile platforms have unified CPU & GPU memories, however to actually load the texture we have to read or memory map it from disk and then copy into OpenGL via glTexture2D and similar functions. Then, depending on the format, the driver would internally do swizzling and alignment of texture data.

Suggestion: can’t most of this cost be removed? If for some formats it’s perfectly, statically known what layout and swizzling the GPU expects… can’t we just point the API to the data we already loaded or memory mapped? We could still need to implement the glTexture2D case for when (if ever) a totally new strange GPU comes that needs the data in a different order, but why not provide a faster path for the current GPUs?

Vertex declarations.

In unextended GLES2.0 you have to do a ton of calls just to setup vertex data. OES_vertex_array_object is a step in the right direction, providing the ability to create sets of vertex data bindings (“vertex declarations” in D3D speak). However, it builds upon an existing API, resulting in something that feels quite messy. Somehow it feels that by starting from scratch it could result in something much cleaner. Like… vertex declarations that existed in D3D since forever maybe?

Suggestion: clean up that shit! It would probably need to be tied to a vertex shader input signature (just like in D3D10/11) to guarantee there would be no shader patching, but we’d be fine with that.

Shader uniforms are per shader program.

What it says - shader uniforms (“constants” in D3D speak) are not global; they are tied to a specific shader program. I don’t quite understand why, and I don’t think any GPU works that way. This is causing complexities and/or performance loss in the driver (it either has to save & restore all uniform values on each shader change, or have dirty tracking on which uniforms have changed etc.). It also causes unneeded uniform sets on the client side - instead of having, for example, view*projection matrix set just once per frame it has to be set for each shader program that we use.

Suggestion: just get rid of that? If you need to not break the existing spec, how about adding an extension to make all uniforms global? I propose glCanHaz(GL_OES_GLOBAL_UNIFORMS_PLZ)

Next up:

Next time, I’ll take a look at my unorganized wishlist for mobile graphics features!


A Non-Uniform Work Distribution

Warning: a post with stupid questions and no answers whatsoever!

You need to do ten thousand things for the gold master / release / ShipIt(tm) moment. And you have 40 people who do the actual work… this means each of them only has to do 10000/40=250 things, which is not that bad. Right?

Meanwhile in the real world… it does not actually work like that. And that’s something that has been on my mind for a long time. I don’t know how much of this is truth vs. perception, or what to do about it. But here’s my feeling, simplified:

20 percent of the people are responsible for getting 80 percent of the work done

I am somewhat exaggerating just to keep it consistent with the Pareto principle. But my feeling is that “work done” distribution is highly non uniform everywhere I worked where the team was more than a handful of people.

Here are some stupid statistics to illustrate my point (with graphs, and everyone loves graphs!):

Graph of bugs fixed per developer, over one week during the bug fixing phase. Red/yellow/green corresponds to priority 1,2,3 issues:

The distribution of bugs fixes is, shall we say, somewhat non uniform.

Is it a valid measure of “productivity”? Absolutely not. Some people probably haven’t been fixing bugs at all that week. Some bugs are way harder to fix than others. Some people could have made major part of the fix, but the finishing touches & the act of actually resolving the bug was made by someone else. So yes, this statistics is absolutely flawed, but do we have anything else?

We could be checking version control commits.

Or putting the same into “commits by developer”:

Of course this is even easier to game than resolving bugs. “Moving buttons to the left”, “Whoops, that was wrong, moving them to the right again” anyone? And people will be trolling statistics just because they can.

However, there is still this highly subjective “feeling” that some folks are way, way faster than others. And not in just “can do some mess real fast” way, but in the “gets actual work done, and done well” way.

Or is it just my experience? How is it in your company? What can be done about it? Should something be done about it? I don’t know the answers…