Improving C#/Mono for Games

A tweet by Michael Hutchinson on C#/Mono usage in games caused me to do a couple of short replies (one, two). But then I started thinking a bit more, and here’s a longer post on what is needed for C# (and more specifically Mono) to be used in games more.

In Unity we use Mono to do game code (well, Unity users are doing that, not us). Overall it’s great; it has tons of advantages, loads of awesome and a flying ninja here and there. But no technology is perfect, right?

Edit: Miguel rightly points out in the comments that Mono team is solving or has already solved some of these issues already. In some areas they are moving so fast that we at Unity can’t keep up!


#1: Garbage Collector

Most game developers do not like Garbage Collection (GC) very much. Typically, the more limited/hardcore their target platform is, the more they dislike GC. The reason? Most GC implementations cause rather unpredictable spikes.

Here’s a run of something recorded in the (awesome) Unity 2.6 profiler. Horizontal axis is time, vertical is CPU time spent in that frame:
Garbage collection spikes

At the bottom you see dark red thingies appearing once in a while. This is garbage collector kicking in, because some script code is allocating some memory at runtime.

Now of course, it is possible to write your script code so that it does no allocations (or almost no allocations). Preallocate your objects into pools, manually invoke GC when there’s a game situation when a small hickup won’t affect gameplay, etc. In fact, a lot of iPhone games made with Unity do that.

But that kind of side steps the whole advantage of “garbage collector almost frees you from doing memory management”. If you’re not allocating anything anyway, GC could just as well not be there!

A little side story. Me and Unity’s iPhone tech lead ReJ tried to explain what GC is to a non-programmer. Here’s what we came up with:

Garbage Collection is this cleaning service for lazy people. They can just leave any garbage on the floor in their house, and once in a while a garbage guy comes, collects all the garbage and takes it outside. Now, there are some intricacies in the service.

First, you never know when the garbage guy will come. You might be taking a shower, doing a meditation or having some “sexy time” – and it’s in the service agreement that when a garbage guy comes, you have to let him in to do his work.

Second thing is, the garbage guy is usually some homeless drunkard. He smells so bad that when he comes, you have to stop whatever you were doing, go outside and wait until he’s done with the garbage collection. Even your neighbors, who might be doing something entirely else in parallel, actually have to stop and idle while garbage is being collected in your house!

There are variations of this GC service. One variation is called “moving GC”, where the garbage guy also rearranges your furniture while collecting the garbage – he moves it all into one side of your house. This is so that you can buy a bigger piece of furniture, or throw a huge piece of garbage – and there will be enough unused space for you to do that! Of course this way GC process takes somewhat longer, but hey, you get all your stuff nicely packed into one corner.

Can’t you see that this service is the greatest idea of all time?

This is quite a harsh attitude towards GC, and of course it’s exaggerated. But there is some truth to it. So how could GC be fixed?

GC fix #1: more control

More explicit control on when & how long GC runs. I want to say to the garbage guy, “come everyday at 4PM and do your work for 20 minutes”. In the game, I’d want to call GC with an upper time limit, say 1 millisecond for each call, and I would be calling that 30 times per second.

GC fix #2: sometimes I want to clean garbage myself

Inefficiencies and unpredictability of GC cause people to do even more work than a normal, oldskool memory allocation. Why not provide an option to deal with deallocations manually? I.e. a keyword reallynew could allocate an object that is not part of garbage collected world. It would function as a regular .NET object, just it would be user’s responsibility to reallydelete it.

Mono is already extending .NET (see SIMD and continuations). Maybe it makes sense to add some way to bypass garbage collector?

#2: Distribution Size

Using C#/.NET in a game requires having .NET runtime. None of the interesting platforms are guaranteed to have it, and even on Windows you can’t count on it being present. Mono is great here in a sense that it can be used on many more platforms than Microsoft’s own .NET. It’s also great on distribution size, but only if you compare it to Microsoft’s .NET.

In Unity Web Player, we package Mono DLL + mscorlib assembly into something like 1.5 megabytes (after LZMA compression). Which is great compared to 20+ megabytes of .NET runtime, but not that great it you compare it so, say, Lua runtime (which is less than 100 kilobytes).

On some platforms (iPhone, Xbox 360, PS3, …) it’s not possible to generate code at runtime, so Mono’s JIT does not work. All code that’s written in C# has to be precompiled to machine code ahead of time (AOT compilation). This is not a problem per se, but because .NET framework was never designed with small size and few dependencies in mind, doing anything will ultimately pull in a lot of code.

We joke that doing anything in C# will result in an XML parser being included somewhere. This is not that far from the truth; e.g. calling float.ToString() will pull in whole internationalization system, which probably somewhere needs to read some global XML configuration file to figure out whether daylight savings time is active when Eastern European Brazilian Chinese calendar is used.

Size fix #1: custom core .NET libraries?

For game uses, most of “fat” stuff in .NET runtime is not really needed. float.ToString() could just always use period as a decimal separator. Core libraries could consist just of essential collections (list, array, hash table) and maybe a String class, with just essential methods. Maybe it’s worth sacrificing some of the generality of .NET if that could shave off a couple of megabytes from your iPhone game size?

Of course this is very much doable; “all that is needed” ™ is writing custom mscorlib+friends, and telling C# compiler to not ever reference any of the “real” libraries.

Size fix #2: make Mono runtime smaller

Uncompressed Mono DLL in our Windows build is 1.5 megabytes. We have turned off all the easy stuff (profiler, debugger, logging, COM, AOT etc.). But probably some more could be stripped away. Do our games really need multiple AppDomains? Some fancy marshalling? I don’t know, it just feels that 1.5MB is a lot.

#3: Porting to New Platforms

You know this classic: “There’s no portable code. There’s only code that’s been ported.”

Most existing gaming platforms are quite weird. Most upcoming smartphone platforms also are quite weird, each in their own interesting way. Porting a large project like Mono is not easy, especially since parts of it (JIT or AOT engine) highly depend on the platform.

For Unity iPhone, unexpected discovery that it’s not possible to JIT on iPhone made the initial release be delayed by something like 4 months. It did not help that in early iPhone SDK builds JIT was actually possible, and Apple decided to disable runtime generated code later. Making Mono actually work there required significant work both from Mono team and from Unity. We still have one guy working almost exclusively on Mono+iPhone issues!

Of course, maybe all the Mono iPhone work made porting to new platforms easier as a byproduct. But so far we don’t have Mono ported to any other platform, up to production quality. So judging from experience, we now always assume Mono port will be a pain, just because “some nasty surprises will come up” (and they always do).

#4: Small Stuff

There is a ton of small bits where extending .NET would benefit gaming scenarios. For example:

Suppose there is some array on the native engine side; for example vertex positions in a mesh (3xFloat for each vertex). Is it possible to make that piece of memory be represented as a native struct array for .NET side? So that it would not involve any extra memory copies, but N vertices somewhere in memory would look just like Vector3[N] for C#?

On a similar note, having “strided arrays” would be useful. For example, mesh data is often interleaved, so for each vertex there is a position, normal, UVs and so on. It would be cool if in C# position array would still look like Vector3[N], but internally the distance between each element would be larger than 12 bytes required for Vector3.

Where do we go from here?

The above are just random ideas, and I’m not complaining about Mono. It is great! It’s just not perfect. Mono being open source is a very good thing, which means pretty much any interested party can improve it as needed. So rock on.

14 Responses to 'Improving C#/Mono for Games'

  1. Seon R

    Interesting read Aras. Explains a bit about what you guys have ahead of you in terms of challenges/frustration in moving Unity to new platforms/consoles.

    Keep up the great work!

  2. Miguel de Icaza

    Aras, you guys keep usih the general purpose Mono which is why you end up with XML an configuration libraries; I explained to Joe at the GDC earlier this year that you should use the Silverlight profile which has none of these dependencies. You only have yourself to blame.

    As for comparing mono vs lua; they can’t be compared, one is a compiled system the other us an interpreter; If you want interpreter speeds, by all means use an interpreter and it will be a lot easier to port.

    As for the structure representation, yes, there are ways of designing your structures and your marshalling to avoid any conversions. Send me email or post to mono-devel list and I will be happy to tell you how to do it based on your own structures
    Finally, as I have suggested to Lucas and you guys, the silverlight profile has fewer cross dependencies and it works better with the mono linker. Our linked applications using the generics corlib are smaller than your hand chopped mscorlib and there is still a lot of room for improvement. The linker is the smarter way of working, as oppossed to the tedious and error prone approach thy you have use so far. I realize that “we need it fast and we need it today” gets in the way of using the smarter solution, but it has been entirely your choice.

    On manual memory mangement: there is a partial solution, you need your objects to implement IDisposable and then call Dispose on them. It won’t take care of everything, but it will help. You can also allocate memory manualy using the Marshal class an release it yourself, it is useful in particular for large buffers.

    Another trick that you can use to avoid the jittery GC is to enable incremental gc at runtime, I would you to try that and follow up with the results as it seems promising.

  3. Simon

    Good summary. I’m curious; was Lua ever considered for Unity? I always thought they would have been a great fit.

  4. Jedd Haberstro

    There is one thing I think you’re missing: an easy to use tool to expose C++ to Mono. I know you guys have written your own internal tool to do this so it’s not a problem for you, but SWIG generates crap and P/Invoke is just to tedious.

  5. Pat

    While I agree in principle about the garbage collector problems, I disagree that the tools do not exist already to work around some of the GC issues. The GC is only something of a black-box, but it is not entirely opaque. Unless Unity performs much differently, there should be a pattern to the GC operations, and what is triggering them. In my opinion, solving the problem at that level is a much better approach than trying to bypass the GC via a Mono extension.

    You should be able to do arrays without copying by either allocating a chunk of memory from the GC, or by allocating an array, pinning it, and using ‘UnsafeAddrOfPinnedArrayElement’. I have been using the second method but I haven’t done performance measurements. I could be totally wrong.

    I spent a significant chunk of time putting the .NET/C interop stuff through its paces about a few months ago, and my findings were that there are lots of ways to do it, and many of them are bad to do. Being able to examine the IL code really helps with a lot of things I have found. I have also found that, if the API is causing excessive marshaling time, it may be needed to change the API.

  6. ReJ

    Regarding GC fix #2: right now we have 2 lifetime “types” of the objects – on-stack structs (lifetime is equal to the scope of the function) or GC managed objects with arbitrary lifetime. That is OK for generic applications, however games usually have a lot of objects which lifetime is longer than scope of a single function, but shorter than single frame – events, temporary buffers which gather and retain data until it is processed (for example rendered) during the course of a frame, callbacks arguments, etc.

    I think it would be rather nice extension – possibility to allocate objects with “single frame” lifetime – collection then would be a simple pointer move at the end of the frame.

    However protecting user from accessing destroyed object during the next frame could be tricky (at least with BoehmGC) – most probably that would require usage of proxy pointers or objects would have to store “timestamps” for sanity checks.

  7. Aras Pranckevičius

    @Miguel: all very valid points. I am aware that you have moved a lot to reduce distribution size for Moonlight etc. I’ve updated my post to clarify that some of my observations are based on old stuff.

  8. steve

    I don’t understand why more people don’t use Python. It’s small, cross-platform, has built-in JIT (rather than an add-on like in Lua), garbage collector is pretty controllable (can be turned on and off and invoked directly).

    I must admit that I’m still not hugely fond of Python’s fixed source tabulation, but I can live with that given how much more mature and packed with features it is than so many alternatives.

  9. Sam Martin

    @aras. Some nice points. Spiky garbage collection is a real problem in all GC’d languages I can think of. I’m not even sure there is a GC’d language that doesn’t suffer from it?

    @Steve I thought quite a few people are using python now? Probably not as many as Lua, but then again Lua is utterly trivial to integrate and get up and running. I haven’t used much python myself, but the usual complaints I here are:
    - tricky/time consuming to get properly integrated
    - has some nasty performance cases
    Are these fair? Is mono easier or better in these regards?

  10. Aras Pranckevičius

    @steve: before Unity 1.0, they used Python. This was way before I joined, but some of the reasons why they switched to Mono: 1) speed, 2) much better embedding, 3) size. Of course all that was around 2003-2004, so everything might have changed since then. Do you know what’s Python’s embedding size?

  11. Miguel de Icaza

    Python’s embeddable runtime is half the size of Mono’s embeddable runtime (1.2 megs vs 2.4 megs). In Python’s case there is no JIT engine, although there are separate independent projects that provide JIT engines (Jython, IronPython, PyPy).

    The other difference is that Python remains a dynamic language at its core, so even with JIT engines they do not match the performance of statically compiled languages like C# or UnityScript.

  12. Pat

    I still disagree on the GC extensions. I think the GC is what it is, and it works fine for many cases. Aras mentioned object pooling in the original post, and that is a great solution, especially for composition-based stuff. Instead of suggesting to the .NET runtime how we may like garbage collection to occur, I think the GC is just an uncontrolled element that is part of the environment. If we want to assert more control over it, I think the problem should be implemented at the application level, not the runtime. In my opinion, wanting hacks into the runtime for GC control is like wanting Microsoft to provide malloc hooks in their C runtime. We could either ask for the malloc hooks, or just code accordingly, and have our own hooks. Or it’s like using a micro-allocator for small, temporary allocations. The GC seems no different to me. This is non-desired, default, memory functionality…so we write our own code to perform the specific task in a more optimal way.

    Than again, I am a C programmer, and write C# like C++, and when all you have is a hammer, everything looks like a nail.

  13. kenpex

    I have to say, of course we’ll all like a better GC.

    But that is a not a big deal as it might seem or at least, it’s not that worse than manual memory management.

    I can already see so many programmers looking at that graph and thinking that they’re indeed right when they think that GC sucks and C/C++ memory management is the key…

    I don’t think you’re right when you say that not allocating in runtime makes GC not useful. GC is NOT a tool to avoid memory leaks, is NOT a tool for lazy programmers that do not want to write a free here and there. In fact, even in C those problems are not problems, with a good debugging malloc leaks are an easy bug to solve (fragmentation is the real bitch).

    GC is to make objects live across modules possible. In C++ in order to do that you write a RC system, and 99% of the games do that. GC is better than RC at solving that problem, that’s his strength, not the leaks thing…

    As for the runtime cost… It can be high, but it’s important to note that even manual memory allocation/deallocation cost is high, and games usually avoid dynamic allocations in runtime, even without GC! That said, the situation is even worse, if you factor in the RC time, that as I said, most games use…

    So in summary… GC has an overhead in runtime? True, but who cares, memory management overhead is anyway too high and unpredictable even without GC. What’s the advantage of GC then? It’s a better RC system! Why moving GC are cool? Mostly because you can avoid fragmentation, that’s the real problem!

  14. bobDole

    This is not a put down on Aras but he seems to deal with a lot of low-level hacky stuff. Which probably has a big impact on iphone but a lot less on a computer. I generally agree with kenpex c# uses a different programming model.

    That said in theory being able to limit the duration of a GC call sounds like a good idea. While removing GC seems less optimal.

Leave a Reply