Testing graphics code

Everyone is saying “unit tests for the win!” all over the place. That’s good, but how would you actually test graphics related code? Especially considering all the different hardware and drivers out there, where the result might be different just because the hardware is different, or because the hardware/driver understands your code in a funky way…

Here is how we do it at work. This took quite some time to set up, but I think it’s very worth it.

Testing Lab in actionFirst you need hardware to test things on. For a start just a couple of graphics cards that you can swap in and out might do the trick. A larger problem is integrated graphics cards – it’s quite hard to swap them in and out, so we bit the bullet and bought a machine for each integrated card that we care about. The same machines are then used to test discrete cards (we have several shelves of those by now, going all the way back to… does ATI Rage, Matrox G45 or S3 ProSavage say anything to you?).

It looks pretty random, huh?Then you make the unit tests (or perhaps these should be called the functional tests). Build a small scene for every possible thing that you can imagine. Some examples:

  • Do all blend modes work?
  • Do light cookies work?
  • Does automatic texture coordinate generation and texture transforms work?
  • Does rendering of particles work?
  • Does glow image postprocessing effect work?
  • Does mesh skinning work?
  • Do shadows from point lights work?

This will result in a lot of tests, with each test hopefully testing a small, isolated feature. Make some setup that can load all defined tests in succession and take screenshots of the results. Make sure time always progresses at fixed rate (for the case where a test does not produce a constant image… like particle or animation tests), and take a screenshot of, for example, frame 5 for each test (so that some tests have some data to warm up… for example motion blur test).

By this time you have something that you can run and it spits out lots of screenshots. This is already very useful. Get a new graphics card, upgrade to new OS or install a new shiny driver? Run the tests, and obvious errors (if any) can be found just by quickly flipping through the shots. Same with the changes that are made in rendering related code – run the tests, see if anything became broken.

My crappy Perl code…The testing process can be further automated. Here we have a small set of Perl scripts that can either produce a suite of test images for the current hardware, or run all the tests and compare the results with “known to be correct” suite of images. As graphics cards are different from each other, the “correct” results will be somewhat different (because of different capabilities, internal precision etc.). So we keep a set of test results for each graphics card.

That’s an awful lot of drivers!Then these scripts can be run for various driver versions on every graphics card. They compare results for each test case, and for failed tests copy out the resulting screenshot, the correct screenshot, log the failures into a wiki-compatible format (to be posted on some internal wiki), etc.

I’ve heard that some folks even go a step further – fully automate the testing of all driver versions. Install one driver in silent mode, reboot the machine, after reboot runs another script that launches the tests and proceeds with the next driver version. I don’t know if that is only an urban legend or if someone actually does this*, but that would be an interesting thing to try. The testing per card then would be: 1) install a card, 2) run the test script, 3) coffee break, happiness and profit!

* My impression is that at least with the big games it works the other way around – you don’t test with the hardware; instead the hardware guys test with your game. That’s how it looks for a clueless observer like me at least.

So far this unit test suite was really helpful in a couple of ways: making of the just-announced Direct3D renderer and discovering new & exciting graphics card/driver workarounds that we have to do. Making of the suite did take a lot of time, but I’m happy with it!

13 Responses to 'Testing graphics code'

  1. Roy

    Don’t some people also do ‘fuzzy compares’ with reference images? As in, you basically have one card be the reference card of which you’re certain it renders properly. Subsequently, all other cards should reasonably match the render of the reference card.

    That would save quite a bit of screenshot flipping.

  2. Ryan

    Great work!

    I work at a studio that used to contract for EA and was recently bought up, and I’ve wanted to do this for every game I’ve ever worked on. It’s much more cost-effective to do it at an engine level as you’re doing, and I’m so glad to hear that it actually works and is useful. I think I’ll keep around a bookmark to this entry to show my coworkers.

    And FYI, EA has a compatibility testing lab for PC games that is quite rigorous. There’s always a bit of dread when submitting a build for compatibilty testing, because we invariably get strange bugs on the most obscure video cards.

  3. Aras Pranckevičius

    Roy: the thing is – there’s no single reference card. A card might not support shaders = result is different (some effects will fallback). Or shadows = result is different (no shadows). And so on. What could be possibly done is storing a reference image set per “card generation” (“this is how it should look on ps2.0 cards”), and then do a bit of fuzzy comparison.

    Ryan: thanks! It does not actually happen at the engine level, all that is required from the engine is the ability to do screenshots and load “game levels” (in this case it’s test scenes) in succession.

  4. blackpawn

    whoa hardcore. i’m still doing things pretty ad hoc and relying a lot on alpha testers to report problems.

    i can see it being pretty hard to think of all the things that should be screenshot because some really random things can break. for example objects fading to the haze color when the camera looks a certain direction, corrupt geometry on character with > X bones, bad texture transform only on stage 2 or 3. i guess even these could be picked up by luck in a particular shot.

  5. Horn

    I just came across your blog and it seems that you are the right person to ask about Test Driven Development and Unit Testing in 3D graphics. I am doing a thesis in applying Test Driven Development in 3D training simulations, but I have trouble finding relevant material about this.

    Here is a more precise formulation of what I work on:

    I work on creating a tool to test if 3D graphics scenes are correctly displayed. The idea is to store a setup that I know is correctly displayed and rendered as a reference scene. How this oracle scene is chosen to be the reference scene is not so important. Maybe it is verified manually.

    The tool should be used on a military training simulator used to train Forward Air Controllers (FACTS). The simulation is written in C++ with Delta3D as engine. The tool might be integrated with a existing unit test framework.

    On strategy that I currently work on, is to capture frames and compare them against the oracle using image processing algorithms. However this approach seems to be hard because how should the timing interval be selected so the tool works independent of the pc running the simulation. I mean performance issues of the underlaying hardware should not affect which frames that are captured.

    Another approach could be to compare selected 3D models in the scene by traversing their scene graphs, and that way avoiding performance issues related to different CPUs and GPUs.

    Hope to hear your ideas and thoughts towards this and if you have any references about something related I would appreciate it.

    Best regards
    Horn

  6. realtimecollisiondetection.net - the blog » A brief graphics blog summary

    [...] not have noticed gamma being broken for several weeks (if indeed ever).  Another good post on testing graphics code can be found at Aras Pranckevičius’ [...]

  7. Game Rendering » Regression Testing a Renderer

    [...] Information about how Unity does testing of their graphics code http://aras-p.info/blog/2007/07/31/testing-graphics-code/ [...]

  8. Jonathan Hartley

    @blackpawn

    You’re right that failures in totally unexpected ways are a problem.

    I think the general testing philosophy to address this is that you create tests to cover as much as you reasonably can. Then every time you discover a new failure that the tests were not catching, you add a test for it. Fix it. Then run the tests again to watch them pass. In future you will be alerted if ever that particular failure regresses.

    This isn’t perfect, but it’s still better than running with no tests at all.

    Best regards.

  9. Jonathan Hartley

    @myself

    I forgot a very important step:

    Add a test for it. *Run the test to watch it fail*. Fix the problem. Then run the tests again to watch them pass.

    All tests should be written this way – if the tests are written after the working code, you will be amazed how many times you write a test which has a bug in it and always passes. Running the test to see it fail helps ensure your tests don’t have bugs of this type.

  10. Jonathan Hartley

    @Horn

    One idea might be to have your simulation use a virtual clock. This clock will usually just output values that it gets directly from the real clock, and the simulation will run as normal. But in some circumstances (eg. while testing) you will ask the virtual clock to instead provide ‘fake’ clock data. This could be used during testing to make sure your simulation progresses by a known amount between frames.

    This behaviour of the virtual clock can be exhaustively unit-tested, separately.

    Obviously this has the problem the your program under test will not be running precisely the same code paths in testing as in production, but this may still be the simplest and most reliable way to address the issue.

    Unless you can directly compare frames using a very simple image-comparison (eg. is every pixel identical), then I think image processing to compare sorta-similar frames is going to be hard and unreliable. I’d be tempted, for an application of this type, to only do minimal functional level testing by comparing images – just enough to verify that the program is running and framerates are sane and then test the rendering (of sky, ground, planes, shadows, etc.)

    For the rest of the program behaviour that isn’t rendering-related, I’d then add a comprehensive layer of ‘high-level unit tests’, that exhaustively test, for example, plane movements by examining the changes to data structures in your program, or test the camera by examining the modelview matrix that is sent to OpenGL (or whatever), rather than examining the output from OpenGL.

    I hope these ideas are applicable.

    Best regards

  11. Aras Pranckevičius

    @Jonathan: the virtual clock thing, sure. In our graphics testing setup, we do run everything at exactly the same path (same clock values etc.).

    For non-graphics related things, of course we have different test setups. I was talking about tests that specifically test graphics (and the code that interfaces with the graphics API).

  12. Catching Common Image Processing Programming Errors with Generic Unit Tests · code-spot

    [...] Testing Graphics Code [...]

  13. Lost in the Triangles » Blog Archive » Testing Graphics Code, 4 years later

    [...] four years ago I wrote how we test rendering code at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 [...]

Leave a Reply