<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lost in the Triangles &#187; code</title>
	<atom:link href="http://aras-p.info/blog/tags/code/feed/" rel="self" type="application/rss+xml" />
	<link>http://aras-p.info/blog</link>
	<description>Random thoughts of a triangle pusher</description>
	<lastBuildDate>Fri, 09 Sep 2011 17:03:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Prophets and duct-tapers or: useful programmer traits</title>
		<link>http://aras-p.info/blog/2011/09/09/prophets-and-duct-tapers-or-useful-programmer-traits/</link>
		<comments>http://aras-p.info/blog/2011/09/09/prophets-and-duct-tapers-or-useful-programmer-traits/#comments</comments>
		<pubDate>Fri, 09 Sep 2011 16:52:38 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=795</guid>
		<description><![CDATA[I liked Pierre&#8217;s The Prophet Programmer post. Go read it now. Now of course that post is a rant. It exaggerates. It puts everything into one bit grayscale colors. There&#8217;s never one person completely like this &#8220;prophet programmer&#8221; and another like the idolized &#8220;best programmer&#8230; not afraid of anything!!1&#8243;. But it does highlight at least [...]]]></description>
			<content:encoded><![CDATA[<p>I liked Pierre&#8217;s <a href="http://www.codercorner.com/blog/?p=502">The Prophet Programmer</a> post. Go read it now.</p>
<p>Now <em>of course</em> that post is a rant. It exaggerates. It puts everything into one bit grayscale colors. There&#8217;s never one person completely like this &#8220;prophet programmer&#8221; and another like the idolized &#8220;best programmer&#8230; not afraid of anything!!1&#8243;.</p>
<p>But it does highlight at least this thing: some aspects of programmer&#8217;s behavior are either useful or not.</p>
<p>Obsessing over latest hypes, &#8220;the proper ways&#8221;, following books by the letter just by itself <em>is not useful</em>. Sure, sometimes a dash of &#8220;proper ways&#8221; or recommendations is good, but the benefits of doing that are really, really tiny. Hence it&#8217;s not worth thinking/arguing much about.</p>
<p><strong>Here&#8217;s some actually useful programmer traits</strong> instead.</strong> I&#8217;m thinking about real actual people I&#8217;m working with here, even if I&#8217;m not telling names.</p>
<p>He <em>feels what needs to be done</em> to get the solution, in the big picture. Sometimes these are unusual ideas that probably no one is doing &#8211; because everyone has always been seeing the problem in the standard way. The solutions seem obvious once you see them, but require some sort of step function in thinking to get there. Zero iteration way of hooking up touchscreen device input to test the game is to play the game on PC, stream images into the device and stream inputs back. Least hassle free asset pipeline is when there is no &#8220;export/import asset&#8221; step. Or a more famous outside example, tablets <a href="http://aras-p.info/blog/wp-content/uploads/2011/09/tablets-before-and-after-ipad.jpeg">before and after</a> the iPad. You rarely, if ever, can do things like that by doing user surveys or improving on existing solutions; you need someone who can see through and find what&#8217;s the <em>actual</em> problem you want to solve. This guy is worth gold.</p>
<p>She can <em>cut things</em>. &#8220;Perfection is achieved, not when there is nothing more to add, but when there is nothing left to cut away&#8221;, quoth Saint-Exupéry. To be good at doing anything you (both you and your team) need to focus, which means cutting things. Let go of bad ideas and blind alleys. If your justification for doing it is &#8220;but we already spent so much time on it&#8221;, just don&#8217;t &#8211; it will only get worse. Cut features that aren&#8217;t quite ready by the deadlines. Remove old things that aren&#8217;t useful anymore. Doing that can and will make some people upset; it&#8217;s really, <em>really</em> hard to postpone or even completely abandon a thing that someone put a lot of effort into. But it needs to be done; and you need her on the team to make these hard decisions.</p>
<p>That other guy is <em>freaking fast</em>. And not in a sense of &#8220;types tons of code real fast and then sometimes it works, and two weeks after someone else has to clean it up&#8221;. No &#8211; he&#8217;s cranking out good, solid, tested, working code at incredible speeds. Got ten bugs; they are fixed by next day. Got a new feature to do; commits with everything implemented (and working!) are pushed in a few days. When he goes on vacation your burndown chart changes slope. How he does it? I don&#8217;t know. But by all means, keep onto him!</p>
<p>The other girl can figure out any <em>complex problem real fast</em>. Be it a tricky bug, unexpected behavior, really weird interaction with other systems &#8211; others could be spending hours, if not days, trying to figure out what&#8217;s going on. She, on the other hand, checks just a handful of things and goes &#8220;ha! the problem&#8217;s right there&#8221;. As if applying binary search to the whole problem space, except to everyone else the space seems unsorted and they don&#8217;t even know what they&#8217;re looking for!</p>
<p>This dude can keep <em>a ton of context in his head</em> while doing anything. How will this feature interact with dozens or even hundreds of other features; he&#8217;s able to think about all of them and majority of corner cases and get everything right in one go. Would take dozens of roundtrips between coding &#038; QA for someone else to get right. When estimating effort for new things, he can immediately list all the tricky work that will need to be done; whereas others would go &#8220;sounds easy&#8221; only to find out it&#8217;s a month of work.</p>
<p>She&#8217;s <em>not satisfied with the status quo</em>. No this isn&#8217;t good enough, she says; and let me show you where &#038; how spectacularly it breaks. And it does not matter if everyone else is doing it this way; here&#8217;s why putting that stuff into uniform grid isn&#8217;t good. A lot of times you need this extra bump to snap out of your own &#8220;this is good enough, no one will care&#8221; thoughts.</p>
<p>He&#8217;s doing a lot of <em>boring work to get others more productive</em>. There&#8217;s <em>a ton</em> of boring work on even the most exciting projects, and someone has to do it. He&#8217;s often the unsung hero, quietly working on infrastructure, build times, fixing annoyances in the tools, processes and workflows; all just so that others can be better at doing <em>exciting</em> things. You could call him a janitor or a plumber if you wish, but any place gets rotten and broken real fast without those people.</p>
<p>&#8230;and the list could go on. Unlike obsessing over irrelevant details, <strong>these make a difference</strong>. Makes your team run circles around others. Helps you solve <em>hard</em> problems, invent things, moves you forward at enormous velocity.</p>
<p>You need people with those traits and attitudes.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/09/09/prophets-and-duct-tapers-or-useful-programmer-traits/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Testing Graphics Code, 4 years later</title>
		<link>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/</link>
		<comments>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/#comments</comments>
		<pubDate>Fri, 17 Jun 2011 04:44:46 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=762</guid>
		<description><![CDATA[Almost four years ago I wrote how we test rendering code at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 people to more than 100 people? I&#8217;m happy to say it did! That&#8217;s it, move on to read the rest of the internets. The earlier [...]]]></description>
			<content:encoded><![CDATA[<p>Almost four years ago <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">I wrote how we test rendering code</a> at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 people to more than 100 people?</p>
<p><em>I&#8217;m happy to say it did! That&#8217;s it, move on to read the rest of the internets.<br />
</em></p>
<p>The earlier post was more focused on hardware compatibility area (differences between platforms, GPUs, driver versions, driver bugs and their workarounds etc.). In addition to that, we do regression tests on a bunch of <a href="http://blogs.unity3d.com/2010/01/12/on-web-player-regression-testing/">actual Unity made games</a>. All that is good and works, let&#8217;s talk about what tests the rendering team at Unity is using in the daily lives instead.</p>
<p><strong>Graphics Feature &#038; Regression Testing</strong></p>
<p>In daily life of a graphics programmer, you care about two things related to testing:</p>
<p><span id="more-762"></span><strong>1.</strong> Whether a new feature you are adding, more or less, works.<br />
<strong>2.</strong> Whether something new you added or something you refactored broke or changed any existing features.</p>
<p>Now, &#8220;works&#8221; is a vague term. Definitions can range from equally vague</p>
<blockquote><p>Works For Me!</p></blockquote>
<p>to something like </p>
<blockquote><p>It has been battle tested on thousands of use cases, hundreds of shipped games, dozens of platforms, thousands of platform configurations and within each and every one of them there&#8217;s not a single wrong pixel, not a single wasted memory byte and not a single wasted nanosecond! <em>No kittehs were harmed either!</em></p></blockquote>
<p>In ideal world we&#8217;d only consider the latter as &#8220;works&#8221;, however that&#8217;s quite hard to achieve.</p>
<p>So instead we settle for small &#8220;functional tests&#8221;, where each feature has a small scene setup that exercises said feature (very much like talked about in <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">previous post</a>). It&#8217;s graphics programmer&#8217;s responsibility to add tests like that for his stuff.</p>
<p>For example, Fog handling might be tested by a couple scenes like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/092-FogModes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/092-FogModes.png" alt="" title="Fog Modes" width="400" height="300" class="alignnone size-full wp-image-770" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/017-Fog.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/017-Fog.png" alt="" title="Fog vs. different shaders; Forward rendering above, Deferred Lighting below" width="400" height="300" class="alignnone size-full wp-image-771" /></a></p>
<p>Another example, tests for various corner cases of Deferred Lighting:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/118-DeferredLMCases.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/118-DeferredLMCases.png" alt="" title="Lighmapped/NonLightmapped objects vs. Baked/NonBaked lights" width="400" height="300" class="alignnone size-full wp-image-774" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/134-DefLightShapes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/134-DefLightShapes.png" alt="" title="Light volumes crossing near/far planes" width="400" height="300" class="alignnone size-full wp-image-775" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/143-DefLargeCoords.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/143-DefLargeCoords.png" alt="" title="Ability to handle small near plane &amp; large world coordinates" width="400" height="300" class="alignnone size-full wp-image-776" /></a></p>
<p>So that&#8217;s basic testing for &#8220;it works&#8221; that the graphics programmers themselves do. Beyond that, features are tested by QA and a large beta testing group, tried, profiled and optimized on real actual game projects and so on.</p>
<p>The good thing is, doing these basic tests also provides you with point 2 (did I break or change something?) automatically. If after your changes, all the graphics tests still pass, there&#8217;s a pretty good chance you did not break anything. Of course this testing is not exhaustive, but any time a regression is spotted by QA, beta testers or reported by users, you can add a new graphics test to check for that situation.</p>
<p><strong>How do we actually do it?</strong></p>
<p>We use <a href="http://www.jetbrains.com/teamcity/">TeamCity</a> for the build/test farm. It has several build machines set up as graphics test agents (unlike most other build machines, they need an actual GPU, or a iOS device connected to them, or a console devkit etc.) that run graphics test configurations for all branches automatically. Each branch has it&#8217;s graphics tests run daily, and branches with &#8220;high graphics code activity&#8221; (i.e. branches that the rendering team is actually working on) have them run more often. You can always initiate the tests manually by clicking a button of course. What you want to see at any time is this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/teamcity-gfx-tests.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/teamcity-gfx-tests.png" alt="" title="The graphics tests are passing one by one!" width="445" height="362" class="alignnone size-full wp-image-778" /></a></p>
<p>The basic approach is the same as <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">4 years ago</a>: a &#8220;game level&#8221; (&#8220;scene&#8221; in Unity speak) for each test, runs for defined number of frames, run everything at fixed timestep, take a screenshot at end of each frame. Compare each screenshot with &#8220;known good&#8221; image for that platform; any differences equals &#8220;FAIL&#8221;. On many platforms you have to allow a couple of wrong pixels because many consumer GPUs are not <i>fully</i> deterministic it seems.</p>
<p>So you have this bunch of &#8220;this is the golden truth&#8221; images for all the tests:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/some-gfx-tests.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/some-gfx-tests-500x247.png" alt="" title="Images for some of the graphics tests" width="500" height="247" class="alignnone size-medium wp-image-781" /></a></p>
<p>And each platform automatically tested on TeamCity has it&#8217;s own set:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/gfx-test-platforms.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/gfx-test-platforms.png" alt="" title="Platforms of graphics tests" width="187" height="181" class="alignnone size-full wp-image-782" /></a></p>
<p>Since the &#8220;test controller&#8221; can run on a different device than actual tests (the case for iOS, Xbox 360 etc.), the test executable opens a socket connection to transfer the screenshots. The test controller is a relatively simple C# application that listens on a socket, fetches the screenshots and compares them with the template ones. The result of it is output that TeamCity can understand; along with &#8220;build artifacts&#8221; that consist of failed tests (for each failed test: expected image, failed image, difference image with increased contrast).</p>
<p>That&#8217;s pretty much it! And of course, automated tests are nice and all, but that should not get too much into the way of actual <a href="http://programming-motherfucker.com/">programming manifesto</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Mercurial/Kiln experience so far</title>
		<link>http://aras-p.info/blog/2011/04/18/mercurialkiln-experience-so-far/</link>
		<comments>http://aras-p.info/blog/2011/04/18/mercurialkiln-experience-so-far/#comments</comments>
		<pubDate>Mon, 18 Apr 2011 07:14:33 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=668</guid>
		<description><![CDATA[At work we switched to Mercurial almost two months ago. Like Richard says, it was time to stop using Subversion. Here are my impressions so far. Preemptive warning: I&#8217;ve only ever used CVS, SourceSafe, Subversion, git and Mercurial as source contro systems (never used Perforce). I never really used a code review tool before Kiln. [...]]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://unity3d.com/">work</a> we switched to <a href="http://mercurial.selenic.com/">Mercurial</a> almost two months ago. Like <a href="http://altdevblogaday.org/2011/03/09/its-time-to-stop-using-subversion/">Richard says</a>, it was time to stop using Subversion. Here are my impressions so far.</p>
<p><span id="more-668"></span><em>Preemptive warning: I&#8217;ve only ever used CVS, SourceSafe, Subversion, git and Mercurial as source contro systems (never used Perforce). I never really used a code review tool before Kiln. Everything below might be non-issues in other tools/systems, or not suitable for different setups/workflows!<br />
</em></p>
<p><strong>The Story</strong></p>
<p>At Unity we used <a href="http://subversion.apache.org/">Subversion</a> for source code versioning as long as I remember. svn revision 1 &#8212; an import from CVS &#8212; happened in 2005. We don&#8217;t talk about CVS. Nor about SourceSafe. Subversion was fine while the number of developers was small; we had a saying that CVS scales up to 5 people, and experimentally found out that svn scales up to about 50.</p>
<p>Since merging branches in subversion does not <em>really</em> work well, everyone was mostly working on one trunk, <em>carefully</em>. We would do an occasional branch for &#8220;this will surely break everything&#8221; features; and would branch off trunk sometime before each Unity release, but that&#8217;s about it. Having something like 50 people and 10 platforms on a single branch in version control does get a bit uneasy.</p>
<p>So we looked at various options, like <a href="http://git-scm.com/">git</a>, <a href="http://mercurial.selenic.com/">Mercurial</a>, <a href="http://www.perforce.com/">Perforce</a> and so on. I don&#8217;t know why exactly we ended up with Mercurial (someone made a decision I guess&#8230;). It <em>felt</em> like distributed versioning systems are <em>teh future</em> and unlike most game developers we don&#8217;t need to version hundreds of gigabytes of binary assets (hence no big need for Perforce).</p>
<p>So while some people were at GDC, we did a big switch to several things at once: 1) replace Subversion with Mercurial, 2) replace &#8220;everyone works on the same trunk&#8221; workflow with &#8220;teams work on their own topic branches&#8221;, 3) introduce a bit more formal code reviews via <a href="http://www.fogcreek.com/kiln/">Kiln</a>.</p>
<p>In hindsight, maybe switching three things at once wasn&#8217;t the brightest idea; there&#8217;s only so much change a person can absorb per unit of time. On the other hand, everyone experienced a large initial shock but now that the debris is setting down they just continue working with no big shocks predicted in the near future.</p>
<p><strong>Our Setup</strong></p>
<p>We use Fogcreek&#8217;s Kiln and host it on <a href="http://www.fogcreek.com/kiln/for-your-server.html">our own servers</a>. This is mostly for legal reasons I think (in our source code we have 3rd party bits which are under strict NDAs). Advantage of hosting ourselves is that we&#8217;re under complete control. Disadvantage is that we have to do some work; and we only get Kiln updates each couple of months (so for example everyone who lets Fogcreek host Kiln is on Kiln 2.4.x right now, while we&#8217;re still on 2.3.x).</p>
<p>Our source tree is about 12000 files amounting to about 600MB. Mercurial&#8217;s history (60000 revisions imported from svn) adds another 200MB. Additionally, we pull almost 1GB of binary files (see below for binary file versioning) into the source tree.</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2011/04/hg-branches.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/hg-branches-150x150.png" alt="" title="Team+feature branches in Mercurial" width="150" height="150" class="alignright size-thumbnail wp-image-685" /></a>Each &#8220;team&#8221; (core, editor, graphics, ios, android, &#8230;) has it&#8217;s own &#8220;branch&#8221; (actually, a separate repository clone) of the codebase, and merge back and forth between &#8220;trunk&#8221; repository. The trunk is supposed to be stable and shippable at almost any time (in theory&#8230; :)); unfinished, unreviewed code or code that has any failing tests can&#8217;t be pushed into trunk. Additionally, long-lasting features get their own &#8220;feature branches&#8221; (again, actually full clones of the repository). So right now we have more than 40 of those team+feature branches.</p>
<p>We have almost 50 developers committing to the source tree. Additionally, there is a build farm of 30 machines building most of those branches and running automated test suites. All this <em>does</em> put some pressure on the Kiln server ;) Everything below describes usage of Kiln 2.3.x with Mercurial 1.7.x; with more recent versions anything might have changed.</p>
<p><strong>Mercurial, or: I Have Two Heads!</strong></p>
<p>Probably the hardest thing to grok is the whole centralized-to-distributed versioning transition. Not everyone has github as their start page yet, and DVCS is actually more complex than a simple centralized model that Subversion has.</p>
<p>Things like this:</p>
<blockquote><p>OMG it says I have two heads now, what do I do?!</p></blockquote>
<p>just do not happen in centralized systems. <em>It&#8217;s not easy for a developer to accept he has two heads now, either. Or where this extra head came from&#8230;</em></p>
<p>And the benefits of distributed source control system are not immediately obvious to someone who&#8217;s never used one. The initial reaction is that suddenly everything got more complex for no good reason. Compare operations that you would use daily:</p>
<ul>
<li>Subversion: update, commit.
<ul>
<li>Since merges don&#8217;t really work: branch, switch &#038; merge are rarely used by mere mortals.</li>
</ul>
</li>
<li>Mercurial: pull, update or merge, commit, push.
<ul>
<li>And you might find you have two heads now!</li>
</ul>
<ul>
<li>You should also see their faces when you go &#8220;well, let me tell you about rebase&#8230;&#8221;. You might just as well explain everything with <a href="http://tartley.com/?p=1267">easy to understand spatial analogies</a> ;)</li>
</ul>
</li>
</ul>
<p>Thankfully, there&#8217;s this thing called the intertubes, which often has <a href="http://hginit.com/">helpful tutorials</a>.</p>
<p>Myself, I think <em>maybe</em> switching to git would have been a smaller overall shock. Mercurial is easier to get into, but it kind of pretends to work like ye olde versioning system, while underneath it is very different. Git, on the other hand, does not even try to look similar; it says &#8220;I&#8217;ll fuck with your brain&#8221; immediately after initial &#8220;hi how are you&#8221;. So it&#8217;s a larger initial shock, but maybe that <em>forces</em> people to get into this different mindset faster.</p>
<p><strong>Versioning large binary files</strong></p>
<p>Even if we <em>mostly</em> version only the code, there are occasional binaries. In our case it&#8217;s mostly 3rd party SDKs that are linked into Unity. For example, PhysX, Mono, FMOD, D3DX, Cg etc. We do have the source code for most of them, but we don&#8217;t need each developer to have 30000 files of Mono&#8217;s source code for example. So we build them separately, and version the prebuilt headers/libraries/DLLs in the regular source tree. Some of those prebuilt things can get quite large though (think couple hundred megabytes).</p>
<p>Most distributed version control systems (including git and mercurial) have trouble with this. <em>Every</em> version of <em>every</em> file is stored in your own local <del datetime="whoops, wrong terminology!">checkout</del>clone. Try having 50 versions of whole Mono build in there and you&#8217;ll wonder where the precious SSD space on your laptop did go!</p>
<p>Luckily, Kiln has a solution for this: <a href="http://kiln.stackexchange.com/questions/1873">kbfiles</a> extension. For each file marked as &#8220;large binary file&#8221;, only it&#8217;s &#8220;stand in&#8221; SHA1 hash is versioned, and the file itself is fetched from a central server into your local machine on demand. Think of it as a centralized versioning model for those special binary files. kbfiles itself is based on <a href="http://mercurial.selenic.com/wiki/BfilesExtension">bfiles extension</a>, with a tighter integration into Mercurial.</p>
<p>So the good news, with Kiln large binary files are handled easy and with no pain. You can globally set &#8220;large size&#8221; threshold, filename patterns etc. that are turned into &#8220;big files&#8221; automatically; or manually indicate &#8220;big file&#8221; when adding new files. And then continue using Mercurial as usual.</p>
<p>The bad news, however, is that kbfiles still has occasional bugs. Of course they will be fixed eventually, but for example right now <a href="http://blog.bitquabit.com/2008/11/25/rebasing-mercurial/">rebasing</a> with an incoming bigfiles commit will result in the wrong bigfile version in the end. Or, presence of kbfiles extension makes various Mercurial operations (like <tt>hg status</tt>) be <em>much</em> <a href="http://kiln.stackexchange.com/questions/3319">slower than usual</a>.</p>
<p><strong>Kiln as Web Interface</strong></p>
<p>Kiln itself is the server hosting Mercurial repositories, a web interface to view/admin them, and a code review tool. It&#8217;s fairly nice and does all the standard stuff, like show overview of all activity happening in a group of repositories:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-overview.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-overview-500x288.png" alt="" title="Overview of all activity in Kiln" width="500" height="288" class="alignnone size-medium wp-image-688" /></a></p>
<p>And shows the overview of any particular repository:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-repo.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-repo-500x279.png" alt="" title="One repository in Kiln" width="500" height="279" class="alignnone size-medium wp-image-689" /></a></p>
<p>And of course diff view of any particular commit:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-diff.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-diff-500x173.png" alt="" title="Diff view in Kiln" width="500" height="173" class="alignnone size-medium wp-image-686" /></a></p>
<p>My largest complaints about Kiln&#8217;s web interface are: 1) speed and 2) merge spiderwebs.</p>
<p><b><em>Speed</em></b>: like oh so many modern fancy-web systems, Kiln sometimes feels sluggish. Sometimes, in a time taken for Kiln to display a diff, Crysis 2 <em>would have rendered New York fifty times</em>. We did various things to boost up our server&#8217;s <em>oomph</em>, but it still does not feel fast enough. Maybe we don&#8217;t know how to setup our servers right; or maybe Kiln is actually quite slow; or maybe our repository size + branch count + number of people hitting it are exceeding whatever limits Kiln was designed for. That said, this is not unique of Kiln, <em>lots</em> of web systems are slow for sometimes no good reasons. If you are a web developer, however, keep this in mind: latency of any user operation is super important.</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-merge-spiderweb.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-merge-spiderweb-150x150.png" alt="" title="It&#039;s a merge forest!" width="150" height="150" class="alignright size-thumbnail wp-image-687" /></a><b><em>Merge spiderwebs</em></b>: distributed version control makes merges reliable and easy. However, merges happen all the time and can make it hard to see what was <em>actually</em> going on in the code. You can&#8217;t see the actual changes through the merge spiderwebs.</p>
<p>The change history is littered with &#8220;merge&#8221;, &#8220;merge remote repo&#8221;, &#8220;merge again&#8221; commits. The branch graph goes crazy and starts taking half of the page width. Not good! Now of course, this is where <a href="http://blog.bitquabit.com/2008/11/25/rebasing-mercurial/">rebasing</a> would help, however right now we&#8217;re not very keen on using it because of Kiln&#8217;s bigfiles bug mentioned above.</p>
<p><strong>Kiln as Code Review Tool</strong></p>
<p>Reviewing code is fairly easy: there&#8217;s a Review button that shows up when hovering over any commit. Each commit also shows how many reviews it has pending or accepted. So you just click on something, and voilà, you can request a code review:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-reviewrequest.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-reviewrequest-500x230.png" alt="" title="Requesting a review in Kiln" width="500" height="230" class="alignnone size-medium wp-image-691" /></a></p>
<p>Within each review you see the diffs, send comments back and forth between people, and highlight code snippets to be attached with each comment:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-review.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/04/kiln-review-500x332.png" alt="" title="Code review in Kiln" width="500" height="332" class="alignnone size-medium wp-image-690" /></a></p>
<p>In Kiln 2.3.x (which is what we use at the moment) the reviews still have a sort of &#8220;unfinished&#8221; feeling. For example, if you want multiple people to review a change, Kiln actually creates multiple reviews that are only very loosely coupled. The good news is that in Kiln 2.4 they have <a href="http://blog.fogcreek.com/rethinking-reviews/">improved this</a>, and I&#8217;m quite sure more improvements will come in the future.</p>
<p>Another option that I&#8217;m missing right now: in the repository views, filter out all approved commits. As an occasional &#8220;merge master&#8221;, I need to see if my big merge had any unreviewed or pending-review commits &#8212; something that&#8217;s quite hard to see with a merge-heavy history.</p>
<p><strong>Summary</strong></p>
<p>I&#8217;m quite happy with how switch to Mercurial + Kiln turned out to be so far. With each team working on their own repository, it does feel like we&#8217;re much less stepping on each other&#8217;s toes. That said, we haven&#8217;t shipped any Unity release from Mercurial yet; doing that will be a future exercise.</p>
<p><a href="http://www.fogcreek.com/kiln/">Kiln</a> is promising. It has some very good ideas (integrated code reviews &#038; versioning of big files in Mercurial), but it still has quite a lot of rough edges. I&#8217;m not totally happy with the web side performance of it either. That said, Fogcreek&#8217;s support for us has been fantastic; we got some bugfixes in the matter of days and they&#8217;ve been really helpful with setup/workflow/optimization issues. So it seems like it has a good future. Fogcreek guys, if you&#8217;re reading this: <a href="http://farm1.static.flickr.com/225/524768428_e20c722cc0.jpg">keep up wrk</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/04/18/mercurialkiln-experience-so-far/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Virtual and No-Virtual</title>
		<link>http://aras-p.info/blog/2011/02/01/the-virtual-and-no-virtual/</link>
		<comments>http://aras-p.info/blog/2011/02/01/the-virtual-and-no-virtual/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 10:28:03 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=606</guid>
		<description><![CDATA[You are writing some system where different implementations have to be used for different platforms. To keep things real, let&#8217;s say it&#8217;s a rendering system which we&#8217;ll call &#8220;GfxDevice&#8221; (based on a true story!). For example, on Windows there could be a Direct3D 9, Direct3D 11 or OpenGL implementations; on iOS/Android there could be OpenGL [...]]]></description>
			<content:encoded><![CDATA[<p>You are writing some system where different implementations have to be used for different platforms. To keep things real, let&#8217;s say it&#8217;s a rendering system which we&#8217;ll call &#8220;GfxDevice&#8221; <em>(based on a true story!)</em>. For example, on Windows there could be a Direct3D 9, Direct3D 11 or OpenGL implementations; on iOS/Android there could be OpenGL ES 1.1 &#038; 2.0 ones and so on.</p>
<p>For sake of simplicity, let&#8217;s say our GfxDevice interface needs to do this <em>(in real world it would need to do much more)</em>:</p>
<blockquote><pre>
void SetShader (ShaderType type, ShaderID shader);
void SetTexture (int unit, TextureID texture);
void SetGeometry (VertexBufferID vb, IndexBufferID ib);
void Draw (PrimitiveType prim, int primCount);
</pre>
</blockquote>
<p>How this can be done?</p>
<p><span id="more-606"></span><br />
<strong>Approach #1: virtual interface!</strong></p>
<p>Many a programmer would think like this: why of course, GfxDevice is an interface with virtual functions, and then we have multiple implementations of it. Sounds good, and that&#8217;s what you would have been taught at the university in various software design courses. Here we go:</p>
<blockquote><pre>
class GfxDevice {
public:
    virtual ~GfxDevice();
    virtual void SetShader (ShaderType type, ShaderID shader) = 0;
    virtual void SetTexture (int unit, TextureID texture) = 0;
    virtual void SetGeometry (VertexBufferID vb, IndexBufferID ib) = 0;
    virtual void Draw (PrimitiveType prim, int primCount) = 0;
};
// and then we have:
class GfxDeviceD3D9 : public GfxDevice {
    // ...
};
class GfxDeviceGLES20 : public GfxDevice {
    // ...
};
class GfxDeviceGCM : public GfxDevice {
    // ...
};
// and so on
</pre>
</blockquote>
<p>And then based on platform (or something else) you create the right GfxDevice implementation, and the rest of the code uses that. This is all good and it works.</p>
<p>But then&#8230; hey! Some platforms <em>can only ever have one</em> GfxDevice implementation. On PS3 you will <em>always</em> end up using GfxDeviceGCM. Does it really make sense to have virtual functions on that platform?</p>
<blockquote><p>
Side note: <em>of course</em> the cost of a virtual function call is not something that stands out immediately. It&#8217;s much less than, for example, doing a network request to get the leaderboards or parsing that XML file that ended up in your game for reasons no one can remember. Virtual function calls will not show up in the profiler as &#8220;a heavy bottleneck&#8221;. However, they are not free and their cost will be scattered around in a million places that is very hard to eradicate. You can end up having death by a thousand paper cuts.
</p></blockquote>
<p>If we want to get rid of virtual functions on platforms where they are useless, what can we do?</p>
<p><strong>Approach #2: preprocessor to the rescue</strong></p>
<p>We just have to take out the &#8220;virtual&#8221; bit from the interface, and the &#8220;= 0&#8243; abstract function bit. With a bit of preprocessor we can:</p>
<blockquote><pre>
#define GFX_DEVICE_VIRTUAL (PLATFORM_WINDOWS || PLATFORM_MOBILE_UNIVERSAL || SOMETHING_ELSE)
#if GFX_DEVICE_VIRTUAL
    #define GFX_API virtual
    #define GFX_PURE = 0
#else
    #define GFX_API
    #define GFX_PURE
#endif
class GfxDevice {
public:
    GFX_API ~GfxDevice();
    GFX_API void SetShader (ShaderType type, ShaderID shader) GFX_PURE;
    GFX_API void SetTexture (int unit, TextureID texture) GFX_PURE;
    GFX_API void SetGeometry (VertexBufferID vb, IndexBufferID ib) GFX_PURE;
    GFX_API void Draw (PrimitiveType prim, int primCount) GFX_PURE;
};
</pre>
</blockquote>
<p>And then there&#8217;s no separate class called GfxDeviceGCM for PS3; it&#8217;s just GfxDevice class implementing non-virtual methods. You have to make sure you don&#8217;t try to compile multiple GfxDevice class implementations on PS3 of course.</p>
<p>Ta-da! Virtual functions are gone on some platforms and life is good.</p>
<p>But we still have the other platforms, where there can be more than one GfxDevice implementation, and the decision for which one to use is made at runtime. Like our good old friend the PC: you could use Direct3D 9 or Direct3D 11 or OpenGL, based on the OS, GPU capabilities or user&#8217;s preference. Or a mobile platform where you don&#8217;t know whether OpenGL ES 2.0 will be available and you&#8217;d have to fallback to OpenGL ES 1.1.</p>
<p><strong>Let&#8217;s think about what virtual functions actually are</strong></p>
<p>How virtual functions work? Usually they work like this: each object gets a &#8220;pointer to a virtual function table&#8221; as it&#8217;s first hidden member. The virtual function table (vtable) is then just pointers to where the functions are in the code. Something like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/vtable1.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/vtable1.png" alt="" title="How virtual functions work" width="535" height="371" class="alignnone size-full wp-image-615" /></a><br />
The key points are: 1) each object&#8217;s data starts with a vtable pointer, and 2) vtable layout for classes implementing the same interface is the same.</p>
<p>When the compiler generates code for something like this:</p>
<blockquote><pre>
device->Draw (kPrimTriangles, 1337);
</pre>
</blockquote>
<p>it will generate something like the following pseudo-assembly:</p>
<blockquote><pre>
vtable = load pointer from [device] address
drawlocation = vtable + 3*PointerSize<em> ; since Draw is at index [3] in vtable</em>
drawfunction = load pointer from [drawlocation] address
pass device pointer, kPrimTriangles and 1337 as arguments
call into code at [drawfunction] address
</pre>
</blockquote>
<p>This code will work no matter if device is of GfxDeviceGLES20 or GfxDeviceGLES11 kind. For both cases, the first pointer in the object will point to the appropriate vtable, and the fourth pointer in the vtable will point to the appropriate Draw function.</p>
<p>By the way, the above illustrates the overhead of a virtual function call. If we&#8217;d assume a platform where we have an in-order CPU and reading from memory takes 500 CPU cycles (which is not far from truth for current consoles), then if nothing we need is in the CPU cache yet, this is what actually happens:</p>
<blockquote><pre>
vtable = load pointer from [device] address
<em>; <strong>wait 500 cycles</strong> until the pointer arrives</em>
drawlocation = vtable + 3*PointerSize
drawfunction = load pointer from [drawlocation] address
<em>; <strong>wait 500 cycles</strong> until the pointer arrives</em>
pass device pointer, kPrimTriangles and 1337 as arguments
call into code at [drawfunction] address
<em>; <strong>wait 500 cycles</strong> until code at that address is loaded</em>
</pre>
</blockquote>
<p><strong>Can we do better?</strong></p>
<p>Look at the picture in the previous paragraph and remember the &#8220;wait 500 cycles&#8221; for each pointer we are chasing. Can we reduce the number of pointer chases? Of course we can: why not ditch the vtable altogether, and just put function pointers directly into the GfxDevice object?</p>
<blockquote><p>Virtual tables are implemented in this way mostly to save space. If we had 10000 objects of some class that has 20 virtual methods, we only pay one pointer overhead per object (40000 bytes on 32 bit architecture) and we store the vtable (20*4=80 bytes on 32 bit arch) just once, in total 39.14 kilobytes.<br />
If we&#8217;d move all function pointers into objects themselves, we&#8217;d need to store 20 function pointers in each object. Which would be 781.25 kilobytes! Clearly this approach does not scale with increasing object instance counts.
</p></blockquote>
<p>However, how many GfxDevice object instances do we <em>really</em> have? Most often&#8230; <em>exactly one</em>.</p>
<p><strong>Approach #3: function pointers</strong></p>
<p>If we move function pointers to the object itself, we&#8217;d have something like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/novtable2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/novtable2.png" alt="" title="No vtable!" width="337" height="356" class="alignnone size-full wp-image-621" /></a></p>
<p>There&#8217;s no built-in language support for implementing this in C++ however, so that would have to be done manually. Something like:</p>
<blockquote><pre>
struct GfxDeviceFunctions {
    SetShaderFunc SetShader;
    SetTextureFunc SetTexture;
    SetGeometryFunc SetGeometry;
    DrawFunc Draw;
};
class GfxDeviceGLES20 : public GfxDeviceFunctions {
    // ...
};
</pre>
</blockquote>
<p>And then when creating a particular GfxDevice, you have to fill in the function pointers yourself. And the functions were member functions which magically take &#8220;this&#8221; parameter; it&#8217;s hard to just use them as function pointers without going to clumsy C++ member function pointer syntax and related issues.</p>
<p>We can be more explicit, C style, and instead just have the functions be static, taking &#8220;this&#8221; parameter directly:</p>
<blockquote><pre>
class GfxDeviceGLES20 : public GfxDeviceFunctions {
    // ...
    static void DrawImpl (GfxDevice* self, PrimitiveType prim, int primCount);
    // ...
};
</pre>
</blockquote>
<p>Code that uses it would look like this then:</p>
<blockquote><pre>
device->Draw (device, kPrimTriangles, 1337);
</pre>
</blockquote>
<p>and it would generate the following pseudo-assembly:</p>
<blockquote><pre>
drawlocation = device + 3*PointerSize
drawfunction = load pointer from [drawlocation] address
<em>; <strong>wait 500 cycles</strong> until the pointer arrives</em>
pass device pointer, kPrimTriangles and 1337 as arguments
call into code at [drawfunction] address
<em>; <strong>wait 500 cycles</strong> until code at that address is loaded</em>
</pre>
</blockquote>
<p>Look at that, one of &#8220;wait 500 cycles&#8221; is gone!</p>
<p><strong>More C style</strong></p>
<p>We could move function pointers outside of GfxDevice if we want to, and just make them global:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/globalfuncs.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/globalfuncs.png" alt="" title="Global function pointers" width="529" height="358" class="alignnone size-full wp-image-624" /></a></p>
<p>In GLES1.1 case, that global GfxDevice funcs block would point to different pieces of code. And the pseudocode for this:</p>
<blockquote><pre>
// global variables!
SetShaderFunc GfxSetShader;
SetTextureFunc GfxSetTexture;
SetGeometryFunc GfxSetGeometry;
DrawFunc GfxDraw;
// GLES2.0 implementation:
void GfxDrawGLES20 (GfxDevice* self, PrimitiveType prim, int primCount) { /* ... */ }
</pre>
</blockquote>
<p>Code that uses it:</p>
<blockquote><pre>
GfxDraw (device, kPrimTriangles, 1337);
</pre>
</blockquote>
<p>and the pseudo-assembly:</p>
<blockquote><pre>
drawfunction = load pointer from [GfxDraw variable] address
<em>; wait 500 cycles until the pointer arrives</em>
pass device pointer, kPrimTriangles and 1337 as arguments
call into code at [drawfunction] address
<em>; wait 500 cycles until code at that address is loaded</em>
</pre>
</blockquote>
<p><strong>Is it worth it?</strong></p>
<p>I can hear some saying, &#8220;what? throwing away C++ OOP and implementing the same in almost raw C?! you&#8217;re crazy!&#8221;</p>
<p>Whether going the above route is better or worse is mostly a matter of programming style and preferences. It does get rid of one &#8220;wait 500 cycles&#8221; in the worst case for sure. And yes, to get that you do lose some of automagic syntax sugar in C++.</p>
<p>Is it worth it? Like always, depends on a lot of things. But if you do find yourself pondering the virtual function overhead for singleton-like objects, or especially if you do see that your profiler reports cache misses when calling into them, at least you&#8217;ll know one of the many possible alternatives, right?</p>
<p>And yeah, another alternative that&#8217;s easy to do on some platforms? Just put different GfxDevice implementations into dynamically loaded libraries, exposing the same set of functions. Which would end up being <em>very</em> similar to the last approach of &#8220;store function pointer table globally&#8221;, except you&#8217;d get some compiler syntax sugar to make it easier; and you wouldn&#8217;t even need to load the code that is not going to be used.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/02/01/the-virtual-and-no-virtual/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>iOS shader tricks, or it&#8217;s 2001 all over again</title>
		<link>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/</link>
		<comments>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 07:43:57 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=592</guid>
		<description><![CDATA[I was recently optimizing some OpenGL ES 2.0 shaders for iOS/Android, and it was funny to see how performance tricks that were cool in 2001 are having their revenge again. Here&#8217;s a small example of starting with a normalmapped Blinn-Phong shader and optimizing it to run several times faster. Most of the clever stuff below [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently optimizing some OpenGL ES 2.0 shaders for iOS/Android, and it was funny to see how performance tricks that were cool in 2001 are having their revenge again. Here&#8217;s a small example of starting with a normalmapped Blinn-Phong shader and optimizing it to run several times faster. Most of the clever stuff below was actually done by <a href="http://twitter.com/#!/__ReJ__">ReJ</a>, props to him!</p>
<p>Here&#8217;s a small test I&#8217;ll be working on: just a single plane with albedo and normal map textures:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump1.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump1-150x150.jpg" alt="" title="iOS Bumped Specular" width="150" height="150" class="alignnone size-thumbnail wp-image-593" /></a></p>
<p><span id="more-592"></span>I&#8217;ll be testing on iPhone 3Gs with iOS 4.2.1. Timer is started before glClear() and stopped after glFinish() that I added just after drawing the mesh.</p>
<p>Let&#8217;s start with an initial na&iuml;ve shader version:<br />
<script src="https://gist.github.com/783784.js"> </script></p>
<p>Should be pretty self-explanatory to anyone who&#8217;s familiar with tangent space normal mapping and Blinn-Phong BRDF. Running time: <strong>24.5 milliseconds</strong>. On iPhone 4&#8242;s Retina resolution, this would be about 4x slower!</p>
<p>What can we do next? On mobile platforms using appropriate precision of variables is often very important, especially in a fragment shader. So let&#8217;s go and add highp/mediump/lowp qualifiers to the fragment shader: <a href="https://gist.github.com/783703/05e78340b12739e853ce031bd0388430ea95f2a6">shader source</a></p>
<p>Still the same running time! Alas, iOS does not have low level shader analysis tools, so we can&#8217;t really tell why that is happening. We could be limited by something else (e.g. normalizing vectors and computing pow() being the bottlenecks that run in parallel with all low precision stuff), or the driver might be promoting most of our computations to higher precision because it feels like it. It&#8217;s a magic box!</p>
<p>Let&#8217;s start approximating instead. How about computing normalized view direction per vertex, and interpolating that for the fragment shader? It won&#8217;t be entirely &#8220;correct&#8221;, but hey, it&#8217;s a phone we&#8217;re talking about. <a href="https://gist.github.com/783703/1e4fd0daa384d308d125a748985e8e203e49625a">shader source</a></p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump3.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump3-150x150.jpg" alt="" title="iOS Bumped Specular, wrong precision!" width="150" height="150" class="alignright size-thumbnail wp-image-594" /></a><br />
15 milliseconds! But&#8230; the rendering is wrong; everything turned white near the bottom of the screen. Turns out PowerVR SGX (the GPU in all current iOS devices) is really meaning &#8220;low precision&#8221; when we want to add two lowp vectors and normalize the result. Let&#8217;s try promoting one of them to medium precision with a &#8220;varying mediump vec3 v_viewdir&#8221;: <a href="https://gist.github.com/783703/591eb83dacaae3840cc4e4d3d8b95a4fc3abdd65">shader source</a></p>
<p>That fixed rendering, but we&#8217;re back to 24.5 milliseconds. <em>Sad shader writers are sad&#8230; oh shader performance analysis tools, where art thou?</em></p>
<p>Let&#8217;s try approximating some more: compute half-vector in the vertex shader, and interpolate normalized value. This would get rid of all normalizations in the fragment shader. <a href="https://gist.github.com/783703/6360c2912b860aa30415e5120ef147169274cd71">shader source</a></p>
<p><strong>16.3</strong> milliseconds, not too bad! We still have pow() computed in the fragment shader, and that one is probably not the fastest operation there&#8230;</p>
<p>Almost a decade ago, a very common trick was to use a lookup texture to do the lighting. For example, a 2D texture indexed by (N.L, N.H). Since all lighting data would be &#8220;baked&#8221; into the texture, it does not necessarily have to be Blinn-Phong even; we can prepare faux-anisotropic, metallic, toon-shading or other fancy BRDFs there, as long as they can be expressed in terms of N.L and N.H. So let&#8217;s try creating 128&#215;128 RGBA lookup texture and use that: <a href="https://gist.github.com/783703/87f1cf5529d644cab16123550e809e9f7598f4f3">shader source</a></p>
<p>A fast &amp; not super efficient code to create the lighting lookup texture for Blinn-Phong:<br />
<script src="https://gist.github.com/783759.js"> </script></p>
<p><strong>9.1</strong> milliseconds! We lost some precision in the specular though (it&#8217;s dimmer):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump6.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump6-150x150.jpg" alt="" title="iOS Bumped Specular via texture LUT" width="150" height="150" class="alignnone size-thumbnail wp-image-595" /></a></p>
<p>What else can be done? Notice that we clamp N.L and N.H values in the fragment shader, but this could be done just as well by the texture sampler, if we set texture&#8217;s addressing mode to CLAMP_TO_EDGE. Let&#8217;s get rid of the clamps: <a href="https://gist.github.com/783703/e24a2475fded83d2196372c8092a0d8de80a98eb">shader source</a></p>
<p>This is 8.3 milliseconds, or <strong>7.6</strong> milliseconds if we reduce our lighting texture resolution to 32&#215;128.</p>
<p>Should we stop there? Not necessarily. For example, the shader is still multiplying albedo with a per-material color. Maybe that&#8217;s not very useful and can be let go. Maybe we can also make specular be always white?<br />
<script src="https://gist.github.com/783703.js"> </script></p>
<p>How fast is this? <strong>5.9 milliseconds</strong>,&nbsp;or over <strong>4 times</strong> faster than our original shader.</p>
<p>Could it be made faster? Maybe; that&#8217;s an exercise for the reader :) I tried computing just the RGB color channels and setting alpha to zero, but that got slightly slower. Without real shader analysis tools it&#8217;s hard to see where or if additional cycles could be squeezed out.</p>
<p>I&#8217;m adding <a href='http://aras-p.info/blog/wp-content/uploads/2011/02/iOSShaderPerf.zip'>Xcode project with sources, textures and shaders of this experiment</a>. Notes about it: only tested on iPhone 3Gs (probably will crash on iPhone 3G, and iPad will have wrong aspect ratio). Might not work at all! Shader is read from Resources/Shaders/shader.txt, next to it are shader versions of the steps of this experiment. Enjoy!</p>
<p><em>This is a cross post from altdevblogaday: <a href="http://altdevblogaday.com/ios-shader-tricks-or-its-2001-all-over-again">http://altdevblogaday.com/ios-shader-tricks-or-its-2001-all-over-again</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>GLSL Optimizer</title>
		<link>http://aras-p.info/blog/2010/09/29/glsl-optimizer/</link>
		<comments>http://aras-p.info/blog/2010/09/29/glsl-optimizer/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 10:39:21 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=561</guid>
		<description><![CDATA[During development of Unity 3.0, I was not-so-pleasantly surprised to see that our cross-compiled shaders run slow on iPhone 3Gs. And by &#8220;slow&#8221;, I mean SLOW; at the speeds of &#8220;stop the presses, we can not ship brand new OpenGL ES 2.0 support with THAT performance&#8221;. Back story Take this HLSL pixel shader for particles, [...]]]></description>
			<content:encoded><![CDATA[<p>During development of <a href="http://unity3d.com/unity/whats-new/unity-3">Unity 3.0</a>, I was not-so-pleasantly surprised to see that our <a href="http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/">cross-compiled</a> shaders run <i>slow</i> on iPhone 3Gs. And by &#8220;slow&#8221;, I mean <strong>SLOW</strong>; at the speeds of &#8220;stop the presses, we can not ship brand new OpenGL ES 2.0 support with THAT performance&#8221;.</p>
<p><span id="more-561"></span><br />
<b>Back story</b></p>
<p>Take this HLSL pixel shader for particles, that does nothing but multiplies texture with per-vertex color:</p>
<blockquote><p><code>
<pre>
half4 frag (v2f i) : COLOR { return i.color * tex2D (_MainTex, i.texcoord); }
</pre>
<p></code></p></blockquote>
<p>This is about as simple as it can get; should be one texture fetch and one multiply for the GPU.</p>
<p>Now <i>of course</i>, when HLSL gets cross-compiled into GLSL, it is augmented by some dummy functions/moves to match GLSL&#8217;s semantics of &#8220;a function called main that takes no arguments and returns no value&#8221;. So you get something like this in GLSL:</p>
<blockquote><p><code>
<pre>
vec4 frag (in v2f i) { return i.color * texture2D (_MainTex, i.texcoord); }
void main() {
    vec4 xl_retval;
    v2f xlt_i;
    xlt_i.color = gl_Color;
    xlt_i.texcoord = gl_TexCoord[0];
    xl_retval = frag (xlt_i);
    gl_FragData[0] = xl_retval;
}
</pre>
<p></code></p></blockquote>
<p>Makes sense. The original function was translated, and main() got added that fills in the input structure, calls the function and writes result to gl_FragData[0] (aka gl_FragColor).</p>
<p>Lo and behold, the above (with some OpenGL ES 2.0 specific stuff added, like precision qualifiers, definitions of varyings etc.) runs like sh*t on a mobile platform.</p>
<p>Which probably means <b>mobile platform drivers are quite bad at optimizing GLSL</b>. I mostly tested iOS, but some tests on Android indicate that situation is the same (maybe even worse, depending on exact kind of Android you have). Which is sad since said platforms also do not have any way to precompile shaders offline, where they could afford good but slow compilers.</p>
<p>Now of course, if you&#8217;re writing GLSL shaders by hand, you&#8217;re probably writing close to optimal code, with no redundant data moves or wrapper functions. But if you&#8217;re cross-compiling them from Cg/HLSL, or generating from some shader fragments, or from visual shader editors, you probably depend on shader compiler being decent at optimizing redundant bits.</p>
<p><b>GLSL Optimizer</b></p>
<p>Around the same time I accidentally discovered that <a href="http://mesa3d.org/">Mesa 3D</a> guys are working on new GLSL compiler, dubbed <a href="http://cgit.freedesktop.org/mesa/mesa/log/?h=glsl2">GLSL2</a>. I looked at the code and I liked it a lot; very hackable and &#8220;no bullshit&#8221; approach. So I took that Mesa&#8217;s GLSL compiler and made it output GLSL back after it has done all the optimizations.</p>
<p>Here it is: <a href="http://github.com/aras-p/glsl-optimizer"><b>http://github.com/aras-p/glsl-optimizer</b></a></p>
<p>It reads GLSL, does some architecture independent optimizations (dead code removal, algebraic simplifications, constant propagation, constant folding, inlining, &#8230;) and spits out &#8220;optimized&#8221; GLSL back.</p>
<p><b>Results</b></p>
<p>The above simple particle shader example. GLSL optimizer optimizes it into:</p>
<blockquote><p><code>
<pre>
void main() {
    gl_FragData[0] =
        (gl_Color.xyzw * texture2D (_MainTex, gl_TexCoord[0].xy)).xyzw;
}
</pre>
<p></code></p></blockquote>
<p>Save for redundant swizzle outputs (on my todo list), this is pretty much what you&#8217;d be writing by hand. No redundant moves, function call inlined, no extra temporaries, sweet!</p>
<p>How much difference does this make?<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptParticlesNo.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptParticlesNo.jpg" alt="" title="Particles, GLSL not optimized" width="160" height="240" /></a><a href="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptParticlesYes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptParticlesYes.jpg" alt="" title="Particles, optimized GLSL" width="160" height="240" /></a><br />
Lots of particles, non-optimized GLSL on the left; optimized GLSL on the right (click for larger image). <b>Yep, it&#8217;s 236 vs. 36 milliseconds/frame</b> (4 vs. 27 FPS).</p>
<p>This result is for iPhone 3Gs running iOS 4.1. Some Android results: Motorola Droid (some PowerVR GPU): 537 vs. 223 ms; Nexus One (Snapdragon 8250 w/ Adreno GPU): 155 vs. 155 ms (yay! good drivers!); Samsung Galaxy S (some PowerVR GPU): 200 vs. 60 ms. All tests were ran at native device resolutions, so do not take this as performance comparisons between devices.</p>
<p>What about a more complex shader example? Let&#8217;s try per-pixel lit Diffuse shader (which is quite simple, but will do ok as &#8220;complex shader&#8221; example for a mobile platform). You can see that the GLSL code below is <a href="http://aras-p.info/blog/2010/07/16/surface-shaders-one-year-later/">mostly auto-generated</a>; writing it by hand wouldn&#8217;t produce that many data moves, unused struct members etc. Cg compiles original shader code into 10 ALU and 1 TEX instructions for D3D9 pixel shader 2.0, and is able to optimize away all the redundant stuff.</p>
<blockquote><p><code>
<pre>
struct SurfaceOutput {
    vec3 Albedo;
    vec3 Normal;
    vec3 Emission;
    float Specular;
    float Gloss;
    float Alpha;
};
struct Input {
    vec2 uv_MainTex;
};
struct v2f_surf {
    vec4 pos;
    vec2 hip_pack0;
    vec3 normal;
    vec3 vlight;
};
uniform vec4 _Color;
uniform vec4 _LightColor0;
uniform sampler2D _MainTex;
uniform vec4 _WorldSpaceLightPos0;
void surf (in Input IN, inout SurfaceOutput o) {
    vec4 c;
    c = texture2D (_MainTex, IN.uv_MainTex) * _Color;
    o.Albedo = c.xyz;
    o.Alpha = c.w;
}
vec4 LightingLambert (in SurfaceOutput s, in vec3 lightDir, in float atten) {
    float diff;
    vec4 c;
    diff = max (0.0, dot (s.Normal, lightDir));
    c.xyz  = (s.Albedo * _LightColor0.xyz) * (diff * atten * 2.0);
    c.w  = s.Alpha;
    return c;
}
vec4 frag_surf (in v2f_surf IN) {
    Input surfIN;
    SurfaceOutput o;
    float atten = 1.0;
    vec4 c;
    surfIN.uv_MainTex = IN.hip_pack0.xy;
    o.Albedo = vec3 (0.0);
    o.Emission = vec3 (0.0);
    o.Specular = 0.0;
    o.Alpha = 0.0;
    o.Gloss = 0.0;
    o.Normal = IN.normal;
    surf (surfIN, o);
    c = LightingLambert (o, _WorldSpaceLightPos0.xyz, atten);
    c.xyz += (o.Albedo * IN.vlight);
    c.w = o.Alpha;
    return c;
}
void main() {
    vec4 xl_retval;
    v2f_surf xlt_IN;
    xlt_IN.hip_pack0 = vec2 (gl_TexCoord[0]);
    xlt_IN.normal = vec3 (gl_TexCoord[1]);
    xlt_IN.vlight = vec3 (gl_TexCoord[2]);
    xl_retval = frag_surf (xlt_IN);
    gl_FragData[0] = xl_retval;
}
</pre>
<p></code></p></blockquote>
<p>Running the above through GLSL optimizer produces this:</p>
<blockquote><p><code>
<pre>
uniform vec4 _Color;
uniform vec4 _LightColor0;
uniform sampler2D _MainTex;
uniform vec4 _WorldSpaceLightPos0;
void main ()
{
    vec4 c;
    vec4 tmpvar_32;
    tmpvar_32 = texture2D (_MainTex, gl_TexCoord[0].xy) * _Color;
    vec3 tmpvar_33;
    tmpvar_33 = tmpvar_32.xyz;
    float tmpvar_34;
    tmpvar_34 = tmpvar_32.w;
    vec4 c_i0_i1;
    c_i0_i1.xyz = ((tmpvar_33 * _LightColor0.xyz) *
    	(max (0.0, dot (gl_TexCoord[1].xyz, _WorldSpaceLightPos0.xyz)) * 2.0)).xyz;
    c_i0_i1.w = (vec4(tmpvar_34)).w;
    c = c_i0_i1;
    c.xyz = (c_i0_i1.xyz + (tmpvar_33 * gl_TexCoord[2].xyz)).xyz;
    c.w = (vec4(tmpvar_34)).w;
    gl_FragData[0] = c.xyzw;
}
</pre>
<p></code></p></blockquote>
<p>All functions got inlined, all unused variable assignments got eliminated, and most of redundant moves are gone. There are some redundant moves left though (again, on my todo list), and the variables are assigned cryptic names after inlining. But otherwise, writing the equivalent shader by hand would be pretty close.</p>
<p>Difference between non-optimized and optimized GLSL in this case:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptDiffuseNo.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptDiffuseNo.jpg" alt="" title="Per-pixel Diffuse, GLSL not optimized" width="160" height="240" /></a><a href="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptDiffuseYes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/09/glslOptDiffuseYes.jpg" alt="" title="Per-pixel Diffuse, optimized GLSL" width="160" height="240" /></a><br />
Non-optimized vs. optimized: <b>350 vs. 267 ms/frame</b> (2.9 vs. 3.7 FPS). Not bad either!</p>
<p><b>Closing thoughts</b></p>
<p>Pulling off this GLSL optimizer quite late in <a href="http://unity3d.com/unity/whats-new/unity-3">Unity 3.0</a> release cycle was a risky move, but it did work.</p>
<p>Hats off to Mesa folks (Eric Anholt, Ian Romanick, Kenneth Graunke et al) for making an awesome codebase of the GLSL compiler! I haven&#8217;t merged up latest GLSL compiler developments on Mesa tree; they&#8217;ve implemented quite a few new compiler optimizations but I was too busy shipping Unity 3 already. Will try to merge them in soon-ish.</p>
<p>I&#8217;ve tested non-optimized vs. optimized GLSL a bit on a desktop platform (MacBook Pro, GeForce 8600M, OS X 10.6.4) and there is no observable speed difference. Which makes sense, and I <i>would have expected</i> mobile drivers to be good at optimization as well, but apparently that&#8217;s not the case.</p>
<p>Now of course, mobile drivers will improve over time, and I hope offline &#8220;GLSL optimization&#8221; step will become obsolete in the future. I still think it makes perfect sense to fully compile shaders offline, so at runtime there&#8217;s no trace of GLSL at all (just load binary blob of GPU microcode into the driver), but that&#8217;s a story for another day.</p>
<p>In the meantime, you&#8217;re welcome to try <a href="http://github.com/aras-p/glsl-optimizer">GLSL Optimizer</a> out!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/09/29/glsl-optimizer/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Compiling HLSL into GLSL in 2010</title>
		<link>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/</link>
		<comments>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/#comments</comments>
		<pubDate>Fri, 21 May 2010 19:59:38 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=523</guid>
		<description><![CDATA[Realtime shader languages these days have settled down into two camps: HLSL (or Cg, which for all practical reasons is the same) and GLSL (or GLSL ES, which is sufficiently similar). HLSL/Cg is used by Direct3D and the big consoles (Xbox 360, PS3). GLSL/ES is used by OpenGL and pretty much all modern mobile platforms [...]]]></description>
			<content:encoded><![CDATA[<p>Realtime shader languages these days have settled down into two camps: HLSL (or Cg, which for all practical reasons is the same) and GLSL (or GLSL ES, which is sufficiently similar). HLSL/Cg is used by Direct3D and the big consoles (Xbox 360, PS3). GLSL/ES is used by OpenGL and pretty much all modern mobile platforms (iPhone, Android, &#8230;).</p>
<p>Since shaders are more or less &#8220;assets&#8221;, having two different languages to deal with is not very nice. What, I&#8217;m supposed to write my shader twice just to support both (for example) D3D and iPad? You would think in 2010, almost a decade since high level realtime shader languages have appeared, this problem would be solved&#8230; but it isn&#8217;t!</p>
<p><span id="more-523"></span>In <a href="http://unity3d.com/unity/coming-soon/unity-3">upcoming Unity 3.0</a>, we&#8217;re going to have OpenGL ES 2.0 for mobile platforms, where GLSL ES is the only option to write shaders in. However, almost all other platforms (Windows, 360, PS3) need HLSL/Cg.</p>
<p>I tried a bit making <a href="http://developer.nvidia.com/object/cg_toolkit.html">Cg</a> spit out GLSL code. In theory it can, and I read somewhere that <a href="http://en.wikipedia.org/wiki/Id_Software">id</a> uses it for OpenGL backend for <a href="http://en.wikipedia.org/wiki/Rage_(video_game)">Rage</a>&#8230; But I just couldn&#8217;t make it work. What&#8217;s possible for <a href="http://en.wikipedia.org/wiki/John_Carmack">John</a> apparently is not possible for mere mortals.</p>
<p>Then I looked at ATI&#8217;s <a href="https://github.com/aras-p/hlsl2glslfork">HLSL2GLSL</a>. That did produce GLSL shaders that were not absolutely horrible. So I started using it, and <em>(surprise!)</em> quickly ran into small issues here and there. Too bad development of the library stopped around 2006&#8230; on the plus side, it&#8217;s open source!</p>
<p>So I just forked it. Here it is: <a href="http://code.google.com/p/hlsl2glslfork/"><strong>http://code.google.com/p/hlsl2glslfork/</strong></a> (<a href="https://github.com/aras-p/hlsl2glslfork/commits/master">commit log here</a>). There are no prebuilt binaries or source drops right now, just a Mercurial repository. BSD license. Patches welcome.</p>
<p><em>Note on the codebase</em>: I don&#8217;t particularly like the codebase. It seems somewhat over-engineered code, that was probably taken from reference GLSL parser that 3DLabs once did, and adapted to parse HLSL and spit out GLSL. There are pieces of code that are unused, unfinished or duplicated. Judging from comments, some pieces of code have been in the hands of 3DLabs, ATI and NVIDIA (what good can come out of <em>that</em>?!). However, it <em>works</em>, and that&#8217;s the most important trait any code can have.</p>
<p><em>Note on the preprocessor</em>: I bumped into some preprocessor issues that couldn&#8217;t be easily fixed without first understanding someone else&#8217;s ancient code and then changing it significantly. Fortunately, Ryan Gordon&#8217;s project, <a href="http://icculus.org/mojoshader/">MojoShader</a>, happens to have preprocessor that very closely emulates HLSL&#8217;s one (including various quirks). So I&#8217;m using that to preprocess any source before passing it down to HLSL2GLSL. Kudos to Ryan!</p>
<p><em>Side note on MojoShader</em>: Ryan is also working on HLSL->GLSL cross compiler in MojoShader. I like that codebase much more; will certainly try it out once it&#8217;s somewhat ready.</p>
<p><em>You can never have enough notes</em>: Google&#8217;s <a href="http://code.google.com/p/angleproject/">ANGLE project</a> (running OpenGL ES 2.0 on top of Direct3D runtime+drivers) seems to be working on the opposite tool. For obvious reasons, they need to take GLSL ES shaders and produce D3D compatible shaders (HLSL or shader assembly/bytecode). The project seems to be moving fast; and if one day we&#8217;ll decide to default to GLSL as shader language in Unity, I&#8217;ll know where to look for a translator into HLSL :)</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Screenspace vs. mip-mapping</title>
		<link>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/</link>
		<comments>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 14:27:55 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=485</guid>
		<description><![CDATA[Just spent half a day debugging this, so here it is for the future reference of the internets. In a deferred rendering setup (see Game Angst for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is [...]]]></description>
			<content:encoded><![CDATA[<p><em>Just spent half a day debugging this, so here it is for the future reference of the internets.</em></p>
<p>In a deferred rendering setup (see <a href="http://gameangst.com/?p=141">Game Angst</a> for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is computed &#8220;in screen space&#8221;.</p>
<p>Because each light is applied to a portion of the screen, the pixels it computes can belong to different objects. If in any place of lighting computation you use textures with <a href="http://en.wikipedia.org/wiki/Mipmap">mipmaps</a>, <em>be careful</em>. Most common use for mipmapped light textures is light &#8220;cookies&#8221; (aka <a href="http://en.wikipedia.org/wiki/Gobo_(lighting)">Gobo</a>).</p>
<p>Let&#8217;s say we have a very simple scene with a spot light: <span id="more-485"></span><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png" alt="" title="Deferred Cookie (Good)" width="610" height="458" class="alignnone size-full wp-image-486" /></a></p>
<p>Light&#8217;s angular attenuation comes from a texture like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="cookie128" width="128" height="128" class="alignnone size-full wp-image-489" /></a></p>
<p>If the texture has mipmaps and you sample it using the &#8220;obvious&#8221; way (e.g. tex2Dproj), you can get something like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png" alt="" title="Deferred Cookie (Bad!)" width="610" height="458" class="alignnone size-full wp-image-491" /></a></p>
<p><em>Black stuff around the sphere is no good!</em> It&#8217;s not the infamous half-texel offset in D3D9, not a driver bug, not a shader compiler bug and not the nature trying to prevent you from writing a deferred renderer.</p>
<p>It&#8217;s the mipmapping.</p>
<p>Mipmaps of your cookie texture look like this (128&#215;128, 16&#215;16, 8&#215;8, 4&#215;4 shown):<br />
<img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="128x128" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie16.png" alt="" title="16x16" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie8.png" alt="" title="8x8" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie4.png" alt="" title="4x4" width="128" height="128" /></p>
<p>Now, take two adjacent pixels, where one belongs to the edge of the sphere, and the other belongs to the background object (technically you take a 2&#215;2 block of pixels, but just two are enough to illustrate the point). When the light is applied, cookie texture coordinates for those pixels are computed. It can happen that the coordinates are <em>very</em> different, especially when pixels &#8220;belong&#8221; to entirely different surfaces that are quite far away from each other.</p>
<p>What the GPU does when texture coordinates of adjacent pixels are very different? Chooses a lower mipmap level so that texel to pixel density roughly matches 1:1. On the edges of this &#8220;wrong&#8221; screenshot, it happens that very small mipmap level is sampled, which is either black or white color (see 4&#215;4 mip level).</p>
<p>What to do here? You could disable mip-mapping (which is not good for performance and not good for image quality). You could drop some smallest mip levels which might be enough and not that bad for performance. Another option is to manually supply LOD level or derivatives to sampling instructions, using <em>something else</em> than cookie texture coordinates. For example, derivative in view space position, or something like that. This might not be possible on lower shader models though.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Implementing fixed function T&amp;L in vertex shaders</title>
		<link>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/</link>
		<comments>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 06:08:50 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=364</guid>
		<description><![CDATA[Almost half a year ago I was wondering how to implement T&#038;L in vertex shaders. Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here. In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar [...]]]></description>
			<content:encoded><![CDATA[<p>Almost half a year ago I was wondering <a href="http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/">how to implement T&#038;L in vertex shaders</a>.</p>
<p>Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a <a href="http://aras-p.info/texts/VertexShaderTnL.html"><strong>technical report here</strong></a>.</p>
<p>In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it&#8217;s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).</p>
<p>What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.</p>
<p>Otherwise &#8211; I like! Here&#8217;s a link to the <a href="http://aras-p.info/texts/VertexShaderTnL.html">article again</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How view on C++ changes over time</title>
		<link>http://aras-p.info/blog/2009/03/01/how-view-on-c-changes-over-time/</link>
		<comments>http://aras-p.info/blog/2009/03/01/how-view-on-c-changes-over-time/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 17:23:40 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=287</guid>
		<description><![CDATA[It&#8217;s funny how one&#8217;s view on things change over time. Back in 2002, I wrote something that would be roughly translated like &#8220;C++ amazes me more and more&#8221;. In a positive sense! And I was talking about what is Boost.Spirit now. A reply on local game development forums I wrote today (again, rough translation): &#8220;C++ [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s funny how one&#8217;s view on things change over time.</p>
<p>Back in 2002, I <a href="http://aras-p.info/relyzai00.html">wrote</a> something that would be roughly translated like &#8220;C++ amazes me more and more&#8221;. In a positive sense! And I was talking about what is <a href="http://spirit.sourceforge.net/">Boost.Spirit</a> now.</p>
<p>A <a href="http://www.gamedev.lt/viewtopic.php?p=19644#p19644">reply</a> on local game development forums I wrote today (again, rough translation): &#8220;C++ is very hard and quite a horrible language, maybe you should not use it unless there are no alternatives&#8221;.</p>
<p>That&#8217;s quite a change in attitude we have here!</p>
<p>I feel like much of C++ horrors are a consequence of &#8220;it just somehow happened&#8221; (the whole template metaprogramming thing) or as a backwards compatibility with C requirement. Or maybe not, but I do agree with what <a href="https://mollyrocket.com/forums/viewtopic.php?p=1955#1955">ryg says here</a>. Let&#8217;s play the internet memes:<br />
<img src="http://aras-p.info/blog/wp-content/uploads/2009/03/cppaccident.jpg" alt="C++ Accident" title="cppaccident" width="513" height="437" class="alignnone size-full wp-image-291" /></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/03/01/how-view-on-c-changes-over-time/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fixed function lighting in vertex shader &#8211; how?</title>
		<link>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/</link>
		<comments>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 20:32:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=261</guid>
		<description><![CDATA[Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell. I don&#8217;t really know why that happens, [...]]]></description>
			<content:encoded><![CDATA[<p>Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to <a href="http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/">various artifacts</a> that annoy people to hell.</p>
<p>I don&#8217;t really know <em>why</em> that happens, because it seems that most modern cards don&#8217;t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista&#8217;s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match&#8230;</p>
<p>How such a task should be approached?</p>
<p>My requirements are:</p>
<ul>
<li>Should handle any possible state combination in D3D fixed function T&#038;L.</li>
<li>D3D 9.0c, using vertex shader 2.0 is ok. For now I don&#8217;t care about OpenGL.</li>
<li>No HLSL at runtime. I don&#8217;t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.</li>
<li>Should work as fast (or close to) as the regular fixed function pipeline.</li>
</ul>
<p>I looked at ATI&#8217;s <a href="http://developer.amd.com/samples/FixedFuncShader/Pages/default.aspx">FixedFuncShader sample</a>. It&#8217;s an <strong>ubershader approach</strong>; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some <a href="http://msdn.microsoft.com/en-us/library/bb147316(VS.85).aspx">rcp</a>&#8216;s and reduce the amount of constant storage by 2x.</p>
<p>Still, it did not handle some things in the D3D T&#038;L or had some issues:</p>
<ul>
<li>It assumes one input UV, one output UV and no texture matrices. This place in T&#038;L gets quite convoluted &#8211; any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.</li>
<li>It was not using full T&#038;L lighting model. No biggie here.</li>
<li>I haven&#8217;t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.</li>
</ul>
<p>Another thing I&#8217;m considering, is to combine final shader(s) from <strong>assembly fragments</strong>, with some simple register allocation.</p>
<p>In T&#038;L shader code, there&#8217;s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.</p>
<p>What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.</p>
<p>Again, I haven&#8217;t checked with shader performance tools, but I <em>think, guess and hope</em> that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that <em>I</em> can be quite sloppy with it, i.e. don&#8217;t have to implement some super smart allocation scheme.</p>
<p>I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).</p>
<p>Does that make sense? Or did everyone solve those problems eons ago already?</p>
<p><strong>Edit</strong>: half a year later, I wrote a technical report on how I implemented all this: <a href="http://aras-p.info/texts/VertexShaderTnL.html">http://aras-p.info/texts/VertexShaderTnL.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Achievement of the week: MakeVistaDWMHappyDance</title>
		<link>http://aras-p.info/blog/2008/12/11/achievement-of-the-week-makevistadwmhappydance/</link>
		<comments>http://aras-p.info/blog/2008/12/11/achievement-of-the-week-makevistadwmhappydance/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 16:16:05 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=247</guid>
		<description><![CDATA[This was the function that I added: void GUIView::MakeVistaDWMHappyDance() { // Looks like Vista has some bug in DWM. Whenever we maximize or dock // a view, we must do something magic, otherwise // white stuff appears in place of the view. // See http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=4208117&#038;SiteID=1 bool earlierThanVista = systeminfo::GetOperatingSystemNumeric() &#60; 600; if( earlierThanVista ) return; [...]]]></description>
			<content:encoded><![CDATA[<p>This was the function that I added:</p>
<blockquote><pre>void GUIView::<strong>MakeVistaDWMHappyDance</strong>()
{
    // Looks like Vista has some bug in DWM. Whenever we maximize or dock
    // a view, we must do something magic, otherwise
    // white stuff appears in place of the view.
    // See http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=4208117&#038;SiteID=1

    bool earlierThanVista = systeminfo::GetOperatingSystemNumeric() &lt; 600;
    if( earlierThanVista )
        return;

    // What seems to work is drawing one pixel via GDI.
    // We draw it at (1,1) with usual background color.
    int grayColor = 0.61f * 255.0f;
    PAINTSTRUCT ps;
    BeginPaint(m_View, &#038;ps);
    SetPixel(ps.hdc, 1, 1, RGB(grayColor,grayColor,grayColor));
    EndPaint(m_View, &#038;ps);
}</pre>
</blockquote>
<p>I know. Reading from screen when Aero is on is slow, bad and wrong. But then, what do you do? It&#8217;s better than users staring an all-white window just because Vista decided to draw it white, no matter what you think you&#8217;re drawing into it.</p>
<p>&#8230;still, <code>MakeVistaDWMHappyDance</code> is not nearly as cool as </p>
<blockquote><p>internal interface ICanHazCustomMenu { &#8230; }</p></blockquote>
<p> that Nicholas added a while ago.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/12/11/achievement-of-the-week-makevistadwmhappydance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t try to outsmart the compiler</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/</link>
		<comments>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/#comments</comments>
		<pubDate>Sat, 06 Dec 2008 21:58:04 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=245</guid>
		<description><![CDATA[The other day at work there was a need to flip an image vertically, in a way that did not bring large portions of other code that deals with images. Flipping vertically is easy: for( int y = 0; y < height/2; ++y ) { memswap( img+y*width, img+(height-y-1)*width, width*img(arr[0]) ); } memswap function was done [...]]]></description>
			<content:encoded><![CDATA[<p>The other day at work there was a need to flip an image vertically, in a way that did not bring large portions of other code that deals with images. Flipping vertically is easy:</p>
<blockquote><pre>for( int y = 0; y < height/2; ++y ) {
    memswap( img+y*width, img+(height-y-1)*width, width*img(arr[0]) );
}</pre>
</blockquote>
<p>memswap function was done this way:</p>
<blockquote><pre>// why isnt this in the std lib?
// using XOR to avoid tmp var
void memswap( void* m1, void* m2, size_t n )
{
    char *p = (char*)m1; char *q = (char*)m2;
    while ( n-- ) {
        *p ^= *q; *q ^= *p; *p ^= *q;
        p++; q++;
    }
}</pre>
</blockquote>
<p>The comment above the function was what triggered my interest. I just added:</p>
<blockquote><p>
// because it can be slower (local variable is likely in register;<br />
// whereas using XOR involves reads/writes to memory)
</p></blockquote>
<p>But then I got interested in this, I just <em>had to</em> check what happens in one or another case.</p>
<p>Using Apple's gcc 4.0.1 on Core 2 Duo, the above memory swapping code takes about 12.5 clock cycles per swapped image pixel (pixel = 4 bytes). The inner loop is this:</p>
<blockquote><pre>movzx  eax,BYTE PTR [edx-0x1]
xor    al,BYTE PTR [ecx-0x1]
mov    BYTE PTR [edx-0x1],al
xor    al,BYTE PTR [ecx-0x1]
mov    BYTE PTR [ecx-0x1],al
xor    BYTE PTR [edx-0x1],al
dec    ebx
inc    edx
inc    ecx
cmp    ebx,0xffffffff
jne    loopstart</pre>
</blockquote>
<p>So the loop is three memory reads, three writes and some increments of the pointers / loop counter. Visual C++ 2008 compiles it very similarly, just uses more complex addressing mode to save one loop counter:</p>
<blockquote><pre>movzx       edx,byte ptr [ecx+eax]
xor         byte ptr [eax],dl
mov         dl,byte ptr [eax]
xor         byte ptr [ecx+eax],dl
mov         dl,byte ptr [ecx+eax]
xor         byte ptr [eax],dl
dec         esi
inc         eax
test        esi,esi
jne         loopstart</pre>
</blockquote>
<p>What if we don't do this "XOR trick", and just swap the contents using a temporary variable?</p>
<blockquote><pre>
// ...
char t = *p; *p = *q; *q = t;
// ...
</pre>
</blockquote>
<p>Lo and behold, now it runs at 7 cycles / pixel (almost twice as fast), and the inner loop is two memory reads and two writes:</p>
<blockquote><pre>
movzx  edx,BYTE PTR [ebx-0x1]
movzx  eax,BYTE PTR [ecx-0x1]
mov    BYTE PTR [ebx-0x1],al
mov    BYTE PTR [ecx-0x1],dl
// ... incrementing pointers / counter here, like in previous case
</pre>
</blockquote>
<p>So yeah. The XOR trick is pretty much useless here - it's twice as slow. Hey, it can even be slower as images get larger - if tested on a 2048x2048 image, regular swap still takes 7 cycles/pixel, but XOR trick takes 55 cycles/pixel!</p>
<p>I guess XOR trick is useful only in quite rare situations, for example when you're inside of some inner loop and want to swap register values without spilling them to memory or using an additional register. Heh, <a href="http://en.wikipedia.org/wiki/XOR_swap_algorithm">Wikipedia has info on this</a>, so I'm not saying anything new :)</p>
<p>Now of course, if we happen to know that our pixels are 32 bits in size, there's no good reason to keep the loop in bytes. We can operate on integers instead:</p>
<blockquote><pre>
void memswapI( void* m1, void* m2, size_t n )
{
    size_t nn = n/sizeof(int);
    int *p = (int*)m1; int *q = (int*)m2;
    while ( nn-- ) {
        int t = *p; *p = *q; *q = t;
        p++; q++;
    }
}</pre>
</blockquote>
<p>This runs at 1.5 cycles/pixel (XOR variant at 2.5 cycles/pixel). The assembly is pretty much the same, just with 32 bit registers.</p>
<p>Another option? If you use STL, just use:</p>
<blockquote><pre>std::swap_ranges(p, p+n, q);</pre>
</blockquote>
<p>on the pixel datatype. On 32 bit pixels, this also runs at 1.5 cycles/pixel.</p>
<p>So yeah. Don't try to outsmart the compiler without measuring it.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Implicit to-pointer operators must die!</title>
		<link>http://aras-p.info/blog/2008/10/09/implicit-to-pointer-operators-must-die/</link>
		<comments>http://aras-p.info/blog/2008/10/09/implicit-to-pointer-operators-must-die/#comments</comments>
		<pubDate>Thu, 09 Oct 2008 13:15:26 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=223</guid>
		<description><![CDATA[For the sake of the nation, this operator must die! Seriously. Suppose there is some class, let&#8217;s say ColorRGBAf. That has four floats inside. Now, someone at some point decided to add this operator to it: operator float* () { /**/ } operator const float* () const { /**/ } Probably because it&#8217;s easier to [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>For the sake of the nation,<br />
this operator must die!</p></blockquote>
<p>Seriously. Suppose there is some class, let&#8217;s say <code>ColorRGBAf</code>. That has four floats inside. Now, someone at some point decided to add this operator to it:</p>
<blockquote><p>operator float* () { /**/ }<br />
operator const float* () const { /**/ }</p></blockquote>
<p>Probably because it&#8217;s easier to pass color to OpenGL this way, or something like that.</p>
<p>This is evil. Like, really <strong>evil</strong>. Especially if that class did not have comparison operators defined, and some totally unrelated code four years later does:</p>
<blockquote><p>if (color != oldColor) { /* &#8230; */ }</p></blockquote>
<p>Ouch! Sounds like someone will spend four hours debugging something that looks like an event routing issue that <em>only</em> happens on Windows and <em>only</em> with optimizations on <em>(yes, I just did that&#8230;)</em>.</p>
<p>What happens here? The compiler takes pointers to two colors and compares <em>the pointers</em>. If for some reason both colors are temporary objects, then it can even happen that <em>both</em> get folded into the same variable/register/whatnot. The pointers are the same. Ouch!</p>
<p>Implicit &#8220;nice&#8221; operators are just disguised evil. Remove that operator, add something like <code>GetPointer()</code> to class if someone really wants to use that, and better even make the comparison operators private and without implementations. Yes. Much better.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/10/09/implicit-to-pointer-operators-must-die/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How watchdog threads should NOT be done&#8230;</title>
		<link>http://aras-p.info/blog/2008/09/05/how-watchdog-threads-should-not-be-done/</link>
		<comments>http://aras-p.info/blog/2008/09/05/how-watchdog-threads-should-not-be-done/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 09:48:22 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=207</guid>
		<description><![CDATA[Here, a thread function that checks whether some tool got stuck: static void WatchdogFunc() { while( true ) { time_t now = time(NULL); Mutex::AutoLock lock(g_WatchdogMutex); if( now - g_StartTime > kWatchdogTimeout ) ComplainLoudlyAndDoSomething(); Thread::Sleep( 0.1f ); } } Mutex is taken because g_StartTime can be occasionally updated by the same tool. Yes, possibly a mutex [...]]]></description>
			<content:encoded><![CDATA[<p>Here, a thread function that checks whether some tool got stuck:</p>
<blockquote><p><code>
<pre>static void WatchdogFunc()
{
    while( true )
    {
        time_t now = time(NULL);
        Mutex::AutoLock lock(g_WatchdogMutex);
        if( now - g_StartTime > kWatchdogTimeout )
            ComplainLoudlyAndDoSomething();
        Thread::Sleep( 0.1f );
    }
}</pre>
<p></code></p></blockquote>
<p>Mutex is taken because g_StartTime can be occasionally updated by the same tool. Yes, possibly a mutex is an overkill here, and aligned variable + some memory fences should be enough (or just nothing), but hey, this is some random offline tool code.</p>
<p>What is horribly wrong with it?</p>
<p>Mutex is held locked for the whole duration of Sleep! That is, almost all the time; and other thread(s) barely have a chance to ever update g_StartTime.</p>
<p>And this is the code I&#8217;ve written. Oh stupid me.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/09/05/how-watchdog-threads-should-not-be-done/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It must be a bug in OS/compiler/&#8230;</title>
		<link>http://aras-p.info/blog/2008/07/16/it-must-be-a-bug-in-oscompiler/</link>
		<comments>http://aras-p.info/blog/2008/07/16/it-must-be-a-bug-in-oscompiler/#comments</comments>
		<pubDate>Wed, 16 Jul 2008 20:02:27 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[random]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=187</guid>
		<description><![CDATA[Ever looked at the code which is absolutely correct, yet runs incorrectly? Sometimes it looks like a genuine compiler bug. &#8220;I swear, mister! The compiler corrupts my code!&#8221; Look again. And again. Eventually you&#8217;ll find where your code is broken. (Of course, in some cases quite often the compiler is broken&#8230; GLSL, anyone?) Pimp my [...]]]></description>
			<content:encoded><![CDATA[<p>Ever looked at the code which is <em>absolutely correct</em>, yet runs incorrectly? Sometimes it looks like a genuine compiler bug. <em>&#8220;I swear, mister! The compiler corrupts my code!&#8221;</em></p>
<p>Look again. And again. Eventually you&#8217;ll find where your code is broken.</p>
<p><em>(Of course, in some cases quite often the compiler is broken&#8230; GLSL, anyone?)</em></p>
<p><a href="http://wilshipley.com/blog/2008/07/pimp-my-code-part-15-greatest-bug-of.html">Pimp my code, part 15: The Greatest Bug of All</a> says the above in a much nicer way:</p>
<blockquote><p>Maybe the problem was there was some huge bug in Apple&#8217;s Mach, where if you open too many files in a short period of time, the filesystem tried to, like, cache the results, and the cache blew up, and as a result the filesystem incorrectly just would fail to open any more files, instead of flushing the cache.</p>
<p>&#8230;</p>
<p>I&#8217;ve also been around long enough to <em>know</em> that whenever I know the operating system must be bugged, since <em>my</em> code is correct, I should take a damn close look at my code. The old adage (not mine) is that 99% of the time operating system bugs are actually bugs in your program, and the other 1% of the time they are still bugs in your program, so look harder, dammit.</p></blockquote>
<p>A post well worth reading&#8230; about the process of investigating tricky bugs. And sincere as well. It&#8217;s so good that I&#8217;ll just quote it again:</p>
<blockquote><p>It&#8217;s a bug we should have caught. We should have spent the time to get the images in the 10,000 item file. I messed up.</p>
<p>Software is written by humans. Humans get tired. Humans become discouraged. They aren&#8217;t perfect beings. As developers, we want to pretend this isn&#8217;t so, that our software springs from our head whole and immaculate like the goddess Athena. Customers don&#8217;t want to hear us admit that we fail.</p>
<p>The measure of a man cannot be whether he ever makes mistakes, because he <em>will</em> make mistakes. It&#8217;s what he does in response to his mistakes. The same is true of companies.</p>
<p>We have to apologize, we have to fix the problem, and we have to learn from our mistakes.</p></blockquote>
<p>So very true.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/07/16/it-must-be-a-bug-in-oscompiler/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Depth bias and the power of deceiving yourself</title>
		<link>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/</link>
		<comments>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 06:52:19 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=176</guid>
		<description><![CDATA[In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords [...]]]></description>
			<content:encoded><![CDATA[<p>In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it&#8217;s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven&#8217;t done that yet because of complexities of Direct3D <em>and</em> OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy.</p>
<p>And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you&#8217;d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer&#8217;s art):<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenenobias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenenobias-300x225.png" alt="Mixing fixed function and vertex shaders" title="scenenobias" width="300" height="225" class="alignnone size-medium wp-image-177" /></a></p>
<p><em>Not pretty at all!</em> This should have looked like this:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenegoodbias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenegoodbias-300x225.png" alt="All good here" title="scenegoodbias" width="300" height="225" class="alignnone size-medium wp-image-178" /></a></p>
<p>So what do we do to make it look like this? We &#8220;pull&#8221; (bias) some rendering passes slighly towards the camera, so there is no depth fighting.</p>
<p>Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all &#8211; they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases.</p>
<p>How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set <a href="http://www.opengl.org/documentation/specs/man_pages/hardcopy/GL/html/gl/polygonoffset.html">glPolygonOffset</a> to something like -1, -1. This works.</p>
<p>How do you apply depth bias in Direct3D 9? <em>Conceptually</em>, you do the same. There are <a href="http://msdn.microsoft.com/en-us/library/bb205599(VS.85).aspx">DEPTHBIAS and SLOPESCALEDEPTHBIAS</a> render states that do just that. And so we did use them.</p>
<p><a href="http://forum.unity3d.com/viewtopic.php?t=8443">And people complained</a> about funky results on Windows.</p>
<p>And I&#8217;d look at their projects, see that they are using something like 0.01 for camera&#8217;s near plane and 1000.0 for the far plane, and tell them something along the lines of <em>&#8220;increase your near plane, stupid!&#8221;</em> (well ok, without the &#8220;stupid&#8221; part). And I&#8217;d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it&#8217;s often not needed but on Direct3D it&#8217;s pretty much always needed. And yes, how sometimes that can produce &#8220;double lighting&#8221; artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry.</p>
<p>Sometimes this helped! I was <em>so convinced</em> that their too-low-near-plane was always the culprit.</p>
<p>And then one day I decided to check. This is what I&#8217;ve got on Direct3D:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbias-300x225.png" alt="Depth bias artefacts" title="scenebadbias" width="300" height="225" class="alignnone size-medium wp-image-179" /></a></p>
<p>Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I&#8217;ve got:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbiasfail.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbiasfail-300x225.png" alt="Epic fail!" title="scenebadbiasfail" width="300" height="225" class="alignnone size-medium wp-image-180" /></a></p>
<p><em>Not good at all.</em></p>
<p>What happened? It happened in roughly this way:</p>
<ol>
<li>First, depth bias <a href="http://msdn.microsoft.com/en-us/library/bb205599(VS.85).aspx">documentation</a> on Direct3D is wrong. Depth bias is <em>not</em> in 0..16 range, it is in 0..1 range which corresponds to entire range of depth buffer.</li>
<li>Back then, our code was always using 16 bit depth buffers, so the equivalent of -1,-1 depth bias in OpenGL was multiplied with something like 1.0/65535.0, and that was fed into Direct3D. <em>Hey, it seemed to work!</em></li>
<li>Later on, the device setup code was modified to do proper format selection, so most often it ended up using 24 bit depth buffer. <em>Of course</em> <del datetime="2008-06-12T06:33:50+00:00">no one</del><ins datetime="2008-06-12T06:50:43+00:00"> I</ins> never modified the depth bias code to account for this change&#8230;</li>
<li>And it stayed there. And I kept deceiving myself that the content of the users is to blame, and not some stupid code of mine.</li>
</ol>
<p><strong>It&#8217;s good to check your assumptions once in a while.</strong></p>
<p>So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is <em>really small</em>, so something like 4.8e-7 should be used instead (see <a href="http://terathon.com/gdc07_lengyel.ppt">Lengyel&#8217;s GDC2007 talk</a>). Oh, but for some reason it&#8217;s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias.</p>
<p>With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon).</p>
<p><em>&#8230;and yes, I know that real men fudge projection matrix instead of using depth bias&#8230; someday maybe.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Argh MFC!</title>
		<link>http://aras-p.info/blog/2008/05/20/argh-mfc/</link>
		<comments>http://aras-p.info/blog/2008/05/20/argh-mfc/#comments</comments>
		<pubDate>Tue, 20 May 2008 07:02:32 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=172</guid>
		<description><![CDATA[When introductory documentation for something has this, you know it won&#8217;t be pretty: CAsyncMonikerFile is derived from CMonikerFile, which in turn is derived from COleStreamFile. A COleStreamFile object represents a stream of data; a CMonikerFile object uses an IMoniker to obtain the data, and a CAsyncMonikerFile object does so asynchronously. So yeah, I am dealing [...]]]></description>
			<content:encoded><![CDATA[<p>When introductory documentation for something <a href="http://msdn.microsoft.com/en-us/library/35a0c067.aspx">has this</a>, you know it won&#8217;t be pretty:</p>
<blockquote><p>CAsyncMonikerFile is derived from CMonikerFile, which in turn is derived from COleStreamFile. A COleStreamFile object represents a stream of data; a CMonikerFile object uses an IMoniker to obtain the data, and a CAsyncMonikerFile object does so asynchronously.</p></blockquote>
<p>So yeah, I am dealing with downloading something from the internet inside an ActiveX control that is written in MFC. A seemingly simple task &#8211; I give you an URL, you give me back the bytes. But no! That would not be a proper architecture, so instead it has asynchronous monikers which are based on monikers which are based on stream files which use some interfaces and whatnot. And for ActiveX controls the docs suggest using CDataPathProperty or CCachedDataPathProperty, which are abstractions build on top of the above crap. And I don&#8217;t even know <em>what</em> &#8220;a moniker&#8221; is!</p>
<p>Of course all this complexity fails spectacularly in some quite common situations. For example, try downloading something when the web server serves gzip compressed html output. Good luck trying to figure out why everything seemingly works, you are notified of downloading progress, but never get the actual downloaded bytes.</p>
<p>Turns out the solution is to change downloading behaviour of the above pile of abstractions to <a href="http://groups.google.be/group/microsoft.public.inetsdk.programming.urlmonikers/browse_thread/thread/45315a0d0860d61a/cfa2bbabad8ff438?hl=en">use &#8220;pull data&#8221; model</a>, instead of default &#8220;push data&#8221; model. The default behaviour just seems to be broken (though it is not broken in that pile of abstractions, instead it is broken somewhere deeper in Windows code). Is this mentioned <em>anywhere</em> in the docs? Of course not!</p>
<p>This is pretty much how a code comment looks like for this:</p>
<blockquote><p>We don&#8217;t use CCachedDataPathProperty because it&#8217;s awfully slow, doing data reallocations for each 1KB received. For 8MB file it&#8217;s 8000 reallocations and 32 GB (!) of data copied for no good reason!</p>
<p>While we&#8217;re at it, we don&#8217;t use CDataPathProperty either, because it&#8217;s a useless wrapper over CAsyncMonikerFile.</p>
<p>Oh, and we don&#8217;t use CAsyncMonikerFile either, because it has bugs in VS2003&#8242; MFC where it never notifies the container that it is done with download, making IE still display &#8220;X items remaining&#8221; indefinitely. Some smart coder was converting information message and returning &#8220;out of memory&#8221; error if result was NULL, even if input message was NULL (which it often was). So we use our own &#8220;fixed&#8221; version of CAsyncMonikerFile instead.
</p></blockquote>
<p>Oh MFC, how we love thee.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/05/20/argh-mfc/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>On job titles and design patterns</title>
		<link>http://aras-p.info/blog/2008/05/09/on-job-titles-and-design-patterns/</link>
		<comments>http://aras-p.info/blog/2008/05/09/on-job-titles-and-design-patterns/#comments</comments>
		<pubDate>Fri, 09 May 2008 11:25:20 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=168</guid>
		<description><![CDATA[I just changed my job title to say &#8220;Code Chef&#8220;. I like it, and it represents my current understanding of programming pretty well. I cook code. That&#8217;s my job. Some N years ago I would have liked a title with &#8220;Architect&#8221; or &#8220;Analyst&#8221; or something like that. I would have called myself &#8220;developer&#8221; instead of [...]]]></description>
			<content:encoded><![CDATA[<p>I just changed my job title to say &#8220;<a href="http://www.linkedin.com/in/nearaz">Code Chef</a>&#8220;. I like it, and it represents my current understanding of programming pretty well. I cook code. That&#8217;s my job.</p>
<p>Some N years ago I would have liked a title with &#8220;Architect&#8221; or &#8220;Analyst&#8221; or something like that. I would have called myself &#8220;developer&#8221; instead of &#8220;programmer&#8221; because hey, a developer thinks up things, whereas a programmer is a mere &#8220;code monkey&#8221;. More on code monkeys below.</p>
<p>But wait! Back then I also believed that knowing and using <a href="http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29">Design Patterns</a> is essential for a programmer! In one place when I was interviewing new hires, design pattern knowledge was something I would look for&#8230; <em>how stupid!</em> Nowadays my view of patterns is more along the lines of &#8220;yeah, whatever&#8221;. I don&#8217;t exactly think of them as <a href="http://realtimecollisiondetection.net/blog?p=44">things from hell</a>, but they could have caused more harm than good already.</p>
<p>Back to job titles. Code monkey is actually the key employee. A software product is largely defined by the code, heck, it <em>is</em> code. Sure, it also has the user interface, the fancy icons, the documentation, the website, the support, the roadmap and whatnot, but the code <em>is</em> the product, whereas everything else is more or less addons (possibly excluding UI&#8230; UI also defines the product).</p>
<p>Code design? Design patterns? Who cares about that.</p>
<p>It&#8217;s the final result that matters. <a href="http://meshula.net/wordpress?p=168">Futurist programming</a> for the win.</p>
<p><em>On the other hand, <a href="http://realtimecollisiondetection.net/blog/?p=44#comment-662">Memento Observer</a> is probably very cool.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/05/09/on-job-titles-and-design-patterns/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Tricky bugs: peculiarities of dynamic linking, and magic divisions</title>
		<link>http://aras-p.info/blog/2008/04/19/tricky-bugs/</link>
		<comments>http://aras-p.info/blog/2008/04/19/tricky-bugs/#comments</comments>
		<pubDate>Sat, 19 Apr 2008 19:00:41 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=166</guid>
		<description><![CDATA[After wasting nearly two days on some really funky animation import crash, I checked in a code change with this log message: Fix FBX animation import crash once more. When exported symbols are not listed for a dylib, it seems to link back to calling executable (?!), making them share function impls with the same [...]]]></description>
			<content:encoded><![CDATA[<p>After wasting nearly two days on some really funky animation import crash, I checked in a code change with this log message:</p>
<blockquote><p>Fix FBX animation import crash once more. When exported symbols are not listed for a dylib, it seems to link <em>back</em> to calling executable (?!), making them share function impls with the same name. And because Keyframe is actually different in editor vs ImportFBX, this is wrong. Apparently this is OS X Leopard only, or something. Argh.</p></blockquote>
<p>The code change in question was just telling the compiler &#8220;here&#8217;s the list of the functions that are exported from this dynamic library&#8221;. The list was already there, just the compiler was never told about existence of it.</p>
<p>The bug manifested itself as a crash when importing animations. But it would not happen when importer was run from a small unit test application. There were no memory corruptions happening, it was not running out of memory, yet the code was crashing with access violation, usually because STL&#8217;s vector was returning it&#8217;s wrong size (but the actual data of the vector was correct; it was just returning bogus size). And it was doing that only on OS X Leopard, and not on OS X Tiger. <em>Huh?</em></p>
<p>Turns out what did happen &#8211; and I&#8217;m not sure if that&#8217;s a bug in OS X or a feature &#8211; is that the calling application did contain a class called Keyframe. And the shared library (where the crash was happening) also contained a class called Keyframe. But those classes were slightly different; first was 20 bytes in size, and second one was 16 bytes.</p>
<p>Now, <em>somehow</em> when the shared library was calling vector&lt;Keyframe&gt;::size(), the <em>function from the calling application</em> was used. I have no idea at all <em>how or why</em> this was happening, but it sure was! I could see from tracing the assembly code, that it was doing difference of two pointers, and then doing <em>something that for sure was not</em> division by 16.</p>
<p>What was the code doing? Turns out it was calculating division by 20 in a cunning way:</p>
<blockquote><pre>
mov  edx,esi   # edx = end()
sub  edx,eax   # edx -= begin()
mov  eax,edx   # eax = edx
sar  eax,0x2   # eax >>= 2
imul eax,eax,0xcccccccd # eax *= 0xcccccccd
</pre>
</blockquote>
<p>In other words, the compiler was replacing division by constant (as used in vector&#8217;s size()) by a shift and multiplication with a magic number. You can read more about the technique <a href="http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx">here</a> or <a href="http://www.nynaeve.net/?p=115">here</a>.</p>
<p>But of course the code above <em>only works</em> if the number was actually divisible by 20; otherwise it returns <em>totally wrong</em> result. This is perfectly fine for computing the difference in two pointers to structures of known size&#8230; Except that inside the shared library the Keyframe structures are 16 bytes, and not 20!</p>
<p>So yeah. Watch out for peculiarities of dynamic linking on your platform.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/04/19/tricky-bugs/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Dogfooding: PeaNinjas part 1</title>
		<link>http://aras-p.info/blog/2008/02/20/dogfooding-peaninjas-part-1/</link>
		<comments>http://aras-p.info/blog/2008/02/20/dogfooding-peaninjas-part-1/#comments</comments>
		<pubDate>Wed, 20 Feb 2008 19:42:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[games]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2008/02/20/dogfooding-peaninjas-part-1/</guid>
		<description><![CDATA[I decided to make a very small game with Unity. Coincidentally, Danc of Lost Garden fame just announced a small game design challenge called &#8220;Play With Your Peas&#8220;. It comes with a set of cute graphics and a ready-to-be-implemented game design. What more could I want? So it&#8217;s a small very small 2D game without [...]]]></description>
			<content:encoded><![CDATA[<p>I decided to make a very small game with Unity. Coincidentally, Danc of <a href="http://www.lostgarden.com/">Lost Garden</a> fame just announced a small game design challenge called &#8220;<a href="http://lostgarden.com/2008/02/play-with-your-peas-game-prototyping.html">Play With Your Peas</a>&#8220;. It comes with a set of cute graphics and a ready-to-be-implemented game design. What more could I want?</p>
<p>So it&#8217;s a <del datetime="2008-02-20T19:15:28+00:00">small</del> very small 2D game without <em>any</em> next-gen bells and whistles. It can probably be done casually on the side, by allocating an hour here and there. We&#8217;ll see how it goes. Hey, I never <em>actually</em> done any game in Unity, I only make or break some underlying parts&#8230;</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080211a.png' title='Look! No game there!'><img class='alignleft' src='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080211a.thumbnail.png' alt='Look! No game there!' /></a>Of course, first I start with no game, just imported graphics. Hey look, I can do sprites!</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216a.png' title='Level editing'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216a.thumbnail.png' alt='Level editing' /></a>Then cook up some base things: define the game grid, throw in some basic user interface on the right hand side, and make it actually do something. This wasn&#8217;t so hard; that already gets me an almost working level building functionality. It does not have fancy block building delay or block deletion yet; that will come later.</p>
<p>Next come basic physics. Danc&#8217;s design calls for simple arcade-like physics (things moving at constant speeds, bouncing off at equal angles, and so on), but in Unity I have a fully fledged <a href="http://unity3d.com/unity/features/physics">physics engine</a> just waiting to be used. Let&#8217;s use that.</p>
<p>The design has sloped ramp pieces, which are hard to approximate using any primitive colliders, so instead I&#8217;ll use convex mesh colliders for them. Now, on this machine I only have Blender, which I totally don&#8217;t know how to use; and I was too lazy to go to PC and use 3ds Max there. What a coder does? Of course, just type in the mesh file in ASCII FBX format. Excerpt:</p>
<blockquote><p>; scaled 2x in Z, by 0.85 in Y<br />
Vertices: -0.5,-0.425,-1.0, 0.5,-0.425,-1.0, -0.5,-0.425,1.0, 0.5,-0.425,1.0,  -0.5,0.425,-1.0, -0.5,0.425,1.0<br />
PolygonVertexIndex: 0,1,-3,2,1,-4,1,0,-5,2,3,-6,0,2,-5,2,5,-5,3,1,-5,5,3,-5
</p></blockquote>
<p>It&#8217;s a left ramp mesh! So much for fancy <a href="http://unity3d.com/unity/features/asset-importing">asset auto-importing</a> functionality, when you don&#8217;t know how to use those 3D apps :)</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216b.png' title='Physics!'><img class='alignleft' src='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216b.thumbnail.png' alt='Physics!' /></a><a href='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216c.png' title='Pea stack!'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2008/02/peas080216c.thumbnail.png' alt='Pea stack!' /></a>After a while I&#8217;ve got peas being controlled by physics, colliding with level and so on. Physics is very bad for productivity, as I ended up just playing around with pea-stacks!</p>
<p>So far there&#8217;s no <em>game</em> yet&#8230; Next up: implement some AI for the peas, so they can wander around, climb the walls, fall down and bounce around. I guess that will be more work and less playing around&#8230; We&#8217;ll see.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/02/20/dogfooding-peaninjas-part-1/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What&#8217;s taking up space in your programs?</title>
		<link>http://aras-p.info/blog/2008/01/17/whats-taking-up-space-in-your-programs/</link>
		<comments>http://aras-p.info/blog/2008/01/17/whats-taking-up-space-in-your-programs/#comments</comments>
		<pubDate>Thu, 17 Jan 2008 10:24:54 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2008/01/17/whats-taking-up-space-in-your-programs/</guid>
		<description><![CDATA[Ever wondered what takes up space in the programs you write? I certainly did on a number of occasions. For some reason though, I could not find a decent tool that would look at a Visual Studio compiled executable or a DLL, and report an overview of how large are the functions, classes, object files [...]]]></description>
			<content:encoded><![CDATA[<p>Ever wondered what takes up space in the programs you write? I certainly did on a number of occasions.</p>
<p>For some reason though, I could not find a decent tool that would look at a Visual Studio compiled executable or a DLL, and report an overview of how large are the functions, classes, object files and whatnot. <a href="http://farbrausch.com/~fg/kkrunchy/">.kkrunchy</a> executable packer does have a very nice size report, but it&#8217;s not exactly suitable for large executables&#8230;</p>
<p>Anyway, <a href="http://farbrausch.com/~fg/">ryg</a> of farbrausch fame was kind enough to donate the size reporting code, I did some modifications, and here it is: <a href="http://aras-p.info/projSizer.html"><strong>Sizer</strong> &#8211; executable symbol size reporting utility</a>.</p>
<p>Enjoy. Oh, and the source code looks messy mostly because ryg and I use different indentation, and I never cared to format everything with a single style. Noone cares about the source code anyway, as long as it works. I&#8217;m not claiming that <em>this</em> code works, of course!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/01/17/whats-taking-up-space-in-your-programs/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Oblique near plane with orthographic camera</title>
		<link>http://aras-p.info/blog/2007/11/12/oblique-near-plane-with-orthographic-camera/</link>
		<comments>http://aras-p.info/blog/2007/11/12/oblique-near-plane-with-orthographic-camera/#comments</comments>
		<pubDate>Mon, 12 Nov 2007 07:58:38 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/11/12/oblique-near-plane-with-orthographic-camera/</guid>
		<description><![CDATA[Could not find any info how to do oblique near clipping plane for orthographic projections, so had to figure it out myself. It even wasn&#8217;t hard! Here it is.]]></description>
			<content:encoded><![CDATA[<p>Could not find any info how to do oblique near clipping plane for orthographic projections, so had to figure it out myself. It even wasn&#8217;t hard!</p>
<p><a href="http://aras-p.info/texts/obliqueortho.html">Here it is</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/11/12/oblique-near-plane-with-orthographic-camera/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing graphics code</title>
		<link>http://aras-p.info/blog/2007/07/31/testing-graphics-code/</link>
		<comments>http://aras-p.info/blog/2007/07/31/testing-graphics-code/#comments</comments>
		<pubDate>Tue, 31 Jul 2007 21:49:45 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/07/31/testing-graphics-code/</guid>
		<description><![CDATA[Everyone is saying &#8220;unit tests for the win!&#8221; all over the place. That&#8217;s good, but how would you actually test graphics related code? Especially considering all the different hardware and drivers out there, where the result might be different just because the hardware is different, or because the hardware/driver understands your code in a funky [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone is saying &#8220;unit tests for the win!&#8221; all over the place. That&#8217;s good, but how would you actually test graphics related code? Especially considering all the different hardware and drivers out there, where the result might be different just because the hardware is different, or because the hardware/driver understands your code in a <em>funky</em> way&#8230;</p>
<p>Here is how we do it at <a href="http://unity3d.com">work</a>. This took quite some time to set up, but I think it&#8217;s very worth it.</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2007/07/test-lab.jpg' title='Testing Lab in action'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2007/07/test-lab.thumbnail.jpg' alt='Testing Lab in action' /></a>First you need <strong>hardware</strong> to test things on. For a start just a couple of graphics cards that you can swap in and out might do the trick. A larger problem is integrated graphics cards &#8211; it&#8217;s quite hard to swap them in and out, so we bit the bullet and bought a machine for each integrated card that we care about. The same machines are then used to test discrete cards (we have several shelves of those by now, going all the way back to&#8230; <em>does ATI Rage, Matrox G45 or S3 ProSavage say anything to you?</em>).</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2007/07/test-shots.png' title='It looks pretty random, huh?'><img  class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2007/07/test-shots.thumbnail.png' alt='It looks pretty random, huh?' /></a>Then you make the <strong>unit tests</strong> (or perhaps these should be called the functional tests). Build a small scene for every possible thing that you can imagine. Some examples:</p>
<ul>
<li>Do all blend modes work?</li>
<li>Do light cookies work?</li>
<li>Does automatic texture coordinate generation and texture transforms work?</li>
<li>Does rendering of particles work?</li>
<li>Does glow image postprocessing effect work?</li>
<li>Does mesh skinning work?</li>
<li>Do shadows from point lights work?</li>
</ul>
<p>This will result in a lot of tests, with each test hopefully testing a small, isolated feature. Make some setup that can load all defined tests in succession and take screenshots of the results. Make sure time always progresses at fixed rate (for the case where a test does not produce a constant image&#8230; like particle or animation tests), and take a screenshot of, for example, frame 5 for each test (so that some tests have some data to warm up&#8230; for example motion blur test).</p>
<p>By this time you have something that you can run and it spits out lots of screenshots. This is already <strong>very useful</strong>. Get a new graphics card, upgrade to new OS or install a new shiny driver? Run the tests, and obvious errors (if any) can be found just by quickly flipping through the shots. Same with the changes that are made in rendering related code &#8211; run the tests, see if anything became broken.</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2007/07/test-perl.png' title='My crappy Perl code…'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2007/07/test-perl.thumbnail.png' alt='My crappy Perl code…' /></a>The testing process can be further <strong>automated</strong>. Here we have a small set of Perl scripts that can either produce a suite of test images for the current hardware, or run all the tests and compare the results with &#8220;known to be correct&#8221; suite of images. As graphics cards are different from each other, the &#8220;correct&#8221; results will be somewhat different (because of different capabilities, internal precision etc.). So we keep a set of test results for each graphics card.</p>
<p><a href='http://aras-p.info/blog/wp-content/uploads/2007/07/test-drivers.png' title='That’s an awful lot of drivers!'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2007/07/test-drivers.thumbnail.png' alt='That’s an awful lot of drivers!' /></a>Then these scripts can be run for <strong>various driver versions</strong> on every graphics card. They compare results for each test case, and for failed tests copy out the resulting screenshot, the correct screenshot, log the failures into a wiki-compatible format (to be posted on some internal wiki), etc.</p>
<p>I&#8217;ve heard that some folks even go a step further &#8211; fully automate the testing of all driver versions. Install one driver in silent mode, reboot the machine, after reboot runs another script that launches the tests and proceeds with the next driver version. I don&#8217;t know if that is only an urban legend or if someone actually does this<sup>*</sup>, but that would be an interesting thing to try. The testing per card then would be: 1) install a card, 2) run the test script, 3) coffee break, happiness and profit!</p>
<p><sup>* My impression is that at least with the big games it works the other way around &#8211; you don&#8217;t test with the hardware; instead the hardware guys test with your game. That&#8217;s how it looks for a clueless observer like me at least.</sup></p>
<p>So far this unit test suite was really helpful in a couple of ways: making of the <a href="http://unity3d.com/unity/whats-new/unity-2.0">just-announced</a> Direct3D renderer and discovering new &#038; exciting graphics card/driver workarounds that we have to do. Making of the suite did take a lot of time, but I&#8217;m happy with it!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/07/31/testing-graphics-code/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Electronic Arts STL</title>
		<link>http://aras-p.info/blog/2007/07/16/electronic-arts-stl/</link>
		<comments>http://aras-p.info/blog/2007/07/16/electronic-arts-stl/#comments</comments>
		<pubDate>Mon, 16 Jul 2007 12:57:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/07/16/electronic-arts-stl/</guid>
		<description><![CDATA[A paper on Electronic Arts&#8217; implementation of Standard Template Library. Is it insane or the only sane thing to do? It&#8217;s insane amount of work, but it looks like they know what they&#8217;re doing. STL is broken in many ways, especially on memory limited systems&#8230; Now they could release it as open source with a [...]]]></description>
			<content:encoded><![CDATA[<p>A paper on <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html">Electronic Arts&#8217; implementation</a> of Standard Template Library.</p>
<p>Is it insane or the only sane thing to do? It&#8217;s insane amount of work, but it looks like they know what they&#8217;re doing. STL is broken in many ways, especially on memory limited systems&#8230; Now they could release it as open source with a decent license!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/07/16/electronic-arts-stl/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Debugging story: video memory leaks</title>
		<link>http://aras-p.info/blog/2007/07/14/debugging-story-video-memory-leaks/</link>
		<comments>http://aras-p.info/blog/2007/07/14/debugging-story-video-memory-leaks/#comments</comments>
		<pubDate>Sat, 14 Jul 2007 19:31:19 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/07/14/debugging-story-video-memory-leaks/</guid>
		<description><![CDATA[I ranted about OpenGL p-buffers a while ago. Time for the whole story! From time to time I hit some nasty debugging situation, and it always takes ages to figure out, and the path to the solution is always different. This is an example of such a debugging story. While developing shadow mapping I implemented [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://aras-p.info/blog/2007/06/04/opengl-pbuffers-suck">ranted</a> about OpenGL p-buffers a while ago. Time for the whole story!</p>
<p>From time to time I hit some nasty debugging situation, and it always takes <em>ages</em> to figure out, and the path to the solution is always different. This is an example of such a debugging story.</p>
<p>While developing shadow mapping I implemented a &#8220;screen space shadows&#8221; thing (where cascaded shadow maps are gathered into a screen-space texture and shadow receiver rendering later uses only that texture). Then while being in the editor and maximizing/restoring the window a few times, everything locks up for 3 or 5 seconds, then resumes normally.</p>
<p>So there&#8217;s a problem: a complete freeze after editor window is being resized after a couple of times (not immediately!), but otherwise everything just works. Where is the bug? What caused it?</p>
<p>Since shadows were working fine before, and I never noticed such lock-ups &#8211; it must be the screen-space shadow gathering thing that I just implemented, right? <em>(Fast-forward answer: no)</em> So I try to figure out <em>where</em> the lock-up is happening. Profiling does not give any insights &#8211; the lock-up is not even in my process, instead &#8220;somewhere&#8221;. Hm&#8230; I insert lots of manual timing code around various code blocks (that deal with shadows). They say the lock-up <em>most often</em> happens when activating a new render texture (an OpenGL p-buffer), specifically, calling a glFlush(). But not always, sometimes it&#8217;s still somewhere else.</p>
<p>After some head-scratching, a session with OpenGL Driver Profiler reveals what is actually happening &#8211; video memory is leaked! Apparently Mac OS X &#8220;virtualizes&#8221; VRAM, and when it runs out, the OS will still happily create p-buffers and so on, it will just start swapping VRAM contents to AGP/PCIe area. This swapping causes the lock-up. Ok, so now I know <em>what</em> is happening, I just need to find out <em>why</em>.</p>
<p>I look at all the code that deals with render textures &#8211; it looks ok. And it would be pretty strange if a VRAM leak would be unnoticed for two years since Unity is out in the wild&#8230; So that must be the depth render textures that are causing a leak (since they are a new type for the shadows), right? <em>(Answer: no)</em></p>
<p>I build a test case that allocates and deallocates a bunch of depth render textures each frame. No leaks&#8230; Huh.</p>
<p>I change my original code so that it gathers screen-space shadows onto the screen directly, instead of the screen-sized texture. No leaks&#8230; Hm&#8230; So it must be the depth render texture followed by screen-size render texture, that is causing the leaks, right? <em>(Answer: no)</em> Because when I have just the depth render texture, I have no leaks; and when I have no depth render texture, instead I gather shadows &#8220;from nothing&#8221; into a screen-size texture, I also have no leaks. So it must be the combination!</p>
<p>So far, the theory is that rendering into a depth texture followed by creation of screen-size texture will cause a video memory leak <em>(Answer: no)</em>. It looks like it leaks the amount that should be taken by depth texture (I say &#8220;it looks&#8221; because in OpenGL you never know&#8230; it&#8217;s all abstracted to make my life easier, hurray!). Looks like a fine bug report, time to build a small repro application that is completely separate from Unity.</p>
<p>So I grab some p-buffer sample code from Apple&#8217;s developer site, change it to also use depth textures and rectangle textures, remove all unused cruft, code the expected bug pattern (render into depth texture followed by rectangle p-buffer creation) and&#8230; it does not leak. D&#8217;oh.</p>
<p>Ok, another attempt: I take the p-buffer related code out of Unity, build a small application with just that code, code the expected bug pattern and&#8230; it does not leak! Huh?</p>
<p><em>Now what?</em></p>
<p>I compare the OpenGL call traces of Unity-in-test-case (leaks) and Unity-code-in-a-separate-app (does not leak). Of course, the Unity case does a lot more; setting up various state, shaders, textures, rendering actual objects with actual shaders, filtering out redundant state changes and whatnot. So I try to bring in bits of stuff that Unity does into my test application.</p>
<p>After a while I made my test app leak video memory (now that&#8217;s an achievement)! Turns out the leak happens when doing this:</p>
<ol>
<li>Create depth p-buffer</li>
<li>Draw to depth p-buffer</li>
<li>Copy it&#8217;s contents into a depth texture</li>
<li>Create a screen-sized p-buffer</li>
<li>Draw something into it <em>using</em> the depth texture</li>
<li>Release the depth texture and p-buffer</li>
<li>Release the screen-sized p-buffer</li>
</ol>
<p>My initial test app was not doing step 5&#8230; Now, <em>why</em> the leaks happens? Is it a bug or something I am doing wrong? And more importantly: how to get rid of it?</p>
<p>My suspicion was that OpenGL context sharing was somehow to blame here <em>(finally, a correct suspicion)</em>. We share OpenGL contexts, because, well, it&#8217;s the only sane thing to do &#8211; if you have a texture, mesh or shader somewhere, you really want to have it available both to the screen and when rendering into something else. The documentation on sharing of OpenGL contexts is extremely spartan, however. Like: &#8220;yeah, when they are shared, then the resources are shared&#8221; &#8211; great. Well, the actual text is like this (Apple&#8217;s <a href="http://developer.apple.com/qa/qa2001/qa1248.html">QA1248</a>):</p>
<blockquote><p>All sharing is peer to peer and developers can assume that shared resources are reference counted and thus will be<br />
maintained until explicitly released or when the last context sharing resources is itself released. It is helpful to think of this in the simplest terms possible and not to assume excess complication.</p></blockquote>
<p>Ok, <em>I am</em> thinking of this in the simplest terms possible&#8230; and it leaks video memory! The docs do not have a single word on <em>how</em> the resources are reference counted and what happens when a context is deleted.</p>
<p>Anyway, armed with my suspicion of context sharing being The Bad Guy here, I tried random things in my small test app. Turns out that unbinding any active textures from a context before switching to new one got rid of the leak. It looks like objects are refcounted by contexts, and they are not actually deleted while they are bound in some context (that is what I expect to happen). However, when a context itself is deleted, it seems as if it does not decrease refcounts of these objects (that is definitely what I don&#8217;t expect to happen). I am not sure if that&#8217;s a bug, or just undocumented &#8220;feature&#8221;&#8230;</p>
<p>All happy, I bring in my changes to the full codebase (&#8220;unbind any active textures before switching to a new context!&#8221;)&#8230; and the leak is still there. Huh?</p>
<p>After some head-scratching and randomly experimenting with <em>whatever</em>, turns out that you have to unbind any active &#8220;things&#8221; before switching to a new context. Even leaving a vertex buffer object bound can make a depth texture memory be leaked when another context is destroyed. Funky, eh?</p>
<p>So that was some 4 days wasted on chasing the bug that started out as &#8220;mysterious 5 second lock-ups&#8221;, went through &#8220;screen-space shadows leak video memory&#8221;, then through &#8220;depth textures followed by screen-size textures leak video memory&#8221; and through &#8220;unbind textures before switching contexts&#8221; to &#8220;unbind everything before switching contexts&#8221;. Would I have guessed it would end up like this? Not at all. I am still not sure if that&#8217;s the intended behavior or a bug; it looks more like a bug to me.</p>
<p>The take-away for OpenGL developers: <strong>when using shared contexts, unbind active textures, VBOs, shader programs etc. before switching OpenGL contexts</strong>. Otherwise at least on Mac OS X you will hit video memory leaks.</p>
<p>It&#8217;s somewhat sad that I find myself fighting issues like that most of my development time &#8211; not actually implementing some cool new stuff, but <em>making stuff actually work</em>. Oh well, I guess that is the difference between making (tech)demos and an actual software product.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/07/14/debugging-story-video-memory-leaks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Now that&#8217;s what I call a good API (stb_image)</title>
		<link>http://aras-p.info/blog/2007/05/28/now-thats-what-i-call-a-good-api-stb_image/</link>
		<comments>http://aras-p.info/blog/2007/05/28/now-thats-what-i-call-a-good-api-stb_image/#comments</comments>
		<pubDate>Mon, 28 May 2007 12:11:07 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/05/28/now-thats-what-i-call-a-good-api-stb_image/</guid>
		<description><![CDATA[The other day at work I needed a command line tool to compare some images (whether they mostly match, used in unit/functional tests). For unknown reason I could not get ImageMagick&#8217;s compare to work like I wanted, so I just wrote my own. I used stb_image library from Sean Barrett &#8211; and it just rocks! [...]]]></description>
			<content:encoded><![CDATA[<p>The other day at work I needed a command line tool to compare some images (whether they mostly match, used in unit/functional tests). For unknown reason I could not get ImageMagick&#8217;s <a href="http://www.imagemagick.org/script/compare.php">compare</a> to work like I wanted, so I just wrote my own.</p>
<p>I used <a href="http://nothings.org/stb_image.c">stb_image</a> library from <a href="http://nothings.org">Sean Barrett</a> &#8211; and it just rocks! Here&#8217;s the code to load a PNG image from file:</p>
<blockquote>
<pre>int width, height, bpp;
unsigned char* rgb = stbi_load( "myimage.png", &amp;width, &amp;height, &amp;bpp, 3 );
// rgb is now three bytes per pixel, width*height size. Or NULL if load failed.
// Do something with it...
stbi_image_free( rgb );</pre>
</blockquote>
<p>That&#8217;s it! Basically a single line to load the image (and of course the library has similar functions to load from a block of memory, etc.). And the whole &#8220;library&#8221; is a single file &#8211; just add to your project and there it is. In comparison, loading a PNG file using de-facto <a href="http://www.libpng.org/pub/png/libpng.html">libpng</a> takes more than 100 lines of code (and some time to read the docs).</p>
<p>Small is beautiful.</p>
<p>&#8230;and the way we do graphics related unit/functional/compatibility testing deserves a separate article. Sometime in the future!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/05/28/now-thats-what-i-call-a-good-api-stb_image/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s wrong with this code?</title>
		<link>http://aras-p.info/blog/2007/02/01/whats-wrong-with-this-code/</link>
		<comments>http://aras-p.info/blog/2007/02/01/whats-wrong-with-this-code/#comments</comments>
		<pubDate>Thu, 01 Feb 2007 09:49:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=100</guid>
		<description><![CDATA[Here&#8217;s a short function: inline int SecondsToEnergy( float time ) { return FastFloorfToInt( time * (float)(1 &#60;&#60; kEnergyFixedPoint) ); } It&#8217;s used in the particle system, and converts particle lifetime to an internal fixed point representation (10 bits for fractional part, i.e. kEnergyFixedPoint=10). Some of the emitted particles are okay on a Mac, but completely [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">Here&#8217;s a short function:</p>
<blockquote><pre>inline int SecondsToEnergy( float time )
{
  return FastFloorfToInt( time * (float)(1 &lt;&lt; kEnergyFixedPoint) );
}</pre>
</blockquote>
<p>It&#8217;s used in the particle system, and converts particle lifetime to an internal fixed point representation (10 bits for fractional part, i.e. kEnergyFixedPoint=10).</p>
<p>Some of the emitted particles are okay on a Mac, but completely not visible on Windows. This function is to blame.</p>
<p>Of course, what&#8217;s wrong is the possible overflow in float-to-int conversion. Whenever someone tries to use lifetime longer than about 2097151, the conversion to signed 32 bit integer is undefined. It seems to clamp result in gcc and produce something like -1 in msvc.</p>
<p>Using multiple compilers can be hard, but it can also help in finding obscure bugs. Ha!</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/02/01/whats-wrong-with-this-code/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>On work and clean code</title>
		<link>http://aras-p.info/blog/2006/07/01/on-work-and-clean-code/</link>
		<comments>http://aras-p.info/blog/2006/07/01/on-work-and-clean-code/#comments</comments>
		<pubDate>Sat, 01 Jul 2006 12:28:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=94</guid>
		<description><![CDATA[It&#8217;s been like 6 months of me working on Unity. So far so good. We&#8217;ve done a big new release recently, so after some pre-release insanity we&#8217;re a bit more relaxed. I guess not for very long though, we have more stuff planned than we can handle :) It sure feels nice to work on [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been like 6 months of me working on <a href="http://unity3d.com">Unity</a>. So far so good. We&#8217;ve done a <a href="http://unity3d.com/unity/whats-new/unity-1.5">big new release</a> recently, so after some pre-release insanity we&#8217;re a bit more relaxed. I guess not for very long though, we have more stuff planned than we can handle :)</p>
<p>It sure feels nice to work on an actual software <span style="font-style: italic;">product</span>. I think it&#8217;s probably the first time in my carreer that I <span style="font-style: italic;">know</span> people are using my work and I do care about that. Having worked on <span style="font-style: italic;">projects</span> before, it&#8217;s very different &#8211; a project just comes and goes, and once it&#8217;s finished you never think about it again. And most of the time you don&#8217;t care about &#8220;the clients&#8221; that much either. Working on a product is much more rewarding (especially if the users seem to like it).</p>
<p>Another interesting here is that we are a <span style="font-style: italic;">very small</span> software shop. So everyone has to be a one-man-army (the others certainly are, not sure about myself). Design, program, fix bugs, decide on features, do support, write docs and even do html tweaks for the website. Of course, it could be <span style="font-style: italic;">Jack of all trades, master of none (*)</span>, but somehow I feel that we are managing pretty well. And I like to be involved in various aspects of making a product.</p>
<p><span style="font-size:78%;">(*) though wikipedia says that the full saying is <span style="font-style: italic;">Jack of all trades, master of none, though ofttimes better than master of one</span> &#8211; which looks like a positive thing to me.<br />
</span><br />
A completely different theme: when programming, it&#8217;s always good to massage the code you&#8217;re working with a bit. Remove unneccessary #includes. Write a comment on tricky code block. Fix warnings. Do small refactorings. Remove unused code paths. It does not take much time and helps to keep the codebase clean. Removing unused code is especially good &#8211; for some reason I <span style="font-style: italic;">love</span> removing code. Could do that all day long; probably I&#8217;m some kind of anti-programmer :)</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2006/07/01/on-work-and-clean-code/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Back to some shader programming</title>
		<link>http://aras-p.info/blog/2006/05/24/back-to-some-shader-programming/</link>
		<comments>http://aras-p.info/blog/2006/05/24/back-to-some-shader-programming/#comments</comments>
		<pubDate>Wed, 24 May 2006 17:14:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=93</guid>
		<description><![CDATA[There is something magic in programming shaders. Like, when you edit one of our standard shaders and save, say, nine instructions in it &#8211; the feeling is really good. Maybe because, well, it&#8217;s a standard shader &#8211; so that means everyone&#8217;s graphics will actually render faster. Nice! Maybe it&#8217;s because shaders are such a short [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">There is something magic in programming shaders. Like, when you edit one of our <a href="http://unity3d.com/support/documentation/Components/Built-in%20Shader%20Guide.html">standard shaders</a> and save, say, nine instructions in it &#8211; the feeling is really good. Maybe because, well, it&#8217;s a standard shader &#8211; so that means everyone&#8217;s graphics will actually render faster. Nice!</p>
<p>Maybe it&#8217;s because shaders are such a short piece of code, without too complex dependencies&#8230; <span style="font-style: italic;">I&#8217;m sure anyone who knows graphics hardware will corect me here, but let&#8217;s oversimplify and pretend that shaders actually execute in a simple way&#8230;</span> So when you make a shader shorter, you pretty much know it&#8217;s going to be faster. When you make it &#8220;look better&#8221;, it almost certainly will look better. Try doing that in your regular big codebase &#8211; by optimizing something you may break something else; and in general you have no clue what to optimize unless you do your profiling homework. So, my take is that shaders are much simpler, so the joys of looking at assembly output actually make sense.</p>
<p>So, yeah, I&#8217;m back to some shader programming.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2006/05/24/back-to-some-shader-programming/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

