<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lost in the Triangles &#187; d3d</title>
	<atom:link href="http://aras-p.info/blog/tags/d3d/feed/" rel="self" type="application/rss+xml" />
	<link>http://aras-p.info/blog</link>
	<description>Random thoughts of a triangle pusher</description>
	<lastBuildDate>Fri, 16 Jul 2010 07:04:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Compiling HLSL into GLSL in 2010</title>
		<link>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/</link>
		<comments>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/#comments</comments>
		<pubDate>Fri, 21 May 2010 19:59:38 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=523</guid>
		<description><![CDATA[Realtime shader languages these days have settled down into two camps: HLSL (or Cg, which for all practical reasons is the same) and GLSL (or GLSL ES, which is sufficiently similar). HLSL/Cg is used by Direct3D and the big consoles (Xbox 360, PS3). GLSL/ES is used by OpenGL and pretty much all modern mobile platforms [...]]]></description>
			<content:encoded><![CDATA[<p>Realtime shader languages these days have settled down into two camps: HLSL (or Cg, which for all practical reasons is the same) and GLSL (or GLSL ES, which is sufficiently similar). HLSL/Cg is used by Direct3D and the big consoles (Xbox 360, PS3). GLSL/ES is used by OpenGL and pretty much all modern mobile platforms (iPhone, Android, &#8230;).</p>
<p>Since shaders are more or less &#8220;assets&#8221;, having two different languages to deal with is not very nice. What, I&#8217;m supposed to write my shader twice just to support both (for example) D3D and iPad? You would think in 2010, almost a decade since high level realtime shader languages have appeared, this problem would be solved&#8230; but it isn&#8217;t!</p>
<p><span id="more-523"></span>In <a href="http://unity3d.com/unity/coming-soon/unity-3">upcoming Unity 3.0</a>, we&#8217;re going to have OpenGL ES 2.0 for mobile platforms, where GLSL ES is the only option to write shaders in. However, almost all other platforms (Windows, 360, PS3) need HLSL/Cg.</p>
<p>I tried a bit making <a href="http://developer.nvidia.com/object/cg_toolkit.html">Cg</a> spit out GLSL code. In theory it can, and I read somewhere that <a href="http://en.wikipedia.org/wiki/Id_Software">id</a> uses it for OpenGL backend for <a href="http://en.wikipedia.org/wiki/Rage_(video_game)">Rage</a>&#8230; But I just couldn&#8217;t make it work. What&#8217;s possible for <a href="http://en.wikipedia.org/wiki/John_Carmack">John</a> apparently is not possible for mere mortals.</p>
<p>Then I looked at ATI&#8217;s <a href="http://sourceforge.net/projects/hlsl2glsl/">HLSL2GLSL</a>. That did produce GLSL shaders that were not absolutely horrible. So I started using it, and <em>(surprise!)</em> quickly ran into small issues here and there. Too bad development of the library stopped around 2006&#8230; on the plus side, it&#8217;s open source!</p>
<p>So I just forked it. Here it is: <a href="http://code.google.com/p/hlsl2glslfork/"><strong>http://code.google.com/p/hlsl2glslfork/</strong></a> (<a href="http://code.google.com/p/hlsl2glslfork/source/list">commit log here</a>). There are no prebuilt binaries or source drops right now, just a Mercurial repository. BSD license. Patches welcome.</p>
<p><em>Note on the codebase</em>: I don&#8217;t particularly like the codebase. It seems somewhat over-engineered code, that was probably taken from reference GLSL parser that 3DLabs once did, and adapted to parse HLSL and spit out GLSL. There are pieces of code that are unused, unfinished or duplicated. Judging from comments, some pieces of code have been in the hands of 3DLabs, ATI and NVIDIA (what good can come out of <em>that</em>?!). However, it <em>works</em>, and that&#8217;s the most important trait any code can have.</p>
<p><em>Note on the preprocessor</em>: I bumped into some preprocessor issues that couldn&#8217;t be easily fixed without first understanding someone else&#8217;s ancient code and then changing it significantly. Fortunately, Ryan Gordon&#8217;s project, <a href="http://icculus.org/mojoshader/">MojoShader</a>, happens to have preprocessor that very closely emulates HLSL&#8217;s one (including various quirks). So I&#8217;m using that to preprocess any source before passing it down to HLSL2GLSL. Kudos to Ryan!</p>
<p><em>Side note on MojoShader</em>: Ryan is also working on HLSL->GLSL cross compiler in MojoShader. I like that codebase much more; will certainly try it out once it&#8217;s somewhat ready.</p>
<p><em>You can never have enough notes</em>: Google&#8217;s <a href="http://code.google.com/p/angleproject/">ANGLE project</a> (running OpenGL ES 2.0 on top of Direct3D runtime+drivers) seems to be working on the opposite tool. For obvious reasons, they need to take GLSL ES shaders and produce D3D compatible shaders (HLSL or shader assembly/bytecode). The project seems to be moving fast; and if one day we&#8217;ll decide to default to GLSL as shader language in Unity, I&#8217;ll know where to look for a translator into HLSL :)</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/05/21/compiling-hlsl-into-glsl-in-2010/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Direct3D GPU Hacks</title>
		<link>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/</link>
		<comments>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 12:26:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=462</guid>
		<description><![CDATA[I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes. So here are the D3D9 hacks known to me in one place. Let me know if I [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes.</p>
<p>So here are the <a href="http://aras-p.info/texts/D3D9GPUHacks.html"><strong>D3D9 hacks known to me in one place</strong></a>.</p>
<p>Let me know if I missed something or got something wrong. I also want to figure out if Intel GPUs/drivers implement any of them.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Compact Normal Storage for small g-buffers</title>
		<link>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/</link>
		<comments>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 09:39:51 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=377</guid>
		<description><![CDATA[I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture. Here are my findings &#8211; with error visualization and shader performance numbers for some GPUs. If you know any other method to encode/store normals in a compact way, please [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture.</p>
<p><a href="http://aras-p.info/texts/CompactNormalStorage.html"><strong>Here are my findings</strong></a> &#8211; with error visualization and shader performance numbers for some GPUs.</p>
<p>If you know any other method to encode/store normals in a compact way, please let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Implementing fixed function T&amp;L in vertex shaders</title>
		<link>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/</link>
		<comments>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 06:08:50 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=364</guid>
		<description><![CDATA[Almost half a year ago I was wondering how to implement T&#038;L in vertex shaders. Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here. In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar [...]]]></description>
			<content:encoded><![CDATA[<p>Almost half a year ago I was wondering <a href="http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/">how to implement T&#038;L in vertex shaders</a>.</p>
<p>Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a <a href="http://aras-p.info/texts/VertexShaderTnL.html"><strong>technical report here</strong></a>.</p>
<p>In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it&#8217;s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).</p>
<p>What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.</p>
<p>Otherwise &#8211; I like! Here&#8217;s a link to the <a href="http://aras-p.info/texts/VertexShaderTnL.html">article again</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fixed function lighting in vertex shader &#8211; how?</title>
		<link>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/</link>
		<comments>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 20:32:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=261</guid>
		<description><![CDATA[Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell. I don&#8217;t really know why that happens, [...]]]></description>
			<content:encoded><![CDATA[<p>Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to <a href="http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/">various artifacts</a> that annoy people to hell.</p>
<p>I don&#8217;t really know <em>why</em> that happens, because it seems that most modern cards don&#8217;t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista&#8217;s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match&#8230;</p>
<p>How such a task should be approached?</p>
<p>My requirements are:</p>
<ul>
<li>Should handle any possible state combination in D3D fixed function T&#038;L.</li>
<li>D3D 9.0c, using vertex shader 2.0 is ok. For now I don&#8217;t care about OpenGL.</li>
<li>No HLSL at runtime. I don&#8217;t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.</li>
<li>Should work as fast (or close to) as the regular fixed function pipeline.</li>
</ul>
<p>I looked at ATI&#8217;s <a href="http://developer.amd.com/samples/FixedFuncShader/Pages/default.aspx">FixedFuncShader sample</a>. It&#8217;s an <strong>ubershader approach</strong>; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some <a href="http://msdn.microsoft.com/en-us/library/bb147316(VS.85).aspx">rcp</a>&#8216;s and reduce the amount of constant storage by 2x.</p>
<p>Still, it did not handle some things in the D3D T&#038;L or had some issues:</p>
<ul>
<li>It assumes one input UV, one output UV and no texture matrices. This place in T&#038;L gets quite convoluted &#8211; any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.</li>
<li>It was not using full T&#038;L lighting model. No biggie here.</li>
<li>I haven&#8217;t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.</li>
</ul>
<p>Another thing I&#8217;m considering, is to combine final shader(s) from <strong>assembly fragments</strong>, with some simple register allocation.</p>
<p>In T&#038;L shader code, there&#8217;s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.</p>
<p>What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.</p>
<p>Again, I haven&#8217;t checked with shader performance tools, but I <em>think, guess and hope</em> that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that <em>I</em> can be quite sloppy with it, i.e. don&#8217;t have to implement some super smart allocation scheme.</p>
<p>I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).</p>
<p>Does that make sense? Or did everyone solve those problems eons ago already?</p>
<p><strong>Edit</strong>: half a year later, I wrote a technical report on how I implemented all this: <a href="http://aras-p.info/texts/VertexShaderTnL.html">http://aras-p.info/texts/VertexShaderTnL.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Depth bias and the power of deceiving yourself</title>
		<link>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/</link>
		<comments>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 06:52:19 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=176</guid>
		<description><![CDATA[In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords [...]]]></description>
			<content:encoded><![CDATA[<p>In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it&#8217;s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven&#8217;t done that yet because of complexities of Direct3D <em>and</em> OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy.</p>
<p>And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you&#8217;d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer&#8217;s art):<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenenobias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenenobias-300x225.png" alt="Mixing fixed function and vertex shaders" title="scenenobias" width="300" height="225" class="alignnone size-medium wp-image-177" /></a></p>
<p><em>Not pretty at all!</em> This should have looked like this:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenegoodbias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenegoodbias-300x225.png" alt="All good here" title="scenegoodbias" width="300" height="225" class="alignnone size-medium wp-image-178" /></a></p>
<p>So what do we do to make it look like this? We &#8220;pull&#8221; (bias) some rendering passes slighly towards the camera, so there is no depth fighting.</p>
<p>Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all &#8211; they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases.</p>
<p>How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set <a href="http://www.opengl.org/documentation/specs/man_pages/hardcopy/GL/html/gl/polygonoffset.html">glPolygonOffset</a> to something like -1, -1. This works.</p>
<p>How do you apply depth bias in Direct3D 9? <em>Conceptually</em>, you do the same. There are <a href="http://msdn.microsoft.com/en-us/library/bb205599(VS.85).aspx">DEPTHBIAS and SLOPESCALEDEPTHBIAS</a> render states that do just that. And so we did use them.</p>
<p><a href="http://forum.unity3d.com/viewtopic.php?t=8443">And people complained</a> about funky results on Windows.</p>
<p>And I&#8217;d look at their projects, see that they are using something like 0.01 for camera&#8217;s near plane and 1000.0 for the far plane, and tell them something along the lines of <em>&#8220;increase your near plane, stupid!&#8221;</em> (well ok, without the &#8220;stupid&#8221; part). And I&#8217;d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it&#8217;s often not needed but on Direct3D it&#8217;s pretty much always needed. And yes, how sometimes that can produce &#8220;double lighting&#8221; artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry.</p>
<p>Sometimes this helped! I was <em>so convinced</em> that their too-low-near-plane was always the culprit.</p>
<p>And then one day I decided to check. This is what I&#8217;ve got on Direct3D:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbias.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbias-300x225.png" alt="Depth bias artefacts" title="scenebadbias" width="300" height="225" class="alignnone size-medium wp-image-179" /></a></p>
<p>Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I&#8217;ve got:<br />
<a href='http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbiasfail.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/scenebadbiasfail-300x225.png" alt="Epic fail!" title="scenebadbiasfail" width="300" height="225" class="alignnone size-medium wp-image-180" /></a></p>
<p><em>Not good at all.</em></p>
<p>What happened? It happened in roughly this way:</p>
<ol>
<li>First, depth bias <a href="http://msdn.microsoft.com/en-us/library/bb205599(VS.85).aspx">documentation</a> on Direct3D is wrong. Depth bias is <em>not</em> in 0..16 range, it is in 0..1 range which corresponds to entire range of depth buffer.</li>
<li>Back then, our code was always using 16 bit depth buffers, so the equivalent of -1,-1 depth bias in OpenGL was multiplied with something like 1.0/65535.0, and that was fed into Direct3D. <em>Hey, it seemed to work!</em></li>
<li>Later on, the device setup code was modified to do proper format selection, so most often it ended up using 24 bit depth buffer. <em>Of course</em> <del datetime="2008-06-12T06:33:50+00:00">no one</del><ins datetime="2008-06-12T06:50:43+00:00"> I</ins> never modified the depth bias code to account for this change&#8230;</li>
<li>And it stayed there. And I kept deceiving myself that the content of the users is to blame, and not some stupid code of mine.</li>
</ol>
<p><strong>It&#8217;s good to check your assumptions once in a while.</strong></p>
<p>So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is <em>really small</em>, so something like 4.8e-7 should be used instead (see <a href="http://terathon.com/gdc07_lengyel.ppt">Lengyel&#8217;s GDC2007 talk</a>). Oh, but for some reason it&#8217;s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias.</p>
<p>With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon).</p>
<p><em>&#8230;and yes, I know that real men fudge projection matrix instead of using depth bias&#8230; someday maybe.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Holy FPU precision, Batman!</title>
		<link>http://aras-p.info/blog/2008/01/22/holy-fpu-precision-batman/</link>
		<comments>http://aras-p.info/blog/2008/01/22/holy-fpu-precision-batman/#comments</comments>
		<pubDate>Tue, 22 Jan 2008 09:46:38 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2008/01/22/holy-fpu-precision-batman/</guid>
		<description><![CDATA[(cross-posted from blogs.unity3d.com) One of our customers found an interesting bug the other day: embedding Unity Web Player into a web page makes some javascript animation libraries not work correctly. For example, script.aculo.us or Dojo Toolkit would stop doing some of their tasks. But only on Windows, and only on some browsers (Firefox and Safari). [...]]]></description>
			<content:encoded><![CDATA[<p><em>(cross-posted from <a href="http://blogs.unity3d.com/2008/01/22/holy-fpu-precision-batman/">blogs.unity3d.com</a>)</em></p>
<p>One of our customers found an interesting bug the other day: embedding Unity Web Player into a web page makes some javascript animation libraries not work correctly. For example, <a href="http://script.aculo.us/">script.aculo.us</a> or <a href="http://dojotoolkit.org/">Dojo Toolkit</a> would stop doing some of their tasks. But only on Windows, and only on some browsers (Firefox and Safari).</p>
<p>Wait a moment&#8230; Unity plugin makes nice wobbling web page elements not wobble anymore!? Sounds like an <em>interesting</em> issue&#8230;</p>
<p>So I prepared for a debug session and tried the usual &#8220;divide by two until you locate the problem&#8221; approach.</p>
<ul>
<li>Unity Web Player is composed of two parts: a small browser plugin, and the actual &#8220;engine&#8221; (let&#8217;s call it &#8220;runtime&#8221;). First I change the plugin so that it only loads the data, but never loads or starts the runtime. Everything works. So the problem is not in the plugin. <em>Good</em>.</li>
<li>Load the runtime and do basic initialization (create child window, load Mono, &#8230;), but never actually start playing the content &#8211; everything works.</li>
<li>Load the runtime and <em>fully</em> initialize everything, but never actually start playing the content &#8211; the bug appears! By now I know that the problem is <em>somewhere</em> in the initialization.</li>
</ul>
<p>Initialization reads some settings from the data file, creates some &#8220;manager objects&#8221; for the runtime,     initializes graphics device, loads first game &#8220;level&#8221; and then the game can play.</p>
<p>What of the above could cause <em>something</em> inside browser&#8217;s JavaScript engine stop working? And do that only on Windows, and only on some browsers? My first guess was the most platform-specific part: intialization of the graphics device, which on Windows usually happens to be Direct3D.</p>
<p>So I continued:</p>
<ul>
<li>Try using OpenGL instead of Direct3D &#8211; everything works. By now it&#8217;s confirmed that initializing Direct3D causes something else in the browser not work.</li>
<li>&#8220;A-ha!&#8221; moment: tell Direct3D to not change floating point precision (via a <a href="http://msdn2.microsoft.com/en-us/library/bb172527(VS.85).aspx">create flag</a>). Voilà, everything works!</li>
</ul>
<p>I don&#8217;t know how I <em>actually</em> came up with the idea of testing floating point precision flag. Maybe I remembered some related problems we had a while ago, where Direct3D would cause timing calculations be &#8220;off&#8221;, if the user&#8217;s machine was not rebooted for a couple of weeks or more. That time around we properly changed our timing code to use 64 bit integers, but left Direct3D precision setting intact.</p>
<blockquote><p>
Side note: Intel x86 floating point unit (FPU) can operate in various <a href="http://www.stereopsis.com/FPU.html">precision modes</a>, usually 32, 64 or 80 bit. By default Direct3D 9 sets FPU precision to 32 bit (i.e. single precision). Telling D3D to not change FPU settings <em>could</em> lower performance somewhat, but in my tests it did not have any noticeable impact.
</p></blockquote>
<p>So there it was. A debugging session, one line of change in the code, and fancy javascript webpage animations work on Windows in Firefox and Safari. This is coming out in Unity 2.0.2 update soon.</p>
<p>The moral? Something in one place can affect seemingly <em>completely</em> unrelated things in another place!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/01/22/holy-fpu-precision-batman/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is OpenGL really faster than D3D9?</title>
		<link>http://aras-p.info/blog/2007/09/23/is-opengl-really-faster-than-d3d9/</link>
		<comments>http://aras-p.info/blog/2007/09/23/is-opengl-really-faster-than-d3d9/#comments</comments>
		<pubDate>Sat, 22 Sep 2007 23:50:08 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[opengl]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/09/23/is-opengl-really-faster-than-d3d9/</guid>
		<description><![CDATA[The common knowledge is that drawing stuff in OpenGL is much more faster than in D3D9. I wonder &#8211; is this actually true, or just an urban legend? I could very well imagine that setting everything up to draw a single model and then issuing 1000 draw calls for it is faster in OpenGL&#8230; but [...]]]></description>
			<content:encoded><![CDATA[<p>The common knowledge is that drawing stuff in OpenGL is much more faster than in D3D9. I wonder &#8211; is this actually true, or just an urban legend? I could very well imagine that setting everything up to draw a single model and then issuing 1000 draw calls for it is faster in OpenGL&#8230; but come on, that&#8217;s not a very life-like scenario!</p>
<p>At <a href="http://unity3d.com">work</a> we now have a D3D9 and an OpenGL renderers on Windows. The original codebase was very much designed for OpenGL, so I had to jump through a lot of hoops to get it fully working on D3D&#8230; small differences that add up, like: there&#8217;s no object space texgen on D3D, shaders don&#8217;t track built-in state (world, modelview matrices, light positions, &#8230;), textures in GL vs. textures + sampler state in D3D, and so on. Anyway, the codebase was definitely not designed to exploit D3D strengths and OpenGL weaknesses, more likely the other way around.</p>
<p>But wait! I look at our benchmark tests, and D3D9 is consistently faster than OpenGL. Some examples:</p>
<ul>
<li>Real world scene with lots of shadow casting lights (different objects, different shaders, different lights, different shadow types in one scene):
<ul>
<li>Core Duo with Radeon X1600: 23 FPS D3D9, 13 FPS GL.</li>
<li>P4 with GeForce 6800GT: 16 FPS D3D9, 9 FPS GL.</li>
<li>Core2 Duo with Radeon HD 2600: 41 FPS D3D9, 35 FPS GL.</li>
</ul>
</li>
<li>High object count test (1000 objects, multiple lights, 5 passes per object total):
<ul>
<li>Core Duo with Radeon X1600: 18.3 FPS D3D9, 12.5 FPS GL.</li>
<li>P4 with GeForce 6800GT: 13.2 FPS D3D9, 9.4 FPS GL.</li>
<li>Core2 Duo with Radeon HD 2600: 34.8 FPS D3D9, 29.3 FPS GL.</li>
</ul>
</li>
<li>Dynamic geometry (lots of particle systems) test (this is limited by vertex buffer writing speed and CPU calculating the particles, not draw by calls):
<ul>
<li>Core Duo with Radeon X1600: 170 FPS D3D9, 102 FPS GL.</li>
<li>P4 with GeForce 6800GT: 108 FPS D3D9, 74 FPS GL.</li>
<li>Core2 Duo with Radeon HD 2600: 325 FPS D3D9, 242 FPS GL.</li>
</ul>
</li>
<li>&#8230;and so on.</li>
</ul>
<p>To be fair, there are a couple of tests where on some hardware OpenGL has a slight edge. But in 95% of the cases, D3D9 is faster. Not to mention that we have about 10x less broken hardware/driver workarounds for D3D9 than we have for OpenGL&#8230;</p>
<p>What gives? Either our OpenGL code is horribly suboptimal, or <em>&#8220;OpenGL is faster!!!!11oneoneeleven&#8221;</em> is a myth. I have trouble figuring out in which places our code would be horribly suboptimal, I think we follow all advice given by hardware vendors on how to make OpenGL efficient (not that there is much advice out there though&#8230;).</p>
<p>There isn&#8217;t much software that can run the same content on both D3D and OpenGL and is suitable for benchmarking. I tried <a href="http://ogre3d.org">Ogre 3D</a> demos on one machine (GeForce 6800GT card) and guess what? D3D9 is faster in tests that specifically stress draw count (like the instancing demo&#8230; D3D9 is faster both in instanced and non-instanced modes).</p>
<p>Am I crazy?</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/09/23/is-opengl-really-faster-than-d3d9/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Back from Seattle</title>
		<link>http://aras-p.info/blog/2007/03/17/back-from-seattle/</link>
		<comments>http://aras-p.info/blog/2007/03/17/back-from-seattle/#comments</comments>
		<pubDate>Sat, 17 Mar 2007 21:13:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[random]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=104</guid>
		<description><![CDATA[Just got back from MVP Global Summit 2007 in Seattle. Among usual things, like watching Bill&#8216;s keynote, meeting other MVPs, DirectX/XNA guys, getting a grip of some NDA information and such, here are some of the other highlights: Amsterdam airport: Officer: You speak English sir? Me: Yeah. O (takes a look at my passport): Ah, [...]]]></description>
			<content:encoded><![CDATA[<p>Just got back from <a href="http://en.wikipedia.org/wiki/Microsoft_Most_Valuable_Professional">MVP</a> Global Summit 2007 in Seattle. Among usual things, like watching <a href="http://en.wikipedia.org/wiki/Bill_Gates">Bill</a>&#8216;s <a href="http://www.microsoft.com/presspass/exec/billg/speeches/2007/03-13MVPSummit.mspx">keynote</a>, meeting other MVPs, DirectX/XNA guys, getting a grip of some NDA information and such, here are some of the other highlights:</p>
<p><span style="font-weight: bold;">Amsterdam</span> airport:</p>
<blockquote><p>Officer: You speak English sir?<br />
Me: Yeah.<br />
O <span style="font-style: italic;">(takes a look at my passport)</span>: Ah, you speak Russian of course!<br />
M: No, not really.<br />
O: But your language is very similar to Russian, right?<br />
M: Hm&#8230;
</p></blockquote>
<p>Well, here we know who gets the Linguist of the Year award.</p>
<p><span style="font-weight: bold;">Seattle-Tahoma</span> airport, lady at checkin: &#8220;<span style="font-style: italic;">what kind of passport is that?</span>&#8220;. It also takes 5 times to enter my last name properly, from the printed letters in the passport. Each time trying to persuade me that I did change the ticket date of course!</p>
<p><span style="font-weight: bold;">Seattle-Tahoma</span> airport, security: &#8220;<span style="font-style: italic;">sir, you have been selected for additional screening</span>&#8220;. Do they randomly select people for that quite involved process? Why this &#8220;selection&#8221; happens immediately <span style="font-style: italic;">after</span> they take a look at my passport?</p>
<p><span style="font-weight: bold;">Random quotes</span>:</p>
<blockquote><p>Ten minutes walk is a <span style="font-style: italic;">long</span> distance! Ten minutes of walking distance in the States is a very good reason to buy a car. At least SUV; preferably a Hummer.
</p></blockquote>
<blockquote><p>DirectX SDK is the source of all sorts of high frequency goodness.
</p></blockquote>
<blockquote><p>Sony is always good at announcements.
</p></blockquote>
<blockquote><p>No? Rumours on the internet? Shock! Horror!
</p></blockquote>
<p><span style="font-style: italic;"></span></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/03/17/back-from-seattle/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>It&#8217;s always unexpected</title>
		<link>http://aras-p.info/blog/2006/04/20/its-always-unexpected/</link>
		<comments>http://aras-p.info/blog/2006/04/20/its-always-unexpected/#comments</comments>
		<pubDate>Thu, 20 Apr 2006 16:22:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=91</guid>
		<description><![CDATA[I have a MacBook Pro now and slowly am getting used to it. It&#8217;s quite hard, considering that I&#8217;ve never had a laptop before; and actually used any Mac for the first time just a couple of months ago. My daughter thinks the best part about it are the weird image effects in PhotoBooth. I [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify"><img class="alignright" src="http://aras-p.info/img/blog/060420.jpg" />I have a <a href="http://www.apple.com/macbookpro/">MacBook Pro</a> now and slowly am getting used to it. It&#8217;s quite hard, considering that I&#8217;ve never had a laptop before; and actually used any Mac for the first time just a couple of months ago. My daughter thinks the best part about it are the weird image effects in PhotoBooth. I just can&#8217;t disagree.</p>
<p>On the unrelated note, now I am a Microsoft DirectX MVP. Just about the time when I almost stopped using it! I&#8217;d love to, but we&#8217;re making <a href="http://unity3d.com">a product</a> that primarily runs on the Macs&#8230; quite hard to use D3D there. But almost every day I wish I could, and every second day I&#8217;m annoying my coworkers by saying that D3D is <span style="font-style: italic">lightyears</span> ahead of TheOtherAPI!</p>
<p>The MVP award just came out of nowhere. It&#8217;s one of the things that you never expect &#8211; but hey, it feels good anyway. And now I have a MVP laptop case for my MacBook :)</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2006/04/20/its-always-unexpected/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reading DX10 docs&#8230;</title>
		<link>http://aras-p.info/blog/2005/12/16/reading-dx10-docs/</link>
		<comments>http://aras-p.info/blog/2005/12/16/reading-dx10-docs/#comments</comments>
		<pubDate>Fri, 16 Dec 2005 17:58:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=81</guid>
		<description><![CDATA[Reading DirectX10 preview documentation right now (you know, it&#8217;s released with Dec2005 SDK). It is pretty impressive, I must say! Seems like a huge leap forward. Back to reading!]]></description>
			<content:encoded><![CDATA[<p>Reading DirectX10 preview documentation right now (you know, it&#8217;s released with <a href="http://msdn.microsoft.com/directx/sdk/">Dec2005 SDK</a>). It is pretty impressive, I must say! Seems like a huge leap forward. <span style="font-style: italic;">Back to reading!</span></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/12/16/reading-dx10-docs/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>An article on efficient D3DX Effects state management</title>
		<link>http://aras-p.info/blog/2005/10/03/an-article-on-efficient-d3dx-effects-state-management/</link>
		<comments>http://aras-p.info/blog/2005/10/03/an-article-on-efficient-d3dx-effects-state-management/#comments</comments>
		<pubDate>Mon, 03 Oct 2005 09:38:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=71</guid>
		<description><![CDATA[I wrote an article on the subject I was talking about recently &#8211; an auto-magical system that manages device states in the effects. The article and links to implementation are on my homepage here: aras-p.info/texts/d3dx_fx_states.html]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">I wrote an article on the subject I was <a href="http://aras-p.info/blog/2005/09/24/state-management-in-d3dx-effects">talking</a> <a href="http://aras-p.info/blog/2005/09/27/state-management-in-d3dx-effects-2">about</a> recently &#8211; an auto-magical system that manages device states in the effects. The article and links to implementation are on my homepage here: <a href="http://aras-p.info/texts/d3dx_fx_states.html">aras-p.info/texts/d3dx_fx_states.html</a>
</div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/10/03/an-article-on-efficient-d3dx-effects-state-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>State management in D3DX Effects #2</title>
		<link>http://aras-p.info/blog/2005/09/27/state-management-in-d3dx-effects-2/</link>
		<comments>http://aras-p.info/blog/2005/09/27/state-management-in-d3dx-effects-2/#comments</comments>
		<pubDate>Tue, 27 Sep 2005 15:46:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=70</guid>
		<description><![CDATA[I&#8217;ve written down the basic idea here. Done some tests and it really seems to work! That required tiny 700 lines of hacky C++ code in the engine; but in exchange there&#8217;s no longer a need to write state restoring passes by hand. Maybe such effect usage scheme would even be useable in RealWorld! Too [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">I&#8217;ve written down the <a href="http://aras-p.info/blog/2005/09/24/state-management-in-d3dx-effects">basic idea here</a>. Done some tests and it <span style="font-style: italic;">really seems to work</span>!</p>
<p>That required tiny 700 lines of hacky C++ code in the engine; but in exchange there&#8217;s no longer a need to write state restoring passes by hand. Maybe such effect usage scheme would even be useable in RealWorld!</p>
<p>Too bad I didn&#8217;t think it up a couple of months ago. My ShaderX4 article about this subject would have been much better&#8230;</p>
<p><span style="font-style: italic;">Ok, still got to test this stuff on real world data (i.e. trying it on our demos)</span></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/09/27/state-management-in-d3dx-effects-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>State management in D3DX Effects</title>
		<link>http://aras-p.info/blog/2005/09/24/state-management-in-d3dx-effects/</link>
		<comments>http://aras-p.info/blog/2005/09/24/state-management-in-d3dx-effects/#comments</comments>
		<pubDate>Sat, 24 Sep 2005 15:45:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=68</guid>
		<description><![CDATA[In my projects I&#8217;ve been using D3DX Effects with no device state saving/restoring. Instead, each effect contained a dummy &#8220;last pass&#8221; that restores &#8220;needed&#8221; state (see here; more lengthy article coming in ShaderX4). I always wrote this &#8220;state restore&#8221; by hand. This is obviously very error-prone; it&#8217;s ok if I&#8217;m the only one writing effects [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">In my projects I&#8217;ve been using D3DX Effects with no device state saving/restoring. Instead, each effect contained a dummy &#8220;last pass&#8221; that restores &#8220;needed&#8221; state (see <a href="http://dingus.berlios.de/index.php?n=Main.D3DXEffects">here</a>; more lengthy article coming in <a href="http://www.shaderx4.com/TOC.html">ShaderX4</a>).</p>
<p>I always wrote this &#8220;state restore&#8221; by hand. This is obviously <span style="font-style: italic;">very </span>error-prone; it&#8217;s ok if I&#8217;m the only one writing effects but would be unusable in any real world scenario.</p>
<p>I think I could automatically generate the &#8220;state restore&#8221; pass. Somehow the engine knows which states need to be restored; which must be set in every effect etc. (this could be read from some file). It first loads each effect file and examines what states it touches. This can be done by supplying a custom ID3DXEffectStateManager and &#8220;executing&#8221; the effect &#8211; the state manager then would remember all states (left-hand sides of state assignments) touched by the effect.</p>
<p>Then the engine generates the &#8220;state restore&#8221; pass and loads the effect again. I&#8217;d image it would do it like this: each effect has to contain a macro <span style="font-style: italic;">RESTORE_PASS</span>:</p>
<blockquote><pre>technique Foo {
 pass P1 { ... }
 pass P2 { ... }
 RESTORE_PASS
}</pre>
</blockquote>
<p> Which would be empty during first load and which would expand to the generated restore pass on the second load (you can supply generated macro definitions when loading the effect). The engine can check whether the generated pass exists after second load (if it doesn&#8217;t then RESTORE_PASS is missing from the effect &#8211; an error).</p>
<p>The downside of this scheme is that each effect file has to be loaded twice &#8211; first time for examining its state assignments and second time for actually loading it with the generated restore pass. It&#8217;s not a problem for me, I guess, because effect loading doesn&#8217;t take much time anyway&#8230; And if it would become really slow, all this stuff can be done as a preprocess (e.g. during a build).</p>
<p>There are many upsides of this scheme, I think: the whole system is robust and error-proof again (no longer depends on the effect writer to remember all the details about states). And as far as I can see, no performance would be lost at all (performance was the main point why I&#8217;m using this &#8220;restore pass&#8221;).</p>
<p>Gotta go and implement all this!
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/09/24/state-management-in-d3dx-effects/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
