<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lost in the Triangles &#187; rendering</title>
	<atom:link href="http://aras-p.info/blog/tags/rendering/feed/" rel="self" type="application/rss+xml" />
	<link>http://aras-p.info/blog</link>
	<description>Random thoughts of a triangle pusher</description>
	<lastBuildDate>Fri, 09 Sep 2011 17:03:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Testing Graphics Code, 4 years later</title>
		<link>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/</link>
		<comments>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/#comments</comments>
		<pubDate>Fri, 17 Jun 2011 04:44:46 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=762</guid>
		<description><![CDATA[Almost four years ago I wrote how we test rendering code at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 people to more than 100 people? I&#8217;m happy to say it did! That&#8217;s it, move on to read the rest of the internets. The earlier [...]]]></description>
			<content:encoded><![CDATA[<p>Almost four years ago <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">I wrote how we test rendering code</a> at Unity. Did it stand the test of time and more importantly, growing the company from less than 10 people to more than 100 people?</p>
<p><em>I&#8217;m happy to say it did! That&#8217;s it, move on to read the rest of the internets.<br />
</em></p>
<p>The earlier post was more focused on hardware compatibility area (differences between platforms, GPUs, driver versions, driver bugs and their workarounds etc.). In addition to that, we do regression tests on a bunch of <a href="http://blogs.unity3d.com/2010/01/12/on-web-player-regression-testing/">actual Unity made games</a>. All that is good and works, let&#8217;s talk about what tests the rendering team at Unity is using in the daily lives instead.</p>
<p><strong>Graphics Feature &#038; Regression Testing</strong></p>
<p>In daily life of a graphics programmer, you care about two things related to testing:</p>
<p><span id="more-762"></span><strong>1.</strong> Whether a new feature you are adding, more or less, works.<br />
<strong>2.</strong> Whether something new you added or something you refactored broke or changed any existing features.</p>
<p>Now, &#8220;works&#8221; is a vague term. Definitions can range from equally vague</p>
<blockquote><p>Works For Me!</p></blockquote>
<p>to something like </p>
<blockquote><p>It has been battle tested on thousands of use cases, hundreds of shipped games, dozens of platforms, thousands of platform configurations and within each and every one of them there&#8217;s not a single wrong pixel, not a single wasted memory byte and not a single wasted nanosecond! <em>No kittehs were harmed either!</em></p></blockquote>
<p>In ideal world we&#8217;d only consider the latter as &#8220;works&#8221;, however that&#8217;s quite hard to achieve.</p>
<p>So instead we settle for small &#8220;functional tests&#8221;, where each feature has a small scene setup that exercises said feature (very much like talked about in <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">previous post</a>). It&#8217;s graphics programmer&#8217;s responsibility to add tests like that for his stuff.</p>
<p>For example, Fog handling might be tested by a couple scenes like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/092-FogModes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/092-FogModes.png" alt="" title="Fog Modes" width="400" height="300" class="alignnone size-full wp-image-770" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/017-Fog.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/017-Fog.png" alt="" title="Fog vs. different shaders; Forward rendering above, Deferred Lighting below" width="400" height="300" class="alignnone size-full wp-image-771" /></a></p>
<p>Another example, tests for various corner cases of Deferred Lighting:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/118-DeferredLMCases.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/118-DeferredLMCases.png" alt="" title="Lighmapped/NonLightmapped objects vs. Baked/NonBaked lights" width="400" height="300" class="alignnone size-full wp-image-774" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/134-DefLightShapes.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/134-DefLightShapes.png" alt="" title="Light volumes crossing near/far planes" width="400" height="300" class="alignnone size-full wp-image-775" /></a><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/143-DefLargeCoords.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/143-DefLargeCoords.png" alt="" title="Ability to handle small near plane &amp; large world coordinates" width="400" height="300" class="alignnone size-full wp-image-776" /></a></p>
<p>So that&#8217;s basic testing for &#8220;it works&#8221; that the graphics programmers themselves do. Beyond that, features are tested by QA and a large beta testing group, tried, profiled and optimized on real actual game projects and so on.</p>
<p>The good thing is, doing these basic tests also provides you with point 2 (did I break or change something?) automatically. If after your changes, all the graphics tests still pass, there&#8217;s a pretty good chance you did not break anything. Of course this testing is not exhaustive, but any time a regression is spotted by QA, beta testers or reported by users, you can add a new graphics test to check for that situation.</p>
<p><strong>How do we actually do it?</strong></p>
<p>We use <a href="http://www.jetbrains.com/teamcity/">TeamCity</a> for the build/test farm. It has several build machines set up as graphics test agents (unlike most other build machines, they need an actual GPU, or a iOS device connected to them, or a console devkit etc.) that run graphics test configurations for all branches automatically. Each branch has it&#8217;s graphics tests run daily, and branches with &#8220;high graphics code activity&#8221; (i.e. branches that the rendering team is actually working on) have them run more often. You can always initiate the tests manually by clicking a button of course. What you want to see at any time is this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/teamcity-gfx-tests.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/teamcity-gfx-tests.png" alt="" title="The graphics tests are passing one by one!" width="445" height="362" class="alignnone size-full wp-image-778" /></a></p>
<p>The basic approach is the same as <a href="http://aras-p.info/blog/2007/07/31/testing-graphics-code/">4 years ago</a>: a &#8220;game level&#8221; (&#8220;scene&#8221; in Unity speak) for each test, runs for defined number of frames, run everything at fixed timestep, take a screenshot at end of each frame. Compare each screenshot with &#8220;known good&#8221; image for that platform; any differences equals &#8220;FAIL&#8221;. On many platforms you have to allow a couple of wrong pixels because many consumer GPUs are not <i>fully</i> deterministic it seems.</p>
<p>So you have this bunch of &#8220;this is the golden truth&#8221; images for all the tests:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/some-gfx-tests.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/some-gfx-tests-500x247.png" alt="" title="Images for some of the graphics tests" width="500" height="247" class="alignnone size-medium wp-image-781" /></a></p>
<p>And each platform automatically tested on TeamCity has it&#8217;s own set:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/06/gfx-test-platforms.png"><img src="http://aras-p.info/blog/wp-content/uploads/2011/06/gfx-test-platforms.png" alt="" title="Platforms of graphics tests" width="187" height="181" class="alignnone size-full wp-image-782" /></a></p>
<p>Since the &#8220;test controller&#8221; can run on a different device than actual tests (the case for iOS, Xbox 360 etc.), the test executable opens a socket connection to transfer the screenshots. The test controller is a relatively simple C# application that listens on a socket, fetches the screenshots and compares them with the template ones. The result of it is output that TeamCity can understand; along with &#8220;build artifacts&#8221; that consist of failed tests (for each failed test: expected image, failed image, difference image with increased contrast).</p>
<p>That&#8217;s pretty much it! And of course, automated tests are nice and all, but that should not get too much into the way of actual <a href="http://programming-motherfucker.com/">programming manifesto</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>A way to visualize mip levels</title>
		<link>http://aras-p.info/blog/2011/05/03/a-way-to-visualize-mip-levels/</link>
		<comments>http://aras-p.info/blog/2011/05/03/a-way-to-visualize-mip-levels/#comments</comments>
		<pubDate>Tue, 03 May 2011 16:41:59 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=710</guid>
		<description><![CDATA[Recently a discussion on Twitter about folks using 2048 textures on a pair of dice spawned this post. How do artists know if the textures are too high or too low resolution? Here&#8217;s what we do in Unity, which may or may not work elsewhere. When you have a game scene that, for example, looks [...]]]></description>
			<content:encoded><![CDATA[<p>Recently a <a href="http://twitter.com/#!/aras_p/status/63538509952200705">discussion</a> on Twitter about folks using 2048 textures on a pair of dice spawned this post. How do artists know if the textures are too high or too low resolution? Here&#8217;s what we do in Unity, which may or may not work elsewhere.</p>
<p>When you have a game scene that, for example, looks like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/05/BootcampNormal.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/05/BootcampNormal-500x283.jpg" alt="" title="Normal game scene view" width="500" height="283" class="alignnone size-medium wp-image-714" /></a><br />
We provide a &#8220;mipmaps&#8221; visualization mode that renders it like this:</p>
<p><span id="more-710"></span><a href="http://aras-p.info/blog/wp-content/uploads/2011/05/BootcampMips.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/05/BootcampMips-500x283.jpg" alt="" title="Mipmap view of the game scene" width="500" height="283" class="alignnone size-medium wp-image-713" /></a></p>
<p>Original texture colors mean it&#8217;s a perfect match (1:1 texels to pixels ratio); more red = too much texture detail; more blue = too little texture detail.</p>
<p><em>That&#8217;s it, end of story, move along!</em></p>
<p>Now of course it&#8217;s not that simple. You can just go and resize all textures that were used on the red stuff. The player might walk over to those red objects, and <em>then</em> they would need more detail!</p>
<p>Also, the amount of texture detail needed very much depends on the screen resolution the game will be running at:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/05/PlatformerSizes.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/05/PlatformerSizes-500x190.jpg" alt="" title="Different resolutions need different detail" width="500" height="190" class="alignnone size-medium wp-image-722" /></a></p>
<p>Still, even with varying resolution sizes and the fact that the same objects in 3D can be near &#038; far from the viewer, this view can answer the question of &#8220;does something have a too high/too low texture detail?&#8221;, mostly by looking at colorization mismatch between nearby objects.</p>
<p>In the picture above, the railings have too little texture detail (blue), while the lamp posts have too much (red). The little extruded things on the floating pads have too much detail as well.</p>
<p>The image below reveals that floor and ceiling have mismatching texture densities: floor has too little, while ceiling has too much. Probably should be the other way around, in a platform you&#8217;d more often be looking at the floor.<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/05/FloorCeiling1.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/05/FloorCeiling1-500x318.jpg" alt="" title="Floor vs Ceiling" width="500" height="318" class="alignnone size-medium wp-image-726" /></a></p>
<p><strong>How to do this?</strong></p>
<p>In the mipmap view shader, we display the original texture mixed with a special &#8220;colored mip levels&#8221; texture. The regular texture is sampled with original UVs, while the color coded texture is sampled with more dense ones, to allow visualization of &#8220;too little texture detail&#8221;. In shader code <em>(HLSL, shader model 2.0 compatible)</em>:</p>
<blockquote><pre>struct v2f {
    float4 pos : SV_POSITION;
    float2 uv : TEXCOORD0;
    float2 mipuv : TEXCOORD1;
};
<b>float2 mainTextureSize</b>;
v2f vert (float4 vertex : POSITION, float2 uv : TEXCOORD0)
{
    v2f o;
    o.pos = mul (matrix_mvp, vertex);
    o.uv = uv;
    o.mipuv = <b>uv * mainTextureSize / 8.0</b>;
    return o;
}
half4 frag (v2f i) : COLOR0
{
    half4 col = tex2D (mainTexture, i.uv);
    half4 mip = tex2D (mipColorsTexture, i.mipuv);
    half4 res;
    res.rgb = lerp (col.rgb, mip.rgb, mip.a);
    res.a = col.a;
    return res;
}
</pre>
</blockquote>
<p>The <tt>mainTextureSize</tt> above is the pixel size of the main texture, for example (256,256). Division by eight might seem a bit weird, but it really isn&#8217;t!</p>
<p>To show the colored mip levels, we need to create <tt>mipColorsTexture</tt> that has different colors in each mip level.</p>
<p>Let&#8217;s say we would create a 32&#215;32 size texture for this, and the largest mip level would be used to display &#8220;ideal texel to pixel density&#8221;. If the original texture was 256 pixels in size and we want to sample a 32 pixels texture at exactly the same texel density as the original one, we have to use more dense UVs: <tt>newUV = uv * 256 / 32</tt> or in a more generic way, <tt>newUV = uv * textureSize / mipTextureSize</tt>.</p>
<p>Why there&#8217;s <tt>8.0</tt> in the shader then, if we create the mip texture at 32&#215;32 size? That&#8217;s because we don&#8217;t want the largest mip level to indicate &#8220;ideal texel to pixel&#8221; density. We also want a way to visualize &#8220;not enough texel density&#8221;. So we push the ideal mip level two levels down, which means it&#8217;s four times UV difference. That&#8217;s how 32 becomes 8 in the shader.</p>
<p>The actual colors we use for this 32&#215;32 mipmaps visualization texture are, in RGBA: (0.0,0.0,1.0,0.8); (0.0,0.5,1.0,0.4); (1.0,1.0,1.0,0.0); (1.0,0.7,0.0,0.2); (1.0,0.3,0.0,0.6); (1.0,0.0,0.0,0.8). Alpha channel controls how much to interpolate between the original color and the tinted color. Our 3rd mip level has zero alpha so it displays unmodified color.</p>
<p><em>Now, step 2 is somehow forcing artists to actually use this ;)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/05/03/a-way-to-visualize-mip-levels/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Mobile graphics API wishlist: some features</title>
		<link>http://aras-p.info/blog/2011/03/19/mobile-graphics-api-wishlist-some-features/</link>
		<comments>http://aras-p.info/blog/2011/03/19/mobile-graphics-api-wishlist-some-features/#comments</comments>
		<pubDate>Sat, 19 Mar 2011 13:50:15 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[mobile]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=653</guid>
		<description><![CDATA[In my previous post I talked about things I&#8217;d want from OpenGL ES 2.0 in the performance area. Now it&#8217;s time to look at what extra features it might expose with an extension here or there. Note that I’m focusing on, in my limited understanding, low-hanging fruits. The features I want already exist in the [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://aras-p.info/blog/2011/03/04/mobile-graphics-api-wishlist-performance/">previous post</a> I talked about things I&#8217;d want from OpenGL ES 2.0 in the performance area. Now it&#8217;s time to look at what extra features it might expose with an extension here or there.</p>
<p><span id="more-653"></span><em>Note that I’m focusing on, in my limited understanding, low-hanging fruits. The features I want already exist in the current GPUs or platforms; or could be easily made available. Of course more radical new architectures would bring more &#038; fancier features, but that&#8217;s a topic for another story.</em></p>
<p><strong>Programmable blending</strong></p>
<p>At least two out of three big current mobile GPU families (PVR SGX, Adreno, Tegra 2) support programmable blending in the hardware. Maybe all of them do this and I just don&#8217;t have enough data. By &#8220;support it in the hardware&#8221; I mean either: 1) the GPU has no blending hardware, the drivers add &#8220;read current pixel &#038; blend&#8221; instructions to the shaders or 2) has blending hardware for commonly used modes, but fancier modes use shader patching with no severe performance penalties.</p>
<p>Programmable blending is useful for various things; from deferred-style decals (blending normals is hard in fixed function!) to fancier Photoshop-like blend modes to potentially faster single-pixel image postprocessing effects (like color correction).</p>
<p>Currently only NVIDIA exposes this capability via <a href="http://developer.download.nvidia.com/tegra/docs/tegra_gles2_development.pdf">NV_shader_framebuffer_fetch</a> extension.</p>
<p><em>Suggestion</em>: expose it on other hardware that can do this! It&#8217;s fine to not handle hard edge cases (for example, what happens when multisampling is used?), we can live with the limitations.</p>
<p><strong>Direct, fast access to frame buffer on the CPU</strong></p>
<p>Most (all?) mobile platforms use unified memory approach, where there&#8217;s no physical distinction between &#8220;system memory&#8221; and &#8220;video memory&#8221;. Some of those platforms are slightly unbalanced, e.g. a strong GPU coupled with a weak CPU or vice versa. More and more of those systems will have multicore CPUs. It might make sense to do similar approaches that PS3 guys are doing these days &#8211; offload some of the GPU work to the CPU(s).</p>
<p>Image processing, deferred lighting and similar things could be done more efficiently on a general purpose CPU, where you aren&#8217;t limited to &#8220;one pixel at a time&#8221; model of current mobile GPUs.</p>
<p><em>Suggestion</em>: can haz get a pointer to framebuffer memory perhaps? Of course this is grossly oversimplifying all the synchronization &#038; security issues, but <em>something</em> should be possible to do in order to exploit the unified memory model. Right now it just sits there largely unused, with GLES2.0 still pretending CPU is talking to a GPU over a ten meter high concrete wall.</p>
<p><strong>Expose Tile Based GPU capabilities</strong></p>
<p>PowerVR GPUs found in all iOS and some Android devices are so called &#8220;tile based&#8221; architectures. So is, to some extent, Qualcomm Adreno family.</p>
<p>Currently this capability is mostly sitting behind a black box. On PowerVR GPUs the programmer does know that &#8220;overdraw of opaque objects does not matter&#8221;, or that &#8220;alpha testing is really slow&#8221; but that&#8217;s about it. There&#8217;s no control over the whole rendering process, even if some of the things could benefit from having more control over the whole tiling thing.</p>
<p>Take, for example, deferred lighting/shading. The cool folks are doing it tile-based already on <a href="http://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3?from=ss_embed">DirectX 11</a> or <a href="http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3?from=ss_embed">PS3</a>.</p>
<p>On a tile-based GPU, all rendering is <em>already</em> happening in tiles, so what if we could say &#8220;now, you work on this tile, render this, render that; now we go this this tile&#8221;? Maybe that way we could achieve two things at once: 1) better light culling because it&#8217;s at tile level, and 2) most of the data could stay on this super-fast on-chip memory, without having to be written into system memory &#038; later read again. Memory bandwidth is very often a limiting factor in mobile graphics performance, and ability to keep deferred lighting buffers on-chip through the whole process could cut down bandwidth requirements a lot.</p>
<p><em>Suggestion</em>: somehow <em>(I&#8217;m feeling very hand-wavy today)</em> expose more control over tiled rendering. For example, explicitly say that rendering will only happen to the given tiles; and these textures are very likely to be read just after they are rendered into &#8211; so don&#8217;t resolve them to memory if they fit into on-chip one.</p>
<p>There&#8217;s already a Qualcomm extension of something towards that area &#8211; <a href="http://www.khronos.org/registry/gles/extensions/QCOM/QCOM_tiled_rendering.txt">QCOM_tiled_rendering</a> &#8211; though it seems to be more concerned about where does rendering happen. More control is needed on how to mark FBO textures as &#8220;keep in on-chip memory for sampling as a texture plz&#8221;.</p>
<p><strong>OpenCL</strong></p>
<p>Current mobile GPUs already are, or very soon will be, OpenCL capable. Also OpenCL can be implemented on the CPU, nicely SIMDified via NEON, and use multicore. <em>DO WANT!</em> (and while you&#8217;re at it, everything that&#8217;s doable to make interop between CL &#038; GL faster)</p>
<p>This can be used for a ton of things; skinning, culling, particles, procedural animations, image postprocessing and so on. And with a much less restrictive programming model, it&#8217;s easier to reuse computation results across draw calls or frames.</p>
<p>Couple this with &#8220;direct access to memory on the CPU&#8221; and OpenCL could be used for more things than graphics (again I&#8217;m grossly oversimplifying here and ignoring the whole synchronization/latency/security elephant&#8230;).</p>
<p><strong>MOAR?</strong></p>
<p>Now of course there are more things I&#8217;d want to see, but for today I&#8217;ll take just those above, thank you. Have a nice day!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/03/19/mobile-graphics-api-wishlist-some-features/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Mobile graphics API wishlist: performance</title>
		<link>http://aras-p.info/blog/2011/03/04/mobile-graphics-api-wishlist-performance/</link>
		<comments>http://aras-p.info/blog/2011/03/04/mobile-graphics-api-wishlist-performance/#comments</comments>
		<pubDate>Fri, 04 Mar 2011 06:24:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[mobile]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=645</guid>
		<description><![CDATA[Most mobile platforms currently are based on OpenGL ES 2.0. While it is much better than traditional OpenGL, there are ways where it limits performance or does not expose some interesting hardware features. So here&#8217;s an unorganized wishlist for GLES2.0 performance part! Note that I&#8217;m focusing on, in my limited understanding, short term low-hanging fruits [...]]]></description>
			<content:encoded><![CDATA[<p>Most mobile platforms currently are based on OpenGL ES 2.0. While it is <em>much</em> better than traditional OpenGL, there are ways where it limits performance or does not expose some interesting hardware features. So here&#8217;s an unorganized wishlist for GLES2.0 performance part!</p>
<p><span id="more-645"></span><em>Note that I&#8217;m focusing on, in my limited understanding, short term low-hanging fruits how to extend/patch existing GLES2.0 API. A pipe dream would be starting from scratch, getting rid of all OpenGL baggage and hopefully come up with a much cleaner, leaner &#038; better API, especially if it&#8217;s designed to only support some particular platform. But I digress, back to GLES2.0 for now.</em></p>
<p><strong>No guarantees when something expensive might happen.</strong></p>
<p>Due to some flexibility in GLES2.0, there might be expensive things happening at almost any point in your frame. For example, binding a texture with a different format might cause a driver to recompile a shader at the draw call time. I&#8217;ve seen <a href="http://twitter.com/#!/aras_p/status/34628257294852096">60 milliseconds</a> on iPhone 3Gs at first draw call with a relatively simple shader, all spent inside shader compiler backend. <em>60 milliseconds!</em> There are various things that can cause performance hiccups like this: texture formats, blending modes, vertex layout, non power of two textures and so on.</p>
<p><em>Suggestion</em>: work with GPU vendors and agree on an API that could make guarantees on when the expensive resource creation / patching work can happen, and when it can&#8217;t. For example, <em>somehow</em> guarantee that a draw call or a state set will not cause any object recreation / shader patching in the driver. I don&#8217;t have much experience with D3D10/11, but my impression is that this was one of the things it got right, no?</p>
<p><strong>Offline shader compilation.</strong></p>
<p>GLES2.0 has the functionality to load binary shaders, but it&#8217;s not mandatory. Some of the big platforms (iOS, I&#8217;m looking at you) just don&#8217;t support it.</p>
<p>Now of course, a single platform (like iOS or Android) can have multiple different GPUs, so you can&#8217;t fully compile a shader offline into final optimized GPU microcode. But <em>some</em> of the full compilation cost could very well be done offline, without being specific to any particular GPU.</p>
<p><em>Suggestion</em>: come up with a platform independent binary shader format. Something like D3D9 shader assembly is probably too low level (it assumes a vector4-based GPU, limited number of registers and so on), but something higher level should be possible. All of the shader lexing, parsing and common optimizations (constant folding, arithmetic simplifications, dead code removal etc.) can be done offline. It won&#8217;t speed up shader loading by an order of magnitude, but even if it&#8217;s possible to cut it by 20%, it&#8217;s worth it. And it would remove a very big bug surface area too!</p>
<p><strong>Texture loading.</strong></p>
<p>A lot (all?) of mobile platforms have unified CPU &#038; GPU memories, however to actually load the texture we have to read or memory map it from disk and then copy into OpenGL via glTexture2D and similar functions. Then, depending on the format, the driver would internally do swizzling and alignment of texture data.</p>
<p><em>Suggestion</em>: can&#8217;t most of this cost be removed? If for some formats it&#8217;s perfectly, statically known what layout and swizzling the GPU expects&#8230; can&#8217;t we just point the API to the data we already loaded or memory mapped? We could still need to implement the glTexture2D case for when (if ever) a totally new strange GPU comes that needs the data in a different order, but why not provide a faster path for the current GPUs?</p>
<p><strong>Vertex declarations.</strong></p>
<p>In unextended GLES2.0 you have to do <em>a ton</em> of calls just to setup vertex data. <a href="http://www.khronos.org/registry/gles/extensions/OES/OES_vertex_array_object.txt">OES_vertex_array_object</a> is a step in the right direction, providing the ability to create sets of vertex data bindings (&#8220;vertex declarations&#8221; in D3D speak). However, it builds upon an existing API, resulting in something that feels quite messy. Somehow it feels that by starting from scratch it could result in something much cleaner. Like&#8230; vertex declarations that existed in D3D since forever maybe?</p>
<p><em>Suggestion</em>: clean up that shit! It would probably need to be tied to a vertex shader input signature (just like in D3D10/11) to guarantee there would be no shader patching, but we&#8217;d be fine with that.</p>
<p><strong>Shader uniforms are per shader program.</strong></p>
<p>What it says &#8211; shader uniforms (&#8220;constants&#8221; in D3D speak) are not global; they are tied to a specific shader program. I don&#8217;t quite understand why, and I don&#8217;t think any GPU works that way. This is causing complexities and/or performance loss in the driver (it either has to save &#038; restore all uniform values on each shader change, or have dirty tracking on which uniforms have changed etc.). It also causes unneeded uniform sets on the client side &#8211; instead of having, for example, view*projection matrix set just once per frame it has to be set for each shader program that we use.</p>
<p><em>Suggestion</em>: just get rid of that? If you need to not break the existing spec, how about adding an extension to make all uniforms global? I propose <code>glCanHaz(GL_OES_GLOBAL_UNIFORMS_PLZ)</code></p>
<p><strong>Next up:</strong></p>
<p>Next time, I&#8217;ll take a look at my unorganized wishlist for mobile graphics features!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/03/04/mobile-graphics-api-wishlist-performance/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>iOS shader tricks, or it&#8217;s 2001 all over again</title>
		<link>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/</link>
		<comments>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 07:43:57 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=592</guid>
		<description><![CDATA[I was recently optimizing some OpenGL ES 2.0 shaders for iOS/Android, and it was funny to see how performance tricks that were cool in 2001 are having their revenge again. Here&#8217;s a small example of starting with a normalmapped Blinn-Phong shader and optimizing it to run several times faster. Most of the clever stuff below [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently optimizing some OpenGL ES 2.0 shaders for iOS/Android, and it was funny to see how performance tricks that were cool in 2001 are having their revenge again. Here&#8217;s a small example of starting with a normalmapped Blinn-Phong shader and optimizing it to run several times faster. Most of the clever stuff below was actually done by <a href="http://twitter.com/#!/__ReJ__">ReJ</a>, props to him!</p>
<p>Here&#8217;s a small test I&#8217;ll be working on: just a single plane with albedo and normal map textures:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump1.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump1-150x150.jpg" alt="" title="iOS Bumped Specular" width="150" height="150" class="alignnone size-thumbnail wp-image-593" /></a></p>
<p><span id="more-592"></span>I&#8217;ll be testing on iPhone 3Gs with iOS 4.2.1. Timer is started before glClear() and stopped after glFinish() that I added just after drawing the mesh.</p>
<p>Let&#8217;s start with an initial na&iuml;ve shader version:<br />
<script src="https://gist.github.com/783784.js"> </script></p>
<p>Should be pretty self-explanatory to anyone who&#8217;s familiar with tangent space normal mapping and Blinn-Phong BRDF. Running time: <strong>24.5 milliseconds</strong>. On iPhone 4&#8242;s Retina resolution, this would be about 4x slower!</p>
<p>What can we do next? On mobile platforms using appropriate precision of variables is often very important, especially in a fragment shader. So let&#8217;s go and add highp/mediump/lowp qualifiers to the fragment shader: <a href="https://gist.github.com/783703/05e78340b12739e853ce031bd0388430ea95f2a6">shader source</a></p>
<p>Still the same running time! Alas, iOS does not have low level shader analysis tools, so we can&#8217;t really tell why that is happening. We could be limited by something else (e.g. normalizing vectors and computing pow() being the bottlenecks that run in parallel with all low precision stuff), or the driver might be promoting most of our computations to higher precision because it feels like it. It&#8217;s a magic box!</p>
<p>Let&#8217;s start approximating instead. How about computing normalized view direction per vertex, and interpolating that for the fragment shader? It won&#8217;t be entirely &#8220;correct&#8221;, but hey, it&#8217;s a phone we&#8217;re talking about. <a href="https://gist.github.com/783703/1e4fd0daa384d308d125a748985e8e203e49625a">shader source</a></p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump3.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump3-150x150.jpg" alt="" title="iOS Bumped Specular, wrong precision!" width="150" height="150" class="alignright size-thumbnail wp-image-594" /></a><br />
15 milliseconds! But&#8230; the rendering is wrong; everything turned white near the bottom of the screen. Turns out PowerVR SGX (the GPU in all current iOS devices) is really meaning &#8220;low precision&#8221; when we want to add two lowp vectors and normalize the result. Let&#8217;s try promoting one of them to medium precision with a &#8220;varying mediump vec3 v_viewdir&#8221;: <a href="https://gist.github.com/783703/591eb83dacaae3840cc4e4d3d8b95a4fc3abdd65">shader source</a></p>
<p>That fixed rendering, but we&#8217;re back to 24.5 milliseconds. <em>Sad shader writers are sad&#8230; oh shader performance analysis tools, where art thou?</em></p>
<p>Let&#8217;s try approximating some more: compute half-vector in the vertex shader, and interpolate normalized value. This would get rid of all normalizations in the fragment shader. <a href="https://gist.github.com/783703/6360c2912b860aa30415e5120ef147169274cd71">shader source</a></p>
<p><strong>16.3</strong> milliseconds, not too bad! We still have pow() computed in the fragment shader, and that one is probably not the fastest operation there&#8230;</p>
<p>Almost a decade ago, a very common trick was to use a lookup texture to do the lighting. For example, a 2D texture indexed by (N.L, N.H). Since all lighting data would be &#8220;baked&#8221; into the texture, it does not necessarily have to be Blinn-Phong even; we can prepare faux-anisotropic, metallic, toon-shading or other fancy BRDFs there, as long as they can be expressed in terms of N.L and N.H. So let&#8217;s try creating 128&#215;128 RGBA lookup texture and use that: <a href="https://gist.github.com/783703/87f1cf5529d644cab16123550e809e9f7598f4f3">shader source</a></p>
<p>A fast &amp; not super efficient code to create the lighting lookup texture for Blinn-Phong:<br />
<script src="https://gist.github.com/783759.js"> </script></p>
<p><strong>9.1</strong> milliseconds! We lost some precision in the specular though (it&#8217;s dimmer):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump6.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2011/02/iosbump6-150x150.jpg" alt="" title="iOS Bumped Specular via texture LUT" width="150" height="150" class="alignnone size-thumbnail wp-image-595" /></a></p>
<p>What else can be done? Notice that we clamp N.L and N.H values in the fragment shader, but this could be done just as well by the texture sampler, if we set texture&#8217;s addressing mode to CLAMP_TO_EDGE. Let&#8217;s get rid of the clamps: <a href="https://gist.github.com/783703/e24a2475fded83d2196372c8092a0d8de80a98eb">shader source</a></p>
<p>This is 8.3 milliseconds, or <strong>7.6</strong> milliseconds if we reduce our lighting texture resolution to 32&#215;128.</p>
<p>Should we stop there? Not necessarily. For example, the shader is still multiplying albedo with a per-material color. Maybe that&#8217;s not very useful and can be let go. Maybe we can also make specular be always white?<br />
<script src="https://gist.github.com/783703.js"> </script></p>
<p>How fast is this? <strong>5.9 milliseconds</strong>,&nbsp;or over <strong>4 times</strong> faster than our original shader.</p>
<p>Could it be made faster? Maybe; that&#8217;s an exercise for the reader :) I tried computing just the RGB color channels and setting alpha to zero, but that got slightly slower. Without real shader analysis tools it&#8217;s hard to see where or if additional cycles could be squeezed out.</p>
<p>I&#8217;m adding <a href='http://aras-p.info/blog/wp-content/uploads/2011/02/iOSShaderPerf.zip'>Xcode project with sources, textures and shaders of this experiment</a>. Notes about it: only tested on iPhone 3Gs (probably will crash on iPhone 3G, and iPad will have wrong aspect ratio). Might not work at all! Shader is read from Resources/Shaders/shader.txt, next to it are shader versions of the steps of this experiment. Enjoy!</p>
<p><em>This is a cross post from altdevblogaday: <a href="http://altdevblogaday.com/ios-shader-tricks-or-its-2001-all-over-again">http://altdevblogaday.com/ios-shader-tricks-or-its-2001-all-over-again</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Surface Shaders, one year later</title>
		<link>http://aras-p.info/blog/2010/07/16/surface-shaders-one-year-later/</link>
		<comments>http://aras-p.info/blog/2010/07/16/surface-shaders-one-year-later/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 06:38:43 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=530</guid>
		<description><![CDATA[Over a year ago I had a thought that &#8220;Shaders must die&#8221; (part 1, part 2, part 3). And what do you know &#8211; turns out we&#8217;re trying to pull this off in upcoming Unity 3. We call this Surface Shaders cause I&#8217;ve a suspicion &#8220;shaders must die&#8221; as a feature name wouldn&#8217;t have flied [...]]]></description>
			<content:encoded><![CDATA[<p>Over a year ago I had a thought that &#8220;Shaders must die&#8221; (<a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">part 1</a>, <a href="http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/">part 2</a>, <a href="http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/">part 3</a>).</p>
<p>And what do you know &#8211; turns out we&#8217;re trying to pull this off in upcoming <a href="http://unity3d.com/unity/coming-soon/unity-3">Unity 3</a>. We call this <strong>Surface Shaders</strong> cause I&#8217;ve a suspicion &#8220;shaders must die&#8221; as a feature name wouldn&#8217;t have flied very far.</p>
<p><span id="more-530"></span></p>
<p><strong>Idea</strong></p>
<p>The main idea is that 90% of the time I just want to declare surface properties. This is what I want to say:</p>
<blockquote><p>Hey, albedo comes from this texture mixed with this texture, and normal comes from this normal map. Use Blinn-Phong lighting model please, and don&#8217;t bother me again!</p></blockquote>
<p>With the above, I don&#8217;t have to care whether this will be used in a forward or deferred rendering, or how various light types will be handled, or how many lights per pass will be done in a forward renderer, or how some indirect illumination SH probes will come in, etc. I&#8217;m not interested in all that! These dirty bits are job of rendering programmers, <em>just make it work dammit</em>!</p>
<p>This is not a new idea. Most graphical shader editors <em>that make sense</em> do not have &#8220;pixel color&#8221; as the final output node; instead they have some node that basically describes surface parameters (diffuse, specularity, normal, &#8230;), and all the lighting code is usually not expressed in the shader graph itself. <a href="http://code.google.com/p/openshadinglanguage/">OpenShadingLanguage</a> is a similar idea as well (but because it&#8217;s targeted at offline rendering for movies, it&#8217;s much richer &#038; more complex).</p>
<p><strong>Example</strong></p>
<p>Here&#8217;s a simple &#8211; but full &#038; complete &#8211; Unity 3.0 shader that does diffuse lighting with a texture &#038; a normal map.<br />
<code>
<pre>
  <span style="color:gray">Shader "Example/Diffuse Bump" {
    Properties {
      _MainTex ("Texture", 2D) = "white" {}
      _BumpMap ("Bumpmap", 2D) = "bump" {}
    }
    SubShader {
      Tags { "RenderType" = "Opaque" }
      CGPROGRAM</span>
      #pragma surface surf Lambert
      struct Input {
        float2 uv_MainTex;
        float2 uv_BumpMap;
      };
      sampler2D _MainTex;
      sampler2D _BumpMap;
      void surf (Input IN, inout SurfaceOutput o) {
        o.Albedo = tex2D (_MainTex, IN.uv_MainTex).rgb;
        o.Normal = UnpackNormal (tex2D (_BumpMap, IN.uv_BumpMap));
      }
      <span style="color:gray">ENDCG
    }
    Fallback "Diffuse"
  }</span></pre>
<p></code><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderDiffuseBump.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderDiffuseBump-150x150.png" alt="" title="SurfaceShaderDiffuseBump" width="150" height="150" class="alignright size-thumbnail wp-image-543" /></a>Given pretty model &#038; textures, it can produce pretty pictures! How cool is that?</p>
<p>I grayed out bits that are not really interesting (declaration of serialized shader properties &#038; their UI names, shader fallback for older machines etc.). What&#8217;s left is Cg/HLSL code, which is then augmented by tons of auto-generated code that deals with lighting &#038; whatnot.</p>
<p>This surface shader dissected into pieces:</p>
<ul>
<li><code>#pragma surface surf Lambert</code>: this is a surface shader with main function &#8220;surf&#8221;, and a Lambert lighting model. Lambert is one of predefined lighting models, but you can write your own.</li>
<li><code>struct Input</code>: input data for the surface shader. This can have various predefined inputs that will be computed per-vertex &#038; passed into your surface function per-pixel. In this case, it&#8217;s two texture coordinates.</li>
<li><code>surf</code> function: actual surface shader code. It takes Input, and writes into <code>SurfaceOutput</code> (a predefined structure). It is possible to write into custom structures, provided you use lighting models that operate on those structures. The actual code just writes Albedo and Normal to the output.</li>
</ul>
<p><strong>What is generated</strong></p>
<p>Unity&#8217;s &#8220;surface shader code generator&#8221; would take this, generate <em>actual</em> vertex &#038; pixel shaders, and compile them to various target platforms. With default settings in Unity 3.0, it would make this shader support:</p>
<ul>
<li>Forward renderer and Deferred Lighting (Light Pre-Pass) renderer.</li>
<li>Objects with precomputed lightmaps and without.</li>
<li>Directional, Point and Spot lights; with projected light cookies or without; with shadowmaps or without. Well ok, this is only for forward renderer because in Light Pre-Pass lighting happens elsewhere.</li>
<li>For Forward renderer, it would compile in support for lights computed per-vertex and spherical harmonics lights computed per-object. It would also generate extra additive blended pass if needed for the case when additional per-pixel lights have to be rendered in separate passes.</li>
<li>For Light Pre-Pass renderer, it would generate base pass that outputs normals &#038; specular power; and a final pass that combines albedo with lighting, adds in any lightmaps or emissive lighting etc.</li>
<li>It can optionally generate a shadow caster rendering pass (needed if custom vertex position modifiers are used for vertex shader based animation; or some complex alpha-test effects are done).</li>
</ul>
<p>For example, here&#8217;s code that would be compiled for a forward-rendered base pass with one directional light, 4 per-vertex point lights, 3rd order SH lights; optional lightmaps <em>(I suggest just scrolling down)</em>: </p>
<pre style="font-size: 75%;">
#pragma vertex vert_surf
#pragma fragment frag_surf
#pragma fragmentoption ARB_fog_exp2
#pragma fragmentoption ARB_precision_hint_fastest
#pragma multi_compile_fwdbase
#include "HLSLSupport.cginc"
#include "UnityCG.cginc"
#include "Lighting.cginc"
#include "AutoLight.cginc"
struct Input {
	float2 uv_MainTex : TEXCOORD0;
};
sampler2D _MainTex;
sampler2D _BumpMap;
void surf (Input IN, inout SurfaceOutput o)
{
	o.Albedo = tex2D (_MainTex, IN.uv_MainTex).rgb;
	o.Normal = UnpackNormal (tex2D (_BumpMap, IN.uv_MainTex));
}
struct v2f_surf {
  V2F_POS_FOG;
  float2 hip_pack0 : TEXCOORD0;
  #ifndef LIGHTMAP_OFF
  float2 hip_lmap : TEXCOORD1;
  #else
  float3 lightDir : TEXCOORD1;
  float3 vlight : TEXCOORD2;
  #endif
  LIGHTING_COORDS(3,4)
};
#ifndef LIGHTMAP_OFF
float4 unity_LightmapST;
#endif
float4 _MainTex_ST;
v2f_surf vert_surf (appdata_full v) {
  v2f_surf o;
  PositionFog( v.vertex, o.pos, o.fog );
  o.hip_pack0.xy = TRANSFORM_TEX(v.texcoord, _MainTex);
  #ifndef LIGHTMAP_OFF
  o.hip_lmap.xy = v.texcoord1.xy * unity_LightmapST.xy + unity_LightmapST.zw;
  #endif
  float3 worldN = mul((float3x3)_Object2World, SCALED_NORMAL);
  TANGENT_SPACE_ROTATION;
  #ifdef LIGHTMAP_OFF
  o.lightDir = mul (rotation, ObjSpaceLightDir(v.vertex));
  #endif
  #ifdef LIGHTMAP_OFF
  float3 shlight = ShadeSH9 (float4(worldN,1.0));
  o.vlight = shlight;
  #ifdef VERTEXLIGHT_ON
  float3 worldPos = mul(_Object2World, v.vertex).xyz;
  o.vlight += Shade4PointLights (
    unity_4LightPosX0, unity_4LightPosY0, unity_4LightPosZ0,
    unity_LightColor0, unity_LightColor1, unity_LightColor2, unity_LightColor3,
    unity_4LightAtten0, worldPos, worldN );
  #endif // VERTEXLIGHT_ON
  #endif // LIGHTMAP_OFF
  TRANSFER_VERTEX_TO_FRAGMENT(o);
  return o;
}
#ifndef LIGHTMAP_OFF
sampler2D unity_Lightmap;
#endif
half4 frag_surf (v2f_surf IN) : COLOR {
  Input surfIN;
  surfIN.uv_MainTex = IN.hip_pack0.xy;
  SurfaceOutput o;
  o.Albedo = 0.0;
  o.Emission = 0.0;
  o.Specular = 0.0;
  o.Alpha = 0.0;
  o.Gloss = 0.0;
  surf (surfIN, o);
  half atten = LIGHT_ATTENUATION(IN);
  half4 c;
  #ifdef LIGHTMAP_OFF
  c = LightingLambert (o, IN.lightDir, atten);
  c.rgb += o.Albedo * IN.vlight;
  #else // LIGHTMAP_OFF
  half3 lmFull = DecodeLightmap (tex2D(unity_Lightmap, IN.hip_lmap.xy));
  #ifdef SHADOWS_SCREEN
  c.rgb = o.Albedo * min(lmFull, atten*2);
  #else
  c.rgb = o.Albedo * lmFull;
  #endif
  c.a = o.Alpha;
  #endif // LIGHTMAP_OFF
  return c;
}
</pre>
<p>Of those 90 lines of code, 10 are your original surface shader code; the remaining 80 would have to be pretty much written by hand in Unity 2.x days (well ok, less code would have to be written because 2.x had less rendering features). <em>But wait</em>, that was only base pass of the forward renderer! It also generates code for additive pass, for deferred base pass, deferred final pass, optionally for shadow caster pass and so on.</p>
<p>So this should be an easier to write lit shaders (it is for me at least). I hope this will also increase the number of Unity users who can write shaders at least 3 times <em>(i.e. to 30 up from 10!)</em>. It <em>should</em> be more future proof to accomodate changes to the lighting pipeline we&#8217;ll do in Unity next.</p>
<p><strong>Predefined Input values</strong></p>
<p>The Input structure can contain texture coordinates and some predefined values, for example view direction, world space position, world space reflection vector and so on. Code to compute them is only generated if they are <em>actually</em> used. For example, if you use world space reflection to do some cubemap reflections (as emissive term) in your surface shader, then in Light Pre-Pass base pass the reflection vector will <em>not be computed</em> (since it does not output emission, so by extension does not need reflection vector).</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderRim.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderRim-150x150.png" alt="" title="SurfaceShaderRim" width="150" height="150" class="alignright size-thumbnail wp-image-545" /></a>As a small example, the shader above extended to do simple rim lighting:<br />
<code>
<pre>
  <span style="color:gray">#pragma surface surf Lambert
  struct Input {
      float2 uv_MainTex;
      float2 uv_BumpMap;</span>
      float3 viewDir;
  <span style="color:gray">};
  sampler2D _MainTex;
  sampler2D _BumpMap;</span>
  float4 _RimColor;
  float _RimPower;
  <span style="color:gray">void surf (Input IN, inout SurfaceOutput o) {
      o.Albedo = tex2D (_MainTex, IN.uv_MainTex).rgb;
      o.Normal = UnpackNormal (tex2D (_BumpMap, IN.uv_BumpMap));</span>
      half rim =
          1.0 - saturate(dot (normalize(IN.viewDir), o.Normal));
      o.Emission = _RimColor.rgb * pow (rim, _RimPower);
  <span style="color:gray">}</span>
</pre>
<p></code></p>
<p><strong>Vertex shader modifiers</strong></p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderNormalExtrusion.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfaceShaderNormalExtrusion-150x150.png" alt="" title="SurfaceShaderNormalExtrusion" width="150" height="150" class="alignright size-thumbnail wp-image-551" /></a>It is possible to specify custom &#8220;vertex modifier&#8221; function that will be called at start of the generated vertex shader, to modify (or generate) per-vertex data. You know, vertex shader based tree wind animation, grass billboard extrusion and so on. It can also fill in any non-predefined values in the Input structure.</p>
<p>My favorite vertex modifier? Moving vertices along their normals.</p>
<p><strong>Custom Lighting Models</strong></p>
<p>There are a couple simple lighting models built-in, but it&#8217;s possible to specify your own. A lighting model is nothing more than a function that will be called with the filled SurfaceOutput structure and per-light parameters (direction, attenuation and so on). Different functions would have to be called in forward &#038; light pre-pass rendering cases; and naturally the light pre-pass one has much less flexibility. So for any fancy effects, it is possible to say &#8220;do not compile this shader for light pre-pass&#8221;, in which case it will be rendered via forward rendering.</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfWrapLambert.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/07/SurfWrapLambert-150x150.png" alt="" title="SurfWrapLambert" width="150" height="150" class="alignright size-thumbnail wp-image-549" /></a>Example of wrapped-Lambert lighting model:<br />
<code>
<pre>
  #pragma surface surf WrapLambert
  half4 LightingWrapLambert (SurfaceOutput s, half3 dir, half atten) {
      dir = normalize(dir);
      half NdotL = dot (s.Normal, dir);
      half diff = NdotL * 0.5 + 0.5;
      half4 c;
      c.rgb = s.Albedo * _LightColor0.rgb * (diff * atten * 2);
      c.a = s.Alpha;
      return c;
  }
  <span style="color:gray">struct Input {
      float2 uv_MainTex;
  };
  sampler2D _MainTex;
  void surf (Input IN, inout SurfaceOutput o) {
      o.Albedo = tex2D (_MainTex, IN.uv_MainTex).rgb;
  }</span></pre>
<p></code></p>
<p><strong>Behind the scenes</strong></p>
<p>I&#8217;m using HLSL parser from Ryan Gordon&#8217;s <a href="http://hg.icculus.org/icculus/mojoshader/">mojoshader</a> to parse the original surface shader code and infer some things from the AST mojoshader produces. This way I can figure out what members are in what structures, go over function prototypes and so on. At this stage some error checking is done to tell the user his surface function is of wrong prototype, or his structures are missing required members &#8211; which is much better than failing with dozens of compile errors in the generated code later.</p>
<p>To figure out which surface shader inputs are <em>actually</em> used in the various lighting passes, I&#8217;m generating small dummy pixel shaders, compile them with Cg and use Cg&#8217;s API to query used inputs &#038; outputs. This way I can figure out, for example, that a normal map nor it&#8217;s texture coordinate is not actually used in Light Pre-Pass&#8217; final pass, and save some vertex shader instructions &#038; a texcoord interpolator.</p>
<p>The code that is ultimately generated is compiled with various shader compilers depending on the target platform (Cg for PC/Mac, XDK HLSL for Xbox 360, PS3 Cg for PS3, and my own <a href="https://github.com/aras-p/hlsl2glslfork">fork of HLSL2GLSL</a> for iPhone, Android and upcoming <a href="http://blogs.unity3d.com/2010/05/19/google-android-and-the-future-of-games-on-the-web/">NativeClient port of Unity</a>).</p>
<p>So yeah, that&#8217;s it. We&#8217;ll see where this goes next, or what happens when Unity 3 will be released.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/07/16/surface-shaders-one-year-later/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Screenspace vs. mip-mapping</title>
		<link>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/</link>
		<comments>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 14:27:55 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=485</guid>
		<description><![CDATA[Just spent half a day debugging this, so here it is for the future reference of the internets. In a deferred rendering setup (see Game Angst for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is [...]]]></description>
			<content:encoded><![CDATA[<p><em>Just spent half a day debugging this, so here it is for the future reference of the internets.</em></p>
<p>In a deferred rendering setup (see <a href="http://gameangst.com/?p=141">Game Angst</a> for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is computed &#8220;in screen space&#8221;.</p>
<p>Because each light is applied to a portion of the screen, the pixels it computes can belong to different objects. If in any place of lighting computation you use textures with <a href="http://en.wikipedia.org/wiki/Mipmap">mipmaps</a>, <em>be careful</em>. Most common use for mipmapped light textures is light &#8220;cookies&#8221; (aka <a href="http://en.wikipedia.org/wiki/Gobo_(lighting)">Gobo</a>).</p>
<p>Let&#8217;s say we have a very simple scene with a spot light: <span id="more-485"></span><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png" alt="" title="Deferred Cookie (Good)" width="610" height="458" class="alignnone size-full wp-image-486" /></a></p>
<p>Light&#8217;s angular attenuation comes from a texture like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="cookie128" width="128" height="128" class="alignnone size-full wp-image-489" /></a></p>
<p>If the texture has mipmaps and you sample it using the &#8220;obvious&#8221; way (e.g. tex2Dproj), you can get something like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png" alt="" title="Deferred Cookie (Bad!)" width="610" height="458" class="alignnone size-full wp-image-491" /></a></p>
<p><em>Black stuff around the sphere is no good!</em> It&#8217;s not the infamous half-texel offset in D3D9, not a driver bug, not a shader compiler bug and not the nature trying to prevent you from writing a deferred renderer.</p>
<p>It&#8217;s the mipmapping.</p>
<p>Mipmaps of your cookie texture look like this (128&#215;128, 16&#215;16, 8&#215;8, 4&#215;4 shown):<br />
<img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="128x128" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie16.png" alt="" title="16x16" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie8.png" alt="" title="8x8" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie4.png" alt="" title="4x4" width="128" height="128" /></p>
<p>Now, take two adjacent pixels, where one belongs to the edge of the sphere, and the other belongs to the background object (technically you take a 2&#215;2 block of pixels, but just two are enough to illustrate the point). When the light is applied, cookie texture coordinates for those pixels are computed. It can happen that the coordinates are <em>very</em> different, especially when pixels &#8220;belong&#8221; to entirely different surfaces that are quite far away from each other.</p>
<p>What the GPU does when texture coordinates of adjacent pixels are very different? Chooses a lower mipmap level so that texel to pixel density roughly matches 1:1. On the edges of this &#8220;wrong&#8221; screenshot, it happens that very small mipmap level is sampled, which is either black or white color (see 4&#215;4 mip level).</p>
<p>What to do here? You could disable mip-mapping (which is not good for performance and not good for image quality). You could drop some smallest mip levels which might be enough and not that bad for performance. Another option is to manually supply LOD level or derivatives to sampling instructions, using <em>something else</em> than cookie texture coordinates. For example, derivative in view space position, or something like that. This might not be possible on lower shader models though.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Direct3D GPU Hacks</title>
		<link>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/</link>
		<comments>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 12:26:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=462</guid>
		<description><![CDATA[I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes. So here are the D3D9 hacks known to me in one place. Let me know if I [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes.</p>
<p>So here are the <a href="http://aras-p.info/texts/D3D9GPUHacks.html"><strong>D3D9 hacks known to me in one place</strong></a>.</p>
<p>Let me know if I missed something or got something wrong. I also want to figure out if Intel GPUs/drivers implement any of them.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Deferred Cascaded Shadow Maps</title>
		<link>http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/</link>
		<comments>http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/#comments</comments>
		<pubDate>Wed, 04 Nov 2009 14:42:08 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=434</guid>
		<description><![CDATA[Reading &#8220;Rendering Technology at Black Rock Studios&#8221; made me realize that cascaded shadow maps I did 2+ years ago in Unity 2.0 are probably called &#8220;deferred shadowing&#8221;. Since I never wrote how they are done&#8230; here: The process is roughly this (all of this is DX9 level tech on PCs; later tech or consoles could [...]]]></description>
			<content:encoded><![CDATA[<p>Reading &#8220;<a href="http://www.bungie.net/News/content.aspx?type=topnews&#038;link=Siggraph_09">Rendering Technology at Black Rock Studios</a>&#8221; made me realize that cascaded shadow maps I did 2+ years ago in Unity 2.0 are <em>probably</em> called &#8220;deferred shadowing&#8221;. Since I never wrote how they are done&#8230; here:</p>
<p>The process is roughly this (all of this is DX9 level tech on PCs; later tech or consoles could and should use more optimizations):</p>
<ol>
<li>Render shadow map cascades. All of them packed into one shadow map via viewports.</li>
<li>Collect shadows into screen sized render target. This is the shadow term.</li>
<li>Blur the shadow term.</li>
<li>In regular forward rendering, use shadow term in screen space.</li>
</ol>
<p>More detail:</p>
<p><strong>Render Shadow Cascades</strong></p>
<p>Nothing fancy here. All cascades packed into a single shadow map. For example two 512&#215;512 cascades would be packed into 1024&#215;512 shadow map side by side.</p>
<p><strong>Screen-space Shadow Term</strong></p>
<p>Render all shadow receivers with a shader that &#8220;collects&#8221; shadow map term. In effect, shadows from all cascades are collected into a screen-sized texture. After this step, original cascaded shadowmaps are not needed anymore.</p>
<p>Unity supports up to 4 shadow map cascades, which neatly fit into a float4 register in the pixel shader. Correct cascade is sampled just once, <em>without</em> using static or dynamic branching. Pixel shader pseudocode:</p>
<blockquote><pre>
float4 near = float4 (z >= _LightSplitsNear);
float4 far = float4 (z < _LightSplitsFar);
float4 weights = near * far;
float2 coord =
    i._ShadowCoord[0] * weights.x +
    i._ShadowCoord[1] * weights.y +
    i._ShadowCoord[2] * weights.z +
    i._ShadowCoord[3] * weights.w;
float sm = tex2D (_ShadowMapTexture, coord.xy).r;
</pre>
</blockquote>
<p>Additionally, shadow fadeout is applied here (shadows in Unity can be cast up to specified distance from the camera, and they fade out when approaching that distance).</p>
<p>After this I end up having shadow term in screen space. Note that here I do not do any shadow map filtering; that is done in screen space later.</p>
<p>On PCs in DX9 there is (or there was?) no easy/sane way to read depth buffer in the pixel shader, so while collecting shadows the shader also outputs depth packed into two channels of the render target.</p>
<p><strong>Screen-space Shadow Blur</strong></p>
<p>Previous step results in screen space shadow term and depth. Shadow term is blurred into another render target, using a spatially varying Poisson disc-like filter.</p>
<p>Filter size depends on depth (shadow boundaries closer to the camera are blurred more). Filter also discards samples if difference in depth is larger <em>than something</em>, to avoid blurring over object boundaries. It's not totally robust, but seems to work quite well.</p>
<p><strong>Using shadow term in forward rendering</strong></p>
<p>In forward rendering, this blurred shadow term texture is used. Here shadow term already has filtering &#038; fadeout applied, and the shaders do not need to know anything about shadow cascades. Just read pixel from the texture and use it in lighting computation. Done!</p>
<p><strong>Fin</strong></p>
<p>Back then I didn't know this would be called "deferred" <em>(that would probably have scared me away!)</em>. I don't know if this approach is any good, but so far it works quite well for Unity needs. Also, reduces shader permutation count a lot, which I like.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Strided blur and other tips for SSAO</title>
		<link>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/</link>
		<comments>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 07:59:01 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=409</guid>
		<description><![CDATA[If you&#8217;re new to SSAO, here are good overview blog posts: meshula.net and levelofdetail. Some tips and an idea on strided blur below. Bits and pieces I found useful SSAO can be generated at a smaller resolution than screen, with depth+normals aware upsample/blur step. If random offset vector points away from surface normal, flip it. [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re new to SSAO, here are good overview blog posts: <a href="http://meshula.net/wordpress/?p=145">meshula.net</a> and <a href="http://levelofdetail.wordpress.com/2008/02/10/2007-the-year-ssao-broke/">levelofdetail</a>. Some tips and an idea on strided blur below.</p>
<p><span id="more-409"></span><strong>Bits and pieces I found useful</strong></p>
<ul>
<li>SSAO can be generated at a smaller resolution than screen, with depth+normals aware upsample/blur step.</li>
<li>If random offset vector points away from surface normal, flip it. This makes random vectors be in the upper hemisphere, which reduces false occlusion on flat surfaces. Of course this requires having surface normals.</li>
<li>When generating random vectors for your AO kernel:
<ul>
<li>Generate vectors <i>inside</i> unit sphere (not <i>on</i> unit sphere).</li>
<li>Use energy minimization to distribute your samples better, especially at low sample counts. See <a href="http://www.malmer.nu/index.php/2008-04-11_energy-minimization-is-your-friend">malmer.ru</a> blog post.</li>
</ul>
</li>
<li>In your AO blurring/upsampling step: no need to sample each pixel for blur. Just skip some of them, i.e. make kernel offsets larger. See below.</li>
</ul>
<p><strong>Strided blur for AO</strong></p>
<p>Normally you&#8217;d blur AO term using some sort of standard blur, for example separable Gaussian: horizontal blur, followed by vertical blur. How one can imagine horizontal blur kernel:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur1.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur1.png" alt="Horizontal Blur Kernel" title="Horizontal Blur Kernel" width="291" height="51" class="alignnone size-full wp-image-420" /></a></p>
<p>Here&#8217;s how <a href="http://runevision.com/">Rune</a> taught me how to blur better:</p>
<blockquote>
<dl>
<dt>Rune:</dt>
<dd>The other thing is the blur. I tried to make the blur 4 times stronger, and it looks much better IMO without any artifacts I could see. I could even use 4x downsampling with that blur amount and still get acceptable results.</dd>
<dt>Aras:</dt>
<dd>how did you make it 4x stronger? <i>(I was going to say that blur step is already quite expensive, and I don&#8217;t want to add more samples to make it even more expensive, yadda yadda)</i></dd>
<dt>Rune:</dt>
<dd>m_SSAOMaterial.SetVector (&#8220;_TexelOffsetScale&#8221;, m_IsOpenGL ?<br />
	&nbsp;&nbsp;new Vector4 (<b>4</b>,0,1.0f/m_Downsampling,0) :<br />
	&nbsp;&nbsp;new Vector4 (<b>4.0f</b>/source.width,0,0,0));<br />
	And similar for vertical.</dd>
<dt>Aras:</dt>
<dd>hmm. that&#8217;s strange :)</dd>
<dt>Rune:</dt>
<dd>I have no idea what I&#8217;m doing of course but it looks good.</dd>
<dt>Aras:</dt>
<dd>so this way it does not do Gaussian on 9&#215;9 pixels, but instead only takes each 4th pixel. Wider area, but&#8230; it should not work! :)</dd>
<dt>Rune:</dt>
<dd>It creates a very fine pattern at pixel level but it&#8217;s way more subtle than the noise you get otherwise.</dd>
<dt>Aras:</dt>
<dd>ok <i>(hides in the corner and weeps)</i></dd>
</dl>
</blockquote>
<p>So yeah. The blur kernel can be &#8220;spread&#8221; to skip some pixels, effectively resulting in a larger blur radius for the same sample count:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur2.png" alt="Blur with 2 pixel stride" title="Blur with 2 pixel stride" width="291" height="51" class="alignnone size-full wp-image-421" /></a></p>
<p>Or even this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur3.png" alt="Blur with 3 pixel stride" title="Blur with 3 pixel stride" width="291" height="51" class="alignnone size-full wp-image-422" /></a></p>
<p>Yes, it&#8217;s not correct blur. <strong>But that&#8217;s okay</strong>, we&#8217;re not building nuclear reactors that depend on SSAO blur being accurate. <em>If you are, SSAO is probably a wrong approach anyway, I&#8217;ve heard it&#8217;s not that useful for nuclear stuff</em>.</p>
<p>I&#8217;m not sure how this blur should be called. Strided blur? Interleaved blur? Interlaced blur? Or maybe everyone is doing that already and it has a well established name? Let me know.</p>
<p>Some images of blur in action. Raw AO term (very low &#8211; 8 &#8211; sample count and increased contrast on purpose):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO1raw.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO1raw-500x270.png" alt="Raw AO at low sample count" title="Raw AO at low sample count" width="500" height="270" class="alignnone size-medium wp-image-412" /></a></p>
<p>Regular 9&#215;9 blur (does not blur over depth+normals discontinuities):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO2blur.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO2blur-500x270.png" alt="Blurred AO" title="Blurred AO" width="500" height="270" class="alignnone size-medium wp-image-413" /></a></p>
<p>Blur that goes in 2 pixel stride (effectively 17&#215;17):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2-500x271.png" alt="Blurred AO with stride 2" title="Blurred AO with stride 2" width="500" height="271" class="alignnone size-medium wp-image-414" /></a><br />
It does create a fine interleaved pattern because it skips pixels. But you get wider blur!<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2mag.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2mag.png" alt="Blurred AO with stride 2, magnified" title="Blurred AO with stride 2, magnified" width="256" height="244" class="alignnone size-full wp-image-415" /></a></p>
<p>Blur that goes in 3 pixel stride (effectively 25&#215;25):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3-500x269.png" alt="Blurred AO with stride 3" title="Blurred AO with stride 3" width="500" height="269" class="alignnone size-medium wp-image-416" /></a><br />
At 3 pixel stride the artifacts are becoming apparent. But hey, this is very<br />
low AO sample count, increased contrast and no textures in the scene.<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3mag.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3mag.png" alt="Blured AO with stride 3, magnified" title="Blured AO with stride 3, magnified" width="256" height="244" class="alignnone size-full wp-image-417" /></a></p>
<p>For sake of completeness, the same raw AO term, but computed at 2&#215;2 smaller resolution (still using low sample count etc.):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO5down2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO5down2-500x270.png" alt="AO computed at lower resolution" title="AO computed at lower resolution" width="500" height="270" class="alignnone size-medium wp-image-418" /></a></p>
<p>Now, 2&#215;2 smaller AO, blurred with 3 pixels stride:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO6down2blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO6down2blur3-499x272.png" alt="AO at lower resolution, blurred with 3 pixel stride" title="AO at lower resolution, blurred with 3 pixel stride" width="499" height="272" class="alignnone size-medium wp-image-419" /></a></p>
<p>Happy blurring!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Compact Normal Storage for small g-buffers</title>
		<link>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/</link>
		<comments>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 09:39:51 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=377</guid>
		<description><![CDATA[I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture. Here are my findings &#8211; with error visualization and shader performance numbers for some GPUs. If you know any other method to encode/store normals in a compact way, please [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture.</p>
<p><a href="http://aras-p.info/texts/CompactNormalStorage.html"><strong>Here are my findings</strong></a> &#8211; with error visualization and shader performance numbers for some GPUs.</p>
<p>If you know any other method to encode/store normals in a compact way, please let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Implementing fixed function T&amp;L in vertex shaders</title>
		<link>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/</link>
		<comments>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 06:08:50 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=364</guid>
		<description><![CDATA[Almost half a year ago I was wondering how to implement T&#038;L in vertex shaders. Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here. In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar [...]]]></description>
			<content:encoded><![CDATA[<p>Almost half a year ago I was wondering <a href="http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/">how to implement T&#038;L in vertex shaders</a>.</p>
<p>Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a <a href="http://aras-p.info/texts/VertexShaderTnL.html"><strong>technical report here</strong></a>.</p>
<p>In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it&#8217;s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).</p>
<p>What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.</p>
<p>Otherwise &#8211; I like! Here&#8217;s a link to the <a href="http://aras-p.info/texts/VertexShaderTnL.html">article again</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Shaders must die, part 3</title>
		<link>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/</link>
		<comments>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/#comments</comments>
		<pubDate>Sun, 10 May 2009 15:24:17 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=350</guid>
		<description><![CDATA[Continuing the series (see Part 1, Part 2)&#8230; Got different lighting models (BRDFs) working. Without further ado, code snippets that produce real actual working shaders that work with lights &#038; shadows and whatnot: Simple Lambert (single color): Properties Color _Color EndProperties Surface o.Albedo = _Color; EndSurface Lighting Lambert Let&#8217;s add a texture: Properties 2D _MainTex [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing the series (see <a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">Part 1</a>, <a href="http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/">Part 2</a>)&#8230;</p>
<p>Got different lighting models (BRDFs) working. Without further ado, code snippets that produce real actual working shaders that work with lights &#038; shadows and whatnot:</p>
<p><span id="more-350"></span>Simple Lambert (single color):</p>
<blockquote><pre>Properties
    Color _Color
EndProperties
Surface
    o.Albedo = _Color;
EndSurface
Lighting Lambert
</pre>
</blockquote>
<p>Let&#8217;s add a texture:</p>
<blockquote><pre>Properties
    2D _MainTex
    Color _Color
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex) * _Color;
EndSurface
Lighting Lambert</pre>
</blockquote>
<p>Change light model to Half-Lambert (a.k.a. wrapped diffuse):</p>
<blockquote><pre>// ...everything the same
Lighting HalfLambert</pre>
</blockquote>
<p>Blinn-Phong, with constant exponent &#038; constant specular color, modulated by gloss map in main texture&#8217;s alpha:</p>
<blockquote><pre>Properties
    2D _MainTex
    Color _Color
    Color _SpecColor
    Float _Exponent
EndProperties
Surface
    half4 col = SAMPLE(_MainTex);
    o.Albedo = col * _Color;
    o.Specular = _SpecColor.rgb * col.a;
    o.Exponent = _Exponent;
EndSurface
Lighting BlinnPhong</pre>
</blockquote>
<p>The same Blinn-Phong, with added normal map:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _BumpMap
    Color _Color
    Color _SpecColor
    Float _Exponent
EndProperties
Surface
    half4 col = SAMPLE(_MainTex);
    o.Albedo = col * _Color;
    o.Specular = _SpecColor.rgb * col.a;
    o.Exponent = _Exponent;
    o.Normal = SAMPLE_NORMAL(_BumpMap);
EndSurface
Lighting BlinnPhong</pre>
</blockquote>
<p>I also made an illustrative-style BRDF (see <a href="http://www.valvesoftware.com/publications.html">Illustrative Rendering in Team Fortress 2</a>), but that only requires above sample to have &#8220;Lighting TF2&#8243; at the end.</p>
<p>Another thing I tried is surface that has Albedo dependent on a viewing angle, similar to <a href="http://developer.amd.com/media/gpu_assets/ShaderX2_LayeredCarPaintShader.pdf">Layered Car Paint Shader</a>. It works:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _BumpMap
    2D _SparkleTex
    Float _Sparkle
    Color _PrimaryColor
    Color _HighlightColor
EndProperties
Surface
    half4 main = SAMPLE(_MainTex);
    half3 normal  = SAMPLE_NORMAL(_BumpMap);
    half3 normalN = normalize(SAMPLE_NORMAL(_SparkleTex));
    half3 ns = normalize (normal + normalN * _Sparkle);
    half3 nss = normalize (normal + normalN);
    i.viewDir = normalize(i.viewDir);
    half nsv = max(0,dot(ns, i.viewDir));
    half3 c0 = _PrimaryColor.rgb;
    half3 c2 = _HighlightColor.rgb;
    half3 c1 = c2 * 0.5;
    half3 cs = c2 * 0.4;
    half3 tone =
        c0 * nsv +
        c1 * (nsv*nsv) +
        c2 * (nsv*nsv*nsv*nsv) +
        cs * pow(saturate(dot(nss,i.viewDir)), 32);
    main.rgb *= tone;
    o.Albedo = main;
    o.Normal = normal;
EndSurface
Lighting Lambert</pre>
</blockquote>
<p>Up next:</p>
<ul>
<li>How and where emissive terms should be placed. I cautiously omitted all emissive terms from the above examples (so my layered car shader is without reflections right now).</li>
<li>Where should things like rim lighting go? I&#8217;m not sure if it&#8217;s a surface property (increasing albedo/emission with angle) or a lighting property (a back light).</li>
</ul>
<p>My impressions so far:</p>
<ul>
<li>I like that I don&#8217;t have to write down vertex-to-fragment structures or the vertex shader. In most cases all the vertex shader does is transform stuff and pass it down to later stages, plus occasional computations that are linear over the triangle. No good reason to write it by hand.</li>
<li>I like that the above shaders do <i>not</i> deal with <i>how</i> the rendering is actually done. For Unity&#8217;s case, I&#8217;m compiling them into single pass per light forward renderer, but they <i>should</i> just work with multiple lights per pass, deferred etc. <em>Of course, that still has to be proven!</em></li>
</ul>
<p>So far so good.</p>
<p>Series index: Shaders must die, <a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">Part 1</a>, <a href="http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/">Part 2</a>, <a href="http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/"><strong>Part 3</strong></a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Shaders must die, part 2</title>
		<link>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/</link>
		<comments>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/#comments</comments>
		<pubDate>Thu, 07 May 2009 21:35:28 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=339</guid>
		<description><![CDATA[I started playing around with the idea of &#8220;shaders must die&#8220;. I&#8217;m experimenting with extracting &#8220;surface shaders&#8221; for now. Right now my experimental pipeline is: Write a surface shader file Perl script transforms it into Unity 2.x shader file Which in turn is compiled by Unity into all lighting/shadows permutations, for D3D9 and OpenGL backends. [...]]]></description>
			<content:encoded><![CDATA[<p>I started playing around with the idea of &#8220;<a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">shaders must die</a>&#8220;. I&#8217;m experimenting with extracting &#8220;surface shaders&#8221; for now.</p>
<p>Right now my experimental pipeline is:</p>
<ol>
<li>Write a surface shader file</li>
<li>Perl script transforms it into Unity 2.x shader file</li>
<li>Which in turn is compiled by Unity into all lighting/shadows permutations, for D3D9 and OpenGL backends. Cg is used for actual shader compilation.</li>
</ol>
<p>I have <em>very</em> simple cases working. For example: <span id="more-339"></span></p>
<blockquote><pre>Properties
    2D _MainTex
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex);
EndSurface</pre>
</blockquote>
<p>This is a &#8220;no bullshit&#8221; source code for a simple Diffuse (Lambertian) shader, 87 bytes of text.</p>
<p>The Perl script produces a Unity 2.x shader. This will be long, but bear with me &#8211; I&#8217;m trying to show how much stuff has to be written right now, when we&#8217;re operating on vertex/pixel shader level. See <a href="http://unity3d.com/support/documentation/Components/SL-Attenuation.html">Attenuation and Shadows for Pixel Lights</a> in Unity docs for how this system works.</p>
<blockquote><pre>Shader "ShaderNinja/Diffuse" {
Properties {
  _MainTex ("_MainTex", 2D) = "" {}
}
SubShader {
  Tags { "RenderType"="Opaque" }
  LOD 200
  Blend AppSrcAdd AppDstAdd
  Fog { Color [_AddFog] }
  Pass {
    Tags { "LightMode"="PixelOrNone" }
CGPROGRAM
#pragma fragment frag
#pragma fragmentoption ARB_fog_exp2
#pragma fragmentoption ARB_precision_hint_fastest
#include "UnityCG.cginc"
uniform sampler2D _MainTex;
struct v2f {
    float2 uv_MainTex : TEXCOORD0;
};
struct f2l {
    half4 Albedo;
};
half4 frag (v2f i) : COLOR0 {
    f2l o;
    o.Albedo = tex2D(_MainTex,i.uv_MainTex);
    return o.Albedo * _PPLAmbient * 2.0;
}
ENDCG
  }
  Pass {
    Tags { "LightMode"="Pixel" }
CGPROGRAM
#pragma vertex vert
#pragma fragment frag
#pragma multi_compile_builtin
#pragma fragmentoption ARB_fog_exp2
#pragma fragmentoption ARB_precision_hint_fastest
#include "UnityCG.cginc"
#include "AutoLight.cginc"
struct v2f {
    V2F_POS_FOG;
    LIGHTING_COORDS
    float2 uv_MainTex;
    float3 normal;
    float3 lightDir;
};
uniform float4 _MainTex_ST;
v2f vert (appdata_tan v) {
    v2f o;
    PositionFog( v.vertex, o.pos, o.fog );
    o.uv_MainTex = TRANSFORM_TEX(v.texcoord, _MainTex);
    o.normal = v.normal;
    o.lightDir = ObjSpaceLightDir(v.vertex);
    TRANSFER_VERTEX_TO_FRAGMENT(o);
    return o;
}
uniform sampler2D _MainTex;
struct f2l {
    half4 Albedo;
    half3 Normal;
};
half4 frag (v2f i) : COLOR0 {
    f2l o;
    o.Normal = i.normal;
    o.Albedo = tex2D(_MainTex,i.uv_MainTex);
    return DiffuseLight (i.lightDir, o.Normal, o.Albedo, LIGHT_ATTENUATION(i));
}
ENDCG
  }
}
Fallback "VertexLit"
}</pre>
</blockquote>
<p>Phew, that is quite some typing to get simple diffuse shader (1607 bytes)! Well, at least all the lighting/shadow combinations are handled by Unity macros here. When Unity takes this shader and compiles into all permutations, it results in 58 kilobytes of shader assembly (D3D9 + OpenGL, 17 light/shadow combinations).</p>
<p>Let&#8217;s try something slightly different: bumpmapped, with a detail texture:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _Detail
    2D _BumpMap
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex) * SAMPLE(_Detail) * 2.0;
    o.Normal = SAMPLE_NORMAL(_BumpMap);
EndSurface
</pre>
</blockquote>
<p>This is 173 bytes of text. Generated Unity shader is 2098 bytes, which compiles into 74 kilobytes of shader assembly.</p>
<p>In this case, the processing script detects that surface shader modifies normal per pixel, and does the necessary tangent space light transformations. It all just works!</p>
<p>So this is where I am now. Next up: detect which lighting model to use based on surface parameters (right now it always uses Lambertian). Fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Shaders must die</title>
		<link>http://aras-p.info/blog/2009/05/05/shaders-must-die/</link>
		<comments>http://aras-p.info/blog/2009/05/05/shaders-must-die/#comments</comments>
		<pubDate>Tue, 05 May 2009 12:59:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=324</guid>
		<description><![CDATA[It came in as a simple thought, and now I can&#8217;t shake it off. So I say: Ok, now that the controversial bits are done, let&#8217;s continue. Most of this can be (and probably is) wrong, and I haven&#8217;t given it enough thought yet. But here&#8217;s my thinking about shaders of &#8220;regular scene objects&#8221;. All [...]]]></description>
			<content:encoded><![CDATA[<p>It came in as a simple <a href="http://twitter.com/aras_p/status/1651784380">thought</a>, and now I can&#8217;t shake it off. So I say:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/05/shadersmustdie.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2009/05/shadersmustdie.jpg" alt="Shaders Must Die" title="Shaders Must Die" width="550" height="550" class="alignnone size-full wp-image-325" /></a></p>
<p>Ok, now that the controversial bits are done, let&#8217;s continue.</p>
<p><span id="more-324"></span><br />
Most of this can be (and probably is) wrong, and I haven&#8217;t given it enough thought yet. But here&#8217;s my thinking about shaders of &#8220;regular scene objects&#8221;. All of below is about things that need to interact with lighting; I&#8217;m not talking about shaders for postprocessing, one-off uses, special effects, GPGPU or kitchen sinks.</p>
<p><strong>Operating on vertex/pixel shader level is a wrong abstraction level</strong></p>
<p>Instead, it should be separated out into &#8220;<em>surface shader</em>&#8221; (albedo, normal, specularity, &#8230;), &#8220;<em>lighting model</em>&#8221; (Lambertian, Blinn Phong, &#8230;) and &#8220;<em>light shader</em>&#8221; (attenuation, cookies, shadows).</p>
<ul>
<li>Probably 90% of the cases would only touch the surface shader (mostly mix textures/colors in various ways), and choose from some precooked lighting models.</li>
<li>9% of the cases would tweak the lighting model. Most of the things would settle for &#8220;standard&#8221; (Blinn-Phong or similar), with some stuff using skin or anisotropic or &#8230;</li>
<li>The &#8220;light shader&#8221; only needs to be touched once in a blue moon by ninjas. Once the shadowing and attenuation systems are implemented, there&#8217;s almost no reason for shader authors to see all the dirty bits.</li>
</ul>
<p>Yes, current hardware operates on vertex/geometry/pixel shaders, which is a logical thing to do for hardware. After all, these are the primitives it works on when rendering. But those primitives are <em>not</em> the things you work on when authoring how a surface should look or how it should react to a light.</p>
<p><strong>Simple code; no redundant info; sensible defaults</strong></p>
<p>In the ideal world, here&#8217;s a simple surface shader (the syntax is deliberately stupid):</p>
<blockquote><p>
Haz Texture;<br />
Albedo = sample Texture;
</p></blockquote>
<p>Or with bump mapping added:</p>
<blockquote><p>
Haz Texture;<br />
Haz NormalMap;<br />
Albedo = sample Texture;<br />
Normal = sample_normal NormalMap;
</p></blockquote>
<p>And this should be <em>all</em> the info you have to provide. This would choose the lighting model based on used things (in this case, Lambertian). It would <em>somehow</em> just work with all kinds of lights, shadows, ambient occlusion and whatnot.</p>
<p>Compare to how much has to be written to implement a simple surface in your current shader technology, so that it would work &#8220;with everything&#8221;.</p>
<p>From the above shader, proper hardware shaders can be generated for DX9, DX11, DX1337, OpenGL, next-gen and next-next-gen consoles, mobile platforms with capable hardware, etc.</p>
<p>It can be used in accumulative forward rendering, forward rendering with multiple lights per pass, hybrid (light pre-pass / prelight) rendering, deferred rendering etc. Heck, even for a raytracer if you have one at hand.</p>
<p>I want!</p>
<p>Now of course, it won&#8217;t be as nice as more complex materials have to be expressed. Some might not even be possible. But shader text complexity should grow with material complexity; and all information that is redundant, implied, inferred or useless should be eliminated. <em>There&#8217;s no good reason to stick to conventions and limits of current hardware just because it operates like that</em>.</p>
<p>Shaders must die!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/05/shaders-must-die/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Google O3D &#8211; it&#8217;s going to be interesting</title>
		<link>http://aras-p.info/blog/2009/05/05/google-o3d-its-going-to-be-interesting/</link>
		<comments>http://aras-p.info/blog/2009/05/05/google-o3d-its-going-to-be-interesting/#comments</comments>
		<pubDate>Tue, 05 May 2009 12:01:24 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=317</guid>
		<description><![CDATA[A couple of weeks ago Google announced O3D: an open source web browser plugin for low level accelerated 3D graphics. The website for O3D project is here. Of course this created some buzz (hey, it&#8217;s Google after all). And it is in some way a competing technology with Unity. I think it&#8217;s going to be [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago Google <a href="http://google-code-updates.blogspot.com/2009/04/toward-open-web-standard-for-3d.html">announced O3D</a>: an open source web browser plugin for low level accelerated 3D graphics. The website for O3D project <a href="http://code.google.com/apis/o3d/">is here</a>.</p>
<p>Of course this created some buzz (hey, it&#8217;s Google after all). And it is in some way a competing technology with <a href="http://unity3d.com/">Unity</a>. I think it&#8217;s going to be interesting, so I say &#8220;welcome competition!&#8221;</p>
<p><em>Preemptive blah blah: this website is my personal opinion and does not represent the views of my employer, former employers or anyone else other than myself.</em></p>
<p>Unity is one of the players in &#8220;3D on the web&#8221; space. 3D graphics in the browser are in fact nothing new. <a href="http://unity3d.com/unity-web-player-2.x">Unity&#8217;s browser plugin</a> has existed since 2005 and is now in eight digits installations count. There is <a href="http://en.wikipedia.org/wiki/VRML">VRML</a>, <a href="http://en.wikipedia.org/wiki/X3D">X3D</a>, <a href="http://en.wikipedia.org/wiki/Adobe_Shockwave">Adobe Shockwave</a>, <a href="http://en.wikipedia.org/wiki/Virtools">3DVIA/Virtools</a>, software rendering approaches on top of <a href="http://en.wikipedia.org/wiki/3D_Flash">Flash</a> and so on.</p>
<p>In my view, major advantages that Unity has compared to O3D:</p>
<ul>
<li>It&#8217;s not only about the graphics. Unity has physics, audio, input, scripting, streaming, networking, asset pipeline and whatnot. O3D is only about the graphics, and at a lower level.</li>
<li>Unity runs on wider range of hardware. O3D requires Shader Mode 2.0 or later hardware, so about 30% of the &#8220;machines on the internet&#8221; can&#8217;t run O3D (based on our <a href="http://unity3d.com/webplayer/hwstats/pages/web-2009Q1-shadergen.html">2009Q1 data</a>). Couple that with lots of compatibility workarounds that we have and it&#8217;s probably safe to say that Unity is more <em>stable and mature</em> at this point.</li>
<li>Unity is not only about the web. There&#8217;s support for iPhone, Nintendo Wii, standalone games, and with time more console and mobile platforms will come.</li>
<li>Creating and improving Unity is our primary and only focus as a company. In Google&#8217;s case, O3D is just another technology in their vast portfolio.</li>
</ul>
<p><em>Of course</em>, O3D also has advantages:</p>
<ul>
<li>It&#8217;s done by Google! When Google does <del datetime="2009-04-24T12:06:53+00:00">something</del> anything, people notice immediately :)</li>
<li>O3D is free and open source. Hard to beat the free price, and open source does have it&#8217;s benefits. O3D is not a &#8220;standard&#8221; of any sort right now, but it looks like Google would want it to become one.</li>
<li>Only focusing on low level graphics has it&#8217;s benefits: it&#8217;s lightweight, it appeals to hackers and graphics programmers who want to be in control. Unity&#8217;s higher level is much easier and faster to use, but low level hacking can be fun.</li>
</ul>
<p>Of course there are tons of other differences (I might have missed something important as well).</p>
<p>For me as a rendering guy, it&#8217;s interesting to see O3D taking similar decisions here and there (e.g. they don&#8217;t use GLSL on OpenGL either because it does not really work in the real world).</p>
<p>So&#8230; we&#8217;ll see where things will go. It&#8217;s going to be interesting!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/05/google-o3d-its-going-to-be-interesting/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Fixed function lighting in vertex shader &#8211; how?</title>
		<link>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/</link>
		<comments>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 20:32:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=261</guid>
		<description><![CDATA[Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell. I don&#8217;t really know why that happens, [...]]]></description>
			<content:encoded><![CDATA[<p>Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to <a href="http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/">various artifacts</a> that annoy people to hell.</p>
<p>I don&#8217;t really know <em>why</em> that happens, because it seems that most modern cards don&#8217;t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista&#8217;s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match&#8230;</p>
<p>How such a task should be approached?</p>
<p>My requirements are:</p>
<ul>
<li>Should handle any possible state combination in D3D fixed function T&#038;L.</li>
<li>D3D 9.0c, using vertex shader 2.0 is ok. For now I don&#8217;t care about OpenGL.</li>
<li>No HLSL at runtime. I don&#8217;t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.</li>
<li>Should work as fast (or close to) as the regular fixed function pipeline.</li>
</ul>
<p>I looked at ATI&#8217;s <a href="http://developer.amd.com/samples/FixedFuncShader/Pages/default.aspx">FixedFuncShader sample</a>. It&#8217;s an <strong>ubershader approach</strong>; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some <a href="http://msdn.microsoft.com/en-us/library/bb147316(VS.85).aspx">rcp</a>&#8216;s and reduce the amount of constant storage by 2x.</p>
<p>Still, it did not handle some things in the D3D T&#038;L or had some issues:</p>
<ul>
<li>It assumes one input UV, one output UV and no texture matrices. This place in T&#038;L gets quite convoluted &#8211; any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.</li>
<li>It was not using full T&#038;L lighting model. No biggie here.</li>
<li>I haven&#8217;t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.</li>
</ul>
<p>Another thing I&#8217;m considering, is to combine final shader(s) from <strong>assembly fragments</strong>, with some simple register allocation.</p>
<p>In T&#038;L shader code, there&#8217;s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.</p>
<p>What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.</p>
<p>Again, I haven&#8217;t checked with shader performance tools, but I <em>think, guess and hope</em> that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that <em>I</em> can be quite sloppy with it, i.e. don&#8217;t have to implement some super smart allocation scheme.</p>
<p>I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).</p>
<p>Does that make sense? Or did everyone solve those problems eons ago already?</p>
<p><strong>Edit</strong>: half a year later, I wrote a technical report on how I implemented all this: <a href="http://aras-p.info/texts/VertexShaderTnL.html">http://aras-p.info/texts/VertexShaderTnL.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>SwiftShader 2.0 experience</title>
		<link>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/</link>
		<comments>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/#comments</comments>
		<pubDate>Mon, 07 Apr 2008 12:05:09 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=165</guid>
		<description><![CDATA[ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, just got released. I tried it on rendering unit tests and some benchmark tests we have for Unity. In short, I&#8217;m impressed. It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything [...]]]></description>
			<content:encoded><![CDATA[<p>ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, <a href="http://www.transgaming.com/products/swiftshader/">just got released</a>. I tried it on rendering unit tests and some benchmark tests we have for Unity.</p>
<p>In short, I&#8217;m impressed.</p>
<p>It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including shaders, shadows, render to texture works without any problems.</p>
<p>Performance wise, of course it&#8217;s dozens to hundreds times slower than a <em>real</em> graphics card, but hey. I also tested with Intel 965 (aka GMA X3000) integrated graphics for comparison. All this on Intel Core2 Quad (Q6600), 3 GB RAM, Windows XP SP2.</p>
<ul>
<li><a href="http://unity3d.com/gallery/live-demos/avert-fate">Avert Fate demo</a>: Radeon HD 3850 about 300 FPS, SwiftShader about 5 FPS (about 15 FPS if per-pixel lighting is turned off), Intel 965 about 22 FPS (about 50 FPS if per-pixel lighting is turned off).</li>
<li>Scene with lots of objects and lots of shadow-casting lights: Radeon HD 3850 about 76 FPS, SwiftShader 2.5 FPS, Intel &#8211; <em>shadows not supported, duh</em>.</li>
<li>High detail terrain with lots of vegetation and four cameras rendering it simultaneously: Radeon HD 3850 about 68 FPS, SwiftShader about 3 FPS, Intel 965 about 12 FPS.</li>
</ul>
<p>Ok, so SwiftShader loses on performance to Intel 965, but the difference is only &#8220;a couple of times&#8221;, and not in order of magnitude or so. Pretty good I&#8217;d say.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The holy grail of shadows</title>
		<link>http://aras-p.info/blog/2006/02/18/the-holy-grail-of-shadows/</link>
		<comments>http://aras-p.info/blog/2006/02/18/the-holy-grail-of-shadows/#comments</comments>
		<pubDate>Sat, 18 Feb 2006 20:45:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=86</guid>
		<description><![CDATA[It just occurred to me: it seems that noone has ever made a shadowing system that does shadows from anything onto anything, with zero artifacts, with no corner cases, always looking good, running fast and on any sensible hardware. Hm&#8230; sounds like a challenge! ;) Back to reading.]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">It just occurred to me: it seems that noone has ever made a shadowing system that does shadows from anything onto anything, with zero artifacts, with no corner cases, always looking good, running fast and on any sensible hardware.</p>
<p>Hm&#8230; sounds like a challenge! ;)</p>
<p>Back to reading.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2006/02/18/the-holy-grail-of-shadows/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

