<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lost in the Triangles &#187; gpu</title>
	<atom:link href="http://aras-p.info/blog/tags/gpu/feed/" rel="self" type="application/rss+xml" />
	<link>http://aras-p.info/blog</link>
	<description>Random thoughts of a triangle pusher</description>
	<lastBuildDate>Fri, 08 Jan 2010 07:02:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Screenspace vs. mip-mapping</title>
		<link>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/</link>
		<comments>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 14:27:55 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=485</guid>
		<description><![CDATA[Just spent half a day debugging this, so here it is for the future reference of the internets.
In a deferred rendering setup (see Game Angst for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is computed [...]]]></description>
			<content:encoded><![CDATA[<p><em>Just spent half a day debugging this, so here it is for the future reference of the internets.</em></p>
<p>In a deferred rendering setup (see <a href="http://gameangst.com/?p=141">Game Angst</a> for a good discussion of deferred shading &#038; lighting), lights are applied using data from screen-space buffers. Position, normal and other things are reconstructed from buffers and lighting is computed &#8220;in screen space&#8221;.</p>
<p>Because each light is applied to a portion of the screen, the pixels it computes can belong to different objects. If in any place of lighting computation you use textures with <a href="http://en.wikipedia.org/wiki/Mipmap">mipmaps</a>, <em>be careful</em>. Most common use for mipmapped light textures is light &#8220;cookies&#8221; (aka <a href="http://en.wikipedia.org/wiki/Gobo_(lighting)">Gobo</a>).</p>
<p>Let&#8217;s say we have a very simple scene with a spot light: <span id="more-485"></span><br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieGood.png" alt="" title="Deferred Cookie (Good)" width="610" height="458" class="alignnone size-full wp-image-486" /></a></p>
<p>Light&#8217;s angular attenuation comes from a texture like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="cookie128" width="128" height="128" class="alignnone size-full wp-image-489" /></a></p>
<p>If the texture has mipmaps and you sample it using the &#8220;obvious&#8221; way (e.g. tex2Dproj), you can get something like this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png"><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/DeferredCookieBad.png" alt="" title="Deferred Cookie (Bad!)" width="610" height="458" class="alignnone size-full wp-image-491" /></a></p>
<p><em>Black stuff around the sphere is no good!</em> It&#8217;s not the infamous half-texel offset in D3D9, not a driver bug, not a shader compiler bug and not the nature trying to prevent you from writing a deferred renderer.</p>
<p>It&#8217;s the mipmapping.</p>
<p>Mipmaps of your cookie texture look like this (128&#215;128, 16&#215;16, 8&#215;8, 4&#215;4 shown):<br />
<img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie128.png" alt="" title="128x128" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie16.png" alt="" title="16x16" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie8.png" alt="" title="8x8" width="128" height="128" /><img src="http://aras-p.info/blog/wp-content/uploads/2010/01/cookie4.png" alt="" title="4x4" width="128" height="128" /></p>
<p>Now, take two adjacent pixels, where one belongs to the edge of the sphere, and the other belongs to the background object (technically you take a 2&#215;2 block of pixels, but just two are enough to illustrate the point). When the light is applied, cookie texture coordinates for those pixels are computed. It can happen that the coordinates are <em>very</em> different, especially when pixels &#8220;belong&#8221; to entirely different surfaces that are quite far away from each other.</p>
<p>What the GPU does when texture coordinates of adjacent pixels are very different? Chooses a lower mipmap level so that texel to pixel density roughly matches 1:1. On the edges of this &#8220;wrong&#8221; screenshot, it happens that very small mipmap level is sampled, which is either black or white color (see 4&#215;4 mip level).</p>
<p>What to do here? You could disable mip-mapping (which is not good for performance and not good for image quality). You could drop some smallest mip levels which might be enough and not that bad for performance. Another option is to manually supply LOD level or derivatives to sampling instructions, using <em>something else</em> than cookie texture coordinates. For example, derivative in view space position, or something like that. This might not be possible on lower shader models though.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Direct3D GPU Hacks</title>
		<link>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/</link>
		<comments>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 12:26:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=462</guid>
		<description><![CDATA[I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes.
So here are the D3D9 hacks known to me in one place.
Let me know if I missed something [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m catching up on various GPU hacks that exist for Direct3D 9 (things like native shadow mapping, render to vertex buffer, etc.). Turns out there&#8217;s a lot of them, but all the information is scattered around the intertubes.</p>
<p>So here are the <a href="http://aras-p.info/texts/D3D9GPUHacks.html"><strong>D3D9 hacks known to me in one place</strong></a>.</p>
<p>Let me know if I missed something or got something wrong. I also want to figure out if Intel GPUs/drivers implement any of them.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/11/20/direct3d-gpu-hacks/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Strided blur and other tips for SSAO</title>
		<link>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/</link>
		<comments>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 07:59:01 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=409</guid>
		<description><![CDATA[If you&#8217;re new to SSAO, here are good overview blog posts: meshula.net and levelofdetail. Some tips and an idea on strided blur below.
Bits and pieces I found useful

SSAO can be generated at a smaller resolution than screen, with depth+normals aware upsample/blur step.
If random offset vector points away from surface normal, flip it. This makes random [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re new to SSAO, here are good overview blog posts: <a href="http://meshula.net/wordpress/?p=145">meshula.net</a> and <a href="http://levelofdetail.wordpress.com/2008/02/10/2007-the-year-ssao-broke/">levelofdetail</a>. Some tips and an idea on strided blur below.</p>
<p><span id="more-409"></span><strong>Bits and pieces I found useful</strong></p>
<ul>
<li>SSAO can be generated at a smaller resolution than screen, with depth+normals aware upsample/blur step.</li>
<li>If random offset vector points away from surface normal, flip it. This makes random vectors be in the upper hemisphere, which reduces false occlusion on flat surfaces. Of course this requires having surface normals.</li>
<li>When generating random vectors for your AO kernel:
<ul>
<li>Generate vectors <i>inside</i> unit sphere (not <i>on</i> unit sphere).</li>
<li>Use energy minimization to distribute your samples better, especially at low sample counts. See <a href="http://www.malmer.nu/index.php/2008-04-11_energy-minimization-is-your-friend">malmer.ru</a> blog post.</li>
</ul>
</li>
<li>In your AO blurring/upsampling step: no need to sample each pixel for blur. Just skip some of them, i.e. make kernel offsets larger. See below.</li>
</ul>
<p><strong>Strided blur for AO</strong></p>
<p>Normally you&#8217;d blur AO term using some sort of standard blur, for example separable Gaussian: horizontal blur, followed by vertical blur. How one can imagine horizontal blur kernel:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur1.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur1.png" alt="Horizontal Blur Kernel" title="Horizontal Blur Kernel" width="291" height="51" class="alignnone size-full wp-image-420" /></a></p>
<p>Here&#8217;s how <a href="http://runevision.com/">Rune</a> taught me how to blur better:</p>
<blockquote>
<dl>
<dt>Rune:</dt>
<dd>The other thing is the blur. I tried to make the blur 4 times stronger, and it looks much better IMO without any artifacts I could see. I could even use 4x downsampling with that blur amount and still get acceptable results.</dd>
<dt>Aras:</dt>
<dd>how did you make it 4x stronger? <i>(I was going to say that blur step is already quite expensive, and I don&#8217;t want to add more samples to make it even more expensive, yadda yadda)</i></dd>
<dt>Rune:</dt>
<dd>m_SSAOMaterial.SetVector (&#8220;_TexelOffsetScale&#8221;, m_IsOpenGL ?<br />
	&nbsp;&nbsp;new Vector4 (<b>4</b>,0,1.0f/m_Downsampling,0) :<br />
	&nbsp;&nbsp;new Vector4 (<b>4.0f</b>/source.width,0,0,0));<br />
	And similar for vertical.</dd>
<dt>Aras:</dt>
<dd>hmm. that&#8217;s strange :)</dd>
<dt>Rune:</dt>
<dd>I have no idea what I&#8217;m doing of course but it looks good.</dd>
<dt>Aras:</dt>
<dd>so this way it does not do Gaussian on 9&#215;9 pixels, but instead only takes each 4th pixel. Wider area, but&#8230; it should not work! :)</dd>
<dt>Rune:</dt>
<dd>It creates a very fine pattern at pixel level but it&#8217;s way more subtle than the noise you get otherwise.</dd>
<dt>Aras:</dt>
<dd>ok <i>(hides in the corner and weeps)</i></dd>
</dl>
</blockquote>
<p>So yeah. The blur kernel can be &#8220;spread&#8221; to skip some pixels, effectively resulting in a larger blur radius for the same sample count:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur2.png" alt="Blur with 2 pixel stride" title="Blur with 2 pixel stride" width="291" height="51" class="alignnone size-full wp-image-421" /></a></p>
<p>Or even this:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/blur3.png" alt="Blur with 3 pixel stride" title="Blur with 3 pixel stride" width="291" height="51" class="alignnone size-full wp-image-422" /></a></p>
<p>Yes, it&#8217;s not correct blur. <strong>But that&#8217;s okay</strong>, we&#8217;re not building nuclear reactors that depend on SSAO blur being accurate. <em>If you are, SSAO is probably a wrong approach anyway, I&#8217;ve heard it&#8217;s not that useful for nuclear stuff</em>.</p>
<p>I&#8217;m not sure how this blur should be called. Strided blur? Interleaved blur? Interlaced blur? Or maybe everyone is doing that already and it has a well established name? Let me know.</p>
<p>Some images of blur in action. Raw AO term (very low &#8211; 8 &#8211; sample count and increased contrast on purpose):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO1raw.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO1raw-500x270.png" alt="Raw AO at low sample count" title="Raw AO at low sample count" width="500" height="270" class="alignnone size-medium wp-image-412" /></a></p>
<p>Regular 9&#215;9 blur (does not blur over depth+normals discontinuities):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO2blur.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO2blur-500x270.png" alt="Blurred AO" title="Blurred AO" width="500" height="270" class="alignnone size-medium wp-image-413" /></a></p>
<p>Blur that goes in 2 pixel stride (effectively 17&#215;17):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2-500x271.png" alt="Blurred AO with stride 2" title="Blurred AO with stride 2" width="500" height="271" class="alignnone size-medium wp-image-414" /></a><br />
It does create a fine interleaved pattern because it skips pixels. But you get wider blur!<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2mag.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO3blur2mag.png" alt="Blurred AO with stride 2, magnified" title="Blurred AO with stride 2, magnified" width="256" height="244" class="alignnone size-full wp-image-415" /></a></p>
<p>Blur that goes in 3 pixel stride (effectively 25&#215;25):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3-500x269.png" alt="Blurred AO with stride 3" title="Blurred AO with stride 3" width="500" height="269" class="alignnone size-medium wp-image-416" /></a><br />
At 3 pixel stride the artifacts are becoming apparent. But hey, this is very<br />
low AO sample count, increased contrast and no textures in the scene.<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3mag.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO4blur3mag.png" alt="Blured AO with stride 3, magnified" title="Blured AO with stride 3, magnified" width="256" height="244" class="alignnone size-full wp-image-417" /></a></p>
<p>For sake of completeness, the same raw AO term, but computed at 2&#215;2 smaller resolution (still using low sample count etc.):<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO5down2.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO5down2-500x270.png" alt="AO computed at lower resolution" title="AO computed at lower resolution" width="500" height="270" class="alignnone size-medium wp-image-418" /></a></p>
<p>Now, 2&#215;2 smaller AO, blurred with 3 pixels stride:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/09/AO6down2blur3.png"><img src="http://aras-p.info/blog/wp-content/uploads/2009/09/AO6down2blur3-499x272.png" alt="AO at lower resolution, blurred with 3 pixel stride" title="AO at lower resolution, blurred with 3 pixel stride" width="499" height="272" class="alignnone size-medium wp-image-419" /></a></p>
<p>Happy blurring!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/09/17/strided-blur-and-other-tips-for-ssao/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Compact Normal Storage for small g-buffers</title>
		<link>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/</link>
		<comments>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 09:39:51 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=377</guid>
		<description><![CDATA[I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture.
Here are my findings &#8211; with error visualization and shader performance numbers for some GPUs.
If you know any other method to encode/store normals in a compact way, please let me [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture.</p>
<p><a href="http://aras-p.info/texts/CompactNormalStorage.html"><strong>Here are my findings</strong></a> &#8211; with error visualization and shader performance numbers for some GPUs.</p>
<p>If you know any other method to encode/store normals in a compact way, please let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/08/04/compact-normal-storage-for-small-g-buffers/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Encoding floats to RGBA &#8211; the final?</title>
		<link>http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-final/</link>
		<comments>http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-final/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 12:58:08 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=369</guid>
		<description><![CDATA[The saga continues! In short, I need to pack a floating point number in [0..1) range into several channels of 8 bit/channel render texture. My previous approach is not ideal.
Turns out some folks have figured out an approach that finally seems to work.
Here it is for my own reference:

gamedev.net forum post by gjaegy
Suggestion right there [...]]]></description>
			<content:encoded><![CDATA[<p>The saga continues! In short, I need to pack a floating point number in [0..1) range into several channels of 8 bit/channel render texture. My <a href="http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/">previous approach</a> is not ideal.</p>
<p>Turns out some folks have figured out an approach that finally <em>seems</em> to work.</p>
<p>Here it is for my own reference:</p>
<ul>
<li><a href="http://www.gamedev.net/community/forums/topic.asp?topic_id=442138&#038;whichpage=1&#2936108">gamedev.net forum post by gjaegy</a></li>
<li>Suggestion <a href="http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/#comment-16380">right there</a> on my previous blog post comments</li>
<li>Repost <a href="http://www.gamerendering.com/2008/09/25/packing-a-float-value-in-rgba/">gamerendering blog</a></li>
<li>Repost on <a href="http://www.gamedev.net/community/forums/topic.asp?topic_id=463075&#038;whichpage=1&#3054958">gamedev.net forums</a> again.</li>
</ul>
<p>So here&#8217;s the proper way:</p>
<blockquote>
<pre>inline float4 EncodeFloatRGBA( float v ) {
  float4 enc = float4(1.0, 255.0, 65025.0, 160581375.0) * v;
  enc = frac(enc);
  enc -= enc.yzww * float4(1.0/255.0,1.0/255.0,1.0/255.0,0.0);
  return enc;
}
inline float DecodeFloatRGBA( float4 rgba ) {
  return dot( rgba, float4(1.0, 1/255.0, 1/65025.0, 1/160581375.0) );
}</pre>
</blockquote>
<p>That is, the difference from the <a href="http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/">previous approach</a> is that the &#8220;magic&#8221; (read: hardware dependent) bias is replaced with subtracting next component&#8217;s encoded value from the previous component&#8217;s encoded value.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-final/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Implementing fixed function T&amp;L in vertex shaders</title>
		<link>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/</link>
		<comments>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 06:08:50 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=364</guid>
		<description><![CDATA[Almost half a year ago I was wondering how to implement T&#038;L in vertex shaders.
Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here.
In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using [...]]]></description>
			<content:encoded><![CDATA[<p>Almost half a year ago I was wondering <a href="http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/">how to implement T&#038;L in vertex shaders</a>.</p>
<p>Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a <a href="http://aras-p.info/texts/VertexShaderTnL.html"><strong>technical report here</strong></a>.</p>
<p>In short, I&#8217;m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it&#8217;s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).</p>
<p>What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.</p>
<p>Otherwise &#8211; I like! Here&#8217;s a link to the <a href="http://aras-p.info/texts/VertexShaderTnL.html">article again</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/06/09/implementing-fixed-function-tl-in-vertex-shaders/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Shaders must die, part 3</title>
		<link>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/</link>
		<comments>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/#comments</comments>
		<pubDate>Sun, 10 May 2009 15:24:17 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=350</guid>
		<description><![CDATA[Continuing the series (see Part 1, Part 2)&#8230;
Got different lighting models (BRDFs) working. Without further ado, code snippets that produce real actual working shaders that work with lights &#038; shadows and whatnot:
Simple Lambert (single color):
Properties
    Color _Color
EndProperties
Surface
    o.Albedo = _Color;
EndSurface
Lighting Lambert


Let&#8217;s add a texture:
Properties
    2D _MainTex
 [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing the series (see <a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">Part 1</a>, <a href="http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/">Part 2</a>)&#8230;</p>
<p>Got different lighting models (BRDFs) working. Without further ado, code snippets that produce real actual working shaders that work with lights &#038; shadows and whatnot:</p>
<p><span id="more-350"></span>Simple Lambert (single color):</p>
<blockquote><pre>Properties
    Color _Color
EndProperties
Surface
    o.Albedo = _Color;
EndSurface
Lighting Lambert
</pre>
</blockquote>
<p>Let&#8217;s add a texture:</p>
<blockquote><pre>Properties
    2D _MainTex
    Color _Color
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex) * _Color;
EndSurface
Lighting Lambert</pre>
</blockquote>
<p>Change light model to Half-Lambert (a.k.a. wrapped diffuse):</p>
<blockquote><pre>// ...everything the same
Lighting HalfLambert</pre>
</blockquote>
<p>Blinn-Phong, with constant exponent &#038; constant specular color, modulated by gloss map in main texture&#8217;s alpha:</p>
<blockquote><pre>Properties
    2D _MainTex
    Color _Color
    Color _SpecColor
    Float _Exponent
EndProperties
Surface
    half4 col = SAMPLE(_MainTex);
    o.Albedo = col * _Color;
    o.Specular = _SpecColor.rgb * col.a;
    o.Exponent = _Exponent;
EndSurface
Lighting BlinnPhong</pre>
</blockquote>
<p>The same Blinn-Phong, with added normal map:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _BumpMap
    Color _Color
    Color _SpecColor
    Float _Exponent
EndProperties
Surface
    half4 col = SAMPLE(_MainTex);
    o.Albedo = col * _Color;
    o.Specular = _SpecColor.rgb * col.a;
    o.Exponent = _Exponent;
    o.Normal = SAMPLE_NORMAL(_BumpMap);
EndSurface
Lighting BlinnPhong</pre>
</blockquote>
<p>I also made an illustrative-style BRDF (see <a href="http://www.valvesoftware.com/publications.html">Illustrative Rendering in Team Fortress 2</a>), but that only requires above sample to have &#8220;Lighting TF2&#8243; at the end.</p>
<p>Another thing I tried is surface that has Albedo dependent on a viewing angle, similar to <a href="http://developer.amd.com/media/gpu_assets/ShaderX2_LayeredCarPaintShader.pdf">Layered Car Paint Shader</a>. It works:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _BumpMap
    2D _SparkleTex
    Float _Sparkle
    Color _PrimaryColor
    Color _HighlightColor
EndProperties
Surface
    half4 main = SAMPLE(_MainTex);
    half3 normal  = SAMPLE_NORMAL(_BumpMap);
    half3 normalN = normalize(SAMPLE_NORMAL(_SparkleTex));
    half3 ns = normalize (normal + normalN * _Sparkle);
    half3 nss = normalize (normal + normalN);
    i.viewDir = normalize(i.viewDir);
    half nsv = max(0,dot(ns, i.viewDir));
    half3 c0 = _PrimaryColor.rgb;
    half3 c2 = _HighlightColor.rgb;
    half3 c1 = c2 * 0.5;
    half3 cs = c2 * 0.4;
    half3 tone =
        c0 * nsv +
        c1 * (nsv*nsv) +
        c2 * (nsv*nsv*nsv*nsv) +
        cs * pow(saturate(dot(nss,i.viewDir)), 32);
    main.rgb *= tone;
    o.Albedo = main;
    o.Normal = normal;
EndSurface
Lighting Lambert</pre>
</blockquote>
<p>Up next:</p>
<ul>
<li>How and where emissive terms should be placed. I cautiously omitted all emissive terms from the above examples (so my layered car shader is without reflections right now).</li>
<li>Where should things like rim lighting go? I&#8217;m not sure if it&#8217;s a surface property (increasing albedo/emission with angle) or a lighting property (a back light).</li>
</ul>
<p>My impressions so far:</p>
<ul>
<li>I like that I don&#8217;t have to write down vertex-to-fragment structures or the vertex shader. In most cases all the vertex shader does is transform stuff and pass it down to later stages, plus occasional computations that are linear over the triangle. No good reason to write it by hand.</li>
<li>I like that the above shaders do <i>not</i> deal with <i>how</i> the rendering is actually done. For Unity&#8217;s case, I&#8217;m compiling them into single pass per light forward renderer, but they <i>should</i> just work with multiple lights per pass, deferred etc. <em>Of course, that still has to be proven!</em></li>
</ul>
<p>So far so good.</p>
<p>Series index: Shaders must die, <a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">Part 1</a>, <a href="http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/">Part 2</a>, <a href="http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/"><strong>Part 3</strong></a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/10/shaders-must-die-part-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Shaders must die, part 2</title>
		<link>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/</link>
		<comments>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/#comments</comments>
		<pubDate>Thu, 07 May 2009 21:35:28 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=339</guid>
		<description><![CDATA[I started playing around with the idea of &#8220;shaders must die&#8220;. I&#8217;m experimenting with extracting &#8220;surface shaders&#8221; for now.
Right now my experimental pipeline is:

Write a surface shader file
Perl script transforms it into Unity 2.x shader file
Which in turn is compiled by Unity into all lighting/shadows permutations, for D3D9 and OpenGL backends. Cg is used for [...]]]></description>
			<content:encoded><![CDATA[<p>I started playing around with the idea of &#8220;<a href="http://aras-p.info/blog/2009/05/05/shaders-must-die/">shaders must die</a>&#8220;. I&#8217;m experimenting with extracting &#8220;surface shaders&#8221; for now.</p>
<p>Right now my experimental pipeline is:</p>
<ol>
<li>Write a surface shader file</li>
<li>Perl script transforms it into Unity 2.x shader file</li>
<li>Which in turn is compiled by Unity into all lighting/shadows permutations, for D3D9 and OpenGL backends. Cg is used for actual shader compilation.</li>
</ol>
<p>I have <em>very</em> simple cases working. For example: <span id="more-339"></span></p>
<blockquote><pre>Properties
    2D _MainTex
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex);
EndSurface</pre>
</blockquote>
<p>This is a &#8220;no bullshit&#8221; source code for a simple Diffuse (Lambertian) shader, 87 bytes of text.</p>
<p>The Perl script produces a Unity 2.x shader. This will be long, but bear with me &#8211; I&#8217;m trying to show how much stuff has to be written right now, when we&#8217;re operating on vertex/pixel shader level. See <a href="http://unity3d.com/support/documentation/Components/SL-Attenuation.html">Attenuation and Shadows for Pixel Lights</a> in Unity docs for how this system works.</p>
<blockquote><pre>Shader "ShaderNinja/Diffuse" {
Properties {
  _MainTex ("_MainTex", 2D) = "" {}
}
SubShader {
  Tags { "RenderType"="Opaque" }
  LOD 200
  Blend AppSrcAdd AppDstAdd
  Fog { Color [_AddFog] }
  Pass {
    Tags { "LightMode"="PixelOrNone" }
CGPROGRAM
#pragma fragment frag
#pragma fragmentoption ARB_fog_exp2
#pragma fragmentoption ARB_precision_hint_fastest
#include "UnityCG.cginc"
uniform sampler2D _MainTex;
struct v2f {
    float2 uv_MainTex : TEXCOORD0;
};
struct f2l {
    half4 Albedo;
};
half4 frag (v2f i) : COLOR0 {
    f2l o;
    o.Albedo = tex2D(_MainTex,i.uv_MainTex);
    return o.Albedo * _PPLAmbient * 2.0;
}
ENDCG
  }
  Pass {
    Tags { "LightMode"="Pixel" }
CGPROGRAM
#pragma vertex vert
#pragma fragment frag
#pragma multi_compile_builtin
#pragma fragmentoption ARB_fog_exp2
#pragma fragmentoption ARB_precision_hint_fastest
#include "UnityCG.cginc"
#include "AutoLight.cginc"
struct v2f {
    V2F_POS_FOG;
    LIGHTING_COORDS
    float2 uv_MainTex;
    float3 normal;
    float3 lightDir;
};
uniform float4 _MainTex_ST;
v2f vert (appdata_tan v) {
    v2f o;
    PositionFog( v.vertex, o.pos, o.fog );
    o.uv_MainTex = TRANSFORM_TEX(v.texcoord, _MainTex);
    o.normal = v.normal;
    o.lightDir = ObjSpaceLightDir(v.vertex);
    TRANSFER_VERTEX_TO_FRAGMENT(o);
    return o;
}
uniform sampler2D _MainTex;
struct f2l {
    half4 Albedo;
    half3 Normal;
};
half4 frag (v2f i) : COLOR0 {
    f2l o;
    o.Normal = i.normal;
    o.Albedo = tex2D(_MainTex,i.uv_MainTex);
    return DiffuseLight (i.lightDir, o.Normal, o.Albedo, LIGHT_ATTENUATION(i));
}
ENDCG
  }
}
Fallback "VertexLit"
}</pre>
</blockquote>
<p>Phew, that is quite some typing to get simple diffuse shader (1607 bytes)! Well, at least all the lighting/shadow combinations are handled by Unity macros here. When Unity takes this shader and compiles into all permutations, it results in 58 kilobytes of shader assembly (D3D9 + OpenGL, 17 light/shadow combinations).</p>
<p>Let&#8217;s try something slightly different: bumpmapped, with a detail texture:</p>
<blockquote><pre>Properties
    2D _MainTex
    2D _Detail
    2D _BumpMap
EndProperties
Surface
    o.Albedo = SAMPLE(_MainTex) * SAMPLE(_Detail) * 2.0;
    o.Normal = SAMPLE_NORMAL(_BumpMap);
EndSurface
</pre>
</blockquote>
<p>This is 173 bytes of text. Generated Unity shader is 2098 bytes, which compiles into 74 kilobytes of shader assembly.</p>
<p>In this case, the processing script detects that surface shader modifies normal per pixel, and does the necessary tangent space light transformations. It all just works!</p>
<p>So this is where I am now. Next up: detect which lighting model to use based on surface parameters (right now it always uses Lambertian). Fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/07/shaders-must-die-part-2/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Shaders must die</title>
		<link>http://aras-p.info/blog/2009/05/05/shaders-must-die/</link>
		<comments>http://aras-p.info/blog/2009/05/05/shaders-must-die/#comments</comments>
		<pubDate>Tue, 05 May 2009 12:59:48 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=324</guid>
		<description><![CDATA[It came in as a simple thought, and now I can&#8217;t shake it off. So I say:

Ok, now that the controversial bits are done, let&#8217;s continue.

Most of this can be (and probably is) wrong, and I haven&#8217;t given it enough thought yet. But here&#8217;s my thinking about shaders of &#8220;regular scene objects&#8221;. All of below [...]]]></description>
			<content:encoded><![CDATA[<p>It came in as a simple <a href="http://twitter.com/aras_p/status/1651784380">thought</a>, and now I can&#8217;t shake it off. So I say:<br />
<a href="http://aras-p.info/blog/wp-content/uploads/2009/05/shadersmustdie.jpg"><img src="http://aras-p.info/blog/wp-content/uploads/2009/05/shadersmustdie.jpg" alt="Shaders Must Die" title="Shaders Must Die" width="550" height="550" class="alignnone size-full wp-image-325" /></a></p>
<p>Ok, now that the controversial bits are done, let&#8217;s continue.</p>
<p><span id="more-324"></span><br />
Most of this can be (and probably is) wrong, and I haven&#8217;t given it enough thought yet. But here&#8217;s my thinking about shaders of &#8220;regular scene objects&#8221;. All of below is about things that need to interact with lighting; I&#8217;m not talking about shaders for postprocessing, one-off uses, special effects, GPGPU or kitchen sinks.</p>
<p><strong>Operating on vertex/pixel shader level is a wrong abstraction level</strong></p>
<p>Instead, it should be separated out into &#8220;<em>surface shader</em>&#8221; (albedo, normal, specularity, &#8230;), &#8220;<em>lighting model</em>&#8221; (Lambertian, Blinn Phong, &#8230;) and &#8220;<em>light shader</em>&#8221; (attenuation, cookies, shadows).</p>
<ul>
<li>Probably 90% of the cases would only touch the surface shader (mostly mix textures/colors in various ways), and choose from some precooked lighting models.</li>
<li>9% of the cases would tweak the lighting model. Most of the things would settle for &#8220;standard&#8221; (Blinn-Phong or similar), with some stuff using skin or anisotropic or &#8230;</li>
<li>The &#8220;light shader&#8221; only needs to be touched once in a blue moon by ninjas. Once the shadowing and attenuation systems are implemented, there&#8217;s almost no reason for shader authors to see all the dirty bits.</li>
</ul>
<p>Yes, current hardware operates on vertex/geometry/pixel shaders, which is a logical thing to do for hardware. After all, these are the primitives it works on when rendering. But those primitives are <em>not</em> the things you work on when authoring how a surface should look or how it should react to a light.</p>
<p><strong>Simple code; no redundant info; sensible defaults</strong></p>
<p>In the ideal world, here&#8217;s a simple surface shader (the syntax is deliberately stupid):</p>
<blockquote><p>
Haz Texture;<br />
Albedo = sample Texture;
</p></blockquote>
<p>Or with bump mapping added:</p>
<blockquote><p>
Haz Texture;<br />
Haz NormalMap;<br />
Albedo = sample Texture;<br />
Normal = sample_normal NormalMap;
</p></blockquote>
<p>And this should be <em>all</em> the info you have to provide. This would choose the lighting model based on used things (in this case, Lambertian). It would <em>somehow</em> just work with all kinds of lights, shadows, ambient occlusion and whatnot.</p>
<p>Compare to how much has to be written to implement a simple surface in your current shader technology, so that it would work &#8220;with everything&#8221;.</p>
<p>From the above shader, proper hardware shaders can be generated for DX9, DX11, DX1337, OpenGL, next-gen and next-next-gen consoles, mobile platforms with capable hardware, etc.</p>
<p>It can be used in accumulative forward rendering, forward rendering with multiple lights per pass, hybrid (light pre-pass / prelight) rendering, deferred rendering etc. Heck, even for a raytracer if you have one at hand.</p>
<p>I want!</p>
<p>Now of course, it won&#8217;t be as nice as more complex materials have to be expressed. Some might not even be possible. But shader text complexity should grow with material complexity; and all information that is redundant, implied, inferred or useless should be eliminated. <em>There&#8217;s no good reason to stick to conventions and limits of current hardware just because it operates like that</em>.</p>
<p>Shaders must die!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/05/05/shaders-must-die/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Fixed function lighting in vertex shader &#8211; how?</title>
		<link>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/</link>
		<comments>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 20:32:49 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[d3d]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=261</guid>
		<description><![CDATA[Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell.
I don&#8217;t really know why that happens, because [...]]]></description>
			<content:encoded><![CDATA[<p>Sometime soon I&#8217;ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to <a href="http://aras-p.info/blog/2008/06/12/depth-bias-and-the-power-of-deceiving-yourself/">various artifacts</a> that annoy people to hell.</p>
<p>I don&#8217;t really know <em>why</em> that happens, because it seems that most modern cards don&#8217;t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista&#8217;s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match&#8230;</p>
<p>How such a task should be approached?</p>
<p>My requirements are:</p>
<ul>
<li>Should handle any possible state combination in D3D fixed function T&#038;L.</li>
<li>D3D 9.0c, using vertex shader 2.0 is ok. For now I don&#8217;t care about OpenGL.</li>
<li>No HLSL at runtime. I don&#8217;t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.</li>
<li>Should work as fast (or close to) as the regular fixed function pipeline.</li>
</ul>
<p>I looked at ATI&#8217;s <a href="http://developer.amd.com/samples/FixedFuncShader/Pages/default.aspx">FixedFuncShader sample</a>. It&#8217;s an <strong>ubershader approach</strong>; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some <a href="http://msdn.microsoft.com/en-us/library/bb147316(VS.85).aspx">rcp</a>&#8217;s and reduce the amount of constant storage by 2x.</p>
<p>Still, it did not handle some things in the D3D T&#038;L or had some issues:</p>
<ul>
<li>It assumes one input UV, one output UV and no texture matrices. This place in T&#038;L gets quite convoluted &#8211; any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.</li>
<li>It was not using full T&#038;L lighting model. No biggie here.</li>
<li>I haven&#8217;t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.</li>
</ul>
<p>Another thing I&#8217;m considering, is to combine final shader(s) from <strong>assembly fragments</strong>, with some simple register allocation.</p>
<p>In T&#038;L shader code, there&#8217;s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.</p>
<p>What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.</p>
<p>Again, I haven&#8217;t checked with shader performance tools, but I <em>think, guess and hope</em> that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that <em>I</em> can be quite sloppy with it, i.e. don&#8217;t have to implement some super smart allocation scheme.</p>
<p>I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).</p>
<p>Does that make sense? Or did everyone solve those problems eons ago already?</p>
<p><strong>Edit</strong>: half a year later, I wrote a technical report on how I implemented all this: <a href="http://aras-p.info/texts/VertexShaderTnL.html">http://aras-p.info/texts/VertexShaderTnL.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2009/01/22/fixed-function-lighting-in-vertex-shader-how/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Hardware of the casual gamer</title>
		<link>http://aras-p.info/blog/2008/08/28/hardware-of-the-casual-gamer/</link>
		<comments>http://aras-p.info/blog/2008/08/28/hardware-of-the-casual-gamer/#comments</comments>
		<pubDate>Thu, 28 Aug 2008 18:32:57 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[games]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[unity]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=200</guid>
		<description><![CDATA[(if this sounds like a rehash of a blog post on blogs.unity3d.com, well, it is&#8230;)
Everyone knows the Valve&#8217;s hardware survey. But what if your target game players are not the traditional &#8220;big budget AAA game&#8221; type? For example, at the moment most Unity Web Player games are oriented to much more casual market, so hardware [...]]]></description>
			<content:encoded><![CDATA[<p><em>(if this sounds like a rehash of a blog post on <a href="http://blogs.unity3d.com/2008/08/01/hardware-of-the-casual-gamer/">blogs.unity3d.com</a>, well, it is&#8230;)</em></p>
<p>Everyone knows the <a href="http://www.steampowered.com/status/survey.html">Valve&#8217;s hardware survey</a>. But what if your target game players are not the traditional &#8220;big budget AAA game&#8221; type? For example, at the moment most Unity Web Player games are oriented to much more casual market, so hardware there might be <em>very</em> different. And indeed, turns out it is quite different.</p>
<p>Without further ado, here&#8217;s the data we have: <a href="http://unity3d.com/webplayer/hwstats/"><strong>Unity Web Player hardware statistics</strong></a>.</p>
<p>It&#8217;s about two million data points since we started gathering it earlier this year.</p>
<p>Some subjective points of interest (I&#8217;ll be using current data for 2008 Q3 here):</p>
<ul>
<li><a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-os.html">Operating systems</a>: Mac OS X is 2.5%, the rest is Windows. 64 bit Windows haven&#8217;t really picked up yet (0.7%). Windows 2000 is dying fast (0.7%). OS X Leopard already took over OS X Tiger.</li>
<li><a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-cpuvendor.html">CPUs</a>: poor Transmeta :) <a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-cores.html">Dual core</a> CPUs are becoming the norm (46%).</li>
<li><a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-gfxcard.html">Graphics cards</a>: quite sad, in fact&#8230; top 15 cards are slow or <em>horribly slow</em>. Capability wise, they are quite good, with <a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-shader.html">about 70%</a> having shader model 2.0 or higher. Shader model 1.x cards are dead. &#8220;<a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-shadergen.html">Can has DX10</a>&#8221; is 2.7%.</li>
<li>Casual machines don&#8217;t have lots of <a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-ram.html">RAM</a>. Nor lots of <a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-vram.html">VRAM</a>.</li>
<li>Most popular nvidia driver? <a href="http://unity3d.com/webplayer/hwstats/pages/web-2008Q3-gfxdriver.html">56.73</a>. Looks like this is the driver that comes integrated in XP SP2&#8230; Now, who says regular people <em>ever</em> update their drivers? Likewise, vga.dll (i.e. standard VGA) is 1.6% of machines; additional 1.5% don&#8217;t report any driver (not sure how that happens&#8230;).</li>
</ul>
<p>So yeah. Casual machines: capabilities quite okay, performance low, low, low. That&#8217;s life.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/08/28/hardware-of-the-casual-gamer/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>I can has vertex?</title>
		<link>http://aras-p.info/blog/2008/06/26/i-can-has-vertex/</link>
		<comments>http://aras-p.info/blog/2008/06/26/i-can-has-vertex/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 05:54:14 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[random]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=182</guid>
		<description><![CDATA[You know something became a cultural phenomenon when hardware review sites start putting up images like this&#8230;
From AnandTech&#8217;s Radeon HD 4850 &#038; 4870 review: I can has vertex data?
Edit: gee, nowadays the reviews have funny performance measures. Like, FPS per square centimeter (of GPU die size)! It does actually make (some) sense, but it&#8217;s still [...]]]></description>
			<content:encoded><![CDATA[<p><a href='http://aras-p.info/blog/wp-content/uploads/2008/06/gt200.png'><img src="http://aras-p.info/blog/wp-content/uploads/2008/06/gt200.png" alt="I can has vertex data?" title="gt200" width="290" height="177" class="alignright size-full wp-image-183" /></a>You know <a href="http://en.wikipedia.org/wiki/Lolcat">something</a> became a cultural phenomenon when hardware review sites start putting up images like this&#8230;</p>
<p>From AnandTech&#8217;s <a href="http://www.anandtech.com/video/showdoc.aspx?i=3341&#038;p=3">Radeon HD 4850 &#038; 4870 review</a>: <em>I can has vertex data?</em></p>
<p><em>Edit</em>: gee, nowadays the reviews have funny performance measures. Like, <a href="http://www.anandtech.com/video/showdoc.aspx?i=3341&#038;p=7">FPS per square centimeter</a> (of GPU die size)! It does actually make (some) sense, but it&#8217;s still funny. Frames per second per square centimeter&#8230; mmm&#8230; delicious.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/06/26/i-can-has-vertex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Encoding floats to RGBA, again</title>
		<link>http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/</link>
		<comments>http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 15:55:56 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=181</guid>
		<description><![CDATA[Hey, it looks like the quest for encoding floats to RGBA textures (part 1, part 2) did not end yet.
Here&#8217;s the &#8220;best available&#8221; code that I have now:

inline float4 EncodeFloatRGBA( float v ) {
  return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + bias;
}
inline float DecodeFloatRGBA( float4 rgba ) {
  return dot( [...]]]></description>
			<content:encoded><![CDATA[<p>Hey, it looks like the quest for encoding floats to RGBA textures (<a href="http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/">part 1</a>, <a href="http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/">part 2</a>) did not end yet.</p>
<p>Here&#8217;s the &#8220;best available&#8221; code that I have now:</p>
<blockquote>
<pre>inline float4 EncodeFloatRGBA( float v ) {
  return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + <b>bias</b>;
}
inline float DecodeFloatRGBA( float4 rgba ) {
  return dot( rgba, float4(1.0, 1/255.0, 1/65025.0, 1/160581375.0) );
}</pre>
</blockquote>
<p><a href="http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/">Before</a> I thought that <strong>bias</strong> should be +0.5/255.0 normally, except it had to be around -0.55/255.0 on Radeon cards (older than Radeon HD series). Well, turns out I was wrong, the bias <em>mostly</em> has to be around -0.5/255.0.</p>
<p>Here&#8217;s the list (same bias on Windows/D3D9 and OS X/OpenGL, so it seems to be hardware dependent, and not something in API/drivers):</p>
<ul>
<li>Radeon 9500 to X850: -0.61/255</li>
<li>Radeon X1300 to X1900: -0.66/255</li>
<li>Radeon HD 2xxx/3xxx: -0.49/255</li>
<li>GeForce FX, 6, 7, 8: -0.48/255</li>
<li>Intel 915, 945, 965: -0.5/255</li>
</ul>
<p>Those are the best bias values I could find. Still, every once in a while (rarely) encoding the value to RGBA texture and reading it back would produce something where one channel is half a bit off. Not a problem if you were encoding numbers were originally 0..1 range, but for example if you were encoding something that spans over whole range of the camera, then 0..1 range gets expanded into 0..FarPlane&#8230;</p>
<p>And all of a sudden there are <strong>huge</strong> precision errors, up to the point of being unusable. I just tried doing a quick&#8217;n'dirty depth of field and soft particles implementation using depth encoded this way&#8230; not good.</p>
<p>Oh well. Has anyone successfully used encoding of high precision number into RGBA channels before?</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/06/20/encoding-floats-to-rgba-again/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>OpenCL?</title>
		<link>http://aras-p.info/blog/2008/06/10/opencl/</link>
		<comments>http://aras-p.info/blog/2008/06/10/opencl/#comments</comments>
		<pubDate>Tue, 10 Jun 2008 19:27:30 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=175</guid>
		<description><![CDATA[Okay, so Apple just announced OpenCL (Open Computing Language) technology in upcoming OS X 10.6. This is starting to get interesting.
My prediction? OpenCL should be something along lines of CUDA or BrookGPU. Will work on various DX10-level graphics cards, and on the CPU. I think trying to target older graphics cards does not make sense [...]]]></description>
			<content:encoded><![CDATA[<p>Okay, so Apple just announced <a href="http://en.wikipedia.org/wiki/OpenCL">OpenCL</a> (Open Computing Language) technology in upcoming OS X 10.6. This is starting to get interesting.</p>
<p>My prediction? OpenCL should be something along lines of <a href="http://en.wikipedia.org/wiki/CUDA">CUDA</a> or <a href="http://en.wikipedia.org/wiki/BrookGPU">BrookGPU</a>. Will work on various DX10-level graphics cards, <em>and</em> on the CPU. I think trying to target older graphics cards does not make sense &#8211; using real actual integer types is useful in general purpose computing (DX10 tech), and Apple will probably only be shipping DX10 level graphics cards in a year (at the moment only Intel cards in Macs are DX9 level; the rest is GeForce 8s and Radeon HDs). With a multithreaded CPU fallback any older machines will be taken care of anyway (and leaves the future open for Larrabees). So yeah, quite similar to BrookGPU actually.</p>
<p>It has &#8220;open&#8221; in the title, so maybe they will make it for other platforms as well. I doubt that they will ship implementation though; perhaps just make it royalty/patent/whatever free and publish the spec. Which is about the same level of &#8220;openness&#8221; as other technologies with &#8220;open&#8221; in their name (OpenGL, OpenAL, OpenMP, OpenCV, &#8230;) &#8211; not exactly open, but not the worst kind either.</p>
<p>Oh, and suddenly there are new uses for other technologies recently developed at Apple, like <a href="http://llvm.org/">LLVM</a> or <a href="http://clang.llvm.org/">clang</a>.</p>
<p>We&#8217;ll see how it goes.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/06/10/opencl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SwiftShader 2.0 experience</title>
		<link>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/</link>
		<comments>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/#comments</comments>
		<pubDate>Mon, 07 Apr 2008 12:05:09 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>
		<category><![CDATA[rendering]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=165</guid>
		<description><![CDATA[ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, just got released. I tried it on rendering unit tests and some benchmark tests we have for Unity.
In short, I&#8217;m impressed.
It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including [...]]]></description>
			<content:encoded><![CDATA[<p>ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, <a href="http://www.transgaming.com/products/swiftshader/">just got released</a>. I tried it on rendering unit tests and some benchmark tests we have for Unity.</p>
<p>In short, I&#8217;m impressed.</p>
<p>It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including shaders, shadows, render to texture works without any problems.</p>
<p>Performance wise, of course it&#8217;s dozens to hundreds times slower than a <em>real</em> graphics card, but hey. I also tested with Intel 965 (aka GMA X3000) integrated graphics for comparison. All this on Intel Core2 Quad (Q6600), 3 GB RAM, Windows XP SP2.</p>
<ul>
<li><a href="http://unity3d.com/gallery/live-demos/avert-fate">Avert Fate demo</a>: Radeon HD 3850 about 300 FPS, SwiftShader about 5 FPS (about 15 FPS if per-pixel lighting is turned off), Intel 965 about 22 FPS (about 50 FPS if per-pixel lighting is turned off).</li>
<li>Scene with lots of objects and lots of shadow-casting lights: Radeon HD 3850 about 76 FPS, SwiftShader 2.5 FPS, Intel &#8211; <em>shadows not supported, duh</em>.</li>
<li>High detail terrain with lots of vegetation and four cameras rendering it simultaneously: Radeon HD 3850 about 68 FPS, SwiftShader about 3 FPS, Intel 965 about 12 FPS.</li>
</ul>
<p>Ok, so SwiftShader loses on performance to Intel 965, but the difference is only &#8220;a couple of times&#8221;, and not in order of magnitude or so. Pretty good I&#8217;d say.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/04/07/swiftshader-20-experience/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What is Intel up to?</title>
		<link>http://aras-p.info/blog/2008/02/21/what-is-intel-up-to/</link>
		<comments>http://aras-p.info/blog/2008/02/21/what-is-intel-up-to/#comments</comments>
		<pubDate>Thu, 21 Feb 2008 07:59:16 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2008/02/21/what-is-intel-up-to/</guid>
		<description><![CDATA[Seriously, what are they up to? Intel acquires Offset Software, a game development studio that is doing a game and an engine. Wait, I was thinking the game and tech are for PC and Xbox360? What would Intel do with that?
Not so long ago, some well known graphics guys went to work for Intel. A [...]]]></description>
			<content:encoded><![CDATA[<p>Seriously, what are they up to? Intel <a href="http://www.projectoffset.com/news.php">acquires Offset Software</a>, a game development studio that is doing a game and an engine. Wait, I was thinking the game and tech are for PC and Xbox360? What would Intel do with that?</p>
<p>Not so long ago, some well known graphics guys <a href="http://www.beyond3d.com/content/news/557">went to work for Intel</a>. A while ago Intel <a href="http://www.beyond3d.com/content/news/534">acquired Neoptica</a>&#8230;</p>
<p>Signs of <a href="http://en.wikipedia.org/wiki/Larrabee_(GPU)">Larrabee</a> coming? Intel starting to take GPUs seriously? Something else?</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2008/02/21/what-is-intel-up-to/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Kindernoiser!</title>
		<link>http://aras-p.info/blog/2007/11/21/kindernoiser/</link>
		<comments>http://aras-p.info/blog/2007/11/21/kindernoiser/#comments</comments>
		<pubDate>Wed, 21 Nov 2007 12:39:59 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[demos]]></category>
		<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/11/21/kindernoiser/</guid>
		<description><![CDATA[I said so &#8211; 4 kilobyte intros are really getting interesting.
Meet kindernoiser &#8211; 4 kilobytes, quaternion Julia fractal on the GPU, screen space ambient occlusion and so on. iq has a nice article on the tech behind SSAO.
Keep &#8216;em coming!
]]></description>
			<content:encoded><![CDATA[<p><a href='http://aras-p.info/blog/wp-content/uploads/2007/11/kindernoiser.png' title='kindernoiser'><img class='alignright' src='http://aras-p.info/blog/wp-content/uploads/2007/11/kindernoiser.thumbnail.png' alt='kindernoiser' /></a><a href="http://aras-p.info/blog/2007/05/19/hey-4-kilobyte-intros-are-starting-to-get-interesting/">I said so</a> &#8211; 4 kilobyte intros are really getting interesting.</p>
<p>Meet <a href="http://www.pouet.net/prod.php?which=32549">kindernoiser</a> &#8211; 4 kilobytes, quaternion Julia fractal on the GPU, screen space ambient occlusion and so on. iq has a nice article on the <a href="http://rgba.scenesp.org/iq/computer/articles/ssao/ssao.htm">tech behind SSAO</a>.</p>
<p>Keep &#8216;em coming!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/11/21/kindernoiser/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Encoding floats to RGBA, redux</title>
		<link>http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/</link>
		<comments>http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/#comments</comments>
		<pubDate>Fri, 29 Jun 2007 07:58:24 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/</guid>
		<description><![CDATA[Gleserg has interesting comments in my earlier post. So I thought I&#8217;d share what I am using right now, and try to throw some more complexities in :)
Here is what I am doing right now:

inline float4 EncodeFloatRGBA( float v ) {
  return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + 0.5/255.0;
}
inline float DecodeFloatRGBA( [...]]]></description>
			<content:encoded><![CDATA[<p>Gleserg has interesting comments in <a href="http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba">my earlier post</a>. So I thought I&#8217;d share what I am using right now, and try to throw some more complexities in :)</p>
<p>Here is what I am doing right now:</p>
<blockquote>
<pre>inline float4 EncodeFloatRGBA( float v ) {
  return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + 0.5/255.0;
}
inline float DecodeFloatRGBA( float4 rgba ) {
  return dot( rgba, float4(1.0, 1/255.0, 1/65025.0, 1/160581375.0) );
}</pre>
</blockquote>
<p>And this seems to work fine almost everywhere (see below). Why am I doing this &#8211; good question, I don&#8217;t have a hard theory on which bits go where and so on. I think I saw someone on gamedev.net forums saying that in hardware 0 == 0.0 and 255 == 1.0, and that truncation is actually done on the values (not rounding). So that would mean you multiply by 255 and add a half of a bit.</p>
<p>Now, the trick: the above does not quite work on Radeons (at least the X1600 that I&#8217;m mostly developing on while I&#8217;m on a Mac). Instead of adding 0.5/255.0, you have to subtract 0.55/255.0 &#8211; and that value is still not perfect, but that&#8217;s the best I could come up with by plowing through various combinations. I have no idea why this must be performed (24 bit internal precision? or does it round <em>up</em>? something else?). On GeForces and even Intel&#8217;s shader-capable hardware, the expected +0.5/255.0 value works.</p>
<p>&#8230;anyone up to figuring out the mathematical proof on why encoding/decoding this way actually works? :) And yes, the last component (the one that uses 160581375) is pretty much meaningless.</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/06/29/encoding-floats-to-rgba-redux/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A day well spent (encoding floats to RGBA)</title>
		<link>http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/</link>
		<comments>http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/#comments</comments>
		<pubDate>Sat, 03 Mar 2007 16:33:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=103</guid>
		<description><![CDATA[Breaking news: sometimes seemingly trivial tasks take insane amounts of time! I am sure no one knew this before! So it was yesterday &#8211; almost whole day spent fighting rounding/precision errors when encoding floating point numbers into regular 8 bit RGBA textures. You know, the trivial stuff where you start with

inline float4 EncodeFloatRGBA( float v [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify"><a href="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba01.png" title="RGBA encoding 01"><img src="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba01.thumbnail.png" class="alignright" alt="RGBA encoding 01" /></a>Breaking news: sometimes seemingly trivial tasks take insane amounts of time! I am <span style="font-style: italic">sure</span> no one knew this before! So it was yesterday &#8211; almost whole day spent fighting rounding/precision errors when encoding floating point numbers into regular 8 bit RGBA textures. You know, the trivial stuff where you start with</p>
<blockquote>
<pre>inline float4 EncodeFloatRGBA( float v ) {
  return frac( float4(1.0, 256.0, 65536.0, 16777216.0) * v );
}
inline float DecodeFloatRGBA( float4 rgba ) {
  return dot( rgba, float4(1.0, 1.0/256.0, 1.0/65536.0, 1.0/16777216.0) );
}</pre>
</blockquote>
<p>and everything is fine until sometimes, somewhere there&#8217;s &#8220;something wrong&#8221;. Must be rounding or quantizations errors; or maybe I should use 255 instead of 256; plus optionally add or subtract 0.5/256.0 (or would that be 0.5/255.0?). Or maybe the error is entirely somewhere else, and I&#8217;m just chasing ghosts here!</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba02.png" title="RGBA encoding 02"><img class="alignright" src="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba02.thumbnail.png" alt="RGBA encoding 02" /></a>What would you do then? Why, of course, build an Encoding Floats Into Textures Studio 2007! <span style="font-style: italic">(don&#8217;t tell me it&#8217;s not a great idea for a commercial software package! game studios would pay insane amounts of money for a tool like this!)</span> The images here are exactly that &#8211; render into a texture, encoding UV coordinate as RGBA, then read from that texture, displaying RGBA and error from the expected value in some weird way. Turns out image postprocessing filters in Unity are a pretty good tool to do all this. Yay!</p>
<p><a href="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba03.png" title="RGBA encoding 03"><img class="alignright" src="http://aras-p.info/blog/wp-content/uploads/2007/03/rgba03.thumbnail.png" alt="RGBA encoding 03" /></a>Sometimes in situations like this I figure out that graphics hardware still leaves a lot to be desired. This last image shows some calculations that depend <span style="font-style: italic">only</span> on the horizontal UV coordinate, so they should produce some purely vertical pattern (sans the part at the bottom, that is expected to be different). Heh, you wish!</p>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Speculation: pipelining geometry shaders</title>
		<link>http://aras-p.info/blog/2005/12/22/speculation-pipelining-geometry-shaders/</link>
		<comments>http://aras-p.info/blog/2005/12/22/speculation-pipelining-geometry-shaders/#comments</comments>
		<pubDate>Thu, 22 Dec 2005 12:06:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=82</guid>
		<description><![CDATA[A followup to the older &#8220;discussion&#8221; about how/why geometry shaders would be okay/slow:
The graphics hardware has been quite successful so far at hiding memory latencies (i.e. when sampling textures). It does so (according to my understanding) by having a looong pixel pipeline, where hundreds (or thousands) pixels might be at one or another processing stage. [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">A followup to the older &#8220;<a href="http://aras-p.info/blog/2005/12/16/reading-dx10-docs/">discussion</a>&#8221; about how/why geometry shaders would be okay/slow:</p>
<p>The graphics hardware has been quite successful so far at hiding memory latencies (i.e. when sampling textures). It does so (according to my understanding) by having a looong pixel pipeline, where hundreds (or thousands) pixels might be at one or another processing stage. ATI talks about this in big letters (<a href="http://www.beyond3d.com/reviews/ati/r520/">R520 dispatch processor</a>) and speculations suggest that GeForceFX had something like that (<a href="http://www.extremetech.com/article2/0,3973,710337,00.asp">article</a>). I have no idea about the older cards, but presumably they did something similar as well.</p>
<p>I am not sure how the vertex texture fetches are pipelined &#8211; pretty slow performance on GeForce6/7 suggest that they aren&#8217;t :) Probably vertex shaders in current cards operate in a simpler way &#8211; just fetch the vertices and run whole shaders on them (in contrast to pixel shaders, which seem to run just several instructions, then go to another pixels, return back, etc.).</p>
<p>With DX10, we have arbitrary memory fetches in any stage of the pipeline. Even the boundary between different fetch types is somewhat blurry (constant buffers vs. arbitrary buffers vs. textures) &#8211; perhaps they will differ only in bandwidth/latency (e.g. constant buffers live near the GPU while textures live in video memory).</p>
<p>So, with arbitrary memory fetches anywhere (and some of them being high latency), everything needs to have long pipelines (again, just my guess). This is all great, but the longer the pipeline, the worse it performs in non-friendly scenarios: pipeline flush is more expensive, drawing just a couple of &#8220;things&#8221; (primitives, vertices, pixels) is inefficient, etc.</p>
<p>I guess we&#8217;ll just learn a new set of performance rules for tomorrow&#8217;s hardware!</p>
<p>Back to GS pipelining: I imagine that the &#8220;slow&#8221; scenarios would be like this: vertices have shaders with dynamic branches or memory fetches differing vastly in execution lengths &#8211; so GS has to wait for all vertex shaders of the current primitive (optional: plus topology) to finish; and then each GS has dynamic branches or memory fetches, and outputs different number of primitives to the rasterizer. If I&#8217;d were hardware, I&#8217;d be scared :)
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/12/22/speculation-pipelining-geometry-shaders/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More HDR woes</title>
		<link>http://aras-p.info/blog/2005/11/02/more-hdr-woes/</link>
		<comments>http://aras-p.info/blog/2005/11/02/more-hdr-woes/#comments</comments>
		<pubDate>Wed, 02 Nov 2005 09:09:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=76</guid>
		<description><![CDATA[I&#8217;m still spending an occasional minute on my HDR demo. Now, everything is fine so far, except one thing: I can&#8217;t get MSAA working on some Radeons (and I don&#8217;t have a Radeon right now, which makes debugging a lot harder). The main point of my demo is to have MSAA on ordinary hw, so [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">I&#8217;m still spending an occasional minute on my <a href="http://aras-p.info/blog/2005/10/23/jumped-onto-hdr-bandwagon">HDR demo</a>. Now, everything is fine so far, except one thing: I can&#8217;t get MSAA working on some Radeons (and I don&#8217;t have a Radeon right now, which makes debugging a lot harder). The main point of my demo is to have MSAA on ordinary hw, so this is bad.</p>
<p>The reason seems to be that on older Radeons <a href="http://www.beyond3d.com/forum/showthread.php?p=611933#post611933">MSAA does not resolve alpha channel</a>, which obsiously messes things up in my case. I&#8217;m using RGBE8 encoding for the main rendertarget, and it RGB gets MSAA&#8217;d and exponent not &#8211; then oh well, no good anti aliasing most of the time.</p>
<p>Of course I could always manually supersample everything, but this would defeat the whole point of the demo. Or I could render everything in two passes, one for RGB and one for exponent &#8211; but this also is not very nice&#8230;</p>
<p>Probably I&#8217;ll just release the demo as it is now and wait for possible feedback. Or dig up an old Radeon somewhere and debug more &#8211; but replacing the video card in my Shuttle XPC is not an easy task :)
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/11/02/more-hdr-woes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Jumped onto HDR bandwagon</title>
		<link>http://aras-p.info/blog/2005/10/23/jumped-onto-hdr-bandwagon/</link>
		<comments>http://aras-p.info/blog/2005/10/23/jumped-onto-hdr-bandwagon/#comments</comments>
		<pubDate>Sun, 23 Oct 2005 16:54:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[demos]]></category>
		<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=74</guid>
		<description><![CDATA[I&#8217;m doing a small HDR demo for fun. Nothing fancy &#8211; linear gamma, Reinhard&#8217;s tone mapping and whatnot &#8211; everyone does that. But the thing I made so far does not even look good! :)
I&#8217;m trying to support both HDR and FSAA at the same time on ordinary DX9 hardware (no Radeons 1k) by using [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">I&#8217;m doing a small HDR demo for fun. Nothing fancy &#8211; linear gamma, Reinhard&#8217;s tone mapping and whatnot &#8211; everyone does that. But the thing I made so far does not even look good! :)</p>
<p>I&#8217;m trying to support both HDR and FSAA at the same time on ordinary DX9 hardware (no Radeons 1k) by using RGBE8 rendertarget for the main scene. It&#8217;s all okay so far.</p>
<p>The most difficult task right now is making it look good. Once I have that I&#8217;ll post the results.
</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/10/23/jumped-onto-hdr-bandwagon/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The video cards are damn fast</title>
		<link>http://aras-p.info/blog/2005/02/05/the-video-cards-are-damn-fast/</link>
		<comments>http://aras-p.info/blog/2005/02/05/the-video-cards-are-damn-fast/#comments</comments>
		<pubDate>Sat, 05 Feb 2005 11:55:00 +0000</pubDate>
		<dc:creator>Aras Pranckevičius</dc:creator>
				<category><![CDATA[gpu]]></category>

		<guid isPermaLink="false">http://aras-p.info/blog/?p=9</guid>
		<description><![CDATA[I was working on our next demo the other day. Boy, the video cards are damn fast nowadays!
We have a high-poly model for the main character (~200k tris), for the demo we use low-poly (~6500 tris) and a normalmap. Now, I&#8217;ve put 128 lights scattered on the hemisphere above him, each using shadow buffer. I [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">I was working on our next demo the other day. Boy, the video cards are damn fast nowadays!</p>
<p>We have a high-poly model for the main character (~200k tris), for the demo we use low-poly (~6500 tris) and a normalmap. Now, I&#8217;ve put 128 lights scattered on the hemisphere above him, each using shadow buffer. I have 4 shadow buffers, render to these from four lights, then render the character, fetching shadows from four shadowmaps at once. The result is that it&#8217;s almost realtime ambient occlusion for the animating character, and it runs at ~40FPS on my geforce 6800gt!</p>
<p>This is of course pretty useless, we don&#8217;t need realtime AO in the demo. But it has been nice :)</p></div>
]]></content:encoded>
			<wfw:commentRss>http://aras-p.info/blog/2005/02/05/the-video-cards-are-damn-fast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
