<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Don&#8217;t try to outsmart the compiler</title>
	<atom:link href="http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/feed/" rel="self" type="application/rss+xml" />
	<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/</link>
	<description>Random thoughts of a triangle pusher</description>
	<lastBuildDate>Thu, 09 Feb 2012 07:56:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Aras Pranckevičius</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-23192</link>
		<dc:creator>Aras Pranckevičius</dc:creator>
		<pubDate>Thu, 26 Nov 2009 06:26:56 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-23192</guid>
		<description>@Tom: first of all, it&#039;s an outer loop, and is very likely that the inner loop is much more heavy than any &quot;loop overhead&quot; of the outer one.

And then of course, the loop is over integers, so doing &quot;int hh = height*0.5&quot; would definitely be slower (involves converting int to float, multiplication, and converting back to int). And since it&#039;s integers, then division by two will be compiled as shift right by one bit by any decent compiler.

Does it make sense to put &quot;height/2&quot; outside of the loop - maybe. &lt;i&gt;In theory&lt;/i&gt; the compiler could detect that &quot;height/2&quot; never changes for the whole loop, so it &lt;i&gt;could&lt;/i&gt; do that automatically. But I haven&#039;t verified whether msvc/gcc actually do that.</description>
		<content:encoded><![CDATA[<p>@Tom: first of all, it&#8217;s an outer loop, and is very likely that the inner loop is much more heavy than any &#8220;loop overhead&#8221; of the outer one.</p>
<p>And then of course, the loop is over integers, so doing &#8220;int hh = height*0.5&#8243; would definitely be slower (involves converting int to float, multiplication, and converting back to int). And since it&#8217;s integers, then division by two will be compiled as shift right by one bit by any decent compiler.</p>
<p>Does it make sense to put &#8220;height/2&#8243; outside of the loop &#8211; maybe. <i>In theory</i> the compiler could detect that &#8220;height/2&#8243; never changes for the whole loop, so it <i>could</i> do that automatically. But I haven&#8217;t verified whether msvc/gcc actually do that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TomLong74</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-23190</link>
		<dc:creator>TomLong74</dc:creator>
		<pubDate>Thu, 26 Nov 2009 05:45:51 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-23190</guid>
		<description>Please forgive me if my comment is so obvious OR wrong that it shouldn&#039;t be mentioned.
I&#039;m not a hard core coder by any means and maybe I&#039;m missing 90% of what your saying
but I have to ask why, in a post about speed, is you outer loop dividing by 2 when you can
multiply by 0.5 and speed the outer loop as well?  - sure the thought exercise was the inner
loop but... 

this...
for( int y = 0; y &lt; height/2; ++y ) {...}

should be...
for( int y = 0; y &lt; height*.5; ++y ) {...}

or better yet maybe...
hh = height*.5
for( int y = 0; y &lt; hh; ++y ) {...}</description>
		<content:encoded><![CDATA[<p>Please forgive me if my comment is so obvious OR wrong that it shouldn&#8217;t be mentioned.<br />
I&#8217;m not a hard core coder by any means and maybe I&#8217;m missing 90% of what your saying<br />
but I have to ask why, in a post about speed, is you outer loop dividing by 2 when you can<br />
multiply by 0.5 and speed the outer loop as well?  &#8211; sure the thought exercise was the inner<br />
loop but&#8230; </p>
<p>this&#8230;<br />
for( int y = 0; y &lt; height/2; ++y ) {&#8230;}</p>
<p>should be&#8230;<br />
for( int y = 0; y &lt; height*.5; ++y ) {&#8230;}</p>
<p>or better yet maybe&#8230;<br />
hh = height*.5<br />
for( int y = 0; y &lt; hh; ++y ) {&#8230;}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert 'Groby' Blum</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-16007</link>
		<dc:creator>Robert 'Groby' Blum</dc:creator>
		<pubDate>Fri, 02 Jan 2009 20:26:10 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-16007</guid>
		<description>At the same time, I&#039;d advise not to trust STL blindly without measuring, either. Some STL containers (multi_map being my favorite) either generate atrocius code, or, if you crank up optimizations sufficiently high, generate decent code but take forever to compile and bloat the namespace for linking...

My money&#039;s on memswapI in the sense that it is more predictable performance-wise.</description>
		<content:encoded><![CDATA[<p>At the same time, I&#8217;d advise not to trust STL blindly without measuring, either. Some STL containers (multi_map being my favorite) either generate atrocius code, or, if you crank up optimizations sufficiently high, generate decent code but take forever to compile and bloat the namespace for linking&#8230;</p>
<p>My money&#8217;s on memswapI in the sense that it is more predictable performance-wise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aras Pranckevičius</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15597</link>
		<dc:creator>Aras Pranckevičius</dc:creator>
		<pubDate>Fri, 12 Dec 2008 07:44:49 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15597</guid>
		<description>@christer: yeah. I can think of perhaps one situation where it could be used. Like you&#039;re in some inner loop, &lt;em&gt;and&lt;/em&gt; for some reason you need to swap contents of two registers, &lt;em&gt;and&lt;/em&gt; you don&#039;t have any spare register left, &lt;em&gt;and&lt;/em&gt; the CPU does not have fast exchange instruction. Yeah, probably you have to wait for centuries for this situation to appear... in other words, no, not useful in practice.</description>
		<content:encoded><![CDATA[<p>@christer: yeah. I can think of perhaps one situation where it could be used. Like you&#8217;re in some inner loop, <em>and</em> for some reason you need to swap contents of two registers, <em>and</em> you don&#8217;t have any spare register left, <em>and</em> the CPU does not have fast exchange instruction. Yeah, probably you have to wait for centuries for this situation to appear&#8230; in other words, no, not useful in practice.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christer ericson</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15596</link>
		<dc:creator>christer ericson</dc:creator>
		<pubDate>Fri, 12 Dec 2008 07:38:53 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15596</guid>
		<description>The xor trick is nifty, in an abstract sort of way, but it has basically never been useful. Not even in the 1970s. The use of it today is 99% an indication of cluelessness more than anything else!</description>
		<content:encoded><![CDATA[<p>The xor trick is nifty, in an abstract sort of way, but it has basically never been useful. Not even in the 1970s. The use of it today is 99% an indication of cluelessness more than anything else!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aras Pranckevičius</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15548</link>
		<dc:creator>Aras Pranckevičius</dc:creator>
		<pubDate>Mon, 08 Dec 2008 18:24:05 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15548</guid>
		<description>@hcpizzi: well, in that case the compiler is not that smart. Tried adding __restrict in VS2008, the XOR test still does three memory reads and three XORs into memory. The linker even collapsed both versions (with __restrict and without) into a single function.</description>
		<content:encoded><![CDATA[<p>@hcpizzi: well, in that case the compiler is not that smart. Tried adding __restrict in VS2008, the XOR test still does three memory reads and three XORs into memory. The linker even collapsed both versions (with __restrict and without) into a single function.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hcpizzi</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15547</link>
		<dc:creator>hcpizzi</dc:creator>
		<pubDate>Mon, 08 Dec 2008 18:02:00 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15547</guid>
		<description>I was talking about the xor version. It should end up being just two reads and two writes as well, plus the xors, that&#039;s why I said that it could never be faster. Just curiosity as I said before.</description>
		<content:encoded><![CDATA[<p>I was talking about the xor version. It should end up being just two reads and two writes as well, plus the xors, that&#8217;s why I said that it could never be faster. Just curiosity as I said before.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aras Pranckevičius</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15544</link>
		<dc:creator>Aras Pranckevičius</dc:creator>
		<pubDate>Mon, 08 Dec 2008 12:55:36 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15544</guid>
		<description>@hcpizzi: well, I don&#039;t see how __restrict would improve the loop in any significant way. To swap two bytes in memory, one still needs to read them and write them back (two reads and two writes).

@ReJ: yeah, I was thinking the same as well. XOR swap is slow on &quot;regular&quot; desktop platforms, and on some others it could be even worse.</description>
		<content:encoded><![CDATA[<p>@hcpizzi: well, I don&#8217;t see how __restrict would improve the loop in any significant way. To swap two bytes in memory, one still needs to read them and write them back (two reads and two writes).</p>
<p>@ReJ: yeah, I was thinking the same as well. XOR swap is slow on &#8220;regular&#8221; desktop platforms, and on some others it could be even worse.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ReJ</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15543</link>
		<dc:creator>ReJ</dc:creator>
		<pubDate>Mon, 08 Dec 2008 12:45:17 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15543</guid>
		<description>XOR version is really evil. &quot;*p ^= *q; *q ^= *p; *p ^= *q;&quot; has 3 dependencies on previous results. That should be a nightmare for in-order CPUs.</description>
		<content:encoded><![CDATA[<p>XOR version is really evil. &#8220;*p ^= *q; *q ^= *p; *p ^= *q;&#8221; has 3 dependencies on previous results. That should be a nightmare for in-order CPUs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hcpizzi</title>
		<link>http://aras-p.info/blog/2008/12/06/dont-try-to-outsmart-the-compiler/comment-page-1/#comment-15541</link>
		<dc:creator>hcpizzi</dc:creator>
		<pubDate>Mon, 08 Dec 2008 10:45:23 +0000</pubDate>
		<guid isPermaLink="false">http://aras-p.info/blog/?p=245#comment-15541</guid>
		<description>This seems the typical scenario to use restrict. Did you try that? Just out of curiosity, because it&#039;s never going to be faster.</description>
		<content:encoded><![CDATA[<p>This seems the typical scenario to use restrict. Did you try that? Just out of curiosity, because it&#8217;s never going to be faster.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

