Archive for 'gpu'

Speculation: pipelining geometry shaders

A followup to the older “discussion” about how/why geometry shaders would be okay/slow:

The graphics hardware has been quite successful so far at hiding memory latencies (i.e. when sampling textures). It does so (according to my understanding) by having a looong pixel pipeline, where hundreds (or thousands) pixels might be at one or another processing stage. ATI talks about this in big letters (R520 dispatch processor) and speculations suggest that GeForceFX had something like that (article). I have no idea about the older cards, but presumably they did something similar as well.

I am not sure how the vertex texture fetches are pipelined – pretty slow performance on GeForce6/7 suggest that they aren’t :) Probably vertex shaders in current cards operate in a simpler way – just fetch the vertices and run whole shaders on them (in contrast to pixel shaders, which seem to run just several instructions, then go to another pixels, return back, etc.).

With DX10, we have arbitrary memory fetches in any stage of the pipeline. Even the boundary between different fetch types is somewhat blurry (constant buffers vs. arbitrary buffers vs. textures) – perhaps they will differ only in bandwidth/latency (e.g. constant buffers live near the GPU while textures live in video memory).

So, with arbitrary memory fetches anywhere (and some of them being high latency), everything needs to have long pipelines (again, just my guess). This is all great, but the longer the pipeline, the worse it performs in non-friendly scenarios: pipeline flush is more expensive, drawing just a couple of “things” (primitives, vertices, pixels) is inefficient, etc.

I guess we’ll just learn a new set of performance rules for tomorrow’s hardware!

Back to GS pipelining: I imagine that the “slow” scenarios would be like this: vertices have shaders with dynamic branches or memory fetches differing vastly in execution lengths – so GS has to wait for all vertex shaders of the current primitive (optional: plus topology) to finish; and then each GS has dynamic branches or memory fetches, and outputs different number of primitives to the rasterizer. If I’d were hardware, I’d be scared :)

More HDR woes

I’m still spending an occasional minute on my HDR demo. Now, everything is fine so far, except one thing: I can’t get MSAA working on some Radeons (and I don’t have a Radeon right now, which makes debugging a lot harder). The main point of my demo is to have MSAA on ordinary hw, so this is bad.

The reason seems to be that on older Radeons MSAA does not resolve alpha channel, which obsiously messes things up in my case. I’m using RGBE8 encoding for the main rendertarget, and it RGB gets MSAA’d and exponent not – then oh well, no good anti aliasing most of the time.

Of course I could always manually supersample everything, but this would defeat the whole point of the demo. Or I could render everything in two passes, one for RGB and one for exponent – but this also is not very nice…

Probably I’ll just release the demo as it is now and wait for possible feedback. Or dig up an old Radeon somewhere and debug more – but replacing the video card in my Shuttle XPC is not an easy task :)

Jumped onto HDR bandwagon

I’m doing a small HDR demo for fun. Nothing fancy – linear gamma, Reinhard’s tone mapping and whatnot – everyone does that. But the thing I made so far does not even look good! :)

I’m trying to support both HDR and FSAA at the same time on ordinary DX9 hardware (no Radeons 1k) by using RGBE8 rendertarget for the main scene. It’s all okay so far.

The most difficult task right now is making it look good. Once I have that I’ll post the results.

The video cards are damn fast

I was working on our next demo the other day. Boy, the video cards are damn fast nowadays!

We have a high-poly model for the main character (~200k tris), for the demo we use low-poly (~6500 tris) and a normalmap. Now, I’ve put 128 lights scattered on the hemisphere above him, each using shadow buffer. I have 4 shadow buffers, render to these from four lights, then render the character, fetching shadows from four shadowmaps at once. The result is that it’s almost realtime ambient occlusion for the animating character, and it runs at ~40FPS on my geforce 6800gt!

This is of course pretty useless, we don’t need realtime AO in the demo. But it has been nice :)