|
|
Archive for 'gpu'
You know something became a cultural phenomenon when hardware review sites start putting up images like this…
From AnandTech’s Radeon HD 4850 & 4870 review: I can has vertex data?
Edit: gee, nowadays the reviews have funny performance measures. Like, FPS per square centimeter (of GPU die size)! It does actually make (some) sense, but it’s still funny. Frames per second per square centimeter… mmm… delicious.
Posted on 2008-06-26 7:54 in gpu, random, uncategorized | No Comments »
Hey, it looks like the quest for encoding floats to RGBA textures (part 1, part 2) did not end yet.
Here’s the “best available” code that I have now:
inline float4 EncodeFloatRGBA( float v ) {
return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + bias;
}
inline float DecodeFloatRGBA( float4 rgba ) {
return dot( rgba, float4(1.0, 1/255.0, 1/65025.0, 1/160581375.0) );
}
Before I thought that bias should be +0.5/255.0 normally, except it had to be around -0.55/255.0 on Radeon cards (older than Radeon HD series). Well, turns out I was wrong, the bias mostly has to be around -0.5/255.0.
Here’s the list (same bias on Windows/D3D9 and OS X/OpenGL, so it seems to be hardware dependent, and not something in API/drivers):
- Radeon 9500 to X850: -0.61/255
- Radeon X1300 to X1900: -0.66/255
- Radeon HD 2xxx/3xxx: -0.49/255
- GeForce FX, 6, 7, 8: -0.48/255
- Intel 915, 945, 965: -0.5/255
Those are the best bias values I could find. Still, every once in a while (rarely) encoding the value to RGBA texture and reading it back would produce something where one channel is half a bit off. Not a problem if you were encoding numbers were originally 0..1 range, but for example if you were encoding something that spans over whole range of the camera, then 0..1 range gets expanded into 0..FarPlane…
And all of a sudden there are huge precision errors, up to the point of being unusable. I just tried doing a quick’n'dirty depth of field and soft particles implementation using depth encoded this way… not good.
Oh well. Has anyone successfully used encoding of high precision number into RGBA channels before?
Posted on 2008-06-20 17:55 in gpu, uncategorized | 4 Comments »
Okay, so Apple just announced OpenCL (Open Computing Language) technology in upcoming OS X 10.6. This is starting to get interesting.
My prediction? OpenCL should be something along lines of CUDA or BrookGPU. Will work on various DX10-level graphics cards, and on the CPU. I think trying to target older graphics cards does not make sense - using real actual integer types is useful in general purpose computing (DX10 tech), and Apple will probably only be shipping DX10 level graphics cards in a year (at the moment only Intel cards in Macs are DX9 level; the rest is GeForce 8s and Radeon HDs). With a multithreaded CPU fallback any older machines will be taken care of anyway (and leaves the future open for Larrabees). So yeah, quite similar to BrookGPU actually.
It has “open” in the title, so maybe they will make it for other platforms as well. I doubt that they will ship implementation though; perhaps just make it royalty/patent/whatever free and publish the spec. Which is about the same level of “openness” as other technologies with “open” in their name (OpenGL, OpenAL, OpenMP, OpenCV, …) - not exactly open, but not the worst kind either.
Oh, and suddenly there are new uses for other technologies recently developed at Apple, like LLVM or clang.
We’ll see how it goes.
Posted on 2008-06-10 21:27 in gpu | 2 Comments »
ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, just got released. I tried it on rendering unit tests and some benchmark tests we have for Unity.
In short, I’m impressed.
It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including shaders, shadows, render to texture works without any problems.
Performance wise, of course it’s dozens to hundreds times slower than a real graphics card, but hey. I also tested with Intel 965 (aka GMA X3000) integrated graphics for comparison. All this on Intel Core2 Quad (Q6600), 3 GB RAM, Windows XP SP2.
- Avert Fate demo: Radeon HD 3850 about 300 FPS, SwiftShader about 5 FPS (about 15 FPS if per-pixel lighting is turned off), Intel 965 about 22 FPS (about 50 FPS if per-pixel lighting is turned off).
- Scene with lots of objects and lots of shadow-casting lights: Radeon HD 3850 about 76 FPS, SwiftShader 2.5 FPS, Intel - shadows not supported, duh.
- High detail terrain with lots of vegetation and four cameras rendering it simultaneously: Radeon HD 3850 about 68 FPS, SwiftShader about 3 FPS, Intel 965 about 12 FPS.
Ok, so SwiftShader loses on performance to Intel 965, but the difference is only “a couple of times”, and not in order of magnitude or so. Pretty good I’d say.
Posted on 2008-04-07 14:05 in gpu, rendering | 3 Comments »
Seriously, what are they up to? Intel acquires Offset Software, a game development studio that is doing a game and an engine. Wait, I was thinking the game and tech are for PC and Xbox360? What would Intel do with that?
Not so long ago, some well known graphics guys went to work for Intel. A while ago Intel acquired Neoptica…
Signs of Larrabee coming? Intel starting to take GPUs seriously? Something else?
Posted on 2008-02-21 9:59 in gpu | 4 Comments »
I said so - 4 kilobyte intros are really getting interesting.
Meet kindernoiser - 4 kilobytes, quaternion Julia fractal on the GPU, screen space ambient occlusion and so on. iq has a nice article on the tech behind SSAO.
Keep ‘em coming!
Posted on 2007-11-21 14:39 in demos, gpu | No Comments »
Gleserg has interesting comments in my earlier post. So I thought I’d share what I am using right now, and try to throw some more complexities in :)
Here is what I am doing right now:
inline float4 EncodeFloatRGBA( float v ) {
return frac( float4(1.0, 255.0, 65025.0, 160581375.0) * v ) + 0.5/255.0;
}
inline float DecodeFloatRGBA( float4 rgba ) {
return dot( rgba, float4(1.0, 1/255.0, 1/65025.0, 1/160581375.0) );
}
And this seems to work fine almost everywhere (see below). Why am I doing this - good question, I don’t have a hard theory on which bits go where and so on. I think I saw someone on gamedev.net forums saying that in hardware 0 == 0.0 and 255 == 1.0, and that truncation is actually done on the values (not rounding). So that would mean you multiply by 255 and add a half of a bit.
Now, the trick: the above does not quite work on Radeons (at least the X1600 that I’m mostly developing on while I’m on a Mac). Instead of adding 0.5/255.0, you have to subtract 0.55/255.0 - and that value is still not perfect, but that’s the best I could come up with by plowing through various combinations. I have no idea why this must be performed (24 bit internal precision? or does it round up? something else?). On GeForces and even Intel’s shader-capable hardware, the expected +0.5/255.0 value works.
…anyone up to figuring out the mathematical proof on why encoding/decoding this way actually works? :) And yes, the last component (the one that uses 160581375) is pretty much meaningless.
Posted on 2007-06-29 9:58 in gpu | 5 Comments »
Breaking news: sometimes seemingly trivial tasks take insane amounts of time! I am sure no one knew this before! So it was yesterday - almost whole day spent fighting rounding/precision errors when encoding floating point numbers into regular 8 bit RGBA textures. You know, the trivial stuff where you start with
inline float4 EncodeFloatRGBA( float v ) {
return frac( float4(1.0, 256.0, 65536.0, 16777216.0) * v );
}
inline float DecodeFloatRGBA( float4 rgba ) {
return dot( rgba, float4(1.0, 1.0/256.0, 1.0/65536.0, 1.0/16777216.0) );
}
and everything is fine until sometimes, somewhere there’s “something wrong”. Must be rounding or quantizations errors; or maybe I should use 255 instead of 256; plus optionally add or subtract 0.5/256.0 (or would that be 0.5/255.0?). Or maybe the error is entirely somewhere else, and I’m just chasing ghosts here!
What would you do then? Why, of course, build an Encoding Floats Into Textures Studio 2007! (don’t tell me it’s not a great idea for a commercial software package! game studios would pay insane amounts of money for a tool like this!) The images here are exactly that - render into a texture, encoding UV coordinate as RGBA, then read from that texture, displaying RGBA and error from the expected value in some weird way. Turns out image postprocessing filters in Unity are a pretty good tool to do all this. Yay!
Sometimes in situations like this I figure out that graphics hardware still leaves a lot to be desired. This last image shows some calculations that depend only on the horizontal UV coordinate, so they should produce some purely vertical pattern (sans the part at the bottom, that is expected to be different). Heh, you wish!
Posted on 2007-03-03 18:33 in gpu | 13 Comments »
A followup to the older “ discussion” about how/why geometry shaders would be okay/slow:
The graphics hardware has been quite successful so far at hiding memory latencies (i.e. when sampling textures). It does so (according to my understanding) by having a looong pixel pipeline, where hundreds (or thousands) pixels might be at one or another processing stage. ATI talks about this in big letters (R520 dispatch processor) and speculations suggest that GeForceFX had something like that (article). I have no idea about the older cards, but presumably they did something similar as well.
I am not sure how the vertex texture fetches are pipelined - pretty slow performance on GeForce6/7 suggest that they aren’t :) Probably vertex shaders in current cards operate in a simpler way - just fetch the vertices and run whole shaders on them (in contrast to pixel shaders, which seem to run just several instructions, then go to another pixels, return back, etc.).
With DX10, we have arbitrary memory fetches in any stage of the pipeline. Even the boundary between different fetch types is somewhat blurry (constant buffers vs. arbitrary buffers vs. textures) - perhaps they will differ only in bandwidth/latency (e.g. constant buffers live near the GPU while textures live in video memory).
So, with arbitrary memory fetches anywhere (and some of them being high latency), everything needs to have long pipelines (again, just my guess). This is all great, but the longer the pipeline, the worse it performs in non-friendly scenarios: pipeline flush is more expensive, drawing just a couple of “things” (primitives, vertices, pixels) is inefficient, etc.
I guess we’ll just learn a new set of performance rules for tomorrow’s hardware!
Back to GS pipelining: I imagine that the “slow” scenarios would be like this: vertices have shaders with dynamic branches or memory fetches differing vastly in execution lengths - so GS has to wait for all vertex shaders of the current primitive (optional: plus topology) to finish; and then each GS has dynamic branches or memory fetches, and outputs different number of primitives to the rasterizer. If I’d were hardware, I’d be scared :)
Posted on 2005-12-22 14:06 in gpu | 2 Comments »
I’m still spending an occasional minute on my HDR demo. Now, everything is fine so far, except one thing: I can’t get MSAA working on some Radeons (and I don’t have a Radeon right now, which makes debugging a lot harder). The main point of my demo is to have MSAA on ordinary hw, so this is bad.
The reason seems to be that on older Radeons MSAA does not resolve alpha channel, which obsiously messes things up in my case. I’m using RGBE8 encoding for the main rendertarget, and it RGB gets MSAA’d and exponent not - then oh well, no good anti aliasing most of the time.
Of course I could always manually supersample everything, but this would defeat the whole point of the demo. Or I could render everything in two passes, one for RGB and one for exponent - but this also is not very nice…
Probably I’ll just release the demo as it is now and wait for possible feedback. Or dig up an old Radeon somewhere and debug more - but replacing the video card in my Shuttle XPC is not an easy task :)
Posted on 2005-11-02 11:09 in gpu | 1 Comment »
|