|
|
Archive for 'rendering'
Almost half a year ago I was wondering how to implement T&L in vertex shaders.
Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here.
In short, I’m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it’s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).
What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.
Otherwise – I like! Here’s a link to the article again.
Posted on 2009-06-09 8:08 in code, d3d, gpu, rendering, unity, work | 4 Comments »
Continuing the series (see Part 1, Part 2)…
Got different lighting models (BRDFs) working. Without further ado, code snippets that produce real actual working shaders that work with lights & shadows and whatnot:
(more…)
Posted on 2009-05-10 17:24 in gpu, rendering, unity | 1 Comment »
I started playing around with the idea of “shaders must die“. I’m experimenting with extracting “surface shaders” for now.
Right now my experimental pipeline is:
- Write a surface shader file
- Perl script transforms it into Unity 2.x shader file
- Which in turn is compiled by Unity into all lighting/shadows permutations, for D3D9 and OpenGL backends. Cg is used for actual shader compilation.
I have very simple cases working. For example: (more…)
Posted on 2009-05-07 23:35 in gpu, rendering, unity | 9 Comments »
It came in as a simple thought, and now I can’t shake it off. So I say:

Ok, now that the controversial bits are done, let’s continue.
(more…)
Posted on 2009-05-05 14:59 in gpu, rant, rendering | 9 Comments »
A couple of weeks ago Google announced O3D: an open source web browser plugin for low level accelerated 3D graphics. The website for O3D project is here.
Of course this created some buzz (hey, it’s Google after all). And it is in some way a competing technology with Unity. I think it’s going to be interesting, so I say “welcome competition!”
Preemptive blah blah: this website is my personal opinion and does not represent the views of my employer, former employers or anyone else other than myself.
Unity is one of the players in “3D on the web” space. 3D graphics in the browser are in fact nothing new. Unity’s browser plugin has existed since 2005 and is now in eight digits installations count. There is VRML, X3D, Adobe Shockwave, 3DVIA/Virtools, software rendering approaches on top of Flash and so on.
In my view, major advantages that Unity has compared to O3D:
- It’s not only about the graphics. Unity has physics, audio, input, scripting, streaming, networking, asset pipeline and whatnot. O3D is only about the graphics, and at a lower level.
- Unity runs on wider range of hardware. O3D requires Shader Mode 2.0 or later hardware, so about 30% of the “machines on the internet” can’t run O3D (based on our 2009Q1 data). Couple that with lots of compatibility workarounds that we have and it’s probably safe to say that Unity is more stable and mature at this point.
- Unity is not only about the web. There’s support for iPhone, Nintendo Wii, standalone games, and with time more console and mobile platforms will come.
- Creating and improving Unity is our primary and only focus as a company. In Google’s case, O3D is just another technology in their vast portfolio.
Of course, O3D also has advantages:
- It’s done by Google! When Google does
something anything, people notice immediately :)
- O3D is free and open source. Hard to beat the free price, and open source does have it’s benefits. O3D is not a “standard” of any sort right now, but it looks like Google would want it to become one.
- Only focusing on low level graphics has it’s benefits: it’s lightweight, it appeals to hackers and graphics programmers who want to be in control. Unity’s higher level is much easier and faster to use, but low level hacking can be fun.
Of course there are tons of other differences (I might have missed something important as well).
For me as a rendering guy, it’s interesting to see O3D taking similar decisions here and there (e.g. they don’t use GLSL on OpenGL either because it does not really work in the real world).
So… we’ll see where things will go. It’s going to be interesting!
Posted on 2009-05-05 14:01 in rendering, unity | 5 Comments »
Sometime soon I’ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell.
I don’t really know why that happens, because it seems that most modern cards don’t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista’s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match…
How such a task should be approached?
My requirements are:
- Should handle any possible state combination in D3D fixed function T&L.
- D3D 9.0c, using vertex shader 2.0 is ok. For now I don’t care about OpenGL.
- No HLSL at runtime. I don’t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.
- Should work as fast (or close to) as the regular fixed function pipeline.
I looked at ATI’s FixedFuncShader sample. It’s an ubershader approach; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some rcp’s and reduce the amount of constant storage by 2x.
Still, it did not handle some things in the D3D T&L or had some issues:
- It assumes one input UV, one output UV and no texture matrices. This place in T&L gets quite convoluted – any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.
- It was not using full T&L lighting model. No biggie here.
- I haven’t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.
Another thing I’m considering, is to combine final shader(s) from assembly fragments, with some simple register allocation.
In T&L shader code, there’s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.
What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.
Again, I haven’t checked with shader performance tools, but I think, guess and hope that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that I can be quite sloppy with it, i.e. don’t have to implement some super smart allocation scheme.
I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).
Does that make sense? Or did everyone solve those problems eons ago already?
Edit: half a year later, I wrote a technical report on how I implemented all this: http://aras-p.info/texts/VertexShaderTnL.html
Posted on 2009-01-22 22:32 in code, d3d, gpu, rendering, work | 9 Comments »
ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, just got released. I tried it on rendering unit tests and some benchmark tests we have for Unity.
In short, I’m impressed.
It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including shaders, shadows, render to texture works without any problems.
Performance wise, of course it’s dozens to hundreds times slower than a real graphics card, but hey. I also tested with Intel 965 (aka GMA X3000) integrated graphics for comparison. All this on Intel Core2 Quad (Q6600), 3 GB RAM, Windows XP SP2.
- Avert Fate demo: Radeon HD 3850 about 300 FPS, SwiftShader about 5 FPS (about 15 FPS if per-pixel lighting is turned off), Intel 965 about 22 FPS (about 50 FPS if per-pixel lighting is turned off).
- Scene with lots of objects and lots of shadow-casting lights: Radeon HD 3850 about 76 FPS, SwiftShader 2.5 FPS, Intel – shadows not supported, duh.
- High detail terrain with lots of vegetation and four cameras rendering it simultaneously: Radeon HD 3850 about 68 FPS, SwiftShader about 3 FPS, Intel 965 about 12 FPS.
Ok, so SwiftShader loses on performance to Intel 965, but the difference is only “a couple of times”, and not in order of magnitude or so. Pretty good I’d say.
Posted on 2008-04-07 14:05 in gpu, rendering | 3 Comments »
It just occurred to me: it seems that noone has ever made a shadowing system that does shadows from anything onto anything, with zero artifacts, with no corner cases, always looking good, running fast and on any sensible hardware.
Hm… sounds like a challenge! ;)
Back to reading.
Posted on 2006-02-18 22:45 in rendering, work | 6 Comments »
|