Compact Normal Storage for small g-buffers
I’ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture.
Here are my findings – with error visualization and shader performance numbers for some GPUs.
If you know any other method to encode/store normals in a compact way, please let me know!
You could also try Crytek’s method – “A bit more deferred – CryEngine3″, http://www.crytek.com/technology/presentations/ as variation on the sphere map. From the slides:
Encode:
G=normalize(N.xy)*sqrt(N.z*0.5 + 0.5)
Decode:
N.z=length2(G.xy)*2 – 1
N.xy=normalize(G.xy)*sqrt(1-N.z*N.z)
Oh, it should be noted that length2 is presumably the squared length (or dot(N.xy, N.xy))
Assuming that’s the case, I think the decode can be simplified to:
L = dot(G.xy, G.xy);
N.z=L*2 – 1;
N.xy=G.xy*sqrt(4 – 4 * L);
Since encoding normals means representing the sphere in 2D, it’s probably worth researching the various cartographic projections developed for mapping the Earth. http://en.wikipedia.org/wiki/Category:Cartographic_projections
I seem to remember using some variant of the stereographic projection ( http://en.wikipedia.org/wiki/Stereographic_projection ) as a good texture mapping for a hemisphere. The actual stereographic projection is not very good at preserving area, so I don’t remember exactly what I did. Possibly just move the plane away from the sphere a little, which probably has a name, but I can’t find it now.
Presumably you want either a mapping that preserves equal-area or one that preserves equal-distance (we don’t care about distortion metrics, just local distance minimization, basically). For example, consider the decode math for this one: http://en.wikipedia.org/wiki/Lambert_azimuthal_equal-area_projection
(look at the formula where (x,y,z) is defined in terms of (X,Y).)
Lambert Azimuthal looks like it would work fantastic for storing modified view space, normals. Cheap encode/decode as well.
@sean, @Pat, @Steve: yeah, I’m experimenting with various cartographic projections as well as CryEngine3 method right now. So far Lambert Azimuthal does not seem to be the best (has quite high error), but there are tons of other ones.
Stay tuned, I’ll update the article once I have the shader code, pictures and GPU performance numbers! (that said, probably won’t happen in the next few days, I’m at Assembly demoparty right now)
You might want to be careful about showing the error-per-channel colorized like that, since the color channels aren’t perceptually equivalent, so an encoding with worst case error in B might look less bad than an encoding with less overall worst case error but that shows primarily in G.
I looked up what I did for texturing the sphere, and it turns out that the general version of the stereographic projection is great for encoding but not so great for decoding (for texturing I only needed encoding). The idea is that that you can achieve the gnomonic projection, the streographic projection, and the orthogonal projection of the far hemisphere by just smoothly moving the projection point out from the center (gnomonic) to the surface (stereographic) to infinity (orthographic). Since the problem with gnomonic/stereographic is that the edges are stretched, and the problem with orthographic is that the edges are squished, some point somewhere in between ought to be a good compromise. One page I found mentioned people making maps with the focal point at 1.5 or 1+sqrt(2) away.
Anyway, the upshot of this is that the math looks like this, for some k representing the focal point distance (k=0 is gnomonic, k=1 is stereographic, k=infinity is orthographic, should choose k to minimize error):
encoding: s = x/(z+k), t = y/(z+k)
decoding (I think, have to solve a quadratic, sadly):
A = 1+s*s+t*t
q = (k + sqrt(k^2 + (1-k^2)*A))/A
x=q*s, y=q*t, z=q-k
Note that the quadratic to be solved has special cases for k=0 and k=infinity (k=inifity is your ‘store x&y’). At k=1 the above simplifies the sqrt() away, but sadly k=1 is a lousy projection for this purpose. I assume it will be too much math for k=1.5 or k=2 or k=1+sqrt(2) or whatever turns out to be best.
Also note that s&t should be rescaled to preserve as much precision as possible, the amount to rescale by depends on k. (At a minimum, you can scale up by k, but actually you can do a tad more, but you have to do some solving to find the limit. Scaling up by k will actually slightly simplify the decode math.)
@sean: yeah, just colored error images are not a scientific metric by any means. I’ll add MSE and PSNR at least, just so “there are some numbers”. Still, it’s hard to tell which encoding is better by just looking at PSNR or other metric. Maybe one really does not care about view space normals that point away from the viewer, in which case just storing x&y is okay.
Yeah, I meant more just that if you want to use an image, just show the euclidean error (or dot product) as luminance or something.
On the number front, maximum angular error might be interesting.
On the “what’s visible” front, you might try just computing diffuse lighting for a few different camera positions and do error metrics on that, but I think it amounts to basically the same thing (it’s not far from just comparing the dot product of the desired and decoded normals). Specular will obviously show larger errors, but I don’t see any good way to illustrate that effect in an unbiased way.
The other thing is that I’m not sure errors per se are objectionable; probably what’s objectionable are banding artifacts etc. So you may not care so much what the deviation from the correct normal is as much care about the worst case error between adjacent representable values.
I.e., for each representation, you have 64K unique normals. The worst discontinuity occurs when two source normals close to each other encode to two maximally-deviating normals. So you need to do something like: compute the voronoi space of the 64K normals, and then report the maximum distance from a voronoi center to voronoi edge. Equivalently, from the space of all possible normals, find the normal for which the nearest encoded and second-nearest encoded normal are equidistant and maximally far–that distance is a measure of the non-smoothness of the encoding.
@Pat: whoops, turns out Lambert Azimuthal is actually very good. In my earlier test I had Z coordinate negated, so it had the errors accumulated towards the viewer, as opposed to away from him. I updated article with Cry Engine 3 and Lambert Azimuthal methods.
@sean: stereographic-like projection with k=1.4 seems to produce quite low error (44.276 dB in my setup). Decoding math is:
fenc = enc*2-1;
float k = 1.4;
float q = dot(fenc,fenc);
n.z = (-q*k + sqrt(q+1-q*k*k)) / (q+1);
n.xy = fenc * (n.z + k);
However, that is quite expensive to decode (one rsq, two rcp, bunch of other ALU instructions). Spheremap and Lambert have similar error for quite cheaper cost.
For any transform that maps the sphere to the circle, I would consider using a polar or concentric map in order to map the circle to the quad and use the entire range of your XY coordinates. That might add too much cost for little benefit, though.
I think it’s also important to realize that not all the normals in the sphere are possible. Depending on the FOV, there’s a spherical cone of normals that will never be used. For orthogonal view projections, you can use an hemispherical map to encode the normals. For an 90 degree frustum, you have a ~60 degree cone.
It should be easy to take that into account in some of the projections, for example, in the stereographic projection it simply changes the scale factor that you use to fit the range of the projection in the unit square.
Assuming your cone has 60 degrees, then for k=1 the scale factor would be:
scale = cos(-60) / (1 + sin(-60)) = ~3.73
And the reconstruction would then be:
n.xy *= scale;
float denom = 2 / (1 + n.x*n.x + n.y*n.y);
n.xy = n.xy * denom;
n.z = denom – 1;
Aras, the proper term for my “distorted spheremap (push sphere towards square)” suggestion is a Peirce quincuncial projection, http://en.wikipedia.org/wiki/Quincuncial_map
[...] This post was Twitted by TwosComplement [...]
[...] likely already visited Lost in the Triangles. But, just in case, I want to point out an awesomely useful investigation going on over there on evaluating different strategies for compact normal storage as typically [...]
That was an interesting read. I was definitely surprised to see just how expensive spherical normals actually are. In my own project, I had been considering using them to make a compressed vertex format, similar to ATI’s Froblins demo. This notion was mostly from being told I needed to keep the data 32-byte aligned for best performance. Eventually, it became clear that this was more trouble than it was worth. :p
@Josh: the “expensive” bit is of course dependent on the context. Maybe you have some spare computation cycles to burn, and are limited by memory bandwidth. In that case it is worth compressing the data, even if it does feel “expensive”.
http://developer.amd.com/gpu_assets/01GDC09AD3DDStalkerClearSky210309.ppt
by S.T.A.L.K.E.R
Method #2: store X&Y&sign of Z, reconstruct Z
float2 pack_normal( float3 norm )
{
float2 res;
res = 0.5*( norm.xy + float2( 1, 1 ) ) ;
res.x *= ( norm.z<0 ? -1.0 : 1.0 );
return res;
}
float3 unpack_normal(float2 norm)
{
float3 res;
res.xy=( 2.0 * abs(norm) ) – float2(1,1);
res.z =(norm.x < 0? -1.0:1.0) *
sqrt(abs(1 – res.x*res.x – res.y*res.y));
return res;
}
Found in the Wolfgang Engel’s blog there is a method for storing normals, but using partial derivatives. It is the storing method used by Insomnia Games (Ratchet&Clank, Resistance)
To encode:
dx = (-nx/nz);
dy = (-ny/nz);
To decode:
nx = -dx;
ny = -dy;
nz = 1;
normalize(n);
http://www.insomniacgames.com/tech/articles/1108/files/Ratchet_and_Clank_WWS_Debrief_Feb_08.pdf
Blog entry: http://diaryofagraphicsprogrammer.blogspot.com/2009/01/partial-derivative-normal-maps.html
There are interesting comments on the matter:
1. How do you get for example (1, 0, 0)? You can’t.
2. Partial derivatives can be used for quick normal calculation for height fields.
3. (Most interesting) Combining normal maps (example a base normal map and a detail normal map) is simplified
to a sum of each of the normals partial derivatives and THEN the normalization.
Quite a good compilation/research going on here!! Thanks a lot Mr. Aras.
@Alejandro: partial derivative normal maps are for encoding tangent space normal map textures. For general encoding of G-buffer normals, they should be about the same as Method 1 in my article (store X&Y, reconstruct Z).
Oh, my bad! you’re right. Thanks for clarifying.
Why the use of negations in partial derivative normal maps both in and out ? Doesn’t it cancel out ?
Hi Aras,
I implemented the three highlighted green methods, and I ran into a little issue that I hope you would be able to clear up.
I assumed the “n” is the transformed normal (final normal used for deferred lighting).
In all three implementations, only ONE inside wall of a room is weird. Everything else is perfect. Lights still work, textures look great on floor, top, and other 3 walls.
The weirdness is black artifacts. Specifically for Stereographic projection, when I set scale to 5 or higher, the black stuff disappears.
Thank you for your time,
Phi