At the 2004 Game Developer Conference Matt Pritchard (my coworker at the time at the now closed Ensemble Studios), John Brooks (CTO of Blue Shift Inc. - PS2/PS3 graphics guru) and I came out of the cold and gave a fairly rushed, but groundbreaking 1 hour presentation on real-time deferred rendering techniques to a surprisingly large audience of approximately 300-400 people. At the time, deferred shading was a pretty much unknown real-time rendering technique, and whenever I brought the topic up with developers or video card vendors I was typically treated like I had three heads. (I'm a bit hardheaded, and a very independent sort of person, so I didn't really care much. I knew the idea worked because I had already shipped a fully deferred Xbox game by that time.) I had no idea what the response would be to this presentation, or how much deferred rendering would eventually catch on in the industry. After the talk I went back to Ensemble Studios and continued plugging away on the graphics engine and tools for one of Ensemble's early prototype games named "Wrench", so I didn't put this presentation on the web or do anything special to make it easy to find. I didn't really realize that the work we where doing was pioneering new ground. The main alternative (the pass per object per light approach used on Doom 3) seemed laughably inefficient by comparison. By this time, Atman Binstock (who moved on to Rad Game Tools) and I had already shipped a fully deferred shaded 3rd person game way back in 2001 on Xbox 1 named Shrek while working at Sandbox Studios/Digital Illusions CE AB in London, ON. It was an Xbox launch title with very pretty graphics but little else. We didn't realize it, but Shrek Xbox was the first deferred shaded game. A while after shipping Shrek, and before this talk, I had already spent almost a year contracting for Microsoft's ATG (Advanced Technology Group - the team behind Xbox) researching deferred rendering on ATI's early prototype shader model 2.0 hardware, so I was pretty confident the approach had legs. Rewind to A Bit of Deferred Shading History, Way Back in 2001 If it wasn't for Atman I wouldn't have started down this path in the first place. Atman originally suggested that we try deferred shading, and without his very deep math skills there's no way I could have gotten omnidirectional lights to work efficiently on the Xbox 1's powerful (for the time) but constraining GPU. At the time Atman suggested it, I already had a bunch of real-time rendering experience after writing the software and Direct3D 7 renderers and 3DS Max exporters for an annoying and quirky little 3D PC kids game named Matchbox Emergency Patrol. We figured the Xbox had these new amazing programmable shader units, so how hard could it be?I had read every NVidia white paper/presentation/etc. I could get my hands on about per-pixel lighting and normal mapping, especially every one written by Mark Kilgard at NVidia. Rendering the various attribute buffers (what we called the "C", "N", etc. buffers) and computing a deferred directional light was easy, but computing proper and efficient deferred omni lights on Xbox 1 was extremely challenging. I had a deferred omni light solution which used the NV2A's vertex pipeline to process pixels, but at ~20 megaverts/sec. (at best) it was just too slow. Early on I figured out how to alias the NV2A's depth/stencil surface to a 32bpp texture, so that the 24 bits of depth and 8 bits of stencil data could be read as regular RGBA color channels and processed in the texture or combiner units. The NV2A could fetch from a texture, perform three 3D or 4D dot products (at floating point precision as far as we could tell - not fixed point!) against the fetched/filtered results using interpolated texture coordinates, then look up the transformed results into another cubemap or 3D texture. (This corresponded to the texm3x3tex, etc. "instructions" in the Xbox's shader assembler.) The results could then be further processed in the powerful fixed point combiner units, where you could rotate the vector again, approximately normalize it, compute stuff like N.L or N.H, etc. Atman figured out what vectors we needed to interpolate in order to properly unproject the depth values from the Z buffer into viewspace coordinates, then from there effectively transform the viewspace coords into normalized light space. Once we where in normalized light space we could then use the vector as texcoords to fetch from a precomputed 3D texture. This 3D texture lookup returned a (mostly) normalized vector in RGB and the omni light's attenuation value in A. Once Atman figured out the equations I stayed up over 24 hours straight trying to coax the NV2A into the right texture+combiner setup to implement them. We both doubted it was actually possible (due to worries about precision and whether we could actually get a "1.0" into the dot products) but I kept at it. There was nothing like PIX available at the time so I just had to keep running experiments and studying the output bits. At the time the only references I had where some NVidia papers and a Xbox Direct3D header with cryptic, undocumented combiner structs and enums. We couldn't use MS's early pixel "shader" assembler because it didn't support the obscure but crucial texture mode that we needed ("HILO_1") in order to get a "1.0" into the dot products. (Without the 1 we couldn't add in the translation column of the interpolated matrices.) I had to roll the whole combiner setup thing by hand. I first got the method working on a single axis, then visually verified the results in a single test room with a huge omni. Once I got one axis of the omni working and stable the other two where trivial. Shrek's omni lights required two separate screenspace passes because I couldn't fit the entire diffuse/spec lighting equation into a single combiner setup. I spent a lot of time working on minimizing the number of pixels that needed to be processed for each omni in the scene. Like most devs who start down the deferred shading path, I started with full-screen quads to get things working, but of course that was brutally slow. The final shipping game rendered 2D n-gons at constant Z depths that where guaranteed to conservatively surround each omni light's screenspace projection. (I didn't render 3D tessellated spheres or whatever, these 2D polygons where computed completely on the CPU.) We enabled Z-testing to discard pixels that couldn't possibly have a non-zero attenuation value. (Or for omni pixels that where behind walls and obviously wouldn't impact the scene - I don't remember anymore.) From memory, the effective omni fillrate was around 65 megapixels/sec. on Xbox 1 which felt kinda slow and limiting at 640x480, even at 30 Hz. To get all this technology into a shippable state, I wrote a custom real-time timeline profiler using the Xbox 1's asynchronous DPC (Deferred Procedure Call) GPU callback mechanism to measure when, and for how long, the CPU and GPU processed each major operation. I would take a RDTSC snapshot when the events started/stopped on the CPU, and do RDTSC's in the DPC callbacks. The latency of these callbacks was surprisingly low. Instead of the usual PIX-like horizontal timeline view, my CPU/GPU timeline profiler displayed each event in two vertical columns, with lines used to connect the CPU and GPU events so I could easily visualize the amount of parallelism between the two processors. This profiler lived completely within the game itself so we had to use the Xbox1's controller to zoom, pan, etc. I used this profiler to tune the shaders and rearrange the frame's rendering order to ensure the GPU was busy as much as possible. Fast Forward Back to the GDC 2004 TalkAnyhow, a few years later I wrote and gave a GDC talk based off this and my later Xbox 1 deferred shading work. In this talk I covered such topics as: the differences between deferred shading verses deferred lighting (apparently rediscovered and called "Light Pre-Pass Rendering"), attribute compression, "G-Buffer" channels (called the "C Buffer" and "N Buffer" in the talk), depth buffer aliasing, normal compression/packing, and other related topics. The GDC Vault seems to be in flux and the old URL to the presentation doesn't work anymore, so I've placed a copy here: I showed a neat Xbox 1 deferred lighting demo named "Gladiator" at the talk, which is referred to several times in the presentation. This demo showed what was probably the first and only single scene pass attribute compressing deferred shading renderer created on Xbox 1. (To put this in perspective, Xbox 1 did not support multiple render targets, and only had a series of complex fixed point combiners, not true pixel shaders as commonly understood today. Shrek had to use two full scene passes to output the same g-buffer data that the Gladiator demo could do in a single scene pass.) Here's the Xbox 1 attribute packing pixel shader used by Gladiator missing from the slides. John Brooks at Blue Shift inspired me to try attribute compression, because he thought it was very inefficient and limiting that Shrek had to render the scene twice per frame. At this talk he demonstrated a PS2 deferred shading demo that used the PS2's VU (Vector Units) to compute per-pixel lighting. I've placed a 5mbps WMV encoded video of the demo here (warning: this file is ~100MB): Here's the video on Youtube: The interactive demo was actually shown running in real-time on an Xbox 1 devkit on the GDC floor in early 2003 at Blue Shift's booth. It was then recycled for my GDC 2004 talk. Shrek, this presentation and the Gladiator deferred lighting demo where later cited by several articles and papers on deferred rendering: GPU Gems Chapter 9, Deferred Shading in S.T.A.L.K.E.R. "A bit more deferred – CryEngine3" "Interactive Massive Lighting for Virtual 3D City Models" Real Time Rendering, "Deferred lighting approaches" "Hybrid Deferred Rendering", Marries van de Hoef |