Dynamic volumetric lighting and shadowing with temporal upsampling
Being able to rendering realistic surface in real time is a very challenging area. During the last year there has been some awesome progress and result are already visilbe in latest game productions . But having this alone is not good enough: you would fall in the problematic case of having a too perfect and clean rendering look that will in the end make the final result appear synthetic.
That is one of the many reason to also have volumetric lighting and shadowing effects rendered, i.e. the result of light scattering when traveling in the air because of particles supsended in it, forming what is called: a participating media. I won't go into the details of how this phenomenon is happening and why as this is not the purpose of this article. If you are interested, please refer to the PBRT book  or this good online dissertation chapter. Or maybe I'll writte an article with details and code if there is a great demand :)
Then let's focus on this demo: rendering lighting, scattering and shadowing effect of dynamic lights in real time. Basically, the work of  but with more details on temporal upsampling. I will present the result for a single point light but any number of them could be accumulated. Also please note that, in this demo, I do a fullscreen pass per light for convenience when integrating the scattering light. But the light geometry could be rasterised instead, or you could even used a tiled approach.
NB: note that I won't give performance details as all this has been done on a MAC having an HD3000 which does not have GPU performance measure tool. I am against using FPS for performance measurment but I will use that here instead while I port the demo to a more interesting environment.
In the context of real-time games, volumetric effects have often been approximated using simple directional billboard or particles. There have been a few games featuring very good volumetric effects from dynamic lights such as Alan Wake using shadow maps or LBP2 using a world space volume.
More recently, this effect have become more advanced. Many lights can now be rasterised , the volumetric look can now look more complex thanks to the use of particles insertion , and particles can even have scattered light applied on them to make sure they fit into the environment . Last but not the least, high quality light shaft from the sun light can also be be achieved , without relying on screen space approximations.
There are three possibilities to develop dynamic light scattering simulation: splatting , volume filling  or a mix of both . This demo focus on splatting.
Rendering Volumetric Lighting and Shadowing
Here is an exemple of what we want to achieve:
This scene feature a dynamic point light lighting the environement and generating light shafts due to the presence of a participating media and occluders. This high quality smooth scattering has been achieved using 200 rays to be sure no under-sampling artefacts were visible. However, this is very expenssive, especially at full resolution.
There are two techniques we can use to make that faster:
Casting less ray per pixel
Rendering at half resolution for a 4x theoretical boost
Using Noise and Bilateral Blur
This is what happens when only 10 rays are cast per pixel:
In this case, the binary shadow boundaries reveal the low frequency nature of our sampling: this is call signal under-sampling. This is more visible due to the fact that shadow are a ON/OFF effect and not a smooth transition. Even with smooth transition, a too low amount of sample will never be able to reconstruct a signal perfectly (depending on its frequency).
What can be done to avoid that? We can exploite the relative coherency in screen space of the sampled signal. The goal will be to offset the rays randomly per pixel in order to produce noise. Then near by rays in screen space will have sampled different parts of the signal we want to reconstruct. A final gather pass will then be used to merge the near by result in screen space to recunstruct a higher quality signal . Here is the result:
Now that we have this picture with close by ray having information we want to merge. How do the merge this information? Typically, a blur would do it. It should merge nearby sampled signal. Here is the result:
You can see that the signal is not reconstructed as you would expect at the level of object silhouettes. One of the multiple reasons for that is that rays contain information that have not been sampled in the same range of distance and thus these data and not coherent and should not be merged (different distance range ==> different occlusion range ==> incoherent rays that should not be merged). To solve that problem, we must use a bilateral up-sampling kernel relying on the image depth information. Great we are going to need one because we want to render at half resolution and and we will need to up sample: this win-win situation is explained in the next section.
Half Resolution Rendering and Bilateral Up-Sampling
Rendering at half resolution requires the following steps:
Downsample the original depth to half resolution using a min of max filter (I use max but it can leave holes so min would be better with lots of high-frequency occluders such as trees and such)
Render the scattering contribution at half resolution and using half resolution depth (accumulate every lights contributions)
Up sample the buffer
Here is the result:
You can now see that the sharp objects silhouette are maintained while all noise from the picture is removed. One donwside is that sharp volumetric shadowing features are lost from the picture (such as the very strong shadow edge coming from the teapot handle).
Getting the noise and bilateral blur kernel right
I have decided to go with a very specific setup to gather all coherent rays in screen space:
A bilateral up sampling kernel using 1 center tap and 8 side taps, 4 on each side of the center pixel
A bilateral filter is not separable but I am still using it in a seprable fashion: it works and it is good enough
I am using uniform weights because we want to gather all the coherent rays, as many as possible. There is not justification in using a gaussian blur filter here even though it could be used. In my few experiment, quality was lower because far from the center taps were having less weight and as such less coherent rays were gathered to reconstruct the final scattered light values. this was leading to the noise pattern becoming visible (but this would be fixed with a wider blur kernel of course).
Since the gather is 9 tap large, I use a 4x4 noise texture (at least twice the number of samples for a given frequency to be sure not visual pattern are visible): This is enough to gather oll the different version of coherent rays
To quality/performance could be setup as a function of this filter kernel width. Also, you will need to play with the number of steps done for each ray.
Also, in this demo I have used a simple random noise. I still have to try other types of noise such as the one presented in .
Up the Quality: Temporal Up-Sampling
So we now get very good quality light scattering from dynamic lights! But there are some corner cases to be aware of:
Dynamic lights can generate flikering shadows when moving. This is especially noticeable when using low resolution shadow maps
When sampling a signal, you need at least twice the number of samples as compared to its frequency. In our case, high frequency geometry can thus results in bad signal reconstruction due to high frequency shadow in the view ray space.
Ths following screenshot proves that high frequency shadow map really mess up with the scattered light estimation. The left side is fine but, on the right, the high frequency tall and thin spots make it hard to evaluate the correct scattering contribution with as few as 10 samples per ray.
Even when applying the bilateral blur algorithm, you can still see artifact:
On obvious way to solve that is too up the number of samples but this will linearly scale the cost up. But a smarter way is to use temporal up-sampling similarly to anti-aliasing techniques (e.g.  or SMAA). It is quickly presented in  for thid context but no details are given. One could think of using a simple blend with the previous frame but as you can see one the next screenshot, this is very ugly when the camera is moving:
The corrent temporal upsampling result is visible in the following screenshot! It definitly looks smoother and the temporal stability of moving lights is also greatly increased. (see video further down)
Temporal Acceptance Factor
Temporal upsampling effectively works by reusing the result from the previous frame in order to accumulate even more coherent samples. You basically have to reproject and combine your current scattering estimation with the one from the previous frame. However, when the camera is moving, some re-projected values will be inccorect due to view space occluders reveiling new scene parts. this is shown in the next picture:
The red areas are marked as containing invalid scattering data for the current frame (as compared to the previous frame buffer data). So in this case the previous frame data cannot be re-used: only the newly evaluated scattered light will be used. The more green it gets, the more the acceptance factor is high and in this case, results from the previous frame can be used. Here are some details on how this is evaluated:
P is current pixel position in world space ( use current clip position and depth, inverse projection matrix and inverse camera transform)
P is projected in the clip space of previous frame camera clip space P'.
D-Previous frame depth is sample using P' and used to reconstruct the previous camera space world space position P''.
Both P and P'' distance are compared to compute an acceptance factor. I tried an boolean selection but I was not happy with the binary results I was getting and it was preventing smooth transition (more about this futher down). In the end I am using exp(-length(P-P'')*scale) as my acceptance value A.
A is thus not a boolean but it is "progressively" rejecting samples based on their world space distance. I have found that in the case of volumetric, it was giving a smoother result when moving. Maybe a personal preference in the case of volumetric light and shadow.
The previous scattered light value will then be combined with the new one from 0 (red) to W (green). In my case I set the W value to be 0.2.
This is great but if the camera stand still, the quality is never going to improve. My way of dealing with this is to randomly translate the noise pattern applied per pixel. Thus each frame will have a different noise pattern and the sample will accumulate and blend to finally form a smooth picture.
I tried to only use a linearised depth buffer data and depth comparison: this is in fact a bad choice. In this case a pixels having a constant depth is a plane in front of the camera (not a sphere). And this causes far pixel at the edges of the screen will be more rejected thant the one at the center of the screen.
I have found that combining results post up-sampling blur was giving a better result. You could say this is post-bilateral-up-sampling-temporal-up-sampling :)
Pixel on the ground plane are a big problem when far away: between frames, their depth can vary a lot for each camera movement, even small, due to rasterisation inprecision/aliasing. I have found that the continuous exponential acceptance factor was helping a lot in this case but I am cursious was else could be done (appart from msaa depth). This problem is illustrated in the next picture, far away pixels are not accepted for some camera movement because the difference in depth or reconstructed position is too large for the same pixel due to rasterisation imprecision (one way to solve that issue is to increase the depth/distance acceptance threshold but this could then result in too many reprojection artefact and trail behind moving objects):
Of course this temporal upsampling approach help us pushing the boundaries only until another higher frequency shadow map beats us. But at least we have a good head start now. :)
Number of Samples
A summary of how many samples are accumulated to reconstruct the scattered light contribution:
Simple scattering buffer: 10 samples per pixel
Bilateral upsampling: 1 center+8 side taps horizontal and vertical: 9*9*10= 810 samples per pixel at maximum
Temporal up sampling: for W=0.2, let's assume we accumulat only 1/0.2=5 frames (even though this is not corect as this is an exponential accumulation so it is a potentially infinite accumulation). That would top up the number of sample per pixel to 4050!
Even though more important details should be handled such as transparent and animated object, this shows that there is no reason to not have such effect in games choosing to go with complex visuals.
Here is a list of things that could be added in this demo. I did not spend time on that as it was not the point of this demo:
Better scattering with phase function and scattering coefficient (wavelength dependent)
Take int accout moving object using a motion buffer
Same type of acceptance based on color as in 
Better moving average using a sample count (instead of the per frame contribution factor used here)
You can add some noise to make the participating media look complex:
A visually interesting bug I had in glsl; uninitialised out value from a glsl function:
Demo and source code
Ths source code is provided as is with a xcode project compiling on MAC and running at least on an Intel HD3000. I did not spend time cleaning it so there also is all my test code inside. Tell me if you find any mistake please! Thanks.
Let me know if you improve the code, I could integrate :)