Histogram equalization is an algorithm used to increase picture contrast. It is used in several vision methods to allow a better detection of image features.
This technique can take a lot of time to compute on the CPU so it is valuable to implement it on the GPU. The GPGPU algorithm I have propose here is inspired of some existing work I know. I have implemented two histogram equalization technics : a uniform histogram equalization and an Adaptative Histogram Equalization (AHE).
Uniform Histogram Equalization First, to generate the picture histogram, I use the scattering technique describe in the ATI paper [1]. Then, I compute the cumulated version of this histogram by using the Summed Area Table algorithm describe in [2] on only one axis. Thus, the complexity of the cumulation algorithm is Log(N) where N is the number of bins in the histogram. Finally, the cumulated histogram is used as a 1D function which for an intensity in the original picture maps the final intensity in the final picture.
Since the picture is composed of many pixels, we need an appropriate internal format for the histogram texture to avoid overflow and accuracy errors. [2] discuss this point very well. Finally, due to the number of pixel, I chose to use 32bits floating number (After testing, 16bit floating point numbers were subjects to accuracy error).
Adaptive Histogram Equalization AHE algorithm divide the picture in several block. For each of these blocks I compute an histogram using the scattering process [1]. All histograms of all blocks are computed in a single pass. These histograms are stored horizontally as a list in a 2D texture. Since in my case I use only 16*16 pixel blocks (256 pixels), I can use a 8bit internal format for the histograms texture. Finally, I compute the cumulated histograms by applying the same algorithm as [2] on only one axis and for the whole histograms texture. Thus all cumulated histograms are computed in parallel.
The final picture could have been computed by applying the histogram equalization algorithm per block. However, this would results in visible seams between each block. To avoid this problem and get a perfect AHE algorithm, I bilinearly interpolate the histogram of the four neighbouring block (considering their center) to get a equalized histogram per pixel (in fact, I only interpolate the four mapped values).
Performance Concerning the performance, they are really good but only for G80 architecture (considering nVidia CG only). On older hardware, the vertex texture fetch operation needed during the scattering process cost too much (could be improved using R2VB instead [1]) and 32 bit foating point internal format are not hardware accelerated.
Download source and executable
References :


Topleft : original picture Topright : uniform histogram equalization (256 bins) Bottomleft : picture histogram Bottomright : cumulated histogram

Same as on the left but the histogram is composed of 32 bins. It results in color reduction.



Topleft : original picture (128x128) Topright : Adaptive Histogram Equalization without bilinear filtering of blocks histogram Bottombottom : horizontal list of blocks histogram Bottomup : horizontal list of cumulated blocks histogram

Same as on the left but blocks histogram are now bilinear filtered



Same as above with another source picture (256x256).

I hope that you like this GPGPU algorithm! If you are interested, you can improve my demo :
 (easy) Currently, the AHE only work for square pictures and block. Change the algorithm to accept rectangular picture and block.
 (normal) Improve the AHE to a CLAHE (Contrast Limited Adaptative Histogram Equalization) by adding a clip and redistibute operation on the histogram as well as a Rayleigh distribution instead of a uniform one. This have to be done before the cumulated histogram is computed.
REM: a friend of mine has already implemented this two points. ;)
