[ Original blog post battleworldrpg.net Mar 17 2013 ]
The biggest reason for re-factoring the rendering engine is to make room for the new shader system.
Today I managed to set up the 3D skybox system once again and added a few render buffers, particularly one that is used for post-processing, a handy technique for mangling your rendered scene so it can look pretty.
One such feature is anti-aliasing and the current popular technique for image enhancement is fast-approximate anti-aliasing, FXAA.
Anti-aliasing is the technique of smoothing out a 3D render so the pixels on the screen are less noticeable, FXAA is a quick approximation of how the smoothing should look like.
Here is Battle World's FXAA (Click open in full size).
On the left is no anti-aliasing, on the right is FXAA. Look at the corners where the darker ceiling meets the lighter walls.
FXAA is often praised for it's performance, it comes with no memory cost (Unless you count the frame-buffer, but I have it there anyway for other special effects) and can be slapped over any scene, in fact, you can see I only lose ~4 frames-per-second in this scene.
I have experienced problems with FXAA on mobile devices, their GPUs are pretty weak compared to the AMD HD 6970 on my primary development machine, so the speed hit is more apparent on devices such as iPhone, iPad and Tegra 3 chips, so far only the Adreno 320 can churn out over 60 frames with FXAA and that is probably due to it's unique chip layout.
So what's going on? In FXAA the processing is offloaded directly onto the GPU's fragment shader and runs across the entire resolution of the buffer, this is quite a lot of work. On a 1920x1080 buffer the GPU has to traverse 2,073,600 fragments (Fragments are essentially pixels), and reading a pixel is the slowest action of this process, FXAA reads pixel data 9 times for each pixel, that's 18,662,400 read operations! Good job PC GPUs are powerful enough to handle this.
Improvements to FXAA would be to somehow offload the calculations to the vertex shader, there are 6 vertices for the frame buffer (2 pairs overlap), much less than over 2 million fragments to traverse, at the moment there is no obvious parts of FXAA's operations that can be offloaded to the vertex shader.
Here's a download of the shader as a function for GLSL, remember to define the precision settings then just drop this function above your screen-shader and then run as "gl_FragColor = vec4( fxaa( INPUT_TEXTURE, TEXTURE_UV_VARYING, INVERSE_SCREEN_RESOLUTION ), 1.0 );"
FXAA GLGL Fragment Shader
The INVERSE_SCREEN_RESOLUTION is simply 1.0 / screenResolution[width/height], it is the rough size of the fragment on the open gl window.
The original source kept these as constants:
float FXAA_SPAN_MAX = 8.0;
float FXAA_REDUCE_MUL = 1.0/8.0;
float FXAA_REDUCE_MIN = (1.0/128.0);
But I highly recommend you equate them and move them directly into the shader source once you've experimented with changing their values.
Just plug those 3 lines at the top of the shader file to put them to use.
The biggest reason for re-factoring the rendering engine is to make room for the new shader system.
Today I managed to set up the 3D skybox system once again and added a few render buffers, particularly one that is used for post-processing, a handy technique for mangling your rendered scene so it can look pretty.
One such feature is anti-aliasing and the current popular technique for image enhancement is fast-approximate anti-aliasing, FXAA.
Anti-aliasing is the technique of smoothing out a 3D render so the pixels on the screen are less noticeable, FXAA is a quick approximation of how the smoothing should look like.
Here is Battle World's FXAA (Click open in full size).
On the left is no anti-aliasing, on the right is FXAA. Look at the corners where the darker ceiling meets the lighter walls.
FXAA is often praised for it's performance, it comes with no memory cost (Unless you count the frame-buffer, but I have it there anyway for other special effects) and can be slapped over any scene, in fact, you can see I only lose ~4 frames-per-second in this scene.
I have experienced problems with FXAA on mobile devices, their GPUs are pretty weak compared to the AMD HD 6970 on my primary development machine, so the speed hit is more apparent on devices such as iPhone, iPad and Tegra 3 chips, so far only the Adreno 320 can churn out over 60 frames with FXAA and that is probably due to it's unique chip layout.
So what's going on? In FXAA the processing is offloaded directly onto the GPU's fragment shader and runs across the entire resolution of the buffer, this is quite a lot of work. On a 1920x1080 buffer the GPU has to traverse 2,073,600 fragments (Fragments are essentially pixels), and reading a pixel is the slowest action of this process, FXAA reads pixel data 9 times for each pixel, that's 18,662,400 read operations! Good job PC GPUs are powerful enough to handle this.
Improvements to FXAA would be to somehow offload the calculations to the vertex shader, there are 6 vertices for the frame buffer (2 pairs overlap), much less than over 2 million fragments to traverse, at the moment there is no obvious parts of FXAA's operations that can be offloaded to the vertex shader.
Here's a download of the shader as a function for GLSL, remember to define the precision settings then just drop this function above your screen-shader and then run as "gl_FragColor = vec4( fxaa( INPUT_TEXTURE, TEXTURE_UV_VARYING, INVERSE_SCREEN_RESOLUTION ), 1.0 );"
FXAA GLGL Fragment Shader
The INVERSE_SCREEN_RESOLUTION is simply 1.0 / screenResolution[width/height], it is the rough size of the fragment on the open gl window.
The original source kept these as constants:
float FXAA_SPAN_MAX = 8.0;
float FXAA_REDUCE_MUL = 1.0/8.0;
float FXAA_REDUCE_MIN = (1.0/128.0);
But I highly recommend you equate them and move them directly into the shader source once you've experimented with changing their values.
Just plug those 3 lines at the top of the shader file to put them to use.