Graphics Pipeline Overview

Those days I have learned Graphics again. For deeper memory,I tried to overview the graphics pipleline in this article. Shame on me that forgetting while reading…also my pooor English.

This article is for the novices that do not have too much graphics knowledge but understand the rough meaning of rasterization/mvp matrics/shader/uv mapping,and it’s better the reader is familiar with Unity.

With the above simplifications in mind, Let’s begin the graphics pipeline.

Fig1. graphics-pipeline-overview (pictures from[1]/[2])

Just look at the middle part of Fig1,the working flow is followed.

Step1.Setting up the scene.
Before we begin rendering,we must set several options that apply to the entire scene. For example,we need to set up the camera, to be more specifically,that means,pick a point of view in the scene from which to render it, and choose where on the screen to render it. We also need to select lighting and fog options, and prepare the depth buffer.

If you have used Unity,then it is easy to understand, you put the camera in the proper place and set the lighting properties,also change the aspect ratio.
Fig2. unity initial interface (from [3])

Step2.Visibility determination.
Once we have a camera in place,we must then decide which objects in the scene are visible. In unity this means that you can tick the box on the Inspector panel to determine the object visible or not.
Fig3. unity inspector

Step3.Setting object-level rendering states.
Each object may have its own rendering options. We must install these options into the rendering context before rendering any primitives associated with the object. The most basic property associated with an object is material that describes the surface propertis of the object.

In unity,the material defines how the surface should be rendered,by including referencse to the texutres it uses,tiling information,color tints and so on. The avaliable options for a material depend on which shader the material is using.
Fig4. unity material

Step4.Geometry generation/delivery.
The geometry is actually submitted to the rendering API.Typically,the data is delivered in the form of triangles;either as individual triangles,or an indexed triangle mesh,triangle strip,or some other form.

If you have heard about 3D Max or Maya,then you can get it.The artists create the model in the form of .obj file, we programmers load the model to the RAM, then we got the triangles data. You can achieve the obj_loader.h. Then you can get the triangles data like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
std::vector<Triangle*> TriangleList;
objl::Loader Loader;
bool loadout = Loader.LoadFile(obj_path);
for(auto mesh:Loader.LoadedMeshes)
{
for(int i=0;i<mesh.Vertices.size();i+=3)
{
Triangle* t = new Triangle();
for(int j=0;j<3;j++)
{
t->setVertex(j,Vector4f(mesh.Vertices[i+j].Position.X,mesh.Vertices[i+j].Position.Y,mesh.Vertices[i+j].Position.Z,1.0));
t->setNormal(j,Vector3f(mesh.Vertices[i+j].Normal.X,mesh.Vertices[i+j].Normal.Y,mesh.Vertices[i+j].Normal.Z));
t->setTexCoord(j,Vector2f(mesh.Vertices[i+j].TextureCoordinate.X, mesh.Vertices[i+j].TextureCoordinate.Y));
}
TriangleList.push_back(t);
}
}
draw(TriangleList);

In unity, this is done by the powerful engine.

Step5.Vertex-level operations.
Once we have the geometry in some triangulated format,a number of various operations are performed at the vertex level. The most important operation is the transformation of vertex positions from modeling space into camera space/clip space.

In unity, this operation is performed by a user-supplied microprogram called vertex shader. Like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct a2v
{
float4 vertex : POSITION;
float3 normal : NORMAL;
float4 tangent : TANGENT;
float4 texcoord : TEXCOORD0;
};
struct v2f
{
float4 pos : SV_POSITION;
float4 uv : TEXCOORD0;
float3 lightDir: TEXCOORD1;
float3 viewDir : TEXCOORD2;
};
v2f vert(a2v v)
{
v2f o;
o.pos = UnityObjectToClipPos(v.vertex); //transform the vertex positions from modeling space into clip space
...

return o;
}

Though Unity has encapsulated the transformation function for us,there exists lots things to write. We all know that the models that artists give us is in the model space,then how to transform them to the world space/camera space(view space)/clip space/screen space. How to deduce the matrixs(mvp). What is the coordinates difference among OpenGL,DirectX and Unity. I will describe those in the Appendix :) Actually,the details have confused me for a long time,if you have the same feeling,don’t worry.Just go ahead.

After we transformed the triangles to the camera space, any portion of a triangle outside the view frustum is removed, by the process known as clipping. Here the mvp matrix have ended. Once we have a clipped polygon in 3D clip space, we then project the vertices of that polygon,mapping them to 2D screen-space coordinates of the output window, here the viewport matrix is used.

Step6.Rasterization.
Once we have a clipped polygon in screen space,it is rasterized. Rasterization refers to the process of selecting which pixels on the screen should be drawn for a particular triangle; interpolating texture coordinates, colors, and lighting values that were computed at the vetex level across the face for each pixel; and passing these down to the next stage for pixel(fragment) shading. The pseudo-code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
...got the TriangleList

for (const auto& t:TriangleList)
{
...mvp

...viewport

rasterize(t);
}
rasterize()
{
...get the triangle bounding box
for(x = x_min; x < x_max+1; x++>
{
for(y = y_min; y < y_max+1; y++)
{
...if the pixel(x,y) is the triangle t
...interpolate the depth buffer/color/normal/texcoords/shadingcoords.
if(depth_buffer < z_buffer[]) means visible
{
...compute the color(texture/lighting...)(pixel shading)
setpixel(x,y,color);
}
}
}
}

Attention: in the code above, why we need the shadingcoords. That’s because, variable x,y is in the screen space, but the shading process should be done in the world space/view space/clip space.

In unity, rasterization is mostly done by the powerful engine, but we can control the process of viewport to adjust the game to different resolution platforms and control the shader part to get more amazing effects.

Step7.Pixel(fragment) shading.
We compute a color for the pixel,a process known as shading. The innocuous phrase “compute a color” is the heart of computer graphics! In unity, we write the fragment shader to compute the pixel colors under different lighting models. code-snippet as follows,from [3].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
fixed4 frag(v2f i) : SV_Target 
{
fixed3 tangentLightDir = normalize(i.lightDir);
fixed3 tangentViewDir = normalize(i.viewDir);
// Get the texel in the normal map
fixed4 packedNormal = tex2D(_BumpMap, i.uv.zw);
fixed3 tangentNormal;

// If the texture is not marked as "Normal map"
//tangentNormal.xy = (packedNormal.xy * 2 - 1) * _BumpScale;
//tangentNormal.z = sqrt(1.0 - saturate(dot(tangentNormal.xy, tangentNormal.xy)));
// Or mark the texture as "Normal map", and use the built-in funciton
tangentNormal = UnpackNormal(packedNormal);
tangentNormal.xy *= _BumpScale;
tangentNormal.z = sqrt(1.0 - saturate(dot(tangentNormal.xy, tangentNormal.xy)));
fixed3 albedo = tex2D(_MainTex, i.uv).rgb * _Color.rgb;
fixed3 ambient = UNITY_LIGHTMODEL_AMBIENT.xyz * albedo;
fixed3 diffuse = _LightColor0.rgb * albedo * max(0, dot(tangentNormal, tangentLightDir));

fixed3 halfDir = normalize(tangentLightDir + tangentViewDir);
fixed3 specular = _LightColor0.rgb * _Specular.rgb * pow(max(0, dot(tangentNormal, halfDir)), _Gloss);
return fixed4(ambient + diffuse + specular, 1.0);
}

From the code we can see that the fragment shader computes the ambient,diffuse,specular,the MainTex & BumpMap texture controls the coefficients value of lighting model formulas.

The lighting model is much more than you see. There are many physical formulas. But they are not hard to understand. You can get the details from the reference books[1][2][3].

Step8.Blending and Output.
Finally! At the bottom of the render pipeline, we have produced a color,opacity, and depth value. The depth value is tested against the depth buffer for per-pixel visibility determination to ensure that an object farther away from the camera doesn’t obscure one closer to the camera. Pixels with an opacity that is too low are rejected, and the output color is then combined with the previous color in the frame buffer in a process known as alpha blending.

SUMMARY
OK! Now the 8 steps have all been listed. You may want to overview the rough processes. The pseudocode summarizes the simplified rendering pipeline outlined above, from[1].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// First , figure how to view the scene
setupTheCamera ( ) ;
// Clear the zbuffer
clearZBuffer ( ) ;
// Setup environmental lighting and fog
setGlobalLightingAndFog ( ) ;
// get a l i s t of objects that are potentially visible
potentiallyVisibleObjectList = highLevelVisibilityDetermination ( scene ) ;
// Render everything we found to be potentially visible
for ( all objects in potentiallyVisibleObjectList )
{
// Perform lower−level VSD using bounding volume test
i f (! object . isBoundingVolumeVisible ( ) ) continue ;
// Fetch or procedurally generate the geometry
triMesh = object . getGeometry ( )
// Clip and render the faces
for ( each triangle in the geometry )
{
// Transform the vertices to clip space , and perform
// vertex−level calculations (run the vertex shader )
clipSpaceTriangle = transformAndLighting ( triangle ) ;
// Clip the triangle to the view volume
clippedTriangle = clipToViewVolume ( clipSpaceTriangle ) ;
i f ( clippedTriangle . isEmpty ( ) ) continue ;
// Project the triangle onto screen space
screenSpaceTriangle = clippedTriangle . projectToScreenSpace ( ) ;
// Is the triangle backfacing ?
i f ( screenSpaceTriangle . isBackFacing ( ) ) continue ;
// Rasterize the triangle
for ( each pixel in the triangle )
{
// Scissor the pixel here ( i f triangle was
// not completely clipped to the frustum )
i f ( pixel is off−screen ) continue ;
// Interpolate color , zbuffer value ,
// and texture mapping coords
// The pixel shader takes interpolated values
// and computes a color and alpha value
color = shadePixel ( ) ;
// Perform zbuffering
i f (! zbufferTest ( ) ) continue ;
// Alpha test to ignore pixels that are ”too
// transparent”
i f (! alphaTest ( ) ) continue ;
// Write to the frame buffer and zbuffer
writePixel ( color , interpolatedZ ) ;
// Move on to the next pixel in this triangle
}
// Move on to the next triangle in this object
}
// Move on to the next potentially visible object
}

I guess you can absolutely understand the pipeline~ If not, don’t worry. You still have time. Now it’s time to seek for details.


Appendix:

Since we referred to Coordinates Transformation in Step5. I guess you may not very clear about the internal matrixs and the workflow. Come on baby! Time to overcome the difficulties!
Fig1. coordinate convertion (from [1])
Model,World,Camera Space,Clip Space

The geometry of an object is initially described in object space,which is a coordinate space local to the object. The information described usually consisits of vertex positions and surface normals.
Object space = Model space = Local space.

Form the model space,the vertices are transformed into world space. The transformation from modeling space to world space is often called model transform. Typically,lighting for the scene is specified in world space,but it doesn’t matter what coordinate space is used to perform the lighting calculations provided that the geometry and the lights can be expressed in the same space. So it is not weird that you see lighting calculations in the world space,or view space,or tangent space,or clip space in unity shader file.

From world space,vertices are transformed into camera sapce. Camera space is a 3D coordinate space in which the origin is at the center of projection,one is axis parallel to the direction the camera is facing(perpendicullar to the projection plane),one axis is the intersection of the top and bottom clip planes,and the other axis is the intersection of the left and right clip planes.
Camera space = View space = Eye space.

Here we should be alert to the difference between left-handed world and right-handed world,as shown in Fig2.
Fig2. coordinate comparision between unity & OpenGL & Dx
In the left-handed world,the most common convention is to point +z in the direction that the camera is facing,with +x and +y pointing “right” and “up”.This is fairly intuitive,as shown in Fig3.The typical right-handed convention is to have -z point in the direction that the camera is facing.
Fig3. camera space in left-handed world

From camera space,vertices are transformed once again into clip space. The matrix that transforms vertices from camera space into clip space is called the clip matrix.
clip space = canonical view volume space.
clip matrix = projection matrix

Here I’m not going to deduce the projection matrix (perspective matrix & orthographic matrx),because it is too complicated for me now. I just list the final formula here and write the deduce process separately next blog. :)
OpenGL use column vectors, Projection_Matrix * View_Matrix * Model_Matrix * Vector

Fig4. perspect projection

$$\begin{bmatrix}
\frac{1}{aspect _ ratio \cdot tan\frac{\theta}{2}} & 0 & 0 & 0 \\
0 & \frac{1}{tan \frac{\theta}{2}} & 0 & 0 \\
0 & 0 & \frac{zFar+zNear}{zNear-zFar}& \frac{2 \cdot zNear \cdot zFar}{zNear-zFar} \\
0 & 0 & -1 & 0 \\
\end{bmatrix} $$

$$ OpenGL-perspective-matrix $$

$$\begin{bmatrix}
\frac{2|n|}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\
0 & \frac{2|n|}{t-b} & \frac{t+b}{t-b} & 0 \\
0 & 0 & \frac{|n|+|f|}{|n|-|f|} & \frac{2|f||n|}{|n|-|f|} \\
0 & 0 & -1 & 0 \\
\end{bmatrix} $$

$$ OpenGL-perspective-matrix $$

$$\begin{bmatrix}
\frac{1}{aspect _ ratio*tan \frac{\theta}{2}} & 0 & 0 & 0 \\
0 & \frac{1}{tan \frac{\theta}{2}} & 0 & 0 \\
0 & 0 & \frac{2}{zNear-zFar} & \frac{zNear+zFar}{zNear-zFar} \\
0 & 0 & 0 & 1 \\
\end{bmatrix} $$

$$ OpenGL-orthographic-matrix $$

so,for the OpenGL conventions, we can tell whether a projection matrix is perspective or orthographic based on the bottom row. Perspective will be
$$\begin{bmatrix} 0 & 0 & -1 & 0 \end{bmatrix}$$
Orthographic will be
$$\begin{bmatrix} 0 & 0 & 0 & 1 \end{bmatrix}$$

Directx use row vectors, Vector * Model_Matrx * View_Matrx * Projection_Matrix
$$\begin{bmatrix}
\frac{2}{w} & 0 & 0 & 0 \\
0 & \frac{2}{h} & 0 & 0 \\
0 & 0 & \frac{1}{zF-zN} & 0 \\
0 & 0 & \frac{zn}{zN-zF} & 1 \\
\end{bmatrix} $$

$$ Dx-orthographic-matrix $$

$$\begin{bmatrix}
\frac{1}{aspect _ ratio*tan \frac{\theta}{2}} & 0 & 0 & 0 \\
0 & \frac{1}{tan \frac{\theta}{2}} & 0 & 0 \\
0 & 0 & \frac{zF}{zF-zN} & 1 \\
0 & 0 & \frac{zN \cdot zF}{zN-zF} & 1 \\
\end{bmatrix} $$

$$ Dx-perspective-matrix $$


References:
[1]3D Math Primer for Graphics and Game Development 2nd Edition.
[2]Fundamentals of Computer Graphics and 3rd Edition.
[3]Unity+Shader入门精要
[4]Unity3d Mannual
[5]GAMES


Your Majesty, coding and blogging is hard, but a cup of tea/coffee would be the best reward. Scan the QRcode below, I’ll swear my allegiance to you. :) memeda.

Author

Keneyr

Posted on

2020-03-10

Updated on

2021-08-28

Licensed under

Comments

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×