【Real Time Rendering翻译】Chapter2 图形渲染管 …
本章介绍了实时渲染的核心内容:图形渲染管线,简称为管线。管线的核心功能使用虚拟摄像机,三维对象,灯光等来生成,或者说是渲染一张二维图像。因此渲染管线就是实时渲染的潜在工具。图2.1描绘了使用管线的过程。图像中对象的位置和形状由他们自身的几何信息,环境特征,摄像机位置来决定。对象外观受材质属性,光源,贴图(物体表面的图像),着色等式影响。This chapter presents the core component of real-time graphics, namely the graphics rendering pipeline, also known simply as “the pipeline.” The main function of the pipeline is to generate, or render, a two-dimensional image, given a virtual camera, three-dimensional objects, light sources, and more. The rendering pipeline is thus the underlying tool for real-time rendering. The process of using the pipeline is depicted in Figure 2.1. The locations and shapes of the objects in the image are determined by their geometry, the characteristics of the environment, and the placement of the camera in that environment. The appearance of the objects is affected by material properties, light sources, textures (images applied to surfaces), and shading equations.
图2.1 左图中,虚拟摄像机位于金字塔顶端(也就是四条线的终点),但只有在视锥体范围(view volume)内的图元会被渲染。对于透视视角渲染的图像来说,视锥体的形状是一个截头锥体(长方形底的,头部截断的金字塔形状)。右图是摄像机视角呈现的图像,我们可以注意到,左图红色甜甜圈形状的对象没有在右图中被渲染出来,因为他在视锥体外部。同时,左图的蓝色扭曲角柱被截头锥体的上平面截掉了一部分。
Figure 2.1. In the left image, a virtual camera is located at the tip of the pyramid (where four lines converge). Only the primitives inside the view volume are rendered. For an image that is rendered in perspective (as is the case here), the view volume is a frustum (plural: frusta), i.e., a truncated pyramid with a rectangular base. The right image shows what the camera “sees.” Note that the red donut shape in the left image is not in the rendering to the right because it is located outside the view frustum. Also, the twisted blue prism in the left image is clipped against the top plane of the frustum.下文我们会介绍渲染管线的不同阶段,侧重于讲解功能,而不是实现。关于管线各个阶段实现的相关细节会在以后的章节中详细介绍。
We will explain the different stages of the rendering pipeline, with a focus on function rather than implementation. Relevant details for applying these stages will be covered in later chapters2.1 结构 Architecture
现实世界中,管线的概念以许多不同的形式表现出来,从工厂装配线到快餐厨房,同时,他也被应用在图形渲染中。管线由数个阶段组成,每个阶段执行更大目标其中的一部分。
管线的每个阶段并行执行,每个阶段依赖于上一阶段的结果。理想情况下,将一个没有管线化的系统分成n个管线阶段,可以让系统的速度提升为原来的n倍。提升系统运行速度是使用管线的主要原因。举个例子,一群人可以快速的制作大量三明治,分别准备面包,夹肉,放顶层面包。每个人将自己做好的部分传递给管线中的下一个人,并立即继续执行下一部分。如果每个人每次完成任务的时间是20秒,那么三明治生产的最大速率是20秒每个,3个一分钟,这是有可能实现的。管线是并行的,但是他会等待最慢的一环结束。比如三明治生产的过程中,夹肉的阶段变得更加复杂,需要30秒,那么现在三明治生产的最高速率是2个/分钟。对于这个管线来说,夹肉的阶段就是瓶颈所在,因为他决定了整个生产线的速度。最上层的阶段在等待夹肉阶段完成的这段时间内陷入“饥饿”(消费者也是)。
In the physical world, the pipeline concept manifests itself in many different forms, from factory assembly lines to fast food kitchens. It also applies to graphics rendering. A pipeline consists of several stages , each of which performs part of a larger task. The pipeline stages execute in parallel, with each stage dependent upon the result of the previous stage. Ideally, a nonpipelined system that is then divided into n pipelined stages could give a speedup of a factor of n. This increase in performance is the main reason to use pipelining. For example, a large number of sandwiches can be prepared quickly by a series of people—one preparing the bread, another adding meat, another adding toppings. Each passes the result to the next person in line and immediately starts work on the next sandwich. If each person takes twenty seconds to perform their task, a maximum rate of one sandwich every twenty seconds, three a minute, is possible. The pipeline stages execute in parallel, but they are stalled until the slowest stage has finished its task. For example, say the meat addition stage becomes more involved, taking thirty seconds. Now the best rate that can be achieved is two sandwiches a minute. For this particular pipeline, the meat stage is the bottleneck, since it determines the speed of the entire production. The toppings stage is said to be starved (and the customer, too) during the time it waits for the meat stage to be done. 计算机图形学的实时渲染领域也有这种类型的管线结构。实时渲染管线粗略的分为四个主要的领域:应用阶段,几何处理阶段,光栅化阶段,像素处理阶段,如下图2.2所示。此结构是渲染管线引擎的核心,被用于计算机图形学实时渲染软件中,也同时是接下来的章节讨论的重点。这些阶段很多自己也是管线,这意味着他包含了很多子阶段。我们在这里给大家区分一些上文提到的功能性阶段和实现结构:每个功能性阶段有他自己需要执行的特定目标,但是不会表明在管线里执行目标的方法;而实现结构可能会将两个功能能行阶段合并成一个单元,或使用可编程内核来执行,同时将更耗时的功能性阶段分成几个硬件单元。
This kind of pipeline construction is also found in the context of real-time computer graphics. A coarse division of the real-time rendering pipeline into four main stages—application, geometry processing, rasterization, and pixel processing—is shown in Figure 2.2. This structure is the core—the engine of the rendering pipeline—which is used in real-time computer graphics applications and is thus an essential base for discussion in subsequent chapters. Each of these stages is usually a pipeline in itself, which means that it consists of several substages. We differentiate between the functional stages shown here and the structure of their implementation. A functional stage has a certain task to perform but does not specify the way that task is executed in the pipeline. A given implementation may combine two functional stages into one unit or execute using programmable cores, while it divides another, more time-consuming, functional stage into several hardware units.
图2.2
渲染管线的基础结构包括以下四个阶段:应用阶段,几何处理阶段,光栅化阶段,像素处理阶段。每个阶段自己也可能是一个管线,像几何阶段下面展现的那样。有的阶段还可能是(部分)并行的,如像素处理阶段下面展现的那样。图中,应用阶段只有一个单独的进程,但这个阶段也可以分解成管线或者并行。光栅化则是将图元(比如三角形)转化为(一些)像素。
Figure 2.2. The basic construction of the rendering pipeline, consisting of four stages: application, geometry processing, rasterization, and pixel processing. Each of these stages may be a pipeline in itself, as illustrated below the geometry processing stage, or a stage may be (partly) parallelized, as shown below the pixel processing stage. In this illustration, the application stage is a single process, but this stage could also be pipelined or parallelized. Note that rasterization finds the pixels inside a primitive, e.g., a triangle渲染速度使用frame per second(FPS)来表达,也就是每秒钟渲染的图像数量,这也可以使用Hertz(Hz)来衡量,他的单位是1/seconds,刷新率。我们还可以用毫秒来衡量渲染图像的时间。生成每张图像的时间通常是不固定的,由每帧执行计算的复杂度有关。FPS既可以用来表达特定一帧渲染的速度,也可以衡量一段时间内的渲染性能。Hertz通常用于衡量硬件性能,比如刷新率固定的显示器。
The rendering speed may be expressed in frames per second (FPS), that is, the
number of images rendered per second. It can also be represented using Hertz (Hz),
which is simply the notation for 1/seconds, i.e., the frequency of update. It is also
common to just state the time, in milliseconds (ms), that it takes to render an image.
The time to generate an image usually varies, depending on the complexity of the
computations performed during each frame. Frames per second is used to express
either the rate for a particular frame, or the average performance over some duration
of use. Hertz is used for hardware, such as a display, which is set to a fixed rate.如名字所述,应用阶段由应用来驱动,因此,应用阶段一般在软件中实现,并在通用CPU上运行。这种CPU拥有多个核心处理单元,并且有能力并行处理多线程的执行。这使 CPU 能够高效运行由应用程序阶段负责的各种任务。CPU一般会执行包括碰撞检测,全局加速算法,动画,物理模拟等功能,具体取决于应用的类型。下一个主要阶段是几何处理阶段,这个阶段主要处理变换,投影,以及其他种类的几何操作。这个阶段要计算画什么,怎么画,在哪画的问题。几何阶段一般使用GPU执行,GPU上存在大量的可编程单元和固定操作单元。光栅化阶段一般会接受可以组成三角形的三个顶点作为输入,并找到所有三角形内部的像素,并传递给下个阶段。最后,像素处理阶段会逐像素的执行程序来决定每个像素的颜色,还可能进行深度测试,来测试此像素是否可见。这个阶段也可能执行逐像素的混合操作,来混合当前像素现有的颜色与新计算出的颜色。光栅化与像素处理阶段全部都在GPU中执行。上述所有的阶段与每个阶段内部的管线会在接下来四个小章节进行讨论。更多关于GPU处理这些阶段的细节会在第三章中讲述。
As the name implies, the application stage is driven by the application and is therefore typically implemented in software running on general-purpose CPUs. These CPUs commonly include multiple cores that are capable of processing multiple threads of execution in parallel. This enables the CPUs to efficiently run the large variety of tasks that are the responsibility of the application stage. Some of the tasks traditionally performed on the CPU include collision detection, global acceleration algorithms, animation, physics simulation, and many others, depending on the type of application. The next main stage is geometry processing, which deals with transforms, projections, and all other types of geometry handling. This stage computes what is to be drawn, how it should be drawn, and where it should be drawn. The geometry stage is typically performed on a graphics processing unit (GPU) that contains many programmable cores as well as fixed-operation hardware. The rasterization stage typically takes as input three vertices, forming a triangle, and finds all pixels that are considered inside that triangle, then forwards these to the next stage. Finally, the pixel processing stage executes a program per pixel to determine its color and may perform depth testing to see whether it is visible or not. It may also perform per-pixel operations such as blending the newly computed color with a previous color. The rasterization and pixel processing stages are also processed entirely on the GPU. All these stages and their internal pipelines will be discussed in the next four sections. More details on how the GPU processes these stages are given in Chapter 32.2 应用阶段
开发者对于应用阶段拥有绝对控制权,因为他一般在CPU中运行。因此,开发者可以完全决定怎么开发,并在后续对其进行修改,以提升性能。 这里的改动也会影响子阶段的性能。举个例子,应用阶段的算法,设置可以减少要渲染的三角形数量。
The developer has full control over what happens in the application stage, since it usually executes on the CPU. Therefore, the developer can entirely determine the implementation and can later modify it in order to improve performance. Changes here can also affect the performance of subsequent stages. For example, an application stage algorithm or setting could decrease the number of triangles to be rendered.不过,一些功能可以使用GPU的一个独立模块compute shader来实现。这个模块将GPU视为高度并行的处理器,忽略了他在渲染图形方面的特殊功能。
All this said, some application work can be performed by the GPU, using a separate mode called a compute shader. This mode treats the GPU as a highly parallel general processor, ignoring its special functionality meant specifically for rendering graphics.应用阶段的最后会将需要渲染的几何体传递给几何处理阶段。这些几何体成为渲染图元,点,线,三角形都可以是渲染图元,最终都可能出现在屏幕上(与使用的输出设备无关)。这是应用阶段最重要的任务。
At the end of the application stage, the geometry to be rendered is fed to the geometry processing stage. These are the rendering primitives, i.e., points, lines, and triangles, that might eventually end up on the screen (or whatever output device is being used). This is the most important task of the application stage.应用阶段基于软件实现的一个结果就是,他没有被分成许多个子阶段,同时几何处理阶段,光栅化阶段,像素处理阶段分成了很多子阶段。(因为CPU本身就是一个小型管线,因此也可以说应用阶段进一步分成了很多管线阶段,但这与本章话题不相关)但为了提升性能,此阶段一般使用多核处理器并行执行。在CPU设计中,这称为超标量结构,因为此结构可以在同一阶段同时执行多个处理器。本书的18.5部分会具体讲解使用多核处理器的各种方法。
A consequence of the software-based implementation of this stage is that it is not divided into substages, as are the geometry processing, rasterization, and pixel processing stages. (Since a CPU itself is pipelined on a much smaller scale, you could say that the application stage is further subdivided into several pipeline stages, but this is not relevant here.) However, to increase performance, this stage is often executed in parallel on several processor cores. In CPU design, this is called a superscalar construction, since it is able to execute several processes at the same time in the same stage. Section 18.5 presents various methods for using multiple processor cores. 碰撞检测一般在此阶段实现。一旦我们检测到两个物体之间的碰撞,两个物体都会接受到碰撞信息,并受到作用力的反馈。应用阶段也要接受其他外接设备的输入,比如键盘,鼠标,头戴显示器等,并会依据这些输入来执行相应的行动。一些加速算法也会在这个阶段实现,比如特定的裁剪算法(第十九章),这是其他阶段无法处理的内容。
One process commonly implemented in this stage is collision detection. After a collision is detected between two objects, a response may be generated and sent back to the colliding objects, as well as to a force feedback device. The application stage is also the place to take care of input from other sources, such as the keyboard, the mouse, or a head-mounted display. Depending on this input, several different kinds of actions may be taken. Acceleration algorithms, such as particular culling algorithms (Chapter 19), are also implemented here, along with whatever else the rest of the pipeline cannot handle.2.3 几何处理 Geometry Processing
几何处理阶段负责处理几乎所有逐三角形或逐顶点的操作。此阶段进一步分为下面几个功能性子阶段:顶点着色,投影,裁剪,屏幕映射。
2.3.1 顶点着色 Vertex Shading
顶点着色阶段有两个任务,即计算顶点的位置,和程序员想输出顶点的什么信息,比如法线和uv坐标。一般来讲,大部分对象的着色通过使用每个顶点上光的信息与法线进行计算,然后将结果的颜色存储在顶点上。然后存储在顶点上的光照结果会在三角形之间插值。由此,可编程的顶点处理单元也叫做顶点着色。但随着现代GPU的发展,几乎全部着色计算是逐像素运行的,而顶点着色阶段普遍不用于执行任何着色等式,这也取决于程序员的意图。现在顶点着色器更常用于设置与每个顶点相关联的数据。举个例子,顶点着色器可以使用4.4,4.5的方法令对象动起来。
There are two main tasks of vertex shading, namely, to compute the position for a
vertex and to evaluate whatever the programmer may like to have as vertex output
data, such as a normal and texture coordinates. Traditionally much of the shade of
an object was computed by applying lights to each vertex’s location and normal and
storing only the resulting color at the vertex. These colors were then interpolated
across the triangle. For this reason, this programmable vertex processing unit was
named the vertex shader . With the advent of the modern GPU, along with some
or all of the shading taking place per pixel, this vertex shading stage is more general
and may not evaluate any shading equations at all, depending on the programmer’s
intent. The vertex shader is now a more general unit dedicated to setting up the data
associated with each vertex. As an example, the vertex shader can animate an object
using the methods in Sections 4.4 and 4.5.我们先来介绍如何计算顶点位置,计算需要一组坐标集合。在模型呈现到屏幕上的过程中,他需要转换到几个不同的坐标系。首先,模型原本就在自己的模型空间下,这也意味着他还没有进行变换。每个模型都可以与一种模型变换相关联,来改变自身的位置与朝向。同一个模型可以与多种模型变换进行关联,这样我们可以令同一个模型的多个实例在同一个场景拥有不同的位置,朝向,大小,而不需要额外复制基础几何体。
We start by describing how the vertex position is computed, a set of coordinates that is always required. On its way to the screen, a model is transformed into several different spaces or coordinate systems. Originally, a model resides in its own model space, which simply means that it has not been transformed at all. Each model can be associated with a model transform so that it can be positioned and oriented. It is possible to have several model transforms associated with a single model. This allows several copies (called instances) of the same model to have different locations, orientations, and sizes in the same scene, without requiring replication of the basic geometry. 模型的顶点与法线会使用模型变换进行变换,一个对象的坐标系被称为模型坐标系,在经历模型变换后,模型目前处于世界坐标系,或世界空间下。世界空间是独一无二的,在所有模型都使用相关的模型变换进行变换后,所有模型都已经处于世界空间里。
It is the vertices and the normals of the model that are transformed by the model transform. The coordinates of an object are called model coordinates, and after the model transform has been applied to these coordinates, the model is said to be located in world coordinates or in world space. The world space is unique, and after the models have been transformed with their respective model transforms, all models exist in this same space. 之前提过,只有摄像机(或观察者)能看见的模型才会被渲染。摄像机在世界空间中有它自身的坐标与朝向,这些信息用于定位摄像机。为了方便投影与裁剪,相机与所有模型会使用视图变换,这样做是为了把摄像机放在坐标系原点,并令他的朝向与坐标系的负z轴对齐,并使y轴指向上方,x轴指向右方。我们约定使用负z轴作为朝向,但有些文章会偏向于使用正z轴。他们的区别主要在语义上,因为二者之间的转换很简单。进行视图变换后,摄像机实际应用的位置与朝向与当前使用的应用编程接口(API)有关。按照上述方法划定的空间被称为摄像机空间,更常叫做观察空间或者视口空间。图2.4展示了视口变换是如何影响摄像机和模型的。模型变换与视口变换都使用4x4矩阵来表达,我们会在第四章讲解相关话题。而且,需要注意的是,程序员可以自由选择计算顶点位置与法线的方法。
As mentioned previously, only the models that the camera (or observer) sees are rendered. The camera has a location in world space and a direction, which are used to place and aim the camera. To facilitate projection and clipping, the camera and all the models are transformed with the view transform. The purpose of the view transform is to place the camera at the origin and aim it, to make it look in the direction of the negative z-axis, with the y-axis pointing upward and the x-axis pointing to the right. We use the z-axis convention; some texts prefer looking down the +z-axis. The difference is mostly semantic, as transform between one and the other is simple. The actual position and direction after the view transform has been applied are dependent on the underlying application programming interface (API). The space thus delineated is called camera space, or more commonly, view space or eye space. An example of the way in which the view transform affects the camera and the models is shown in Figure 2.4. Both the model transform and the view transform may be implemented as 4×4 matrices, which is the topic of Chapter 4. However, it is important to realize that the position and normal of a vertex can be computed in whatever way the programmer prefers.
图2.4 左图展示了顶视角下,用户自定义的摄像机的位置与朝向,这里的世界空间下,正z轴朝上。如右图 ,视口变换重定向了坐标系,令相机在原点,朝向负z轴,并令正y轴朝上。这些操作可以使投影与裁剪操作更加简单快速。蓝色的部分是视锥体,可以看到是一个截头锥体,因此在这里我们使用的是透视视图。类似的技术会应用在任意种类的投影上。
Figure 2.4. In the left illustration, a top-down view shows the camera located and oriented as the user wants it to be, in a world where the +z-axis is up. The view transform reorients the world so that the camera is at the origin, looking along its negative z-axis, with the camera’s +y-axis up, as shown on the right. This is done to make the clipping and projection operations simpler and faster. The light blue area is the view volume. Here, perspective viewing is assumed, since the view volume is a frustum. Similar techniques apply to any kind of projection.下一步,我们介绍顶点着色的第二种输出。为了生成真实的场景,光是渲染对象的形状和位置是不够的,他们的外观也要考虑在内。对象外观的描述包括每个对象的材质,和投在物体上的光源带来的影响。材质与光照可以用多种方法来描述,从简单的颜色到复杂的物理表达式。
Next, we describe the second type of output from vertex shading. To produce a realistic scene, it is not sufficient to render the shape and position of objects, but their appearance must be modeled as well. This description includes each object’s material, as well as the effect of any light sources shining on the object. Materials and lights can be modeled in any number of ways, frm simple colors to elaborate representations of physical descriptions.这种决定光源如何影响材质的操作被称为着色,包含计算对象各处的着色等式的操作。一般来说,一些计算会在几何处理阶段逐顶点的执行,令一些计算会在像素处理阶段逐像素的执行。对象的每个顶点上可以存储大量的材质信息,包括顶点位置,法线,颜色等任何在着色等式中需要使用的数值信息。顶点着色的结果(包括颜色,向量,uv坐标,其他类型的着色数据等)接下来传递给光栅化与像素处理阶段进行插值与表面着色的计算。
This operation of determining the effect of a light on a material is known as shading. It involves computing a shading equation at various points on the object. Typically, some of these computations are performed during geometry processing on a model’s vertices, and others may be performed during per-pixel processing. A variety of material data can be stored at each vertex, such as the point’s location, a normal, a color, or any other numerical information that is needed to evaluate the shading equation. Vertex shading results (which can be colors, vectors, texture coordinates, along with any other kind of shading data) are then sent to the rasterization and pixel processing stages to be interpolated and used to compute the shading of the surface.第三章和第五章会更细致的按照GPU顶点着色器的结构讲解顶点着色。
Vertex shading in the form of the GPU vertex shader is discussed in more depth throughout this book and most specifically in Chapters 3 and 5. 作为顶点着色的一部分,渲染系统接下来会执行投影与裁剪两个步骤,将视锥体变换成单位立方体,此立方体的两个极点的坐标是(-1,-1,-1)和(1,1,1)。同一视锥体可以使用不同的深度范围,比如 0\leq z \leq1 。这个单位立方体被称为标准视锥体。GPU上,顶点着色器先执行投影操作。我们经常会使用两种投影方法,正交(平行)投影和透视投影,如图2.5所示。实际上,正交只是平行投影的一种,平行投影还有几种会在建筑行业使用,比如斜投影,轴测投影。老游戏Zaxxon就是以后者命名的。
As part of vertex shading, rendering systems perform projection and then clipping, which transforms the view volume into a unit cube with its extreme points at (1, 1, 1) and (1, 1, 1). Different ranges defining the same volume can and are used, for example, 0 ≤ z ≤ 1. The unit cube is called the canonical view volume. Projection is done first, and on the GPU it is done by the vertex shader. There are two commonly used projection methods, namely orthographic (also called parallel)and perspective projection. See Figure 2.5. In truth, orthographic is just one type of parallel projection. Several others find use, particularly in the field of architecture, such as oblique and axonometric projections. The old arcade game Zaxxon is named from the latter.
注意,投影可以以矩阵的形式来表达(4.7节),因此在讲余下的几何变换时,有时会提起投影。
Note that projection is expressed as a matrix (Section 4.7) and so it may sometimes be concatenated with the rest of the geometry transform.正交视角的视锥体就是一个长方体,正交投影会将其变换成一个单位正方体。正交投影的主要特征是经过变换后,平行线依旧平行,该变换是平移与缩放的结合(?翻译存疑)。
The view volume of orthographic viewing is normally a rectangular box, and the orthographic projection transforms this view volume into the unit cube. The main characteristic of orthographic projection is that parallel lines remain parallel after the transform. This transformation is a combination of a translation and a scaling.透视投影稍微复杂一些,在这种透视下,离摄像机越远的物体,投影之后看起来越小,而且平行线在投影之后会在地平线相交。因此可以说,透视变换模拟了我们感知物体大小的方式。从几何上来讲,视锥体,也叫截头锥体,是一个被从头部截断的,有长方形底的金字塔型物体,透视变换也将视锥体变换成一个单位正方体。正交变换和透视变换都可以使用4x4的矩阵表达出来,在经历了以上任一变换后,模型目前处于裁剪坐标系内,事实上也可以说是在齐次坐标系内,这些步骤会在除w之前发生,我们会在第四章继续讲解。为了下一阶段的裁剪工作能够正确执行,GPU的顶点着色器输出的数据必须处于裁剪坐标系下。
The perspective projection is a bit more complex. In this type of projection, the farther away an object lies from the camera, the smaller it appears after projection. In addition, parallel lines may converge at the horizon. The perspective transform thus mimics the way we perceive objects’ size. Geometrically, the view volume, called a frustum, is a truncated pyramid with rectangular base. The frustum is transformed into the unit cube as well. Both orthographic and perspective transforms can be constructed with 4 × 4 matrices (Chapter 4), and after either transform, the models are said to be in clip coordinates. These are in fact homogeneous coordinates, discussed in Chapter 4, and so this occurs before division by w. The GPU’s vertex shader must always output coordinates of this type in order for the next functional stage, clipping, to work correctly.尽管这些变换只是将一种容器变换为另一种容器,这种操作之所以叫做投影,是因为将图像展现在屏幕上时,图像并不存储z轴的信息,z轴的信息存储在z-buffer里,2.5部分会讲到这点。因此可以说,我们通过这种方法来将模型从三维投影到了二维。
Although these matrices transform one volume into another, they are called projections because after display, the z-coordinate is not stored in the image generated but is stored in a z-buffer, described in Section 2.5. In this way, the models are projected from three to two dimensions.2.3.2 可选的顶点处理阶段 Optional Vertex Processing
上述的顶点处理过程在每种管线都存在,这个过程一旦结束,接下来还有几个可选的阶段可以在GPU上执行:细分,几何着色,流式输出。这些阶段是否使用取决于GPU是否支持,还有程序员是否希望使用。他们彼此独立,而且大部分不经常使用。第三章会讲更多关于这些的内容。
Every pipeline has the vertex processing just described. Once this processing is done, there are a few optional stages that can take place on the GPU, in this order: tessellation, geometry shading, and stream output. Their use depends both on the capabilities of the hardware—not all GPUs have them—and the desires of the programmer. They are independent of each other, and in general they are not commonly used. More will be said about each in Chapter 3.第一个可选的阶段是细分阶段。想象你有一个球,如果你用一组三角形来表达这个球,他可能会有质量或性能问题。你的球可能在5米以外看起来不错,但是凑近看就能看到明显的三角形面,特别是菱形的轮廓。如果你给这个球更多的三角面来提升质量,那么在球离你很远的且在屏幕上只占据很少一部分像素的时候,你会浪费很多处理时间与内存来渲染这个球。使用细分,曲面可以根据离屏幕距离生成适当数量的三角形。
The first optional stage is tessellation. Imagine you have a bouncing ball object. If you represent it with a single set of triangles, you can run into problems with quality or performance. Your ball may look good from 5 meters away, but up close the individual triangles, especially along the silhouette, become visible. If you make the ball with more triangles to improve quality, you may waste considerable processing time and memory when the ball is far away and covers only a few pixels on the screen. With tessellation, a curved surface can be generated with an appropriate number of triangles. 我们已经讨论了一段时间的三角形,但当前阶段的管线只处理了顶点。顶点可以用来表达点,线,三角形或者其他对象。顶点还可以用于描述曲面,比如球。这种表面可以用多组顶点来表示。细分阶段自身可以划分成多个阶段:hull shader, tessellator,domain shader,这些阶段将多组顶点(通常)转换成更多的顶点,再使用这些顶点来组成新的三角形。相机可以用于决定生成三角形的数量,物体离相机近的时候生成较多的三角形,离相机远的时候生成较少的三角形。
We have talked a bit about triangles, but up to this point in the pipeline we have just processed vertices. These could be used to represent points, lines, triangles, or other objects. Vertices can be used to describe a curved surface, such as a ball. Such surfaces can be specified by a set of patches, and each patch is made of a set of vertices. The tessellation stage consists of a series of stages itself—hull shader, tessellator, and domain shader—that converts these sets of patch vertices into (normally) larger sets of vertices that are then used to make new setsof triangles. The camera for the scene can be used to determine how many triangles are generated: many when the patch is close, few when it is far away. 下一个可选的阶段是几何着色器,他早于细分着色器出现,因此更常在GPU上找到。他像细分着色器抑郁,可以接受各种类型的图元并产生新的顶点,不过它更简单,因此他的使用范围有限,输出图元的种类也有限。几何着色器最经常的用法就是用于生成粒子,可以想象一下对于烟火爆炸的模拟。每个火星可以用一个点或者单独的一个顶点来表示,几何着色器可以将点转换成一个面对摄像机的方形面片(由两个三角形组成),并覆盖一部分像素,由此可以产生更好的效果。
The next optional stage is the geometry shader. This shader predates the tessellation shader and so is more commonly found on GPUs. It is like the tessellation shader in that it takes in primitives of various sorts and can produce new vertices. It is a much simpler stage in that this creation is limited in scope and the types of output primitives are much more limited. Geometry shaders have several uses, with one of the most popular being particle generation. Imagine simulating a fireworks explosion. Each fireball could be represented by a point, a single vertex. The geometry shader can take each point and turn it into a square (made of two triangles) that faces the viewer and covers several pixels, so providing a more convincing primitive for us to shade.最后一个可选的阶段是流式输出,此阶段把GPU当做几何引擎来使用,我们可以选择将处理过的顶点输出到一个数组内,以备后续处理,而不是将顶点传给管线余下的阶段来将其渲染到屏幕上。在接下来的阶段,CPU和GPU都能使用这些数据。此阶段通常用于粒子模拟,比如之前提到的烟火的例子。
The last optional stage is called stream output. This stage lets us use the GPU as a geometry engine. Instead of sending our processed vertices down the rest of the pipeline to be rendered to the screen, at this point we can optionally output these to an array for further processing. These data can be used by the CPU, or the GPU itself, in a later pass. This stage is typically used for particle simulations, such as our fireworks example. 这三个阶段按照以下顺序执行:细分,几何着色器,流式输出,每个阶段都可选的。不管以上任意阶段是否被使用,我们会继续执行管线,判断处于齐次坐标系下的顶点集合是否在摄像机视线范围内。
These three stages are performed in this order—tessellation, geometry shading, and stream output—and each is optional. Regardless of which (if any) options are used, if we continue down the pipeline we have a set of vertices with homogeneous coordinates that will be checked for whether the camera views them.2.3.3 裁剪 clipping
只有整个或者部分在视锥体内的图元才需要被传给光栅化阶段(与接下来的像素处理阶段),这个阶段会将图元画到屏幕上。完全在视锥体内的图元会原封不动的传递给下一个阶段。完全不在视锥体内的图元不会被传递给下个阶段,因为他们不会被渲染。部分在视锥体内的图元则需要裁剪。举个例子,一个顶点在外面,一个顶点在里面的一条线应该在与视锥体相交的位置进行裁剪,在视锥体外的顶点被新生成的,线与视锥体的交点所代替。因为我们之前使用了投影矩阵,因此转换才clip space下的图元通过判断与单位正方体的相交来裁剪。在裁剪之前执行视图变换和投影的好处是,他令裁剪问题在不同情况下保持一致,也就是说图元一直都和一个单位正方体进行裁剪运算。
Only the primitives wholly or partially inside the view volume need to be passed on to the rasterization stage (and the subsequent pixel processing stage), which then draws them on the screen. A primitive that lies fully inside the view volume will be passed on to the next stage as is. Primitives entirely outside the view volume are not passed on further, since they are not rendered. It is the primitives that are partially inside the view volume that require clipping. For example, a line that has one vertex outside and one inside the view volume should be clipped against the view volume, so that the vertex that is outside is replaced by a new vertex that is located at the intersection between the line and the view volume. The use of a projection matrix means that the transformed primitives are clipped against the unit cube. The advantage of performing the view transformation and projection before clipping is that it makes the clipping problem consistent; primitives are always clipped against the unit cube.裁剪过程如图2.6所示,除了使用视锥体的六个面进行裁剪,用户可以定义额外的裁剪面对物体进行可视化的裁剪,818页的图19.1展示了这种可视化裁剪,也叫做切片。
The clipping process is depicted in Figure 2.6. In addition to the six clipping planes of the view volume, the user can define additional clipping planes to visibly chop objects. An image showing this type of visualization, called sectioning, is shown in Figure 19.1 on page 818. 裁剪步骤使用投影后生成的四维齐次坐标来执行裁剪。在透视空间里,三角形之间的值一般并不是线性插值。因此在进行透视投影时,我们需要使用坐标的第四个维度来对数据进行正确的插值与裁剪。最后会执行透视除法,将三角形的位置变换为三维的归一化设备坐标(normalized device coordinate)。之前提过,视锥体的范围是从(-1,-1,-1)到(1,1,1)。几何阶段最后的步骤是将物体从ndc坐标系转换到屏幕坐标系。
The clipping step uses the 4-value homogeneous coordinates produced by projection to perform clipping. Values do not normally interpolate linearly across a triangle in perspective space. The fourth coordinate is needed so that data are properly interpolated and clipped when a perspective projection is used. Finally, perspective division is performed, which places the resulting triangles’ positions into three-dimensional normalized device coordinates. As mentioned earlier, this view volume ranges from (1, 1, 1) to (1, 1, 1). The last step in the geometry stage is to convert from this space to window coordinates
投影变换后,只有在单位正方体内的图元(也就是在视锥体内的图元)需要继续处理,因此,在单位正方体外的图元会被丢弃,完全处于其内部的图元会被留下,与单位正方体相交的图元,会计算交点,并生成新顶点,抛弃旧顶点。
2.3.4 屏幕映射 Screen Mapping
只有在视锥体内的(裁剪过的)图元会被传递给屏幕映射阶段,而且在刚进入这个阶段的时候,图元依然处于三维坐标系内。每个图元的xy轴变换之后构成了屏幕坐标系。屏幕坐标系加上z轴,也可以叫做窗口坐标系。假设场景要渲染到一个最小角(x1,y1),最大角(x2,y2)的窗口,且x1<x2,y1<y2。那么屏幕映射就是先平移再缩放的操作。这样变换的新的xy坐标就是屏幕坐标。z轴(opengl的范围是[-1,1],directx的范围是)的值也映射到,且z1的默认值是0,z2的默认值是1,但默认值会根据API变动。窗口坐标与重映射z值会被传递给光栅化阶段。图2.7描绘了屏幕映射阶段。
Only the (clipped) primitives inside the view volume are passed on to the screen mapping stage, and the coordinates are still three-dimensional when entering this stage. The x- and y-coordinates of each primitive are transformed to form screen coordinates. Screen coordinates together with the z-coordinates are also called window coordinates. Assume that the scene should be rendered into a window with the minimum corner at (x1, y1) and the maximum corner at (x2, y2), where x1 < x2 and y1 < y2. Then the screen mapping is a translation followed by a scaling operation. The new x- and ycoordinates are said to be screen coordinates. The z-coordinate ( for OpenGL and for DirectX) is also mapped to , with z1 = 0 and z2 = 1 as the default values. These can be changed with the API, however. The window coordinates along with this remapped z-value are passed on to the rasterizer stage. The screen mapping process is depicted in Figure 2.7.
在影变换后,图元在单位正方体内,在经过屏幕映射之后,可以获得图元的屏幕坐标。
接下来,我们要讲一下像素(和贴图坐标)与浮点数,整数的关系。现在有一行在笛卡尔坐标系下的像素,最左边的第一个像素的浮点数坐标是0.0、opengl一直都使用这个方案,directx10及以后的版本也用这个方案。这个像素的中心坐标是0.5,因此范围内的像素点范围跨度是[0,10)。以下表达式表示了像素的离散索引(d)与像素内连续值(c)之间的关系:d=floor(c) c=d+0.5。
Next, we describe how integer and floating point values relate to pixels (and texture coordinates). Given a horizontal array of pixels and using Cartesian coordinates, the left edge of the leftmost pixel is 0.0 in floating point coordinates. OpenGL has always used this scheme, and DirectX 10 and its successors use it. The center of this pixel is at 0.5. So, a range of pixels cover a span from
尽管在所有的API中,像素位置都是从左到右增大,Opengl与DirectX并没有统一像素原点的位置到底会是底部还是顶部。Opengl偏向于使用左下角作为坐标原点,而directX有时会使用左上角,这根据具体环境来决定。他们各自有理由来解释自身规定的原因,不过我们没办法评判究竟谁的规定是正确的。比如,(0,0)点在Opengl中位于图像的左下角,而在DirectX中位于图像的左上角。我们在跨平台时,要额外注意API之间的差别。
While all APIs have pixel location values that increase going from left to right, the location of zero for the top and bottom edges is inconsistent in some cases between OpenGL and DirectX.2 OpenGL favors the Cartesian system throughout, treating the lower left corner as the lowest-valued element, while DirectX sometimes defines the upper left corner as this element, depending on the context. There is a logic to each, and no right answer exists where they differ. As an example, (0, 0) is located at the lower left corner of an image in OpenGL, while it is upper left for DirectX. This difference is important to take into account when moving from one API to the other. 2.4 光栅化 rasterization
接收到被变换与投影过的顶点与他们相关联的着色数据(都来自于几何处理阶段)后,下个阶段的目标是找到图元(比如正在渲染的一个三角形)内所有的像素。我们称这个阶段为光栅化,此阶段分为两个子阶段:三角形设置(也叫做图元装配)和三角形遍历。图2.8的左边展示了这两个阶段。要注意的是这个阶段也可以处理点和线,但因为三角形更常见,所以子阶段的名称包含‘三角形’。因此,光栅化,也称扫描转换,是将每一个包含z值(深度值)与各种着色信息的处于屏幕空间下的二维顶点转换为屏幕上的像素。光栅化也可以视为连接几何处理与像素处理两个阶段之间的桥梁,因为在这个阶段,三个顶点形成了三角形,并最终传递给像素处理阶段。
Given the transformed and projected vertices with their associated shading data (all from geometry processing), the goal of the next stage is to find all pixels—short for picture elements—that are inside the primitive, e.g., a triangle, being rendered. We call this process rasterization, and it is split up into two functional substages: triangle setup (also called primitive assembly) and triangle traversal. These are shown to the left in Figure 2.8. Note that these can handle points and lines as well, but since triangles are most common, the substages have “triangle” in their names. Rasterization, also called scan conversion, is thus the conversion from two-dimensional vertices in screen space—each with a z-value (depth value) and various shading information associated with each vertex—into pixels on the screen. Rasterization can also be thought of as a synchronization point between geometry processing and pixel processing, since it is here that triangles are formed from three vertices and eventually sent down to pixel processing.
左图中,光栅化阶段分成了两个功能阶段,叫做三角形设置和三角形遍历。右图中,像素处理阶段分成了两个功能阶段,叫做像素处理和像素合并。
对于三角形是否覆盖一个顶点的判断取决于GPU管线的设置。举个例子,你可以使用点采样来决定当前像素是否在三角形内。用点采样的最简单的例子就是,如果当前点的中心在三角形内部,那么与其关联的像素也在三角形内部。你也可以在每个像素用多次采样,比如超采样和多次采样技术(5.4.2)。还有一种技术叫保守光栅化,他的定义如下:如果一个像素至少有一部分与三角形交叠在一起,那么他被判定为在三角形内。
Whether the triangle is considered to overlap the pixel depends on how you have set up the GPU’s pipeline. For example, you may use point sampling to determine“insideness.” The simplest case uses a single point sample in the center of each pixel, and so if that center point is inside the triangle then the corresponding pixel is considered inside the triangle as well. You may also use more than one sample per pixel using supersampling or multisampling antialiasing techniques (Section 5.4.2). Yet another way is to use conservative rasterization, where the definition is that a pixel is “inside” the triangle if at least part of the pixel overlaps with the triangle (Section 23.1.2). 2.4.1 三角形设置 Triangle Setup
这个阶段计算三角形每个边的斜率,等式,还有一些其他的数据。三角形遍历这个阶段可能会用上这些数据,并会对几何阶段传过来的各种着色数据进行插值。此任务会用到一些固定函数硬件。
In this stage the differentials, edge equations, and other data for the triangle are computed. These data may be used for triangle traversal (Section 2.4.2), as well as for interpolation of the various shading data produced by the geometry stage. Fixedfunction hardware is used for this task.2.4.2 三角形遍历 Triangle Traversal
此阶段会判断像素中心是否在三角形内,在三角形内的那部分像素会生成片元。5.4会讲述更多复杂的采样方法。三角形遍历就是要找到当前采样或像素是否在一个三角形内部。每个三角形片元的属性会在三个三角形顶点之间插值来得到(第五章)。这些属性包括了片元深度以及任何来自于几何阶段的着色数据。McCormack et al.的方法在三角形遍历阶段提供了更多信息。三角形遍历阶段也会执行透视矫正插值(23.1.1)操作。图元内部的所有像素或采样接下来会被传递给像素处理阶段。
Here is where each pixel that has its center (or a sample) covered by the triangle is checked and a fragment generated for the part of the pixel that overlaps the triangle. More elaborate sampling methods can be found in Section 5.4. Finding which samples or pixels are inside a triangle is often called triangle traversal. Each triangle fragment’s properties are generated using data interpolated among the three triangle vertices (Chapter 5). These properties include the fragment’s depth, as well as any shading data from the geometry stage. McCormack et al. offer more information on triangle traversal. It is also here that perspective-correct interpolation over the triangles is performed (Section 23.1.1). All pixels or samples that are inside a primitive are then sent to the pixel processing stage, described next.2.5 像素处理阶段 Pixel Processing
到了像素处理阶段,他接收到的所有像素都在一个三角形,或者其他图元内,这些是之前所有阶段结果的集合。像素处理阶段分为:像素着色和合并两个子阶段,如图2.8的右边所示。像素处理阶段会对每个在图元内的像素/采样进行逐像素/采样的着色计算。
At this point, all the pixels that are considered inside a triangle or other primitive have been found as a consequence of the combination of all the previous stages. The pixel processing stage is divided into pixel shading and merging, shown to the right in Figure 2.8. Pixel processing is the stage where per-pixel or per-sample computations and operations are performed on pixels or samples that are inside a primitive. 2.5.1 像素着色
任何逐像素的着色计算会在此阶段执行,计算会用到插值过的着色数据作为输入。此阶段最终会输出一个或多个颜色,并传递给下一个阶段。不像三角形设置和三角形便利阶段,会使用复杂的硬件来执行,像素着色阶段用可编程的GPU内核,因此程序员需要给GPU提供像素着色器(OpenGL称之为片元着色器),其内可以成在任何想要的计算。大量技术可以在这里得到实现,其中最重要的就是贴图。第六章会更详细的介绍贴图技术。简单来说,贴图技术就是为了各种目的,将一张或者多张贴图“粘”到物体上。图2.9简单的描述了这一过程。图像可能是一维,二维,或者三维,不过二维图像最常见。最简单的情况下,最后的结果是每个片元有一个颜色值,这些结果会被传递给下一个子阶段。
Any per-pixel shading computations are performed here, using the interpolated shading data as input. The end result is one or more colors to be passed on to the next stage. Unlike the triangle setup and traversal stages, which are usually performed by dedicated, hardwired silicon, the pixel shading stage is executed by programmable GPU cores. To that end, the programmer supplies a program for the pixel shader (or fragment shader, as it is known in OpenGL), which can contain any desired computations. A large variety of techniques can be employed here, one of the most important of which is texturing. Texturing is treated in more detail in Chapter 6. Simply put, texturing an object means “gluing” one or more images onto that object, for a variety of purposes. A simple example of this process is depicted in Figure 2.9. The image may be one-, two-, or three-dimensional, with two-dimensional images being the most common. At its simplest, the end product is a color value for each fragment, and these are passed on to the next substage.
左上角是没有贴图的龙模型,右边的贴图粘到了龙模型上面,结果如左下所示。
2.5.2合并
每个像素的信息会被存在颜色缓冲区中,他是存储了颜色的长方形数组(每个颜色有红绿蓝三个通道)。合并阶段负责将片元着色器阶段计算出的片元颜色与当前颜色缓冲区离存储的颜色混合到一起。这个阶段也称为ROP,也就是“光栅操作管线(raster operations pipeline)”或“渲染输出单元(render output unit)”,说法取决于你问谁。不像着色阶段,执行这个阶段的GPU子单元一般不是可以完全自主编程的,但是他是高度可配置的,可以产生多种效果。
The information for each pixel is stored in the color buffer, which is a rectangular array of colors (a red, a green, and a blue component for each color). It is the responsibility of the merging stage to combine the fragment color produced by the pixel shading stage with the color currently stored in the buffer. This stage is also called ROP, standing for “raster operations (pipeline)” or “render output unit,” depending on who you ask. Unlike the shading stage, the GPU subunit that performs this stage is typically not fully programmable. However, it is highly configurable, enabling various effects.该阶段也负责解决可见性问题,这意味着如果整个场景都已经被渲染,那么场景中颜色缓冲区中存储的颜色的图元应该在摄像机视角下可见。对于大部分,甚至全部图形硬件来说,上述功能会利用z缓冲区 z-buffer(也成为深度缓冲区depth buffer)来实现。z缓冲区的大小,形状与颜色缓冲区一致,每个像素内会存储当前最近图元的z值。这意味着当一个图元渲染到一个特定像素的时候,图元的z值计算出来,并与对应像素的z值进行比较。如果新的z值比当前z缓冲区内的z值小,那么当前渲染的图元与摄像机的距离比前一个离摄像机最近的图元还要近。因此,那个像素的z值与颜色值就会被正在绘制的那个图元的z值与颜色值所更新。如果计算出来的z值比z缓冲区里的z值要大,那么当前图元的z值与颜色值就会被抛弃。z缓冲区算法很简单,具有O(n)的复杂度(n是要渲染的图元数量),可以用于任何能计算每个(相关的)像素点z值的正在绘制的图元。要注意的是,这个算法支持大部分图元以任意顺序渲染,这也是他受欢迎的原因之一。但是z缓冲区在屏幕的每个点上只存储一个深度值,因此这个方法不能用于半透明的图元。半透明图元的渲染必须在所有不透明图元渲染过后,再以由远及近的顺序渲染半透明图元,或者使用单独的OIT算法(order-independent algorithm 5.5)。半透明图元的处理是基础z-buffer算法最重要的缺陷之一。
This stage is also responsible for resolving visibility. This means that when the whole scene has been rendered, the color buffer should contain the colors of the primitives in the scene that are visible from the point of view of the camera. For most or even all graphics hardware, this is done with the z-buffer (also called depth buffer) algorithm . A z-buffer is the same size and shape as the color buffer, and for each pixel it stores the z-value to the currently closest primitive. This means that when a primitive is being rendered to a certain pixel, the z-value on that primitive at that pixel is being computed and compared to the contents of the z-buffer at the same pixel. If the new z-value is smaller than the z-value in the z-buffer, then the primitive that is being rendered is closer to the camera than the primitive that was previously closest to the camera at that pixel. Therefore, the z-value and the color of that pixel are updated with the z-value and color from the primitive that is being drawn. If the computed z-value is greater than the z-value in the z-buffer, then the color buffer and the z-buffer are left untouched. The z-buffer algorithm is simple, has O(n) convergence (where n is the number of primitives being rendered), and works for any drawing primitive for which a z-value can be computed for each (relevant) pixel. Also note that this algorithm allows most primitives to be rendered in any order, which is another reason for its popularity. However, the z-buffer stores only a single depth at each point on the screen, so it cannot be used for partially transparent primitives. These must be rendered after all opaque primitives, and in back-to-front order, or using a separate order-independent algorithm (Section 5.5). Transparency is one of the major weaknesses of the basic z-buffer.我们之前提过,颜色缓冲区用于给每个像素存储颜色,z缓冲区用于给每个像素存储z值。不过也存在其他通道与缓冲区来用于过滤并捕捉片元信息。alpha通道与颜色缓冲区相关联,并存储了每个像素的透明值(5.5)。在一些老的API中,alpha通道还可以使用半透明测试对像素进行取舍。目前来讲,舍弃(discard)操作可以插入到像素着色器的代码片段,中,我们可以使用任何类型的计算来对当前像素触发舍弃操作。半透明测试可以保证完全透明的片元不影响z缓冲区的取值(6.6)。
We have mentioned that the color buffer is used to store colors and that the z-buffer stores z-values for each pixel. However, there are other channels and buffers that can be used to filter and capture fragment information. The alpha channel is associated with the color buffer and stores a related opacity value for each pixel (Section 5.5). In older APIs, the alpha channel was also used to discard pixels selectively via the alpha test feature. Nowadays a discard operation can be inserted into the pixel shader program and any type of computation can be used to trigger a discard. This type of test can be used to ensure that fully transparent fragments do not affect the z-buffer (Section 6.6). 模板缓冲区是一个附加的缓冲区,他用于记录渲染过的图元的位置。他一般每个像素占8bit。图元可以在模板缓冲区内使用各种函数来填充值,然后缓冲区内的值可以用于控制该图元是否应该被渲染到颜色缓冲区与z缓冲区内。举个例子,假如有一个实心圆被画进了模板缓冲区内,那么使用一个操作符与模板值比较,可以控制接下来的图元只在实心圆出现的像素上被渲染到颜色缓冲区中。模板缓冲区是制作一些特殊效果的有力工具。管线最后阶段的所有功能统称为光栅操作(raster operation ROP)或混合操作。我们可以将颜色缓冲区里的颜色与三角形内正在被处理的像素的颜色进行混合。混合操作可以产生半透明或者颜色采样的积累叠加效果。不过之前提到,混合操作可以使用API进行参数配置,而不能使用程序完全控制。但是有些API支持Raster Oder View(ROV),也叫做pixel shader ordering,这个功能可以使用代码操控混合。
The stencil buffer is an offscreen buffer used to record the locations of the rendered primitive. It typically contains 8 bits per pixel. Primitives can be rendered into the stencil buffer using various functions, and the buffer&#39;s contents can then be used to control rendering into the color buffer and z-buffer. As an example, assume that a filled circle has been drawn into the stencil buffer. This can be combined with an operator that allows rendering of subsequent primitives into the color buffer only where the circle is present. The stencil buffer can be a powerful tool for generating some special effects. All these functions at the end of the pipeline are called raster operations (ROP) or blend operations. It is possible to mix the color currently in the color buffer with the color of the pixel being processed inside a triangle. This can enable effects such as transparency or the accumulation of color samples. As mentioned, blending is typically configurable using the API and not fully programmable. However, some APIs have support for raster order views, also called pixel shader ordering, which enable programmable blending capabilities.帧缓冲区一般由系统里所有的缓冲区所组成。
The framebuffer generally consists of all the buffers on a system.当图元到达并通过光栅化阶段后,那些在摄像机视角下可见的图元会被展示到屏幕上。屏幕会展示颜色缓冲区的内容。为了避免人眼看到图元被光栅化并传递给屏幕的过程,我们通常会使用双重缓冲技术。这意味着我们会在屏幕后使用后缓冲区(back buffer)对场景进行渲染。后缓冲区的场景一旦渲染完成,后缓冲区的内容会和处于前缓冲区的,当前屏幕上展示的内容进行交换。这种交换通常发生在垂直回溯(vertical retrace)过程中,在这个时候进行交换比较安全。
When the primitives have reached and passed the rasterizer stage, those that are visible from the point of view of the camera are displayed on screen. The screen displays the contents of the color buffer. To avoid allowing the human viewer to see the primitives as they are being rasterized and sent to the screen, double buffering is used. This means that the rendering of a scene takes place off screen, in a back buffer. Once the scene has been rendered in the back buffer, the contents of the back buffer are swapped with the contents of the front buffer that was previously displayed on the screen. The swapping often occurs during vertical retrace, a time when it is safe to do so. 更多关于不同缓冲区与缓冲方法的信息,请看5.4.2,23.6,23.7。
For more information on different buffers and buffering methods, see Sections 5.4.2, 23.6, and 23.7.2.6 管线流程概览 Through the Pipeline
点,线,三角面是要被渲染的图元,他们来自于已经创建的模型或对象。想象有个应用是可交互的计算机辅助设计应用,用户需要检查一个华夫饼机的设计。在这里,我们会跟随这个模型经历整个图形渲染管线,包括四个主要的阶段:应用,几何,光栅化,像素处理。场景会使用透视视角来渲染到屏幕上。在这个简单的示例中,华夫饼机的模型同时包括了线(为了展示部件的边)和三角形(为了展示表面)。华夫饼机还有一个可以打开的盖子。一些三角形上还贴有二维贴图,也就是制造厂商的logo。这个例子中,表面着色完全在几何阶段进行计算,除了应用贴图,该操作会在光栅化阶段执行。
Points, lines, and triangles are the rendering primitives from which a model or an object is built. Imagine that the application is an interactive computer aided design (CAD) application, and that the user is examining a design for a waffle maker. Here we will follow this model through the entire graphics rendering pipeline, consisting of the four major stages: application, geometry, rasterization, and pixel processing. The scene is rendered with perspective into a window on the screen. In this simple example, the waffle maker model includes both lines (to show the edges of parts) and triangles (to show the surfaces). The waffle maker has a lid that can be opened. Some of the triangles are textured by a two-dimensional image with the manufacturer’s logo. For this example, surface shading is computed completely in the geometry stage, except for application of the texture, which occurs in the rasterization stage.应用 Application
CAD应用允许用户选择或移动模型的一部分。举个例子,用户也许会选择盖子,并打开它。应用阶段需要将鼠标移动转换为相关的旋转矩阵,然后再观察盖子渲染的时候,矩阵是否合理的被应用在上面。另一个例子:相机播放一段动画,根据预定义的路径移动,在不同的角度拍摄华夫饼机。应用会根据时间来更新相机参数,比如位置,视线朝向等。对于即将渲染的每一帧,应用阶段会传递给几何阶段以下信息:摄像机位置,光照,模型图元等。
CAD applications allow the user to select and move parts of the model. For example, the user might select the lid and then move the mouse to open it. The application stage must translate the mouse move to a corresponding rotation matrix, then see to it that this matrix is properly applied to the lid when it is rendered. Another example: An animation is played that moves the camera along a predefined path to show the waffle maker from different views. The camera parameters, such as position and view direction, must then be updated by the application, dependent upon time. For each frame to be rendered, the application stage feeds the camera position, lighting, and primitives of the model to the next major stage in the pipeline—the geometry stage. 几何处理 Geometry Processing
对于透视视角,我们假定应用已经提供了一个投影矩阵。对于每个对象,应用也计算了描述视图变换与描述对象位置与朝向的矩阵。在我们的例子中,华夫饼机的本体会有一个矩阵,而他的盖子会有另外一个。在几何阶段,对象的顶点和法线会用这个矩阵进行视图变换,将对象转换到观察空间。顶点还可能会使用材质和光照信息来计算自身着色或者其他算式。接下来则会使用单独的用户提供的投影矩阵,将对象变换到一个单位正方体的空间,这个空间代表了眼睛所看到的。所有在正方体外的图元都会被抛弃。所有与正方体相交的图元会在相交处裁剪成两部分,这样可以保证图元集合都完全在单位正方体内。然后顶点会被映射到屏幕之上的窗口内。在所有的逐三角形与逐顶点的操作完成后,最终的数据会传递给光栅化阶段。
For perspective viewing, we assume here that the application has supplied a projection matrix. Also, for each object, the application has computed a matrix that describes both the view transform and the location and orientation of the object in itself. In our example, the waffle maker’s base would have one matrix, the lid another. In the geometry stage the vertices and normals of the object are transformed with this matrix, putting the object into view space. Then shading or other calculations at the vertices may be computed, using material and light source properties. Projection is then performed using a separate user-supplied projection matrix, transforming the object into a unit cube’s space that represents what the eye sees. All primitives outside the cube are discarded. All primitives intersecting this unit cube are clipped against the cube in order to obtain a set of primitives that lies entirely inside the unit cube. The vertices then are mapped into the window on the screen. After all these per-triangle and per-vertex operations have been performed, the resulting data are passed on to the rasterization stage. 光栅化 Rasterization
所有在几何阶段通过裁剪的图元,会在此阶段被光栅化。这意味着我们会找到一个图元内的所有像素,并将其传递给管线的下一个阶段——像素处理阶段。
All the primitives that survive clipping in the previous stage are then rasterized, which means that all pixels that are inside a primitive are found and sent further down the pipeline to pixel processing.像素处理 Pixel Processing
此阶段的目标是计算每个可见图元的每个像素的颜色。与任何贴图有关联的三角形会以用户指定的方式,带着这些贴图渲染。z缓冲区算法可以解决可见性问题,同时可以选择使用discard或者模板测试。每个对象会按顺序处理,最终会在屏幕上呈现最后渲染出的图像。
The goal here is to compute the color of each pixel of each visible primitive. Those triangles that have been associated with any textures (images) are rendered with these images applied to them as desired. Visibility is resolved via the z-buffer algorithm, along with optional discard and stencil tests. Each object is processed in turn, and the final image is then displayed on the screen. 结论 Conclusion
上述管线是根据几十年来针对实时渲染应用的API和图形硬件的发展而得来的。但读者要意识到,这不是唯一可能的渲染管线,离线渲染管线就经历了不同的进化路径。电影产品的渲染经常使用micropolygon管线,不过最近光线追踪和路径追踪已经成为了主流算法。这些算法也被用于提前预览建筑与设计的效果,我们会在11.2.2节讲述相关算法。
This pipeline resulted from decades of API and graphics hardware evolution targeted to real-time rendering applications. It is important to note that this is not the only possible rendering pipeline; offline rendering pipelines have undergone different evolutionary paths. Rendering for film production was often done with micropolygon pipelines , but ray tracing and path tracing have taken over lately. These techniques, covered in Section 11.2.2, may also be used in architectural and design previsualization.多年以来,图形开发者只能通过使用的图形API定义的固定功能管线来使用上述的流程。固定功能管线之所以这样命名,是因为支持他的图形硬件的一些部分不能通过编程来灵活控制。任天堂2006年推出的Wii是最近一台主要使用固定管线的机器。另一方面,可编程的GPU可以让开发者决定管线上的各种子阶段上面具体执行什么操作。对于本书的第四版来说,我们假定所有的开发都是使用可编程的GPU来进行的。
For many years, the only way for application developers to use the process described here was through a fixed-function pipeline defined by the graphics API in use. The fixed-function pipeline is so named because the graphics hardware that implements it consists of elements that cannot be programmed in a flexible way. The last example of a major fixed-function machine is Nintendo’s Wii, introduced in 2006. Programmable GPUs, on the other hand, make it possible to determine exactly what operations are applied in various sub-stages throughout the pipeline. For the fourth edition of the book, we assume that all development is done using programmable GPUs. 更进一步的阅读与资源 Further Reading and Resources
Blinn的书《A Trip Down the Graphics Pipeline》是一本老书,他介绍了如何从头写一个软件渲染器。这本书可以很好的了解如何开发一个渲染管线,并掌握其中的细节,解释一些核心算法,比如裁剪和透视插值。历史悠久(但经常更新)的Opengl Programming Guide(也称红皮书)对图形管线和他使用的相关算法进行了详尽的描述。我们书的网站http://realtimerendering.com,给出了各种管线图的链接,渲染管线的实现等。
Blinn&#39;s book A Trip Down the Graphics Pipeline is an older book about writing a software renderer from scratch. It is a good resource for learning about some of the subtleties of implementing a rendering pipeline, explaining key algorithms such as clipping and perspective interpolation. The venerable (yet frequently updated) OpenGL Programming Guide (a.k.a. the “Red Book”) provides a thorough description of the graphics pipeline and algorithms related to its use. Our book’s website, realtimerendering.com, gives links to a variety of pipeline diagrams, rendering engine implementations, and more.
页:
[1]