Unity的GPU Instancing

量子计算9 发表于 2022-6-6 06:35

GPU Instancing可以用来批量绘制大量相同几何结构相同材质的物体，以降低绘制所需的batches。要想在Unity中使用，首先需要至少在shader的某个pass中加上
#pragma multi_compile_instancing。由于instancing的每个物体所需要的绘制数据可能各不相同，因此还需要在shader中传递一个instanceId：

struct VertexData {
UNITY_VERTEX_INPUT_INSTANCE_ID
float4 vertex : POSITION;
…
};

UNITY_VERTEX_INPUT_INSTANCE_ID宏定义如下：

// - UNITY_VERTEX_INPUT_INSTANCE_ID Declare instance ID field in vertex shader input / output struct.
# define UNITY_VERTEX_INPUT_INSTANCE_ID DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID

#if defined(UNITY_INSTANCING_ENABLED) || defined(UNITY_PROCEDURAL_INSTANCING_ENABLED) || defined(UNITY_STEREO_INSTANCING_ENABLED)
#ifdef SHADER_API_PSSL
   #define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID;
#else
   #define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID : SV_InstanceID;
#endif

#else
#define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID
#endif
其实就是在启用gpu instancing时定义一个instanceID。
除此之外，我们需要在shader的开头部分使用
UNITY_SETUP_INSTANCE_ID宏进行设置：

InterpolatorsVertex MyVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_INITIALIZE_OUTPUT(Interpolators, i);
UNITY_SETUP_INSTANCE_ID(v);
i.pos = UnityObjectToClipPos(v.vertex);
…
}

UNITY_SETUP_INSTANCE_ID宏展开如下：

// - UNITY_SETUP_INSTANCE_ID    Should be used at the very beginning of the vertex shader / fragment shader,
//                               so that succeeding code can have access to the global unity_InstanceID.
//                               Also procedural function is called to setup instance data.
# define UNITY_SETUP_INSTANCE_ID(input) DEFAULT_UNITY_SETUP_INSTANCE_ID(input)

#define DEFAULT_UNITY_SETUP_INSTANCE_ID(input)       { UnitySetupInstanceID(UNITY_GET_INSTANCE_ID(input)); UnitySetupCompoundMatrices(); }

这个宏主要做了两件事，第一是设置全局的
unity_InstanceID变量，该变量用于索引shader用到的各类内置矩阵（例如object to world）的数组：

void UnitySetupInstanceID(uint inputInstanceID)
{
   #ifdef UNITY_STEREO_INSTANCING_ENABLED
         #if defined(SHADER_API_GLES3)
            // We must calculate the stereo eye index differently for GLES3
            // because otherwise,the unity shader compiler will emit a bitfieldInsert function.
            // bitfieldInsert requires support for glsl version 400 or later.Therefore the
            // generated glsl code will fail to compile on lower end devices.By changing the
            // way we calculate the stereo eye index,we can help the shader compiler to avoid
            // emitting the bitfieldInsert function and thereby increase the number of devices we
            // can run stereo instancing on.
            unity_StereoEyeIndex = round(fmod(inputInstanceID, 2.0));
            unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
         #else
            // stereo eye index is automatically figured out from the instance ID
            unity_StereoEyeIndex = inputInstanceID & 0x01;
            unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
         #endif
   #else
         unity_InstanceID = inputInstanceID + unity_BaseInstanceID;
   #endif
}
第二就是重新定义常用的矩阵：
   void UnitySetupCompoundMatrices()
   {
         unity_MatrixMVP_Instanced = mul(unity_MatrixVP, unity_ObjectToWorld);
         unity_MatrixMV_Instanced = mul(unity_MatrixV, unity_ObjectToWorld);
         unity_MatrixTMV_Instanced = transpose(unity_MatrixMV_Instanced);
         unity_MatrixITMV_Instanced = transpose(mul(unity_WorldToObject, unity_MatrixInvV));
   }
注意这里的
unity_ObjectToWorld和
unity_WorldToObject也已经被重新定义过了：

   #define unity_ObjectToWorld UNITY_ACCESS_INSTANCED_PROP(unity_Builtins0, unity_ObjectToWorldArray)
   #define MERGE_UNITY_BUILTINS_INDEX(X) unity_Builtins##X
   #define unity_WorldToObject UNITY_ACCESS_INSTANCED_PROP(MERGE_UNITY_BUILTINS_INDEX(UNITY_WORLDTOOBJECTARRAY_CB), unity_WorldToObjectArray)

   inline float4 UnityObjectToClipPosInstanced(in float3 pos)
   {
         return mul(UNITY_MATRIX_VP, mul(unity_ObjectToWorld, float4(pos, 1.0)));
   }
   inline float4 UnityObjectToClipPosInstanced(float4 pos)
   {
         return UnityObjectToClipPosInstanced(pos.xyz);
   }
   #define UnityObjectToClipPos UnityObjectToClipPosInstanced
开启gpu instancing时，这里实际上就是用instanceId去对应的矩阵数组中进行索引。

http://pic1.zhimg.com/v2-18aeb19942934140521844ac688d866c_r.jpg
正是因为每次batch都需要传递给gpu的是矩阵数组而不是矩阵本身，batch的大小需要进行限制，即最多一次只会将有限数量的几何体合并到一个batch进行gpu instancing。unity定义了一个
UNITY_INSTANCED_ARRAY_SIZE宏来表示最大数量的限制。

gpu instancing同样支持阴影和多光源的情况。对于阴影，只需要在shadow caster的pass中加上对应的instancing声明即可：
#pragma multi_compile_shadowcaster
#pragma multi_compile_instancing

struct VertexData {
UNITY_VERTEX_INPUT_INSTANCE_ID
};

InterpolatorsVertex MyShadowVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_SETUP_INSTANCE_ID(v);
}

http://pic4.zhimg.com/v2-2d886528346ef528474582d74b17d24b_r.jpg
对于多光源的情况，则需要使用延迟渲染路径：

然而，默认的gpu instancing只能支持相同材质，这在使用时会很不方便，有时候可能仅仅想要修改材质的某个属性，例如这里修改不同球体的颜色，会导致instancing失效：

http://pic2.zhimg.com/v2-85a380cf4bb7c06367b0e73cb60a8371_r.jpg
我们可以使用
MaterialPropertyBlock来避免修改颜色时创建出新的材质：

         MaterialPropertyBlock properties = new MaterialPropertyBlock();
         properties.SetColor(
            &#34;_Color&#34;, new Color(Random.value, Random.value, Random.value)
         );
         t.GetComponent<MeshRenderer>().SetPropertyBlock(properties);
为了在shader代码中使用到此属性，需要在instancing buffer中对其定义：
UNITY_INSTANCING_BUFFER_START(InstanceProperties)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
#define _Color_arr InstanceProperties
UNITY_INSTANCING_BUFFER_END(InstanceProperties)
对宏进行展开，可以发现就是定义了一个包含struct数组的cbuffer，其中struct中定义了我们新增的属性：
#define UNITY_INSTANCING_BUFFER_START(buf)    UNITY_INSTANCING_CBUFFER_SCOPE_BEGIN(UnityInstancing_##buf) struct {
#define UNITY_INSTANCING_BUFFER_END(arr)    } arr##Array; UNITY_INSTANCING_CBUFFER_SCOPE_END
#define UNITY_DEFINE_INSTANCED_PROP(type, var)type var;
如果要把vertex shader中使用的instanceId传递到fragment shader，可以使用unity提供的
UNITY_TRANSFER_INSTANCE_ID：

InterpolatorsVertex MyVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_INITIALIZE_OUTPUT(Interpolators, i);
UNITY_SETUP_INSTANCE_ID(v);
UNITY_TRANSFER_INSTANCE_ID(v, i);
…
}
这个宏定义很简单：
#define UNITY_TRANSFER_INSTANCE_ID(input, output) output.instanceID = UNITY_GET_INSTANCE_ID(input)

那么最终要如何正确读取这个cbuffer的属性呢？这里Unity也提供了配套的宏：
float3 GetAlbedo (Interpolators i) {
float3 albedo =
   tex2D(_MainTex, i.uv.xy).rgb * UNITY_ACCESS_INSTANCED_PROP(_Color_arr, _Color).rgb;
...
}
这个宏定义也很简单，就是从之前定义的struct数组中，根据instanceId进行索引，再取出对应的变量：
#define UNITY_ACCESS_INSTANCED_PROP(arr, var) arr##Array.var

经过修改之后，再次运行，可以发现batch降低了，instancing生效了：

如果你觉得我的文章有帮助，欢迎关注我的微信公众号：Game_Develop_Forever

Reference

GPU Instancing
（四）unity自带的着色器源码剖析之——————Unity3D 多例化技术（GUI Instancing）

页: [1]

Unity开发者联盟's Archiver

Unity的GPU Instancing