Unity实现GPUDriven地形

123456833 · 发表于 2021-3-3 09:28

在看了 @安柏霖的天刀手游关于GPU Driven的分享博客后
想着自己把GPU Driven的地形在Unity简单实现一下，主要是把地形这块有关的几个技术细节简单捋了一下。
天刀的思路应该是参考某年GDC上farcry5关于GPU地形的分享，大家有兴趣的可以google一下那篇ppt，讲得很通透。
思路

由于那两篇关于原理都讲得挺好，我这里就只做一个大致地阐述。
主要是把地形拆成4x4的小格子，考虑lod，得到一个全量的金字塔形的NodeList。
将NodeList传入computeshader计算lod，得到一个lod计算之后的列表（demo这一步在cpu进行的）
继续将处理后的NodeList传入computeshader作视椎体和Hiz剔除，得到可视Node的Id列表
使用Node的Id列表DrawMeshInstancedIndirect，通过Id获取node的信息，还原地形的相关信息，渲染出来。
资源准备

GPU driven渲染地形的话，主要用的是DrawMeshInstancedIndirect，这里边传入小一个棋盘格，然后使用Instance Buffer进行绘制。unity本身的地形也是支持drawInstance的，他是用32x32的模型去draw的，并且有一点极其傻逼的地方是他处理不了不同lod之间的接缝问题，居然想到用六种模型去模拟所有的接缝情况，也就是说即使勾上了DrawInstance，Unity渲染一块地形也得6个drawcall
这个处理接缝是多少年前的技术了，居然还用这种办法，就离谱！！
我们这使用一个模型就好了，用一个4x4的格子
数据结构

这里我们需要准备每个块的数据结构，这里是一个mipmap的金字塔结构，对于每个节点，我们定义一下数据结构，demo这里我暂时用这个
public struct NodeInfo
{
public float4 rect;
public int mip;
public int neighbor;

public NodeInfo(float4 r, int m)
{
      rect = r;
      mip = m;
      neighbor = 0;
}
}
这里记得，CPU端，Compute Shader，PS这三端的组成必须要一样，他们通过comandbuffer来传递。
构建数据

我们通过Unity原始地形提取GPU Driven所需的资源，高度图，法线图，NodeInfo信息等。
高度图
heightmapTex = terrain.terrainData.heightmapTexture;
法线图（unity底层是通过GennerateNormal的shader实时生成的，正式项目应该改成那样，这里的demo我直接在cpu直接这么搞）
normalTex = new Texture2D(heightmapTex.width, heightmapTex.height, TextureFormat.RGBA32, -1, true);var colors = new Color[heightmapTex.width * heightmapTex.width];int index = 0;for(int i=0;i<heightmapTex.width;i++) for(int j=0;j<heightmapTex.height;j++) {       var normal = terrain.terrainData.GetInterpolatedNormal((float)i / heightmapTex.width, (float)j / heightmapTex.height);       colors[index ++] = new Color( normal.z * 0.5f + 0.5f, normal.y * 0.5f + 0.5f, normal.x * 0.5f + 0.5f); }normalTex.SetPixels(colors);normalTex.Apply();NodeInfo
float perSize = 64;
var rect = new Rect(0, 0, terrain.terrainData.size.x, terrain.terrainData.size.z);
pageRoot = new TerrainNodePage(rect);
var children = new List<TerrainNodePage>();
for (var i = rect.xMin; i < rect.xMax; i += perSize)
for (var j = rect.yMin; j < rect.yMax; j += perSize)
{
      children.Add(new TerrainNodePage(new Rect(i, j, perSize, perSize), 3));
}
pageRoot.children = children.ToArray();
public TerrainNodePage(Rect r)
{
this.rect = r;
this.index = -1;
this.mip = -1;
}

public TerrainNodePage(Rect r, int m)
{
      this.rect = r;
      this.mip = m;
      this.Info = new NodeInfo(new float4(r.xMin,r.yMin,r.width,r.height), m);
      this.index = -1;
      if (this.mip > 0)
      {
         children = new TerrainNodePage[4];
         children[0] = new TerrainNodePage(new Rect(r.xMin, r.yMin, r.width / 2, r.height / 2), m - 1);
         children[1] = new TerrainNodePage(new Rect(r.xMin + r.width / 2, r.yMin, r.width / 2, r.height / 2), m - 1);
         children[2] = new TerrainNodePage(new Rect(r.xMin + r.width / 2, r.yMin + r.height / 2, r.width / 2, r.height / 2), m - 1);
         children[3] = new TerrainNodePage(new Rect(r.xMin, r.yMin + r.height / 2, r.width / 2, r.height / 2), m - 1);
      }
}
CommandBuffer
allInstancesPosWSBuffer = new ComputeBuffer(allNodeInfo.Count, sizeof(float) * 4 + sizeof(int) + sizeof(int));
allInstancesPosWSBuffer.SetData(allNodeInfo.ToArray());
visibleInstancesOnlyPosWSIDBuffer = new ComputeBuffer(allNodeInfo.Count, sizeof(uint), ComputeBufferType.Append);

计算LOD

这一步，在FarCry和天刀手游分享上都有讲通过computeshader去计算lod，这里层数都不宜太多，因为是一个树形结构，一般来讲4级lod就足够了，在computeshader计算可以用一个循环来代替多次Disbatch，我这里直接在cpu计算lod了。
public void CollectNodeInfo(Vector2 center, List<NodeInfo> allNodeInfo)
{
if (mip >= 0 && (mip == 0 || (center - rect.center).magnitude >= 100 * Mathf.Pow(2, mip)))
{
      this.index = allNodeInfo.Count;
      allNodeInfo.Add(this.Info);
}
else
{
      this.index = -1;
      foreach (var child in children)
      {
         child.CollectNodeInfo(center, allNodeInfo);
      }
}
}
这里计算Lod一般根据到相机的距离，以及地形的密度分布来确定，这里我只通过到相机的距离。
Compute剔除

视椎体剔除
将所有的NodeInfo传入ComputeShader，通过把包围盒投射到屏幕空间，看是否在屏幕内，决定是否剔除。
遮挡剔除
这里还可以通过比对深度，通过深度图和当前包围盒的Z值比对，如果在深度图后边，说明被遮挡了。
[numthreads(64,1,1)]
void CullTerrain (uint3 id : SV_DispatchThreadID)
{
float4 nowRect = _AllInstancesPosWSBuffer[id.x].rect;
float2 minPos = nowRect.xy;
float2 maxPos = nowRect.xy + nowRect.zw;
float4 heights = float4(_HeightMap[minPos],
                        _HeightMap[maxPos],
                        _HeightMap[float2(minPos.x, maxPos.y)],
                        _HeightMap[float2(maxPos.x, minPos.y)]);
float minHeight = _TerrainHeightSize * min(min(heights.x, heights.y), min(heights.z, heights.w));
float maxHeight = _TerrainHeightSize * max(max(heights.x, heights.y), max(heights.z, heights.w));
float4 boundVerts[8];
boundVerts[0] = float4(minPos.x, minHeight, minPos.y, 1);
boundVerts[1] = float4(minPos.x, minHeight, maxPos.y, 1);
boundVerts[2] = float4(maxPos.x, minHeight, minPos.y, 1);
boundVerts[3] = float4(maxPos.x, minHeight, maxPos.y, 1);
boundVerts[4] = float4(minPos.x, maxHeight, minPos.y, 1);
boundVerts[5] = float4(minPos.x, maxHeight, maxPos.y, 1);
boundVerts[6] = float4(maxPos.x, maxHeight, minPos.y, 1);
boundVerts[7] = float4(maxPos.x, maxHeight, maxPos.y, 1);
bool visible = false;
for (int i = 0; i < 8;i++)
{
      float4 posCS = mul(_VPMatrix, boundVerts);
      //posCS.xyz = posCS.xyz / posCS.w;
      float2 hizCoord = _HizSize.xy * 0.25 * (float2(posCS.x / posCS.w, posCS.y / posCS.w) * 0.5 + 0.5);
      float4 absPosCS = abs(posCS);
      if (absPosCS.z <= absPosCS.w && absPosCS.y <= absPosCS.w && absPosCS.x <= absPosCS.w && (1 - _HiZMap.mips[2][hizCoord]) * posCS.w > posCS.z)
         visible = true;
}

if (visible)
      _VisibleInstancesOnlyPosWSIDBuffer.Append(id.x);
}
这里我使用了Hiz，按理来说这里还需要计算使用了Hiz的哪一级mipmap（ddx和ddy，或者就通过相机距离），demo代码直接取的第二级，大家如果自己实现的话，需要修改这一部分。
Hiz计算

上面剔除用到了hiz，其实就是深度图，这里因为需要用到mipmap，需要生成不同mipmap等级的深度图，这里不要使用unity自带的生成mipmap，需要自己去构建mipmap，因为我们需要取深度的最小值，unity默认应该是取卷积。这里也可以使用一个ComputeShader简单求一下。
// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel CalHiz

// Create a RenderTexture with enableRandomWrite flag and set it
// with cs.SetTexture
RWTexture2D<float> HizTex;
Texture2D<float> DepthTex;
float uvScale;

[numthreads(16,8,1)]
void CalHiz(uint3 id : SV_DispatchThreadID)
{
float2 nowUV = id.xy * uvScale;
HizTex[id.xy] = min(min(DepthTex[nowUV], DepthTex[nowUV + float2(1, 0)]), min(DepthTex[nowUV + float2(0, 1)], DepthTex[nowUV + float2(1, 1)]));
}
这里的问题在于，生成mipmap，需要把上一级的贴图当输入，下一级的贴图当输出，在Unity里，如果指定RT的某个Level当输入，另一个Level当输出会出问题。这里我用了一个额外的rt，pingpang地执行。
还有一点，这个需要写到URP中，最好自己自定义一个URP的Feature和Pass，我这里简单地贴一下代码。
/// <inheritdoc/>
public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
{
if (HizRTFunc != null)
{
      var hizRT = HizRTFunc(context, renderingData.cameraData.camera, depthSize);
      if (hizRT != null)
      {
         var cmd = CommandBufferPool.Get(m_ProfilerTag);
         int width = hizRT.width;
         int height = hizRT.height;
         HizTexTemp = Shader.PropertyToID(&#34;_HizTexTemp&#34;);       //这里同一张贴图不同等级mipmap不能既当输入又当输出，采用pingpang方式写入mipmap
         for (int i = 0; i < hizRT.mipmapCount; i++)
         {
            if (i % 2 == 1)
            {
                  if (i > 1)
                  {
                     cmd.ReleaseTemporaryRT(HizTexTemp);
                  }
                  cmd.GetTemporaryRT(HizTexTemp, width, height, 0, hizRT.filterMode, hizRT.format, RenderTextureReadWrite.Linear, hizRT.antiAliasing, true);
                  cmd.SetComputeTextureParam(computeShader, 0, &#34;DepthTex&#34;, hizRT, i - 1);  // input mipmap not work
                  cmd.SetComputeFloatParam(computeShader, &#34;uvScale&#34;, Mathf.Pow(2,i));
                  cmd.SetComputeTextureParam(computeShader, 0, &#34;HizTex&#34;, HizTexTemp);
            }
            else
            {
                  if (i == 0)
                     cmd.SetComputeTextureParam(computeShader, 0, &#34;DepthTex&#34;, this.depthTex.Identifier());
                  else
                     cmd.SetComputeTextureParam(computeShader, 0, &#34;DepthTex&#34;, HizTexTemp);
                  cmd.SetComputeFloatParam(computeShader, &#34;uvScale&#34;, 2);
                  cmd.SetComputeTextureParam(computeShader, 0, &#34;HizTex&#34;, hizRT, i);
            }
            cmd.DispatchCompute(computeShader, 0, Mathf.CeilToInt(width / 16f), Mathf.CeilToInt(height / 8f), 1);
            width /= 2;
            height /= 2;

            if (i % 2 == 1)
                  cmd.CopyTexture(HizTexTemp,0,0, hizRT, 0,i);
         }

         cmd.ReleaseTemporaryRT(HizTexTemp);
         context.ExecuteCommandBuffer(cmd);
         CommandBufferPool.Release(cmd);
      }
}
}

DrawTerrain

上面我们已经通过剔除已经把可视的Node给剔除出来了，然后就可以把计算出来的CommandBuffer传入PS调用DrawMeshInstancedIndirect就可以画地形了。核心代码是这样的，大致就是通过InstanceId取出visibale索引，在全量表里通过索引获取地块信息，然后转换顶点。
float4 rect = _AllInstancesTransformBuffer[_VisibleInstanceOnlyTransformIDBuffer[instanceID]].rect;
float2 posXZ = rect.zw * 0.25 * v.position.xz + rect.xy; //we pre-transform to posWS in C# now
VaryingsLean o = (VaryingsLean) 0;

float3 positionWS = TransformObjectToWorld(posXZ.xyy);
float height = UnpackHeightmap(_TerrainHeightmapTexture.Load(int3(positionWS.xz, 0)));
positionWS.y = height * terrainParam.y * 2;
float3 normalWS = _TerrainNormalmapTexture.Load(int3(positionWS.xz, 0)).rgb * 2 - 1;

接缝处理

mesh在不同lod间相邻，如果不做任何处理会出现以下情况
这里我们可以在compute计算过程中获取上下左右的lod等级是否比当前块高，然后存入到不同位上，然后在渲染地块的时候就能知道哪块需要做接缝处理。
接缝处理也比较简单
就是检测到需要相邻格子的lod比当前大，做一次顶点退变就ok了。
我这里把mesh刷上顶点色，把需要处理的顶点表上不同颜色，rgba分别表示上下左右需要退变的点。
接着在vs中就比较好处理了
NodeInfoData infoData = _AllInstancesTransformBuffer[_VisibleInstanceOnlyTransformIDBuffer[instanceID]];
float4 rect = infoData.rect;
int neighbor = infoData.neighbor;
float2 diff = 0;
if (neighbor & 1)
{
diff.x = -input.color.r;
}
if (neighbor & 2)
{
diff.x = -input.color.g;
}
if (neighbor & 4)
{
diff.y = -input.color.b;
}
if (neighbor & 8)
{
diff.y = -input.color.a;
}

float2 positionWS = rect.zw * 0.25 * (input.positionOS.xz + diff) + rect.xy; //we pre-transform to posWS in C# now
VertexPositionInputs vertexInput;
vertexInput.positionWS = TransformObjectToWorld(positionWS.xyy);
float height = UnpackHeightmap(_TerrainHeightmapTexture.Load(int3(vertexInput.positionWS.xz , 0)));
float3 normalWS = _TerrainNormalmapTexture.Load(int3(vertexInput.positionWS.xz, 0)).rgb * 2 - 1;
得到的效果
阴影处理

如果不做任何处理，直接Graphic.DrawMeshInstancedIndirect的话，渲染shadowmap的时候只会渲染主相机剔除的结果，这样相机外的东西是没办法投射阴影的。
这里需要自己写一个URPPass，在渲染半透明后，使用CommandBuffer调用主相机的地形渲染。
public class GPUTerrainPass : ScriptableRenderPass
{
public static Action<ScriptableRenderContext,Camera> ExecuteAction;
public GPUTerrainPass()
{
      this.renderPassEvent = RenderPassEvent.AfterRenderingOpaques;
}
/// <inheritdoc/>
public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
{
      if (HizBehaviour.Instance?.hizRT == null)
      {
         return;
      }

      ExecuteAction?.Invoke(context, renderingData.cameraData.camera);
}
}

void Render(ScriptableRenderContext context, Camera cam)
{
var cmd = CommandBufferPool.Get(m_ProfilerTag);
if (DebugMode < 0 || (DebugMode == 0 && cam == Camera.main))
{
      var hizRT = HizBehaviour.Instance.hizRT;
      cmd.SetComputeTextureParam(cullingComputeShader, cullTerrainKernel, &#34;_HiZMap&#34;, hizRT);
      cmd.SetComputeVectorParam(cullingComputeShader, &#34;_HizSize&#34;, new Vector4(hizRT.width, hizRT.height, 0, 0));
      Matrix4x4 v = cam.worldToCameraMatrix;
      Matrix4x4 p = cam.projectionMatrix;
      Matrix4x4 vp = p * v;
      cmd.SetComputeBufferCounterValue(visibleInstancesOnlyPosWSIDBuffer, 0);
      cmd.SetComputeMatrixParam(cullingComputeShader, &#34;_VPMatrix&#34;, vp);
      cmd.DispatchCompute(cullingComputeShader, cullTerrainKernel, Mathf.CeilToInt(allNodeInfo.Count / 64f), 1, 1);
      cmd.CopyCounterValue(visibleInstancesOnlyPosWSIDBuffer, argsBuffer, 4);
}

cmd.DrawMeshInstancedIndirect(instanceMesh, 0, mat, 0, argsBuffer);
context.ExecuteCommandBuffer(cmd);
CommandBufferPool.Release(cmd);
}
阴影的处理要麻烦一点，因为要考虑Cascade。
我是在ShadowUtils里声明了一个Action，在渲染shadow的时候调用这个委托，把需要的数据传出来。
public static Action<CommandBuffer, Matrix4x4,Vector4, VisibleLight,int> CustomRenderShadowSlice;
public static void RenderShadowSlice(CommandBuffer cmd, ref ScriptableRenderContext context,
ref ShadowSliceData shadowSliceData, ref ShadowDrawingSettings settings,
Matrix4x4 proj, Matrix4x4 view,Vector4 shadowBias, VisibleLight shadowLight,int cascadeIndex)
{
cmd.SetViewport(new Rect(shadowSliceData.offsetX, shadowSliceData.offsetY, shadowSliceData.resolution, shadowSliceData.resolution));
cmd.EnableScissorRect(new Rect(shadowSliceData.offsetX + 4, shadowSliceData.offsetY + 4, shadowSliceData.resolution - 8, shadowSliceData.resolution - 8));

cmd.SetViewProjectionMatrices(view, proj);
context.ExecuteCommandBuffer(cmd);
cmd.Clear();
context.DrawShadows(ref settings);
CustomRenderShadowSlice?.Invoke(cmd, GL.GetGPUProjectionMatrix(proj, true) * view, shadowBias, shadowLight, cascadeIndex);
cmd.DisableScissorRect();
context.ExecuteCommandBuffer(cmd);
cmd.Clear();
}
public void RenderShadowmap(CommandBuffer cmd, Matrix4x4 shadowTransform,Vector4 shadowBias, VisibleLight shadowLight,int cascadeIndex)
{
if (DebugMode < 0 || DebugMode == cascadeIndex + 1)
{
      cmd.SetComputeBufferCounterValue(visibleInstancesOnlyPosWSIDBuffer, 0);
      cmd.SetComputeMatrixParam(cullingComputeShader, &#34;_VPMatrix&#34;, shadowTransform);
      cmd.SetComputeVectorParam(cullingComputeShader, &#34;_ShadowBias&#34;, shadowBias);
      Vector3 lightDirection = -shadowLight.localToWorldMatrix.GetColumn(2);
      cmd.SetComputeVectorParam(cullingComputeShader, &#34;_LightDirection&#34;, lightDirection);
      cmd.DispatchCompute(cullingComputeShader, cullTerrainShadowKernel, Mathf.CeilToInt(allNodeInfo.Count / 64f), 1, 1);
      if (DebugMode == cascadeIndex + 1)
      {
         cmd.CopyCounterValue(visibleInstancesOnlyPosWSIDBuffer, argsBuffer, 4);
         return;
      }

      cmd.CopyCounterValue(visibleInstancesOnlyPosWSIDBuffer, shadowBuffer, 4);
      cmd.DrawMeshInstancedIndirect(instanceMesh, 0, mat, 1, shadowBuffer);

}
}

demo演示

视频前半段，演示地块剔除。视频后半段是4级cascade下的渲染状态，可以看到Unity的cascade基本是后一级要完全覆盖掉上一级应该要渲染的物体。
demo工程：
这个是早期demo工程，里边只是做技术演示，可能有很多问题，我也不会去维护，只是提供给大家一些参考。

最后打个广告，北京字节游戏急招大量TA，正式和实习均可（使用二维码，或私聊我都可）。现在工

123456835 · 发表于 2021-3-3 09:35

这，三角形级别的剔除嘛？

银鲜目江探 · 发表于 2021-3-3 09:41

地块级别的，不过CullTerrian那一段我看得有点迷，包围盒计算和可见性判定都很奇怪…当意思到了就行吧

123456848 · 发表于 2021-3-3 09:43

[干杯]

普通人物怨 · 发表于 2021-3-3 09:46

文档写得工整漂亮。

惜颜705 · 发表于 2021-3-3 09:53

写的很不错！只是 Hi-z 只取第二级恐怕剔除就不太“保守”了

		自动登录	找回密码
密码			立即注册

Unity实现GPUDriven地形

本帖子中包含更多资源