Unity中SLua、Tolua、XLua和ILRuntime效率评测

zt3ff3n 发表于 2023-1-17 08:31

Unity脚本效率评测

对SLua、Tolua、XLua和ILRuntime四个脚本插件进行效率测试，对框架脚本进行选型。
本文项目:https://github.com/cateatcatx/UnityScriptPTest
tolua:https://github.com/topameng/tolua
slua:https://github.com/pangweiwei/slua
xlua:https://github.com/Tencent/xLua
ILRuntime:https://github.com/Ourpalm/ILRuntime
用例版本

Unity5.6.0p3
SLua 1.3.2
Tolua# github 2017/4/25 19:08:37
XLua 2.1.7
ILRuntime 1.11
Lua: luajit-2.1.0-beta2测试环境

Smartian T2、Win7(64bit)实验方案

总共设计了17个Test，分别从以下3个方面来考察脚本效率（JIT和非JIT），实验结果取10次的平均值，时间单位为ms。通过实验数据简单分析原因（Lua插件会横向对比，ILRuntime会单独考虑，因为毕竟C#和Lua本身差别较大）。
1. Mono -> Script，Mono调用脚本
2. Script -> Mono，脚本调用Mono
3. Script自身，脚本自身执行效率Mono -> Script

Test11Test12Test13Test14Test15Test16SumTolua1864495984075777592978SLua315751901757125318835863XLua1459241010573150719296091ILRuntime7114223683793973932672Tolua(JIT)1684545924165788263037SLua(JIT)384842956824132834397775XLua(JIT)1899571047608154017006043ILRuntime(JIT)1232892903969117511026275Lua分析：
-- Test11functionEmptyFunc() _V0 = _V0 + 1end_V0 = 1-- Test12_V1 = "12345"-- Test13_V2 = GameObject.New() -- Test14_V3 = Vector3.New(1, 2, 3) -- Test15_V4 = {1, 2, 3} -- Test16Test11为Mono调用脚本中空方法，Test12~16为测试Mono到脚本中取变量（ILRuntime代码类似，访问类的static函数与变量）。ILRuntime因为没有变量类型的转换所以效率最为优秀（JIT模式下使用的是Mono的反射取值所以会更慢，ILRuntime内部可能对类型变量有缓存，所以比反射快很多），Lua中可以看到Tolua的综合性能尤为突出（所有测试均好于其他Lua）。
对比Tolua和SLua的实现发现，Tolua会尽量减少与C++通信的次数，因为c#与c++通信会有一定效率损耗（参数的Marshaling等），虽然是Mono与Lua通信，但是其中还夹着一层C++，所以Mono与Lua通信的主要优化思路就是减少与C++的通信，从实验数据来看Tolua的这种优化效果是很明显的。
// SLua的函数调用publicboolpcall(int nArgs, int errfunc){ if (!state.isMainThread()) {    Logger.LogError("Can't call lua function in bg thread");    returnfalse; } LuaDLL.lua_getref(L, valueref); if (!LuaDLL.lua_isfunction(L, -1)) {    LuaDLL.lua_pop(L, 1);    thrownew Exception("Call invalid function."); } LuaDLL.lua_insert(L, -nArgs - 1); if (LuaDLL.lua_pcall(L, nArgs, -1, errfunc) != 0) {    LuaDLL.lua_pop(L, 1);    returnfalse; } returntrue;}// Tolua的函数调用publicvoidCall(){ BeginPCall(); PCall(); EndPCall();}publicintBeginPCall(int reference){                         return LuaDLL.tolua_beginpcall(L, reference);}publicintLuaPCall(int nArgs, int nResults, int errfunc){ return LuaDLL.lua_pcall(L, nArgs, nResults, errfunc);}publicvoidLuaSetTop(int newTop){ LuaDLL.lua_settop(L, newTop);}对比SLua和Tolua代码的函数调用部分，发现SLua的C++调用多余Tolua两倍左右，所以效率高下立见。变量读取读者可以自行对比，总结就是Tolua通过减少C++调用的方式来优化效率，后期对Lua的进一步优化也要遵循这个思路。
ILRuntime分析：
实验发现解释执行下，ILRuntime的获取变量的效率要明显好于Lua，主要是因为都是C#对象，不需要进行类型转换。ILRuntime的JIT模式其实是用的Mono的，效率反而比解释执行更低，猜测是因为JIT下主要采用反射调用函数和取变量，而ILRuntime的解释器可能内部有缓存（因为没看过实现，都是不负责任猜测）。
Script->Mono

Test0Test1Test2Test3Test4Test5Test6Test7Test10SumTolua675797141035483934340716819157426SLua768640230546611104083943379129910774XLua59064885044507751213695126784514989ILRuntime11521054401231599843710263272143413703Tolua(JIT)6127017.4357823318371288843871SLua(JIT)7326795.446511974111016513074974XLua(JIT)63666883124387671303734134091115113ILRuntime(JIT)721978426040233219303771651Lua分析：
functionTest0(transform)local t = os.clock() for i = 1,200000do    transform.position = transform.position endreturnos.clock() - tendfunctionTest1(transform)local t = os.clock() for i = 1,200000do    transform:Rotate(up, 1) endreturnos.clock() - tendfunctionTest2()local t = os.clock() for i = 1, 2000000dolocal v = Vector3.New(i, i, i)    local x,y,z = v.x, v.y, v.z endreturnos.clock() - tendfunctionTest3()local t = os.clock()    for i = 1,20000do                   GameObject.New() endreturnos.clock() - tendfunctionTest4()local t = os.clock() local tp = typeof(SkinnedMeshRenderer) for i = 1,20000dolocal go = GameObject.New()    go:AddComponent(tp)    local c = go:GetComponent(tp)    c.receiveShadows=falseendreturnos.clock() - tendfunctionTest5()local t = os.clock() for i = 1,200000dolocal p = Input.mousePosition    --Physics.RayCastendreturnos.clock() - tendfunctionTest6()local Vector3 = Vector3 local t = os.clock() for i = 1, 200000dolocal v = Vector3.New(i,i,i)    Vector3.Normalize(v) endreturnos.clock() - tendfunctionTest7()local Quaternion = Quaternion local t = os.clock() for i=1,200000dolocal q1 = Quaternion.Euler(i, i, i)             local q2 = Quaternion.Euler(i * 2, i * 2, i * 2)    Quaternion.Slerp(Quaternion.identity, q1, 0.5)       endreturnos.clock() - tendfunctionTest10(trans)local t = os.clock() for i = 1, 200000do    UserClass.TestFunc1(1, "123", trans.position, trans) endreturnos.clock() - tend总体效率还是Tolua胜出，其中XLua在Test2中较比SLua、Tolua差出不止一个数量级，主要是因为Tolua和SLua对于Unity的值类型变量做了lua的实现，这种值类型SLua和Tolua中是一个table，而在XLua中是一个Userdata，所以SLua和Tolua在做Test2的时候并没有跟Unity交互（从JIT结果也能看出来，JIT不能处理C函数，所以JIT后Test2效果提升明显），而XLua需要频繁和Unity交互，效率消耗明显。对于对象类型的变量，3种lua处理机制是雷同的，只是内部实现细节不一样而已，细节不再本文讨论范围内，从实验数据上来看，还是Tolua的内部实现更加效率。用好lua+unity，让性能飞起来——lua与c#交互篇，这篇文章对C#与Lua的交互原来有非常详细的说明，虽然插件后续有改进，但是核心思想还是不变的。
ILRuntime分析：
数据上来看ILRuntime解释器的效率还是很高的并不比lua慢太多，但是对于Vector3这种Unity值类型的处理跟lua差距比较大（主要是因为SLua和Tolua中的Unity值类型其实就是table，等于没有跟Unity交互）。ILRuntime还是一个很有潜力的Unity热更解决方案的，毕竟C#配合VS的开发效率还是比Lua高不少的。其中的JIT部分是Mono层的，跟本身的C#代码是没有区别的，不参与对比。
Script自身

Test8Test9SumTolua25442464500SLua25547665022XLua31145064817ILRuntime8527904879900Tolua(JIT)46371417SLua(JIT)48414463XLua(JIT)40469510ILRuntime(JIT)222313536functionTest8()local total = 0local t = os.clock() for i = 0, 1000000, 1do    total = total + i - (i/2) * (i + 3) / (i + 5) endreturnos.clock() - t endfunctionTest9()local array = {} for i = 1, 1024do    array = i endlocal total = 0local t = os.clock() for j = 1, 100000dofor i = 1, 1024do          total = total + array    endendreturnos.clock() - tend因为Lua全部使用的是LuaJIT2.1.0B2版本，所以其实脚本自身的效率理论上应该是一致的，从数据上看也差不多。实验结果上主要体现了Lua解释器的速度要明显好于ILRuntime（语言内部实现不一样，勉强对比在一块，毕竟lua是c写的），并且发现LuaJIT对效率的提升也是好几个数量级，虽然LuaJIT很多坑，但是如果能用好还是个优化利器。
总结

综合来看Tolua是现在效率较好的Unity Lua解决方案，后续会对Tolua的内部实现做进一步剖析，从来做进一步的效率优化。

页: [1]

Unity开发者联盟's Archiver

Unity中SLua、Tolua、XLua和ILRuntime效率评测