输出如下:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:08.0 Off | 0 |
| N/A 35C P0 21W / 75W | 0MiB / 7680MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+能展示出信息,就是驱动安装OK了。怎么来解读这个信息呢。以中间的空行分隔,可以把整个信息分成三个部分。 先说第一部分,也就是第一行:
我们此时可以再次执行nvidia-smi命令来查看一下:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 On | 00000000:00:08.0 Off | 0 |
| N/A 69C P0 66W / 75W | 6941MiB / 7680MiB | 90% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 309535 C python 6939MiB |
+-----------------------------------------------------------------------------+可以发现第三部分的进程信息不再是空了,能看到一个PID为309535的进程在使用GPU,进程名是python占用显存6939MB。总的Memory-Usage是6941MB。二者几乎一样。GPU利用率(GPU-Util)已经90%了。
最终在我的云服务器上跑了75分钟…… 好吧。
模型预测
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0
and cpu! (when checking argument for argument index in method wrapper__index_select)