1. 现象
一台Windows 10安装有2张显卡,RTX4000和P2000,运行nvidia-smi显示如下:
1
2
3
4
5
6
7
8
9
10
11
12
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.94 Driver Version: 516.94 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P2000 WDDM | 00000000:02:00.0 Off | N/A |
| 57% 56C P0 25W / 75W | 289MiB / 5120MiB | 18% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 4000 WDDM | 00000000:04:00.0 Off | N/A |
| 30% 45C P0 60W / 125W | 3442MiB / 8192MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
说明P2000的显卡Id是0,RTX4000的Id是1。运行A软件指定工作在RTX4000上出现闪退。
2. 分析
2.1. 更换驱动
记得驱动版本516不是RTX系列的,于是更换到最新的456.71:
1
2
3
4
5
6
7
8
9
10
11
12
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71 Driver Version: 456.71 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P2000 WDDM | 00000000:02:00.0 Off | N/A |
| 57% 56C P0 25W / 75W | 289MiB / 5120MiB | 18% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 4000 WDDM | 00000000:04:00.0 Off | N/A |
| 30% 45C P0 60W / 125W | 3442MiB / 8192MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
装完重启系统后,问题依旧。
2.2. 排查代码
学习软件代码发现内部有看门狗功能,当主线程操作CUDA解码图像、图像处理时发生意外阻塞时,会超时10秒退出。加入详细日志观察到解码时堆积了太多来自FFMPEG的NAL包,说明解码慢。
查看任务管理器、GPU-Z、Nvidia-SMI中的显卡性能数据,都没有明显的指证。同时相同的A软件在Windows 7的RTX4000和P5000显卡环境上,工作正常。
2.3. FFMPEG辅助
为避免A软件自身的关隘,改用FFMPEG测试,参考FFMPEG常用操作,使用CUDA解码器时指定使用Id为0的显卡:
ffmpeg -gpu 0 -vcodec hevc_cuvid -rtsp_transport tcp -i "rtsp://192.168.8.168" -f null null
再观察Nvidia-smi,显示FFMPEG进程竟然工作在显卡1,如下图:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71 Driver Version: 456.71 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P2000 WDDM | 00000000:02:00.0 Off | N/A |
| 49% 44C P0 17W / 75W | 278MiB / 5120MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 4000 WDDM | 00000000:04:00.0 Off | N/A |
| 30% 42C P0 36W / 125W | 629MiB / 8192MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1980 C+G ...bbwe\Microsoft.Photos.exe N/A |
| 0 N/A N/A 2424 C+G Insufficient Permissions N/A |
| 0 N/A N/A 12628 C+G Insufficient Permissions N/A |
| 0 N/A N/A 13812 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 15248 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 18832 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 22344 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 26420 C+G C:\Windows\explorer.exe N/A |
| 1 N/A N/A 1600 C+G Insufficient Permissions N/A |
| 1 N/A N/A 1608 C+G Insufficient Permissions N/A |
| 1 N/A N/A 20464 C ...tsp_server\bin\ffmpeg.exe N/A |
+-----------------------------------------------------------------------------+
同理使用-gpu 1
,显示工作在显卡0,这就奇怪了。换到A软件尝试也是如此,通过cuDeviceGetName()
来打印指定显卡Id是0时的显卡名称,竟然是RTX4000。更换电脑上多个位置的Nvidia-SMI软件,都这样。
3. 原因
对比nvidia-smi返回的显卡Id和名称的结果,和基于CUDA的NVENCEC API获取结果,恰恰相反,0变成1,1变成0。任务管理器中看到的显卡顺序和nvidia-smi一致,说明相互兼容出了偏差。