1. 现象

一台Windows 10安装有2张显卡,RTX4000和P2000,运行nvidia-smi显示如下:

1
2
3
4
5
6
7
8
9
10
11
12
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.94       Driver Version: 516.94       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P2000       WDDM  | 00000000:02:00.0 Off |                  N/A |
| 57%   56C    P0    25W /  75W |    289MiB /  5120MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 4000    WDDM  | 00000000:04:00.0 Off |                  N/A |
| 30%   45C    P0    60W / 125W |   3442MiB /  8192MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+

说明P2000的显卡Id是0,RTX4000的Id是1。运行A软件指定工作在RTX4000上出现闪退。

2. 分析

2.1. 更换驱动

记得驱动版本516不是RTX系列的,于是更换到最新的456.71:

1
2
3
4
5
6
7
8
9
10
11
12
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71       Driver Version: 456.71       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P2000       WDDM  | 00000000:02:00.0 Off |                  N/A |
| 57%   56C    P0    25W /  75W |    289MiB /  5120MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 4000    WDDM  | 00000000:04:00.0 Off |                  N/A |
| 30%   45C    P0    60W / 125W |   3442MiB /  8192MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+

装完重启系统后,问题依旧。

2.2. 排查代码

学习软件代码发现内部有看门狗功能,当主线程操作CUDA解码图像、图像处理时发生意外阻塞时,会超时10秒退出。加入详细日志观察到解码时堆积了太多来自FFMPEG的NAL包,说明解码慢。

查看任务管理器、GPU-Z、Nvidia-SMI中的显卡性能数据,都没有明显的指证。同时相同的A软件在Windows 7的RTX4000和P5000显卡环境上,工作正常。

2.3. FFMPEG辅助

为避免A软件自身的关隘,改用FFMPEG测试,参考FFMPEG常用操作,使用CUDA解码器时指定使用Id为0的显卡:

ffmpeg -gpu 0 -vcodec hevc_cuvid -rtsp_transport tcp -i "rtsp://192.168.8.168" -f null null

再观察Nvidia-smi,显示FFMPEG进程竟然工作在显卡1,如下图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71       Driver Version: 456.71       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P2000       WDDM  | 00000000:02:00.0 Off |                  N/A |
| 49%   44C    P0    17W /  75W |    278MiB /  5120MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 4000    WDDM  | 00000000:04:00.0 Off |                  N/A |
| 30%   42C    P0    36W / 125W |    629MiB /  8192MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1980    C+G   ...bbwe\Microsoft.Photos.exe    N/A      |
|    0   N/A  N/A      2424    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A     12628    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A     13812    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     15248    C+G   ...2txyewy\TextInputHost.exe    N/A      |
|    0   N/A  N/A     18832    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     22344    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     26420    C+G   C:\Windows\explorer.exe         N/A      |
|    1   N/A  N/A      1600    C+G   Insufficient Permissions        N/A      |
|    1   N/A  N/A      1608    C+G   Insufficient Permissions        N/A      |
|    1   N/A  N/A     20464      C   ...tsp_server\bin\ffmpeg.exe    N/A      |
+-----------------------------------------------------------------------------+

同理使用-gpu 1,显示工作在显卡0,这就奇怪了。换到A软件尝试也是如此,通过cuDeviceGetName()来打印指定显卡Id是0时的显卡名称,竟然是RTX4000。更换电脑上多个位置的Nvidia-SMI软件,都这样。

3. 原因

对比nvidia-smi返回的显卡Id和名称的结果,和基于CUDA的NVENCEC API获取结果,恰恰相反,0变成1,1变成0。任务管理器中看到的显卡顺序和nvidia-smi一致,说明相互兼容出了偏差。