高性能计算

centos5.4安装nvidia CUDA3.0 错误及解决

2010年4月27日 阅读(49)

安装文件为:
cudatoolkit_3.0_linux_32_fedora10.run
devdriver_3.0_linux_32_195.36.15.run

cuda安装前需要安装kernel、kernel-headers和kernel-devel kernel-source,

yum install kernel kernel-headers kernel-devel

如果还找不到,则手动找到内核源代码目录,通常在/usr/src下面,使用–kernel-source-path指明

./devdriver_3.0_linux_32_195.36.15.run –kernel-source-path /usr/src/kernels/2.6.18-164.15.1.el5-i686

之后出现如下错误:
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module ‘nvidia.ko’.  This happens most
       frequently when this kernel module was built against the wrong or
       improperly configured kernel sources, with a version of gcc that differs
       from the one used to build the target kernel, or if a driver such as
       rivafb/nvidiafb is present and prevents the NVIDIA kernel module from
       obtaining ownership of the NVIDIA graphics device(s).

参考这里的讨论:http://www.nvnews.net/vbulletin/showthread.php?t=49951&page=3

Looks like I spoke too soon, the problem was solved by using the "-k $(uname -r)"  (thank you jong0357 and the all of you)

把命令改为:
./devdriver_3.0_linux_32_195.36.15.run –kernel-source-path /usr/src/kernels/2.6.18-164.15.1.el5-i686 -k $(uname -r)

虽然这样可以编译成功,但是后面运行时又出现错误:
[root@gpu-node1 cuda_test]# ./a.out
FATAL: Error inserting nvidia (/lib/modules/2.6.18-164.el5PAE/kernel/drivers/video/nvidia.ko): Invalid module format
查看/var/log/ nvidia-installer.log,可以看到如下信息:nvidia: disagrees about version of symbol struct_module.
看这边的解释:http://www.nvnews.net/vbulletin/showthread.php?t=76705

Default Re: ‘invalid module format’ when building module

Just got a little futher after finding a post that states I must have "xorg-x11-server-sdk" installed. I have now done this, however I get an error now stating:

nvidia: disagrees about version of symbol struct_module.

the kernel source is the same version as the running kernel…

# rpm -qa | grep kernel
kernel-devel-2.6.17-1.2630.fc6
kernel-PAE-2.6.17-1.2630.fc6
kernel-headers-2.6.17-1.2630.fc6

edit: attached nvidia-installer.log

Attached Files File Type: zip nvidia-installer.log.zip (5.0 KB, 81 views)

Last edited by CRCinAU; 09-15-06 at 06:52 AM.

Default Re: ‘invalid module format’ when building module

nvidia: disagrees about version of symbol struct_module

Judging from this error message, the kernel development files installed on your system do not match the running kernel exactly. My guess is you’ll need to install the correct kernel-PAE-devel package.

也就是说实际安装的kernel的原代码与内核本身编译所用的源代码不是一个版本。重新简单内核版本:

[root@gpu-node1 log]# rpm -qa|grep kernel

kernel-devel-2.6.18-164.15.1.el5

kernel-2.6.18-164.15.1.el5

kernel-headers-2.6.18-164.15.1.el5

kernel-PAE-2.6.18-164.el5

[root@gpu-node1 log]# yum install kernel-devel

实际上内核本身所kernel-PAE-2.6.18-164.el5,但是安装的却是kernel-devel-2.6.18-164.15.1.el5

原因是我的yum默认采用了

* addons: centos.ustc.edu.cn

 * base: centos.ustc.edu.cn

 * extras: centos.ustc.edu.cn

 * updates: centos.ustc.edu.cn作为源,将光盘作为源,重新安装。

安装成功后,设置path

export PATH=$PATH:’/usr/local/cuda/bin’

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:’/usr/local/cuda/lib’

You Might Also Like