Suramya's Blog : Welcome to my crazy life…

February 21, 2023

Fixing problems with nvidia-driver on Debian Unstable after latest upgrade

Filed under: Computer Software,Linux/Unix Related,Tech Related — Suramya @ 10:54 PM

Earlier today I ran my periodic update of my main desktop that is running Debian Unstable. The upgrade finished successfully and since a new kernel was released with this update I restarted the system to ensure that all files/services etc are running the same version. After the reboot the GUI refused to start and I thought the problem could be because of a NVIDIA kernel module issue so I tried to reboot to an older kernel but that didn’t work either. Then I tried running apt-get dist-upgrade again which gave me the following error:

root@StarKnight:~# apt-get dist-upgrade 
Reading package lists...
Building dependency tree...
Reading state information...
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 nvidia-driver : Depends: nvidia-kernel-dkms (= 525.85.12-1) but 515.86.01-1 is installed or
                          nvidia-kernel-525.85.12 or
                          nvidia-open-kernel-525.85.12 or
                          nvidia-open-kernel-525.85.12
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

So I ran the apt –fix-broken install command as recommended and that failed as well with another set of errors:

root@StarKnight:/var/log# apt --fix-broken install
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Correcting dependencies... Done
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
dpkg: dependency problems prevent configuration of nvidia-driver:
 nvidia-driver depends on nvidia-kernel-dkms (= 525.85.12-1) | nvidia-kernel-525.85.12 | nvidia-open-kernel-525.85.12 | nvidia-open-kernel-525.85.12; however:
  Version of nvidia-kernel-dkms on system is 515.86.01-1.
  Package nvidia-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.

dpkg: error processing package nvidia-driver (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)

Looking at the logs, I didn’t see any major errors but I did see the following message:

2023-02-21T19:48:27.668268+05:30 StarKnight kernel: [    3.379006] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.86.01  Wed Oct 26 09:12:38 UTC 2022
2023-02-21T19:48:27.668286+05:30 StarKnight kernel: [    4.821755] NVRM: API mismatch: the client has the version 525.85.12, but
2023-02-21T19:48:27.668287+05:30 StarKnight kernel: [    4.821755] NVRM: this kernel module has the version 515.86.01.  Please
2023-02-21T19:48:27.668287+05:30 StarKnight kernel: [    4.821755] NVRM: make sure that this kernel module and all NVIDIA driver
2023-02-21T19:48:27.668288+05:30 StarKnight kernel: [    4.821755] NVRM: components have the same version.

Searching on the web didn’t give me a solution but since I am running the Debian Unstable branch it is expected that once in a while things might break and sometimes they break quite spectacularly… So I started experimenting and tried removing and reinstalling the nvidia-driver but that was failing as well because the package was expecting nvidia-kernel-dkms version 525.85.12 but we had 515.86.01-1 installed.

root@StarKnight:~# apt-get install nvidia-driver
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  nvidia-driver
0 upgraded, 1 newly installed, 0 to remove and 14 not upgraded.
Need to get 0 B/494 kB of archives.
After this operation, 1,398 kB of additional disk space will be used.
Selecting previously unselected package nvidia-driver.
(Reading database ... 439287 files and directories currently installed.)
Preparing to unpack .../nvidia-driver_525.85.12-1_amd64.deb ...
Unpacking nvidia-driver (525.85.12-1) ...
dpkg: dependency problems prevent configuration of nvidia-driver:
 nvidia-driver depends on nvidia-kernel-dkms (= 525.85.12-1) | nvidia-kernel-525.85.12 | nvidia-open-kernel-525.85.12 | nvidia-open-kernel-525.85.12; however:
  Version of nvidia-kernel-dkms on system is 515.86.01-1.
  Package nvidia-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.

Now I had a couple of options, first was to wait for a couple of days (if I am lucky) for someone to upload the correct versions of the packages to the channel. The second option was to remove the package and installed the Open Source version of the Nvidia driver. I didn’t want to do that because that package is a memory hog and doesn’t work that well either. The last option was to try to manually install the older version (525.85.12) of the nvidia-kernel-dkms package and this is what I decided to go with, a search on the Debian Packages site gave me the .deb file for nvidia-kernel-dkms and firmware-nvidia-gsp (a dependency for the dkms package). I downloaded both the packages and installed them using the following command:

root@StarKnight:/home/suramya/Media/Downloads# dpkg -i firmware-nvidia-gsp_525.85.12-1_amd64.deb 
root@StarKnight:/home/suramya/Media/Downloads# dpkg -i nvidia-kernel-dkms_525.85.12-1_amd64.deb 

Once the packages were successfully downgraded I rebooted the system and the GUI came up without issues post the reboot.

Moral of the story is that you need to be prepared to have to troubleshoot your setup if you are running Debian Unstable or Debian Testing on your system. If you don’t want to do that then you should stick to Debian Stable which is rock solid or one of the other distributions such as Ubuntu or Linux Mint etc.

– Suramya

4 Comments »

  1. Add “non-free-firmware” to /etc/apt/sources.list.

    Comment by X-Tech — February 28, 2023 @ 11:05 AM

  2. I already have that. The issue is that the versions for firmware-nvidia-gsp, nvidia-kernel-dkms and nvidia-kernel in the repository were not matching. So the upgrade broke the system till I forced a downgrade of the packages.

    It should get resolved by the time I do the next system upgrade in a few days.

    Not the first time this has happened and won’t be the last, that’s what you get when you are on the Unstable repo.

    Comment by Suramya — February 28, 2023 @ 11:22 AM

  3. It shouldn’t happen in the first place, this software should’ve been extensively unit and E2E tested by a large team of people before they shipped it out, clearly they are too lazy to do that

    Comment by asd — March 30, 2024 @ 2:25 AM

  4. @asd, I am running the Unstable branch, which is what is used to perform the E2E testing that you are referring to. It is expected that at times the upgrades will break something because this is where they test inter-package interactions. I mean, the name itself is ‘unstable’.

    Calling them lazy is out of line and uncalled for as this is expected behavior.

    Now, if this had happened on the Stable Branch I would agree with you and question the QA processes as Stable is supposed to be released after extensive testing.

    – Suramya

    Comment by Suramya — April 1, 2024 @ 2:59 PM

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress