Dave Heinemann

Fedora 37: "NVIDIA kernel module missing" boot error

Since upgrading to Fedora 37 last weekend, I've noticed a brief error message during startup:

NVIDIA kernel module missing. Falling back to nouveau.

This post is a summary of how I diagnosed and fixed the issue, in case it's helpful for anyone else.

Initial Troubleshooting

I figured my Nvidia drivers might require reinstallation after the system upgrade, so I removed all Nvidia packages and then reinstalled them, broadly following the RPM Fusion guide:

$ sudo dnf remove *nvidia*
$ sudo dnf install akmod-nvidia

After rebooting, the error persisted.

I removed the Nvidia drivers again, then re-read the RPM Fusion guide and realised I had overlooked a step. I had enabled Secure Boot after upgrading to Fedora 37, since Gnome recommended it. Now that Secure Boot is enabled, I needed to generate a key to sign kernel modules with. RPM Fusion has instructions on how to do this.

While following the instructions to generate a signing key, I couldn't actually get kmodgenca to generate a key; it would simply exit without printing any messages. At this point I decided I'd set this issue aside and troubleshoot it later, so I reinstalled akmod-nvidia and rebooted.

Then my system started freezing during startup. Now I had to fix it.

Boot Hang

I started troubleshooting the boot hang. The main symptoms are that:

I tried booting into Rescue Mode. That failed with this error:

Cannot open access to console, the root account is locked in emergency mode

I later fixed this by setting a root password as per this guide.

I edited the kernel boot options and tried booting into single-user mode. That failed with the same error.

I tried booting with graphical boot disabled by removing rhgv from the kernel boot options. This time I could see the boot process hang after this step:

Failed to start vboxdrv.service

However, I think I remember seeing that error before the Fedora 37 upgrade, so I decided it was unrelated.

Finally, I tried booting with the nouveau drivers (instead of Nvidia drivers) by removing them from the kernel boot option blacklist. This worked and I could finally boot again!

After successfully booting, I checked <b>journalctl</b> for information about what went wrong and found this:
Apr 30 09:46:33 Abraxas kernel: nouveau 0000:01:00.0: bus: MMIO write of 80000140 FAULT at 10eb14 [ PRIVRING ]
Apr 30 09:46:33 Abraxas kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Apr 30 09:46:33 Abraxas kernel: #PF: supervisor instruction fetch in kernel mode
Apr 30 09:46:33 Abraxas kernel: #PF: error_code(0x0010) - not-present page

Signing the Drivers

While researching the issue, I discovered this helpful post which explained that the process for configuring and installing the Nvidia drivers on a Secure Boot system needs to happen in a certain order. The signing key must be generated before installing akmod-nvidia. Then after akmod-nvidia is installed, you must allow around 5 minutes for it to build in the background; I assume this is the point where it actually signs the drivers.

This wasn't clear to me before, so with this knowledge I tried again from the start:

  1. I removed all Nvidia packages.
  2. I generated a signing key and imported it into EFI. I had to run kmodgenca with the --force flag to force it to generate the key, then accepted all the defaults.
  3. I reinstalled akmod-nvidia.
  4. I monitored the build process with ps aux | grep akmod, and waited until it finished before rebooting. It took around 5 minutes.
  5. When prompted, I agreed to add the public key to EFI.
  6. I allowed the system to boot as normal.

It worked! There were no boot errors or hangs, and I was able to open nvidia-settings and see my GPU.

Why did all this Happen?

I think this is all on me.

The original error about the Nvidia kernel module missing was probably because I enabled Secure Boot. I assume the drivers I installed in the past wouldn't have been signed.

The boot hang was probably caused by me rebooting too soon after installing the akmod-nvidia package. I assume this left the build in an incomplete state.

I'm not sure why I had to forcibly generate a new signing key. /usr/share/doc/akmods/README.secureboot indicates that kmodgenca will silently exit if a key already exists, so there must have been a key already. However, I don't recall creating it. Maybe it was generated automatically, but never imported into the EFI firmware.

Do you have any thoughts or feedback? Let me know via email!

#Fedora #Linux