Fedora 37: "NVIDIA kernel module missing" boot error
Since upgrading to Fedora 37 last weekend, I've noticed a brief error message during startup:
NVIDIA kernel module missing. Falling back to nouveau.
This post is a summary of how I diagnosed and fixed the issue, in case it's helpful for anyone else.
I figured my Nvidia drivers might require reinstallation after the system upgrade, so I removed all Nvidia packages and then reinstalled them, broadly following the RPM Fusion guide:
$ sudo dnf remove *nvidia*
$ sudo dnf install akmod-nvidia
After rebooting, the error persisted.
I removed the Nvidia drivers again, then re-read the RPM Fusion guide and realised I had overlooked a step. I had enabled Secure Boot after upgrading to Fedora 37, since Gnome recommended it. Now that Secure Boot is enabled, I needed to generate a key to sign kernel modules with. RPM Fusion has instructions on how to do this.
While following the instructions to generate a signing key, I couldn't actually get kmodgenca to generate a key; it would simply exit without printing any messages. At this point I decided I'd set this issue aside and troubleshoot it later, so I reinstalled akmod-nvidia and rebooted.
Then my system started freezing during startup. Now I had to fix it.
I started troubleshooting the boot hang. The main symptoms are that:
- The boot process hangs at the Fedora logo, and the spinning boot animation freezes.
- The other virtual terminals cannot be accessed using Alt+F2, etc.
I tried booting into Rescue Mode. That failed with this error:
Cannot open access to console, the root account is locked in emergency mode
I later fixed this by setting a root password as per this guide.
I edited the kernel boot options and tried booting into single-user mode. That failed with the same error.
I tried booting with graphical boot disabled by removing rhgv from the kernel boot options. This time I could see the boot process hang after this step:
Failed to start vboxdrv.service
However, I think I remember seeing that error before the Fedora 37 upgrade, so I decided it was unrelated.
Finally, I tried booting with the nouveau drivers (instead of Nvidia drivers) by removing them from the kernel boot option blacklist. This worked and I could finally boot again!
After successfully booting, I checked <b>journalctl</b> for information about what went wrong and found this:
Apr 30 09:46:33 Abraxas kernel: nouveau 0000:01:00.0: bus: MMIO write of 80000140 FAULT at 10eb14 [ PRIVRING ]
Apr 30 09:46:33 Abraxas kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Apr 30 09:46:33 Abraxas kernel: #PF: supervisor instruction fetch in kernel mode
Apr 30 09:46:33 Abraxas kernel: #PF: error_code(0x0010) - not-present page
Signing the Drivers
While researching the issue, I discovered this helpful post which explained that the process for configuring and installing the Nvidia drivers on a Secure Boot system needs to happen in a certain order. The signing key must be generated before installing akmod-nvidia. Then after akmod-nvidia is installed, you must allow around 5 minutes for it to build in the background; I assume this is the point where it actually signs the drivers.
This wasn't clear to me before, so with this knowledge I tried again from the start:
- I removed all Nvidia packages.
- I generated a signing key and imported it into EFI. I had to run kmodgenca with the --force flag to force it to generate the key, then accepted all the defaults.
- I reinstalled akmod-nvidia.
- I monitored the build process with ps aux | grep akmod, and waited until it finished before rebooting. It took around 5 minutes.
- When prompted, I agreed to add the public key to EFI.
- I allowed the system to boot as normal.
It worked! There were no boot errors or hangs, and I was able to open nvidia-settings and see my GPU.
Why did all this Happen?
I think this is all on me.
The original error about the Nvidia kernel module missing was probably because I enabled Secure Boot. I assume the drivers I installed in the past wouldn't have been signed.
The boot hang was probably caused by me rebooting too soon after installing the akmod-nvidia package. I assume this left the build in an incomplete state.
I'm not sure why I had to forcibly generate a new signing key. /usr/share/doc/akmods/README.secureboot indicates that kmodgenca will silently exit if a key already exists, so there must have been a key already. However, I don't recall creating it. Maybe it was generated automatically, but never imported into the EFI firmware.