Re: [PATCH v11 3/4] x86/cpu: Do a sanity check on required feature bits

From: H. Peter Anvin

Date: Mon Mar 23 2026 - 16:30:54 EST

On 2026-03-23 12:19, Borislav Petkov wrote:
> On Mon, Mar 23, 2026 at 11:43:08AM -0700, H. Peter Anvin wrote:
>> That is not necessarily true at all.
>
> What does that even mean?! :)
>
>> As such, this may be a really cheap way to get a message out in case we get
>> that far without problems.
>
> Meh.
>
>> For one thing, this runs -- at least on the BSP -- before either alternatives
>> patching or running user space, so there is plenty of features that may not
>> have been used yet.
>
> We're talking about required features here. What's wrong with verify_cpu()
> testing required features and stopping if some of them are not present?
>
> It is already checking some of them.
>

Well, there is the bits which need to be in assembly because they

>> For another thing, there are some features -- such as PAE to mention one --
>> that are present in some CPUs but disabled in CPUID because for some reason or
>> another the manufacturer found during testing that it doesn't always work
>> right. However, it is likely that a PAE kernel will successfully boot, and it
>> might even work on any one particular CPU. This is *exactly* what
>> TAINT_CPU_OUT_OF_SPEC is supposed to represent.
>
> And?
>
> We're supposed to support such a CPU or somewhat wobbly only?
>
> You want to be able to boot up to the point of checking required features in
> C code, find out that PAE is not supported, taint the CPU but still run?
>
> What for?
>
> How do you explain the user that her machine is actually fine but we'll taint
> the kernel and that it maybe works but maybe not and there are no guarantees?

That is EXACTLY what TAINT_CPU_OUT_OF_SPEC means.

>> Finally, as Maciej reported, the user might have tried to explicitly override
>> a required feature.
>
> You can still catch it in verify_cpu. Catch it such that you simply stop
> there.
>
> If the luser is overriding required features, then she gets to keep both
> pieces.

It doesn't mean we can't at least TRY to warn things.

>> verify_cpu.S serves a different purpose: is to enable features that are
>> required to even set up the kernel execution environment and that may be
>> switched off through various mechanisms.
>
> And because we run it at so many places, then it can do those checks for us
> too.
>
>> verify_cpu.S is unfortunately not able to issue messages (not to mention that
>> it is written in assembly,
>
> We can convert it to C. We've done this before with other crap. :)

No, you can't, because YOU CAN'T GET FAR ENOUGH ALONG TO RUN C CODE WITHOUT
IT. That is what is unique about verify_cpu.S. It should probably really be
called "enable_cpu.S".

It doesn't mean you can't do an earlier check, but at that point you pretty
much need the entire machinery of arch/x86/kernel/cpu.

We COULD push a lot of that code much earlier, and make it sharable with the
boot/compressed prekernel, but that is a cleanup on a whole different scale.

> Yes, I have seen the error message about this CPU not being supported very
> early.
>
> But I don't see the point for adding another function to verify required
> features which is somewhere else where we're pretty much doing that checking
> early.

Well, for one thing: it lets us avoid more ad hoc messages in central code.

> And what's the point of booting up to C code and kernel proper and say that
> some of the required features are off?
>
> I think we should extend verify_cpu or convert it to C or have it call
> a C function or whatever and do the checking once and for all and not boot
> into a wobbly and tainted kernel.
>
> As to showing a proper error message, what is the real use case we're chasing
> here?
>
> The CPU has all the required features - which is probably 99.999% of the cases
> out there or it doesn't and then it deserves to blow up.
>
> So what are we really "fixing" here...?
You can make the same argument about #MC for example: why bother trying to get
a message out when the CPU is literally telling you that your system just broke?

The answer is because it helps the user understand what is wrong. Certainly,
you have no guarantee that you will actually get there, but in practice, in
many (but definitely not all) cases you WILL be able to get far enough along
to get the message out so that when the user wonders why their machine crashed
they have a clue.

-hpa