Re: [PATCH v2] killswitch: add per-function short-circuit mitigation primitive
From: Anthony Iliopoulos
Date: Mon May 11 2026 - 13:51:43 EST
On Mon, May 11, 2026 at 07:15:10AM -0400, Sasha Levin wrote:
> On Mon, May 11, 2026 at 12:33:28PM +0200, Anthony Iliopoulos wrote:
> > On Sat, May 09, 2026 at 08:34:11AM -0400, Sasha Levin wrote:
> > > On Sat, May 09, 2026 at 02:02:24PM +0200, Florian Weimer wrote:
> > > > * Sasha Levin:
> > > >
> > > > > When a kernel (security) issue goes public, fleets stay exposed until a patched
> > > > > kernel is built, distributed, and rebooted into.
> > > > >
> > > > > For many such issues the simplest mitigation is to stop calling the buggy
> > > > > function. Killswitch provides that. An admin writes:
> > > > >
> > > > > echo "engage af_alg_sendmsg -1" \
> > > > > > /sys/kernel/security/killswitch/control
> > > > >
> > > > > After this, af_alg_sendmsg() returns -EPERM on every call without
> > > > > running its body. The mitigation takes effect immediately, and is dropped on
> > > > > the next reboot -- by which point a patched kernel is hopefully in place.
> > > >
> > > > Do you expect this to be safe to enable in kernel lockdown mode (i.e.,
> > > > with typical Secure Boot configurations in distributions)?
> > >
> > > Yes: under lockdown, killswitch has to be configured on the cmdline. Runtime
> > > engage is gated on the new LOCKDOWN_KILLSWITCH reason.
> >
> > Basically this proposal allows for any function to be overridden on a
> > production kernel as long as no lockdown level is enabled, which is quite
> > dangerous.
> >
> > Assuming this is acceptable (which I am not sure it should be), then this
> > is equivalent to the existing error injection code that we already have in
> > the kernel (CONFIG_FAIL_FUNCTION) minus the explicit whitelisting on a per
> > function basis required to permit injection.
>
> The mechanism is the same, but I don't think reusing fail_function works for
> what killswitch is trying to do.
How so? The kprobe handler is essentially the same. Setting the
whitelisting aside, it is currently possible to do:
echo af_alg_sendmsg > /sys/kernel/debug/fail_function/inject
echo 0xffffffffffffffff > /sys/kernel/debug/fail_function/af_alg_sendmsg/retval
echo 100 > /sys/kernel/debug/fail_function/probability
echo -1 > /sys/kernel/debug/fail_function/times
and that will return -EPERM, taint the kernel, and log the stacktrace on
dmesg on every rejected call.
> > Given that this achieves the exact same result, then why don't we consider
> > simply removing the whitelisting restriction from fail_function altogether
> > and use that instead? The only thing missing then would be the boot param
> > parsing and setup.
>
> fail_function lives in debugfs, and on a typical Secure Boot distro debugfs is
> itself blocked by LOCKDOWN_DEBUGFS at integrity level. Dropping the whitelist
> doesn't help when the operator can't write to the file in the first place.
Agreed, for this to work fail_function would also need to parse boot
params similarly.
> Killswitch is in securityfs so that engaging it can be its own lockdown
> decision rather than being lumped in with everything debugfs exposes.
Sure but it makes no difference when a kernel is locked at integrity it
will anyway block either solution, this makes no practical difference.
> Fault injection in general isn't enabled on production kernels - having to
> enable CONFIG_FUNCTION_ERROR_INJECTION will drag in that entire infra into
> kernels that don't need it.
There's very little code that CONFIG_FUNCTION_ERROR_INJECTION brings in
apart from the override_function_with_return trampoline and
lib/error-inject.c which becomes obsolete without the need to whitelist.
Your proposal also depends on FUNCTION_ERROR_INJECTION necessarily.
The only thing that would be missing and not usually compiled in is
CONFIG_FAIL_FUNCTION that just implements the debugfs ops interface
which you are exposing via securityfs instead.
> > This way we'll be removing a few hundred lines of code instead of adding
> > more duplication, while enabling the same functionality.
>
> I'm not even sure there would be hundreds of lines saved here...
I'm talking specifically about whitelisting which would essentially be
useless:
wc -l lib/error-inject.c include/asm-generic/error-injection.h include/linux/error-injection.h
246 lib/error-inject.c
43 include/asm-generic/error-injection.h
28 include/linux/error-injection.h
317 total
plus a hundred or so annotations of ALLOW_ERROR_INJECT and a tiny bit of
image space savings from dropping that whitelist section from the binary.
> The pieces that make killswitch what it is (cmdline parser,
> LOCKDOWN_KILLSWITCH, TAINT_KILLSWITCH, audit on engage and disengage, the
> module-unload notifier, etc) add up to roughly 200 lines that would move into
> fail_function unchanged. I really don't think we'd end up with much of a line
> delta.
All of that apart from the cmdline parser is already present in the
fault/error injection code, directly or indirectly. I can see though the
appeal of having killswitch cleanly separated from anything else, but
perhaps changing the existing code is more approachable.
> That said, the kprobe and override machinery underneath both of these is fair
> game for a shared helper that fail_function and killswitch both build on. We can
> look at extracting that as a follow-up once killswitch lands, but it's a
> separate piece of work from the policy questions in this thread.
Sure, but my point is that if this is acceptable, then it follows that:
- whitelisting becomes irrelevant (even if fail_function remains
separate), since the exact same capability will be exposed via the
killswitch interface for all functions anyway, so why would we need it
to protect error-injection
and subsequently:
- fail_function would become somewhat redundant since the same
functionality would be achieved via the securityfs (or just bpf, which
is already the case).
Regards,
Anthony