Re: [PATCH v3 1/5] liveupdate: block outgoing session updates during reboot
From: oskar
Date: Mon Mar 23 2026 - 16:54:44 EST
On 2026-03-23 20:00, Pasha Tatashin wrote:
On Sat, Mar 21, 2026 at 6:28 PM Pasha Tatashin
<pasha.tatashin@xxxxxxxxxx> wrote:
On Sat, Mar 21, 2026 at 10:38 AM Oskar Gerlicz Kowalczuk
<oskar@gerlicz.space> wrote:
>
> kernel_kexec() serializes outgoing sessions before the reboot path
> freezes tasks, so close() and session ioctls can still mutate a
> session while handover state is being prepared. The original v2 code
> also let incoming lookups keep a bare session pointer after dropping
> the list lock.
>
> That leaves two correctness problems in the reboot path: outgoing state
> can change after serialization starts, and incoming sessions can be
> freed while another thread still holds a pointer to them.
>
> Add refcounted session lifetime management, track in-flight outgoing
> close() paths with an atomic closing counter, and make serialization
> wait for closing to drain before setting rebooting. Reject phase-invalid
> ioctls, keep incoming release on a common cleanup path, and make the
> release wait freezable without spinning.
>
> Fixes: fc5acd5c89fe ("liveupdate: block outgoing session updates during reboot")
> Signed-off-by: Oskar Gerlicz Kowalczuk <oskar@gerlicz.space>
> ---
> kernel/liveupdate/luo_internal.h | 12 +-
> kernel/liveupdate/luo_session.c | 236 +++++++++++++++++++++++++++----
> 2 files changed, 221 insertions(+), 27 deletions(-)
Hi Oskar,
Thank you for sending this series and finding these bugs in LUO. I
agree with Andrew that a cover letter would help to understand the
summary of the overall effort.
I have not reviewed the other patches yet, but for this patch, my
understanding is that it solves two specific races during reboot()
syscalls: session closure after serialization, and the addition of new
sessions or preserving new files after serialization.
Given that KHO is now stateless, and liveupdate_reboot() is
specifically placed at the last point where we can still return an
error to userspace, we should simply return an error if a userspace is
doing something unexpected.
Instead of creating a new state machine, let's just reuse the file
references and simply take them for each session at the beginning of
serialization. This ensures that no session closes will happen later.
For file preservation and session addition, we can block them by
simply adding a new boolean.
Please take a look at the two patches below and see if this approach
would work. It is a much smaller change compared to the proposed state
machine in this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/tatashin/linux.git/log/?h=luo-reboot-sync/rfc/1
Oskar, I made a few more changes to avoid returning an error if
get_file_active() fails. This prevents a race condition where the user
might call close(session_fd) right before calling reboot(). I
force-updated the above branch. Please let me know if you want to take
these changes and use them to in the next version.
Pasha
Hi Pasha,
thank you for taking the time to prototype this approach and for the detailed explanation, I really appreciate it.
I agree that reusing file references and introducing a simple blocking mechanism makes the solution much smaller and easier to reason about compared to a dedicated state machine. Your patches definitely move things in a nice direction in terms of simplicity.
While going through it, I was wondering if there might still be a couple of corner cases worth discussing. In particular, do you think a boolean gate is sufficient to cover in-flight operations that may have already passed the check before serialization starts? It seems like those paths could still potentially mutate session state during serialization.
I was also thinking about the lifetime of incoming sessions (especially lookups holding pointers). Do you think file reference handling alone is enough there, or would we still need some explicit lifetime protection?
I’m currently working on v4 and will take a closer look at your branch to see if we can combine both approaches in a way that keeps the solution simple while still covering these cases.
Thanks,
Oskar Gerlicz Kowalczuk