Re: [PATCH] RFC: mailbox: Fix NULL message support in mbox_send_message()

From: Joonwon Kang

Date: Tue Mar 17 2026 - 01:03:14 EST


> On Fri, Mar 13, 2026 at 5:12 AM Joonwon Kang <joonwonkang@xxxxxxxxxx> wrote:
> >
> > > The active_req field serves double duty as both the "is a TX in
> > > flight" flag (NULL means idle) and the storage for the in-flight
> > > message pointer. When a client sends NULL via mbox_send_message(),
> > > active_req is set to NULL, which the framework misinterprets as
> > > "no active request." This breaks the TX state machine by:
> > >
> > > - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > > callback and the tx_complete completion
> > > - txdone_hrtimer() skips the channel entirely since active_req
> > > is NULL, so poll-based TX-done detection never fires.
> > >
> > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > "no active request," freeing NULL to be valid message data. The
> > > sentinel is internal to the mailbox core and is never exposed to
> > > controller drivers or clients.
> >
> > The following drivers are currently using ->active_req which now could be
> > assigned MBOX_NO_MSG.
> > - drivers/mailbox/tegra-hsp.c
> > - drivers/mailbox/mtk-vcp-mailbox.c
> >
> Good catch, Thanks.
> mtk-vcp-mailbox.c is fine as is - it assumes only valid requests. This
> patch will not introduce any change for it.
> Yes, tegra-hsp.c should now track MBOX_NO_MSG instead of NULL. Single
> line change.
>
A more important aspect than single line change would be that you are
creating a new API contract that the controller drivers who are to use
->active_req should now have new knowledge that the pointer value could
be unconventional value such as -1. Without this additional knowledge,
they may easily fail checking if the channel is empty.

> > One of them is using ->active_req to wait until the channel is empty. In
> > this case, strictly speaking, that controller driver should be aware of
> > the sentinel value MBOX_NO_MSG, which means the sentinel value should be
> > exposed to the controller. Or, if a future controller driver to come is to
> > use ->active_req for the same purpose for doorbell or non-doorbell, it
> > should also be aware of the sentinel value anyway.
> >
> > However, I believe that it is not intuitive to the controller developers
> > that a pointer value could be other value than a real memory address,
> > NULL or error encoded value, which is -1(== MBOX_NO_MSG). For this reason,
> > I think it will be better to change the type of ->active_req to give a
> > better indication to the controller developers, e.g. to integer as in the
> > original patch
> > https://lore.kernel.org/all/20251126045926.2413532-1-joonwonkang@xxxxxxxxxx/.
> >
> > Or, we could change those drivers not to use ->active_req, hide
> > ->active_req entirely in the mailbox core and keep this patch.
> >
> Yes, ideally controller drivers should not track active_req. But this
> patch will cause least churn it seems.

If we will take those two drivers as an exceptional and not recommended
case that uses ->active_req directly, it will be fine not to change them
except for tegra-hsp.c to check MBOX_NO_MSG.

If that is the case and you are to keep this patch, please keep in mind to
leave the original patch link in the commit message for better trackability
of the discussion and solutions and for credit for the contribution of
finding and analyzing this issue and proposing the first solution, which
I believe is also important for future contributions to come.

Thanks.