Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels

From: Sebastian Andrzej Siewior

Date: Wed Apr 29 2026 - 07:06:11 EST


On 2026-04-25 16:08:56 [+0200], Andrea Mayer wrote:
> On Thu, 23 Apr 2026 10:00:56 +0200
> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
>
> Hi Sebastian,

Hi Andrea,

> > Doesn't this make ip6_route_input() on RT fragile in general due to the
> > RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
> > files that are patched?
> > Based on your explanation it all makes sense, I am just not sure if this
> > race is limited to those two are if there is more to it.
>
> seg6_input_core() and rpl_input() cache the dst via dst_cache_set_ip6(), which
> invokes dst_hold(). The dst_hold() calls rcuref_get(), failing on a zero
> refcount and triggering a WARN, but the pointer is still stored in the cache.
> After the RCU grace period completes the dst is freed, and a subsequent
> dst_cache_get() returns a dangling pointer.
>
> The other callers of ip6_route_input() (e.g., ipv6_srh_rcv, ipv6_rpl_srh_rcv,
> ip6_rcv_finish_core) consume the NOREF dst without caching it. Even if the
> pcpu_rt's refcount is concurrently dropped to zero, the dst memory remains
> valid because dst_release() defers the actual free via call_rcu_hurry() and the
> caller is still inside the RCU read-side critical section.

Ah, okay. Thank you for clearing that up.

> > > [snip]
> > >
> > > Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> > > Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> >
> > If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
> > right fixes: would be
> > Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> >
> > as prior this commit the race is not possible, right?
>
> I built and tested kernels at 3253cb49cbad and its parent fd4e876f59b7 (both
> CONFIG_PREEMPT_RT=y, without the fix): no issues at fd4e876f59b7.
> At 3253cb49cbad, a pcpu_rt cmpxchg contention in rt6_make_pcpu_route() shows
> up, which was addressed in 1adaea51c61b. I also tested at 1adaea51c61b, and at
> that point the dst_hold() race described in this patch appears.
>
> The seg6/rpl code obtains a NOREF dst from ip6_route_input(), does not promote
> it via skb_dst_force(), and passes it to dst_cache_set_ip6() which calls
> dst_hold(). This pattern has been present since af4a2209b134 and a7a29f9c361f,
> and the current Fixes: tags point to the commits where it was introduced.
> Does that seem reasonable?

Yes. So based on that the regression was introduced in 3253cb49cbad.
Before that, the lock guarded everything. That means also that
rpl_input() and seg6_input_core() is invoked a BH disabled section which
is what makes it for !RT work.

> > Does this mean that rpl_input() does a local_bh_disable() while
> > obtaining the dst but it never runs outside of bh-disabled section?
> > Because if it can run in preemptible context then it would not be to
> > PREEMPT_RT at which point the Fixes: tags from above would make sense
> > again.
> >
>
> rpl_input() and seg6_input_core() run in softirq context via lwtunnel_input().
> They do local_bh_disable() around dst_cache_get() and dst_cache_set_ip6(), but
> not around ip6_route_input(). The race window is between ip6_route_input()
> returning and dst_cache_set_ip6().

My point was that the Fixes: tag could be updated to 3253cb49cbad
instead. Since everything runs in softirq context, the
local_bh_disable() within that functions is not needed. Otherwise, if
this would not be invoked softirq then preemption would also be possible
on !RT.
Anyway, now it has been merged.

>
> Ciao,
> Andrea

Sebastian