Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Alexei Starovoitov
Date: Tue Apr 14 2026 - 10:35:16 EST
On Tue, Apr 14, 2026 at 3:57 AM Jiayuan Chen <jiayuan.chen@xxxxxxxxx> wrote:
>
> A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> to inject custom TCP header options. When the kernel builds a TCP packet,
> it calls tcp_established_options() to calculate the header size, which
> invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> callback.
>
> If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> tcp_current_mss(), which calls tcp_established_options() again,
> re-triggering the same BPF callback. This creates an infinite recursion
> that exhausts the kernel stack and causes a panic.
>
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> -> bpf_setsockopt(TCP_NODELAY)
> -> tcp_push_pending_frames()
> -> tcp_current_mss()
> -> tcp_established_options()
> -> bpf_skops_hdr_opt_len()
> /* infinite recursion */
> -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>
> A similar reentrancy issue exists for TCP congestion control, which is
> guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> bpf_setsockopt(TCP_NODELAY) calls that would trigger
> tcp_push_pending_frames() and cause the recursion.
>
> Reported-by: Quan Sun <2022090917019@xxxxxxxxxxxxxxxx>
> Reported-by: Yinhao Hu <dddddd@xxxxxxxxxxx>
> Reported-by: Kaiyan Mei <M202472210@xxxxxxxxxxx>
> Reported-by: Dongliang Mu <dzm91@xxxxxxxxxxx>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@xxxxxxxxxxxxxxxx/
> Fixes: 0813a841566f ("bpf: tcp: Allow bpf prog to write and parse TCP header option")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxx>
> ---
> Documentation/networking/net_cachelines/tcp_sock.rst | 1 +
> include/linux/tcp.h | 11 ++++++++++-
> net/core/filter.c | 4 ++++
> net/ipv4/tcp_minisocks.c | 1 +
> net/ipv4/tcp_output.c | 3 +++
> 5 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
> index 563daea10d6c..07d3226d90cc 100644
> --- a/Documentation/networking/net_cachelines/tcp_sock.rst
> +++ b/Documentation/networking/net_cachelines/tcp_sock.rst
> @@ -152,6 +152,7 @@ unsigned_int keepalive_intvl
> int linger2
> u8 bpf_sock_ops_cb_flags
> u8:1 bpf_chg_cc_inprogress
> +u8:1 bpf_hdr_opt_len_cb_inprogress
> u16 timeout_rehash
> u32 rcv_ooopack
> u32 rcv_rtt_last_tsecr
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index f72eef31fa23..2bfb73cf922e 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -475,12 +475,21 @@ struct tcp_sock {
> u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
> * values defined in uapi/linux/tcp.h
> */
> - u8 bpf_chg_cc_inprogress:1; /* In the middle of
> + u8 bpf_chg_cc_inprogress:1, /* In the middle of
> * bpf_setsockopt(TCP_CONGESTION),
> * it is to avoid the bpf_tcp_cc->init()
> * to recur itself by calling
> * bpf_setsockopt(TCP_CONGESTION, "itself").
> */
> + bpf_hdr_opt_len_cb_inprogress:1; /* It is set before invoking the
> + * callback so that a nested
> + * bpf_setsockopt(TCP_NODELAY) or
> + * bpf_setsockopt(TCP_CORK) cannot
> + * trigger tcp_push_pending_frames(),
> + * which would call tcp_current_mss()
> + * -> bpf_skops_hdr_opt_len(), causing
> + * infinite recursion.
Let's not add new bits.
Reuse existing and test/check all in one place,
like commit 061ff040710e9 did.
pw-bot: cr