Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
From: Eric Dumazet
Date: Mon Jun 01 2026 - 03:32:50 EST
On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>
> Hi Eric,
>
> Thank you for the review!
>
> On 01/06/2026 16:04, Eric Dumazet wrote:
> > On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
> > <matttbe@xxxxxxxxxx> wrote:
> >>
> >> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
> >> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
> >> this case, it is then not possible to send the address signal.
> >>
> >> The idea is to let MPTCP dropping the TCP-timestamps option for some
> >> specific packets, to be able to send some specific pure ACK carrying >28
> >> bytes of MPTCP options, like with this specific ADD_ADDR. A new
> >> parameter is passed from tcp_established_options to the MPTCP side to
> >> indicate if the TCP TS option is used, and if it should be dropped. The
> >> next commit implements the part on MPTCP side, but split into two
> >> patches to help TCP maintainers to identify the modifications on TCP
> >> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
> >> MPTCP sysctl knob.
> >>
> >> It is important to keep in mind that dropping the TCP timestamps option
> >> for one packet of the connection could eventually disrupt some
> >> middleboxes: even if it should be unlikely, they could drop the packet
> >> or even block the connection. That's why this new feature will be
> >> controlled by a sysctl knob.
> >>
> >> Note that it would be technically possible to squeeze both options into
> >> the header if the ADD_ADDR is first written, and then the TCP timestamps
> >> without the NOPs preceding it. But this means more modifications on TCP
> >> side, plus some middleboxes could still be disrupted by that.
> >>
> >> About the implementation, instead of passing a new boolean (drop_ts),
> >> another option would be to pass the whole option structure (opts),
> >> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
> >> would need to be exported. Plus that means the removal of the TCP TS
> >> option would be done on the MPTCP side, and not here on the TCP side.
> >> It feels clearer to remove other TCP options from the TCP side, than
> >> hiding that from the MPTCP side.
> >>
> >> Yet an other alternative would be to pass the size already taken by the
> >> other TCP options, and have a way to drop them all when needed. But this
> >> feels better to target only the timestamps option where dropping it
> >> should be safe, even if it is currently the only option that would be
> >> set before MPTCP, when MPTCP is used.
> >>
> >> Reviewed-by: Mat Martineau <martineau@xxxxxxxxxx>
> >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
> >> ---
> >> To: Neal Cardwell <ncardwell@xxxxxxxxxx>
> >> To: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> >> ---
> >> include/net/mptcp.h | 3 ++-
> >> net/ipv4/tcp_output.c | 6 +++++-
> >> net/mptcp/options.c | 3 ++-
> >> 3 files changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> >> index f7263fe2a2e4..000b6593bfa4 100644
> >> --- a/include/net/mptcp.h
> >> +++ b/include/net/mptcp.h
> >> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
> >> struct mptcp_out_options *opts);
> >> bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> >> unsigned int *size, unsigned int remaining,
> >> - struct mptcp_out_options *opts);
> >> + bool *drop_ts, struct mptcp_out_options *opts);
> >> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
> >>
> >> void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> >> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
> >> struct sk_buff *skb,
> >> unsigned int *size,
> >> unsigned int remaining,
> >> + bool *drop_ts,
> >> struct mptcp_out_options *opts)
> >> {
> >> return false;
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index ef0c10cd31c7..53ee4c8f5f8c 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> >> */
> >> if (sk_is_mptcp(sk)) {
> >> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> >> + bool drop_ts = opts->options & OPTION_TS;
> >> unsigned int opt_size = 0;
> >>
> >> if (mptcp_established_options(sk, skb, &opt_size, remaining,
> >> - &opts->mptcp)) {
> >> + &drop_ts, &opts->mptcp)) {
> >> opts->options |= OPTION_MPTCP;
> >> size += opt_size;
> >> +
> >> + if (drop_ts)
> >> + opts->options &= ~OPTION_TS;
> >> }
> >> }
> >
> > Passing local variables' addresses forces the compiler to use a stack
> > canary in this hot function, even for non-MPTCP flows.
> >
> > I was about to test the following patch, which removes the current
> > stack canary caused by MPTCP :/
>
> Sorry, I didn't know you were planning to do that.
>
> Would that be OK for you if I use an unused bit in opts->mptcp? It's a
> bit "hackish", but it avoids adding a new local variable address. Or do
> you have another idea?
>
> The modifications in net/ipv4/tcp_output.c would then be limited to:
>
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ef0c10cd31c7..f4edc9c4f3fc 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> */
> if (sk_is_mptcp(sk)) {
> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> + bool has_ts = opts->options & OPTION_TS;
> unsigned int opt_size = 0;
>
> if (mptcp_established_options(sk, skb, &opt_size, remaining,
> - &opts->mptcp)) {
> + has_ts, &opts->mptcp)) {
> opts->options |= OPTION_MPTCP;
> size += opt_size;
> +
> +#if IS_ENABLED(CONFIG_MPTCP)
> + if (opts->mptcp.drop_ts)
> + opts->options &= ~OPTION_TS;
> +#endif
SGTM, but maybe the IS_ENABLED() is not needed in this block,
guarded by if (sk_is_mptcp(sk)) ?
Also I am unsure opts->mptcp.drop_ts is cleared already before
reaching tcp_established_options()?
> }
> }
>
>
> I can also avoid adding a new parameter in mptcp_established_options
> (bool has_ts) by setting opts->mptcp.drop_ts before calling it, but
> that's clearer with this new parameter I think.
>
> > $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
> > add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
> > Function old new delta
> > tcp_options_write.isra 1423 1407 -16
> > mptcp_established_options 2746 2720 -26
> > tcp_established_options 553 503 -50
> > Total: Before=22110750, After=22110658, chg -0.00%
> >
>
> Good reduction!
>
> (...)
>
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
> > 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1183,10 +1183,11 @@ static unsigned int
> > tcp_established_options(struct sock *sk, struct sk_buff *skb
> > unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> > unsigned int opt_size = 0;
> >
> > - if (mptcp_established_options(sk, skb, &opt_size, remaining,
> > - &opts->mptcp)) {
> > + opt_size = mptcp_established_options(sk, skb, remaining,
> > + &opts->mptcp);
> > + if (opt_size) {
> > opts->options |= OPTION_MPTCP;
> > - size += opt_size;
> > + size += (opt_size & 63);
>
> Nice trick! What about returning a negative number when the MPTCP option
> is not needed? Just to avoid playing with masks in the code?
Yes, that would work, thanks.
I can provide a patch in a couple of hours.