Re: [RFC PATCH] net/core: use wake_up_interruptible_poll() in sock_def_readable()

From: Xuewen Yan

Date: Thu Jun 04 2026 - 03:29:39 EST


Hello everyone,
Any comments about this?

Or we only ignore sync for thread:
---
diff --git a/net/core/sock.c b/net/core/sock.c
index b37b664b6eb9..a46334266e86 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3610,9 +3610,14 @@ void sock_def_readable(struct sock *sk)

rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (skwq_has_sleeper(wq))
- wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
+ if (skwq_has_sleeper(wq)) {
+ if (in_interrupt())
+ wake_up_interruptible_sync_poll(&wq->wait,
EPOLLIN | EPOLLPRI |
+ EPOLLRDNORM | EPOLLRDBAND);
+ else
+ wake_up_interruptible_poll(&wq->wait, EPOLLIN
| EPOLLPRI |
EPOLLRDNORM | EPOLLRDBAND);
+ }
sk_wake_async_rcu(sk, SOCK_WAKE_WAITD, POLL_IN);
rcu_read_unlock();
}
@@ -3628,9 +3633,14 @@ static void sock_def_write_space(struct sock *sk)
*/
if (sock_writeable(sk)) {
wq = rcu_dereference(sk->sk_wq);
- if (skwq_has_sleeper(wq))
- wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
+ if (skwq_has_sleeper(wq)) {
+ if (in_interrupt())
+
wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
+ EPOLLWRNORM | EPOLLWRBAND);
+ else
+ wake_up_interruptible_poll(&wq->wait, EPOLLOUT |
EPOLLWRNORM | EPOLLWRBAND);
+ }

/* Should agree with poll, otherwise some programs break */
sk_wake_async_rcu(sk, SOCK_WAKE_SPACE, POLL_OUT);

On Tue, May 26, 2026 at 2:37 PM Xuewen Yan <xuewen.yan@xxxxxxxxxx> wrote:
>
> sock_def_readable() currently uses wake_up_interruptible_sync_poll() to
> wake up tasks waiting for readable data on a socket. The _sync variant
> sets the WF_SYNC flag, which tells the scheduler that the waker will
> schedule away soon, so the wakee should stay on the same CPU to avoid
> needless cache bouncing.
>
> However, we found that the following stack:
> -vfs_write
> -sock_write_iter
> -unix_stream_sendmsg
> -sock_def_readable
> -__wake_up_sync_key
>
> In this process-context scenario, the waker does NOT go to sleep
> after the wakeup. With WF_SYNC, the scheduler is misled into placing
> the wakee on the waker's CPU (via wake_affine_idle()'s sync path when
> nr_running == 1), causing both the sender and receiver to contend for
> the same CPU. This may hurt throughput for IPC workloads on multi-core
> systems where the sender and receiver could otherwise run in parallel
> on different CPUs.
>
> Switch to wake_up_interruptible_poll() which does not set WF_SYNC.
> This allows the scheduler to freely migrate the wakee to an idle CPU,
> enabling true parallelism between the sending and receiving processes.
>
> Co-developed-by: Guohua Yan <guohua.yan@xxxxxxxxxx>
> Signed-off-by: Guohua Yan <guohua.yan@xxxxxxxxxx>
> Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> ---
> Note:
> The possible cost is that for softirq callers (TCP/UDP receive path), the wakee
> may be migrated away from the current CPU where the received data is
> cache-hot. However, this is mitigated by:
> - The scheduler's existing wake_affine logic which already considers
> cache affinity regardless of WF_SYNC.
>
> We are not very familiar with the networking code here,
> so we would greatly appreciate any suggestions or advice from the community.
>
> Thanks!
> ---
> net/core/sock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b37b664b6eb9..42ab9373194f 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3611,7 +3611,7 @@ void sock_def_readable(struct sock *sk)
> rcu_read_lock();
> wq = rcu_dereference(sk->sk_wq);
> if (skwq_has_sleeper(wq))
> - wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
> + wake_up_interruptible_poll(&wq->wait, EPOLLIN | EPOLLPRI |
> EPOLLRDNORM | EPOLLRDBAND);
> sk_wake_async_rcu(sk, SOCK_WAKE_WAITD, POLL_IN);
> rcu_read_unlock();
> --
> 2.25.1
>