Re: [PATCH 06/14] sunrpc: add helpers to count and snapshot pending cache requests

From: Chuck Lever

Date: Thu Mar 19 2026 - 14:48:12 EST

On 3/19/26 2:22 PM, Jeff Layton wrote:
> On Thu, 2026-03-19 at 14:07 -0400, Chuck Lever wrote:
>> On 3/16/26 11:14 AM, Jeff Layton wrote:
>>> Add sunrpc_cache_requests_count() and sunrpc_cache_requests_snapshot()
>>> to allow callers to count and snapshot the pending upcall request list
>>> without exposing struct cache_request outside of cache.c.
>>>
>>> Both functions skip entries that no longer have CACHE_PENDING set.
>>>
>>> The snapshot function takes a cache_get() reference on each item so the
>>> caller can safely use them after the queue_lock is released.
>>>
>>> These will be used by the nfsd generic netlink dumpit handler for
>>> svc_export upcall requests.
>>>
>>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
>>> ---
>>> include/linux/sunrpc/cache.h | 5 ++++
>>> net/sunrpc/cache.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 62 insertions(+)
>>>
>>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>>> index c358151c23950ab48e83991c6138bb7d0e049ace..343f0fb675d761e019989e47e03645484e0aa30f 100644
>>> --- a/include/linux/sunrpc/cache.h
>>> +++ b/include/linux/sunrpc/cache.h
>>> @@ -251,6 +251,11 @@ extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
>>> extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
>>> extern void sunrpc_cache_unhash(struct cache_detail *, struct cache_head *);
>>>
>>> +int sunrpc_cache_requests_count(struct cache_detail *cd);
>>> +int sunrpc_cache_requests_snapshot(struct cache_detail *cd,
>>> + struct cache_head **items,
>>> + u64 *seqnos, int max);
>>> +
>>> /* Must store cache_detail in seq_file->private if using next three functions */
>>> extern void *cache_seq_start_rcu(struct seq_file *file, loff_t *pos);
>>> extern void *cache_seq_next_rcu(struct seq_file *file, void *p, loff_t *pos);
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index 819f12add8f26562fdc6aaa200f55dec0180bfbc..2a78560edee5ca07be0fce90f87ce43a19df245f 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -1906,3 +1906,60 @@ void sunrpc_cache_unhash(struct cache_detail *cd, struct cache_head *h)
>>> spin_unlock(&cd->hash_lock);
>>> }
>>> EXPORT_SYMBOL_GPL(sunrpc_cache_unhash);
>>> +
>>> +/**
>>> + * sunrpc_cache_requests_count - count pending upcall requests
>>> + * @cd: cache_detail to query
>>> + *
>>> + * Returns the number of requests on the cache's request list that
>>> + * still have CACHE_PENDING set.
>>> + */
>>> +int sunrpc_cache_requests_count(struct cache_detail *cd)
>>> +{
>>> + struct cache_request *crq;
>>> + int cnt = 0;
>>> +
>>> + spin_lock(&cd->queue_lock);
>>> + list_for_each_entry(crq, &cd->requests, list) {
>>> + if (test_bit(CACHE_PENDING, &crq->item->flags))
>>> + cnt++;
>>> + }
>>> + spin_unlock(&cd->queue_lock);
>>> + return cnt;
>>> +}
>>> +EXPORT_SYMBOL_GPL(sunrpc_cache_requests_count);
>>> +
>>> +/**
>>> + * sunrpc_cache_requests_snapshot - snapshot pending upcall requests
>>> + * @cd: cache_detail to query
>>> + * @items: array to fill with cache_head pointers (caller-allocated)
>>> + * @seqnos: array to fill with sequence numbers (caller-allocated)
>>> + * @max: size of the arrays
>>> + *
>>> + * Only entries with CACHE_PENDING set are included. Takes a reference
>>> + * on each cache_head via cache_get(). Caller must call cache_put()
>>> + * on each returned item when done.
>>> + *
>>> + * Returns the number of entries filled.
>>> + */
>>> +int sunrpc_cache_requests_snapshot(struct cache_detail *cd,
>>> + struct cache_head **items,
>>> + u64 *seqnos, int max)
>>> +{
>>> + struct cache_request *crq;
>>> + int i = 0;
>>> +
>>> + spin_lock(&cd->queue_lock);
>>> + list_for_each_entry(crq, &cd->requests, list) {
>>> + if (i >= max)
>>> + break;
>>> + if (!test_bit(CACHE_PENDING, &crq->item->flags))
>>> + continue;
>>> + items[i] = cache_get(crq->item);
>>> + seqnos[i] = crq->seqno;
>>> + i++;
>>> + }
>>> + spin_unlock(&cd->queue_lock);
>>> + return i;
>>> +}
>>> +EXPORT_SYMBOL_GPL(sunrpc_cache_requests_snapshot);
>>>
>>
>> This API architecture introduces a TOCTOU, since as soon as the
>> queue_lock is dropped, the count can immediately become stale.
>>
>> The count returned by sunrpc_cache_requests_count() is used as the array
>> bound. To wit:
>>
>> cnt = sunrpc_cache_requests_count(cd);
>>
>> items = kcalloc(cnt, sizeof(*items), GFP_KERNEL);
>>
>> seqnos = kcalloc(cnt, sizeof(*seqnos), GFP_KERNEL);
>>
>> cnt = sunrpc_cache_requests_snapshot(cd, items, seqnos, cnt);
>>
>>
>>
>> This appears in all four dumpit handlers (ip_map, unix_gid, svc_export,
>> expkey).
>>
>> This isn't a memory safety issue; _snapshot() caps its output at max, so
>> if the list grows between the two calls, entries are silently truncated
>> rather than overflowing the buffer. If the list shrinks, the arrays are
>> merely oversized.
>>
>> However, the practical risk is incomplete data returned to userspace. If
>> the caller is guaranteed to be re-invoked (e.g., the netlink dumpit
>> callback gets called again until it returns 0), then missing items due
>> to list growth between count() and snapshot() is harmless: they'll be
>> picked up on the next pass.
>>
>> But looking at the callers, they all do this:
>>
>> if (cb->args[0])
>> return 0;
>>
>> and then set cb->args[0] after the single snapshot dump.
>>
>> The dumpit is a one-shot: it snapshots once and signals completion. If
>> the list grows between count() and snapshot(), the truncated entries are
>> silently lost and there's no subsequent pass to pick them up, IIUC.
>>
>
> Userland will receive a separate notify request whenever a
> cache_request is queued, and that notification is only sent after the
> cache_request is queued.
>
> So while it might not receive all of the queued requests from the
> kernel in the race you describe, it won't matter because select() will
> return the notify descriptor again on the next pass and mountd will
> pick up the remaining entries at that point.

Makes sense. But perhaps should be documented somewhere; the "retry"
loop is not obvious from the source code introduced here.

"Caller is expected to ..." or some such language.

--
Chuck Lever