Re: [REGRESSION] [PATCH] ceph: fix num_ops OBOE when crypto allocation fails
From: Sam Edwards
Date: Tue Mar 17 2026 - 15:14:50 EST
On Tue, Mar 17, 2026 at 11:51 AM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
> > If this is just a general question about the patch, then I don't know
> > of a way to trigger the issue in a short timeframe, but something like
> > this ought to work:
> > 1. Create a reasonably-sized (e.g. 4GiB) fscrypt-protected file in CephFS
> > 2. Put the CephFS client system under heavy memory pressure, so that
> > bounce page allocation is more likely to fail
> > 3. Repeatedly write to the file in a 4KiB-written/4KiB-skipped
> > pattern, starting over upon getting to the end of the file
> > 4. Wait for the system to panic, gradually ramping up the memory
> > pressure until it does
> >
> > I run a workload that performs fairly random I/O atop CephFS+fscrypt.
> > Before this patch, I'd get a panic after about a day. After this
> > patch, I've been running for 4+ days without this particular issue
> > reappearing.
> >
>
> I think this is good enough description how the issue can be triggered. And I
> believe that the commit message deserve to have this description.
Very well, I'll try to condense it in a way that makes it clear to
those trying to repro the crash without being overly verbose.
> Frankly speaking, I am trying to reproduce the issue [1]. Do you think that it
> could be the same issue?
Please double-check; the link you sent is to bug #74156, reported last
year. This regression was only introduced last month, so it couldn't
be the same issue. Did you send the wrong link?
> > > > BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
>
> I believe that it will be great to have the link to the particular location of
> this code in the commit message.
I strongly disagree: The location of the code changes with every
commit that adds/removes lines above it (including this patch) so such
a link would be rendered stale immediately. What is your reason for
believing the link is useful?
> > > I don't quite follow. We decrement ceph_wbc->num_ops but BUG_ON() operates by
> > > req->r_num_ops. How req->r_num_ops receives the value of ceph_wbc->num_ops?
> >
> > ceph_submit_write() passes ceph_wbc->num_ops to ceph_osdc_new_request()...
>
> I think it makes sense to mention it in the commit message.
NACK, that relationship is already memorialized in addr.c. But again
I'm interested to learn your reasoning.
> I think that it makes sense to create the issue in Ceph tracker and to add
> Closes to the fix.
I don't currently have a Ceph tracker account and don't think I can
add anything of substance to an issue report. Feel free to create the
issue on my behalf if it's important for Ceph's processes, and I can
Closes: tag it in v2.
Cheers,
Sam