Re: [PATCH] jbd2: gracefully abort on checkpointing state corruptions
From: Milos Nikic
Date: Mon Mar 16 2026 - 18:18:04 EST
On Mon, Mar 16, 2026 at 10:34 AM Jan Kara <jack@xxxxxxx> wrote:
>
> On Mon 09-03-26 16:08:38, Milos Nikic wrote:
> > This patch targets two internal state machine invariants in checkpoint.c
> > residing inside functions that natively return integer error codes.
> >
> > - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely
> > corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE
> > and a graceful journal abort, returning -EUCLEAN.
> >
> > - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for
> > an unexpected buffer_jwrite state. If the warning triggers, we
> > explicitly drop the just-taken get_bh() reference and call __flush_batch()
> > to safely clean up any previously queued buffers in the j_chkpt_bhs array,
> > preventing a memory leak before returning -EUCLEAN.
> >
> > Signed-off-by: Milos Nikic <nikic.milos@xxxxxxxxx>
>
> Looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
>
> Honza
Hi Jan,
Thank you for the review!
Just a quick heads-up: I recently sent a v2 of this patch to the list
to address some minor feedback from Baokun (specifically, changing
-EUCLEAN to -EFSCORRUPTED, and ensuring jbd2_journal_abort is called
after __flush_batch).
Does your Reviewed-by still apply to the v2? If so, I can spin up a
quick v3 just to formally collect your tag, or if you prefer, you can
just grab v2 from the list and append it there.
Thanks, Milos
>
> > ---
> > fs/jbd2/checkpoint.c | 17 +++++++++++++++--
> > 1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> > index de89c5bef607..cdfbfd27afae 100644
> > --- a/fs/jbd2/checkpoint.c
> > +++ b/fs/jbd2/checkpoint.c
> > @@ -267,7 +267,17 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> > */
> > BUFFER_TRACE(bh, "queue");
> > get_bh(bh);
> > - J_ASSERT_BH(bh, !buffer_jwrite(bh));
> > + if (WARN_ON_ONCE(buffer_jwrite(bh))) {
> > + put_bh(bh); /* drop the ref we just took */
> > + spin_unlock(&journal->j_list_lock);
> > + jbd2_journal_abort(journal, -EUCLEAN);
> > +
> > + /* Clean up any previously batched buffers */
> > + if (batch_count)
> > + __flush_batch(journal, &batch_count);
> > +
> > + return -EUCLEAN;
> > + }
> > journal->j_chkpt_bhs[batch_count++] = bh;
> > transaction->t_chp_stats.cs_written++;
> > transaction->t_checkpoint_list = jh->b_cpnext;
> > @@ -325,7 +335,10 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
> >
> > if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
> > return 1;
> > - J_ASSERT(blocknr != 0);
> > + if (WARN_ON_ONCE(blocknr == 0)) {
> > + jbd2_journal_abort(journal, -EUCLEAN);
> > + return -EUCLEAN;
> > + }
> >
> > /*
> > * We need to make sure that any blocks that were recently written out
> > --
> > 2.53.0
> >
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR