Re: [PATCH v4 1/1] ocfs2: split transactions in dio completion to avoid credit exhaustion
From: Joseph Qi
Date: Thu Mar 26 2026 - 23:16:12 EST
On 3/27/26 11:02 AM, Heming Zhao wrote:
> On Fri, Mar 27, 2026 at 09:42:55AM +0800, Joseph Qi wrote:
>> Hi,
>>
>> On 3/26/26 10:26 PM, Heming Zhao wrote:
>>> During ocfs2 dio operations, JBD2 may report warnings via following call trace:
>>> ocfs2_dio_end_io_write
>>> ocfs2_mark_extent_written
>>> ocfs2_change_extent_flag
>>> ocfs2_split_extent
>>> ocfs2_try_to_merge_extent
>>> ocfs2_extend_rotate_transaction
>>> ocfs2_extend_trans
>>> jbd2__journal_restart
>>> start_this_handle
>>> output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
>>>
>>> To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write()
>>> to handle extent in a batch of transaction.
>>>
>>> Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should
>>> only be removed from the orphan list after the extent tree update is complete.
>>> this ensures that if a crash occurs in the middle of extent tree updates, we
>>> won't leave stale blocks beyond EOF.
>>>
>>> Finally, thanks to Jans and Joseph for providing the bug fix prototype and
>>> suggestions.
>>>
>>> Suggested-by: Jan Kara <jack@xxxxxxx>
>>> Suggested-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
>>> Reviewed-by: Jan Kara <jack@xxxxxxx>
>>> Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
>>> ---
>>> fs/ocfs2/aops.c | 72 ++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 44 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 09146b43d1f0..60f1b607022f 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -37,6 +37,8 @@
>>> #include "namei.h"
>>> ... ... <--- snip --->
>>> + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) {
>>> + ocfs2_commit_trans(osb, handle);
>>> + handle = NULL;
>>> + batch = 0;
>>> + }
>>> }
>>>
>>> if (end > i_size_read(inode)) {
>>
>> I still don't think it is a good idea to update inode size in case error.
>> The original logic behaves inconsistent, if ocfs2_start_trans() and
>> ocfs2_journal_access_di() fails, it won't update inode size, but if
>> ocfs2_assure_trans_credits() and ocfs2_mark_extent_written(), it will do.
>> So let's make it behave consistent by both checking 'ret' here.
>>
>> Other looks fine.
>>
>> Joseph
>
> After sending the v4 patch, I did some tests and identified the key of the
> performance degradation in the dynamic credit adjustment patch.
> I found that by using half of the jbd2_max value, the time consumption returned
> to the same level as the "direct loop" style.
>
> key code change:
> (base on patch[1]: 0001-ocfs2-dynamic-extending-jbd2-credits-during-dio-end-.patch)
> int jbd2_max = (journal->j_max_transaction_buffers -
> journal->j_transaction_overhead_buffers) >> 2;
>
> my test env:
> - 10G ocfs2 volume
> - DIO 2G file, write with 64MB block (--bs=64M).
>
> observations:
> - ocfs2_assure_trans_credits calls "commit & start trans" times on every 64MB block:
> - using jbd2_max (5449): ~9 times.
> - using half value (2724): ~20 times.
> - v4 patch ("batch" style), call "commit & start trans" on every 64MB block: ~62 times
>
> time consumption:
>
> jbd2_max:5449 case
> real 2m9.346s
> user 0m0.277s
> sys 0m22.748s
>
> half jbd2_max:2724 case
> real 1m49.400s
> user 0m0.343s
> sys 0m22.734s
>
> v4 patch:
> real 1m49.650s
> user 0m0.337s
> sys 0m22.780s
>
>
> the reason (I guess):
> The original patch only performed a "commit and start new trans" when jbd2
> credits were nearly exhausted. I suspect contention between the commit flush
> operation and the jbd2 operations in the dio end path, leading to high latency.
> By using half the jbd2_max value, trigger commits more frequently, which appears
> to reduce jbd2 contention and racing.
>
> Finally, I suggest we use the dynamic credit extension patch for v5. This
> version only adds logic to ocfs2_assure_trans_credits() and does not affect
> i_size changes during error cases.
>
Ummm... I think v4 looks simpler and direct.
For i_size update logic, I think the existing logic looks wried. Take a
look at ext4, it also doesn't do this if converting unwritten extents
fails.
Joseph