Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie_free()
From: Matt Fleming
Date: Mon Jun 30 2025 - 09:30:58 EST
On Fri, 27 Jun 2025 at 20:36, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> Good. Now you see my point, right?
> The cond_resched() doesn't fix the issue.
> 1hr to free a trie of 100M elements is horrible.
> Try 100M kmalloc/kfree to see that slab is not the issue.
> trie_free() algorithm is to blame. It doesn't need to start
> from the root for every element. Fix the root cause.
It doesn't take an hour to free 100M entries, the table showed it
takes about a minute (67 or 62 seconds).
I never claimed that kmalloc/kfree was at fault. I said that the loop
in trie_free() has no preemption, and that's a problem with tries with
millions of entries.
Of course, rewriting the algorithm used in the lpm trie code would
make this less of an issue. But this would require a major rework.
It's not as simple as improving trie_free() alone. FWIW I tried using
a recursive algorithm in trie_free() and the results are slightly
better, but it still takes multiple seconds to free 10M entries (4.3s)
and under a minute for 100M (56.7s). To fix this properly it's
necessary to use more than two children per node to reduce the height
of the trie. And in the meantime, anyone who uses maps with millions
of entries is gonna have the kthread spin in a loop without
preemption.
Thanks,
Matt