Re: [PATCH] bpf: Simplify tnum_step()

From: Kumar Kartikeya Dwivedi

Date: Thu Mar 19 2026 - 04:21:25 EST

On Wed, 18 Mar 2026 at 19:21, Hao Sun <sunhao.th@xxxxxxxxx> wrote:
>
> Simplify tnum_step() from a 10-variable algorithm into a straight
> line sequence of bitwise operations.
>
> tnum_step(): Given a tnum `(tval, tmask)` where `tval & tmask == 0`,
> and a value `z` with `tval ≤ z < (tval | tmask)`, find the smallest
> `r > z`, a tnum-satisfying value, i.e., `r & ~tmask == tval`.
>
> Every tnum-satisfying value has the form tval | s where s is a subset
> of tmask bits (s & ~tmask == 0). Since tval and tmask are disjoint:
>
> tval | s = tval + s
>
> Similarly z = tval + d where d = z - tval, so r > z becomes:
>
> tval + s > tval + d
> s > d
>
> The problem reduces to: find the smallest s, a subset of tmask, such
> that s > d.
>
> Notice that `s` must be a subset of tmask, the problem now is simplified.
>
> The mask bits of `d` form a "counter" that we want to increment by one,
> but the counter has gaps at the fixed-bit positions. A normal +1 would
> stop at the first 0-bit it meets; we need it to skip over fixed-bit
> gaps and land on the next mask bit.
>
> Step 1 -- plug the gaps:
>
> d | carry_mask | ~tmask
>
> - ~tmask fills all fixed-bit positions with 1.
> - carry_mask = (1 << fls64(d & ~tmask)) - 1 fills all positions
> (including mask positions) below the highest non-mask bit of d.
>
> After this, the only remaining 0s are mask bits above the highest
> non-mask bit of d where d is also 0 -- exactly the positions where
> the carry can validly land.
>
> Step 2 -- increment:
>
> (d | carry_mask | ~tmask) + 1
>
> Adding 1 flips all trailing 1s to 0 and sets the first 0 to 1. Since
> every gap has been plugged, that first 0 is guaranteed to be a mask bit
> above all non-mask bits of d.
>
> Step 3 -- mask:
>
> ((d | carry_mask | ~tmask) + 1) & tmask
>
> Strip the scaffolding, keeping only mask bits. Call the result inc.
>
> Step 4 -- result:
>
> tval | inc
>
> Reattach the fixed bits.
>
> A simple 8-bit example:
> tmask: 1 1 0 1 0 1 1 0
> d: 1 0 1 0 0 0 1 0 (d = 162)
> ^
> non-mask 1 at bit 5
>
> With carry_mask = 0b00111111 (smeared from bit 5):
>
> d|carry|~tm 1 0 1 1 1 1 1 1
> + 1 1 1 0 0 0 0 0 0
> & tmask 1 1 0 0 0 0 0 0
>
> The patch passes my local test: test_verifier, test_prog for
> `-t verifier` and `-t reg_bounds`.
>
> Signed-off-by: Hao Sun <hao.sun@xxxxxxxxxxx>
> ---
> The original algorithm is not intuitive to me, let me know if you
> spot any inconsistency.
>
> A Lean4 proof for the correctness of the algorithm is also available
> in case anyone is interested:
> [1] https://pastebin.com/raw/czHKiyY0

IMO it is worth it to include this proof inline in the commit log,
since links are fragile.
It's not that big, and I think it's more useful to have it inline than not.

[...]