Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
From: Harry Yoo (Oracle)
Date: Tue Mar 24 2026 - 21:44:51 EST
On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
> From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
>
> The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
> reparent non-hierarchical stats, the values passed to them might exceed
> the upper limit of the type int, so correct the val parameter type of them
> to long.
>
> Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
> ---
> include/trace/events/memcg.h | 10 +++++-----
> mm/memcontrol.c | 8 ++++----
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7fb9cbc10dfbb..4a78550f6174e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
>
> #ifdef CONFIG_MEMCG_V1
> static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
> - enum node_stat_item idx, int val);
> + enum node_stat_item idx, long val);
>
> void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
> struct mem_cgroup *parent, int idx)
> @@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
> * Normalize the value passed into memcg_rstat_updated() to be in pages. Round
> * up non-zero sub-page updates to 1 page as zero page updates are ignored.
> */
> -static int memcg_state_val_in_pages(int idx, int val)
> +static long memcg_state_val_in_pages(int idx, long val)
> {
> int unit = memcg_page_state_unit(idx);
Sashiko AI made an interesting argument [1] that this could lead to
incorrectly returning a very large positive number. Let me verify that.
[1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
Sashiko wrote:
> Does this change inadvertently break the handling of negative byte-sized
> updates?
> Looking at the rest of the function:
> if (!val || unit == PAGE_SIZE)
> return val;
> else
> return max(val * unit / PAGE_SIZE, 1UL);
> PAGE_SIZE is defined as an unsigned long.
Right, it's defined as 1UL << PAGE_SHIFT.
> When val is negative, such as during uncharging of byte-sized stats like
> MEMCG_ZSWAP_B, the expression val * unit is a negative long.
Right.
> Dividing a signed long by an unsigned long causes the signed long to be
> promoted to unsigned before division,
Right.
> resulting in a massive positive
> number instead of a small negative one.
Let's look at an example (assuming unit is 1).
val = val * unit = -16384 (-16 KiB)
val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
Yeah, that's a massive positive number.
Hmm but how did it work when it was int?
val = val * unit = -16384 (-16KiB)
val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
(int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
That's incorrect. It should have been -4?
> Before this change, the function returned an int, which implicitly truncated
> the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
> correct negative arithmetic value.
So... "accidentally yielding the correct negative arithemetic value"
is wrong.
Sounds like it's been subtly broken even before this patch and nobody
noticed.
> By changing the return type to long, this implicit truncation is removed,
> and the huge positive value is returned unaltered.
That's true.
> Could this corrupt tracepoint logs when passed to trace_mod_memcg_state?
I'm not sure if that's critical but yeah that's true.
> Also, would passing this huge positive value to memcg_rstat_updated instantly
> exceed the charge batch threshold and trigger endless, expensive global
> cgroup_rstat flushing, severely degrading system performance?
It would lead to more frequent flushes, at least.
--
Cheers,
Harry / Hyeonggon