Re: [PATCH 2/3] mm: memcontrol: correct the parameter type of __mod_memcg{_lruvec}_state()
From: Qi Zheng
Date: Wed Mar 25 2026 - 03:27:20 EST
On 3/25/26 1:17 PM, Harry Yoo (Oracle) wrote:
On Wed, Mar 25, 2026 at 11:25:06AM +0800, Qi Zheng wrote:
On 3/25/26 9:43 AM, Harry Yoo (Oracle) wrote:
On Tue, Mar 24, 2026 at 07:31:28PM +0800, Qi Zheng wrote:
From: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
The __mod_memcg_state() and __mod_memcg_lruvec_state() were used to
reparent non-hierarchical stats, the values passed to them might exceed
the upper limit of the type int, so correct the val parameter type of them
to long.
Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
include/trace/events/memcg.h | 10 +++++-----
mm/memcontrol.c | 8 ++++----
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7fb9cbc10dfbb..4a78550f6174e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -527,7 +527,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
#ifdef CONFIG_MEMCG_V1
static void __mod_memcg_lruvec_state(struct mem_cgroup_per_node *pn,
- enum node_stat_item idx, int val);
+ enum node_stat_item idx, long val);
void reparent_memcg_lruvec_state_local(struct mem_cgroup *memcg,
struct mem_cgroup *parent, int idx)
@@ -784,7 +784,7 @@ static int memcg_page_state_unit(int item);
* Normalize the value passed into memcg_rstat_updated() to be in pages. Round
* up non-zero sub-page updates to 1 page as zero page updates are ignored.
*/
-static int memcg_state_val_in_pages(int idx, int val)
+static long memcg_state_val_in_pages(int idx, long val)
{
int unit = memcg_page_state_unit(idx);
Sashiko AI made an interesting argument [1] that this could lead to
incorrectly returning a very large positive number. Let me verify that.
[1] https://sashiko.dev/#/patchset/cover.1774342371.git.zhengqi.arch%40bytedance.com
Sashiko wrote:
Does this change inadvertently break the handling of negative byte-sized
updates?
Looking at the rest of the function:
if (!val || unit == PAGE_SIZE)
return val;
else
return max(val * unit / PAGE_SIZE, 1UL);
PAGE_SIZE is defined as an unsigned long.
Right, it's defined as 1UL << PAGE_SHIFT.
When val is negative, such as during uncharging of byte-sized stats like
MEMCG_ZSWAP_B, the expression val * unit is a negative long.
Right.
Dividing a signed long by an unsigned long causes the signed long to be
promoted to unsigned before division,
Right.
resulting in a massive positive
number instead of a small negative one.
Let's look at an example (assuming unit is 1).
val = val * unit = -16384 (-16 KiB)
val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
max(0x3FFFFFFFFFFFFF, 1UL) = 0x3FFFFFFFFFF
Yeah, that's a massive positive number.
Hmm but how did it work when it was int?
val = val * unit = -16384 (-16KiB)
val * unit / PAGE_SIZE = 0xFFFFFFFFFFFFC000 / PAGE_SIZE = 0x3FFFFFFFFFFFFF
max(val * unit / PAGE_SIZE, 1UL) = 0x3FFFFFFFFFFFFF
(int)0x3FFFFFFFFFFFFF = 0xFFFFFFFF = (-1)
That's incorrect. It should have been -4?
Before this change, the function returned an int, which implicitly truncated
the massive unsigned 64-bit result to a 32-bit int, accidentally yielding the
correct negative arithmetic value.
So... "accidentally yielding the correct negative arithemetic value"
is wrong.
Sounds like it's been subtly broken even before this patch and nobody
noticed.
Thank you for such a detailed analysis! And I think you are right.
No problem ;)
The memcg_state_val_in_pages() is only to make @val to be in pages, so
perhaps we can avoid the above problem by taking the absolute value
first?
For more clearly, here is the diff:
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6491ca74b3398..2b34805b01476 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -787,11 +787,17 @@ static int memcg_page_state_unit(int item);
static long memcg_state_val_in_pages(int idx, long val)
{
int unit = memcg_page_state_unit(idx);
+ unsigned long uval;
+ long res;
if (!val || unit == PAGE_SIZE)
return val;
- else
- return max(val * unit / PAGE_SIZE, 1UL);
+
+ uval = val < 0 ? -(unsigned long)val : (unsigned long)val;
+
+ res = max(mult_frac(uval, unit, PAGE_SIZE), 1UL);
+
+ return val < 0 ? -res : res;
}
#ifdef CONFIG_MEMCG_V1
That would work for memcg_rstat_updated(),
but not for trace_mod_memcg_state()?
Why? The trace_mod_memcg_state() seems to accept 'long' type,
am I missing something?
Thanks,
Qi