Re: [RFC PATCH v6 3/5] mm: Hot page tracking and promotion - pghot
From: Donet Tom
Date: Thu Apr 30 2026 - 03:07:25 EST
Hi Bharata
On 4/27/26 10:54 AM, Bharata B Rao wrote:
On 24-Apr-26 6:27 PM, Donet Tom wrote:
I am accumulating two stats here: How many hot page intimations pghot obtained+int pghot_record_access(unsigned long pfn, int nid, int src, unsigned long now)
+{
+ struct mem_section *ms;
+ struct folio *folio;
+ phi_t *phi, *hot_map;
+ struct page *page;
+
+ if (!kmigrated_started)
+ return 0;
+
+ if (!pghot_nid_valid(nid))
+ return -EINVAL;
+
+ switch (src) {
+ case PGHOT_HINTFAULTS:
+ if (!static_branch_unlikely(&pghot_src_hintfaults))
+ return 0;
+ count_vm_event(PGHOT_RECORDED_HINTFAULTS);
+ break;
+ case PGHOT_HWHINTS:
+ if (!static_branch_unlikely(&pghot_src_hwhints))
+ return 0;
+ count_vm_event(PGHOT_RECORDED_HWHINTS);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * Record only accesses from lower tiers.
+ */
+ if (node_is_toptier(pfn_to_nid(pfn)))
+ return 0;
Just a thought—could we check this at the beginning of the function, before the
switch case?
that are attributable to different sources
and
out of them, how many turned out to be actionable
Understood. Thanks for the clarification.
which is this ^
+
+ /*
+ * Reject the non-migratable pages right away.
+ */
+ page = pfn_to_online_page(pfn);
+ if (!page || is_zone_device_page(page))
+ return 0;
+
+ folio = page_folio(page);
+ if (!folio_try_get(folio))
+ return 0;
+
+ if (unlikely(page_folio(page) != folio))
+ goto out;
+
+ if (!folio_test_lru(folio))
+ goto out;
+
+ /* Get the hotness slot corresponding to the 1st PFN of the folio */
+ pfn = folio_pfn(folio);
+ ms = __pfn_to_section(pfn);
+ if (!ms || !ms->hot_map)
+ goto out;
+
+ hot_map = (phi_t *)(((unsigned long)(ms->hot_map)) &
~PGHOT_SECTION_HOT_MASK);
+ phi = &hot_map[pfn % PAGES_PER_SECTION];
+
+ count_vm_event(PGHOT_RECORDED_ACCESSES);
I didn't just yet optimize the walk. Since there is one kmigrated thread per+static void kmigrated_do_work(pg_data_t *pgdat)
+{
+ unsigned long section_nr, s_begin, start_pfn;
+ struct mem_section *ms;
+ int nid;
+
+ clear_bit(PGDAT_KMIGRATED_ACTIVATE, &pgdat->flags);
+ s_begin = next_present_section_nr(-1);
+ for_each_present_section_nr(s_begin, section_nr) {
+ start_pfn = section_nr_to_pfn(section_nr);
I may be missing something, but in pghot_setup_hot_map() and kmigrated_do_work()
we seem to iterate over all memory sections. On large memory systems, could this
become a bottleneck right?
Since hot_map is allocated only for lower-tier memory and the hotness
information is primarily used there, would it make sense to skip scanning
higher-tier sections?
for_each_online_node(nid) {
if (node_is_toptier(nid))
continue;
start_pfn = node_start_pfn(nid);
end_pfn = node_end_pfn(nid);
s_begin = pfn_to_section_nr(start_pfn);
for_each_present_section_nr(s_begin, section_nr) {
}
}
Would this approach be reasonable, or am I overlooking something?
lower tier, this routine already is aware of which node to scan. We can limit
the section walk to that node instead. Something like this:
static void kmigrated_do_work(pg_data_t *pgdat)
{
unsigned long section_nr, s_begin, start_pfn, end_pfn;
struct mem_section *ms;
int nid = pgdat->node_id;
start_pfn = SECTION_ALIGN_DOWN(node_start_pfn(nid));
end_pfn = SECTION_ALIGN_UP(start_pfn + node_end_pfn(nid));
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
section_nr = pfn_to_section_nr(pfn);
if (!present_section_nr(section_nr))
continue;
ms = __nr_to_section(section_nr);
...
kmigrated_walk_zone(pfn, pfn + PAGES_PER_SECTION, nid);
}
}
Thanks. This looks good to me.
There is a !fail check in the for-loop due to which we break when alloc fails.+static int pghot_online_sec_hotmap(unsigned long start_pfn,I may be missing something, but after pghot_alloc_hot_map fails, we continue the
+ unsigned long nr_pages)
+{
+ int nid = pfn_to_nid(start_pfn);
+ unsigned long start, end, pfn;
+ struct mem_section *ms;
+ int fail = 0;
+
+ start = SECTION_ALIGN_DOWN(start_pfn);
+ end = SECTION_ALIGN_UP(start_pfn + nr_pages);
+
+ for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
+ ms = __pfn_to_section(pfn);
+ if (!ms || ms->hot_map)
+ continue;
+
+ fail = pghot_alloc_hot_map(ms, nid);
loop. Would it make sense to break and go to the cleanup logic instead?
My bad, I missed it.
-Donet
Regards,
Bharata.