RE: [PATCH v4 1/1] printk: fix zero-valued printk timestamps in early boot

From: Bird, Tim

Date: Wed Apr 15 2026 - 19:31:51 EST


> -----Original Message-----
> From: Roberto A. Foglietta <roberto.foglietta@xxxxxxxxx>
> On Wed, 15 Apr 2026 at 23:57, Roberto A. Foglietta
> <roberto.foglietta@xxxxxxxxx> wrote:
> >
>
> [...]
>
> > V5
> >
> > A single file with the essential for hacking the early boot,
>
> early_times.h -- it means before every feature offered by the kernel
> is available or 100% trustable, thus ASM because it is also supposed
> to fix what is broken in the kernel thus using the kernel macros isn't
> the correct way to do so. Early times means before the kernel, thus
> also out-of-it-three or feature support. Them asm (isb) sounds too
> drastic or not enough for "exhotic" arm arch? It is a template to
> start with, not a working part of the kernel. When the kernel works,
> !0 == true is true.

Roberto,

I appreciate your work to address deficiencies in the proposed
patch, but I don't follow the above paragraph. There's nothing
"broken" about the kernel to fix, just an opportunity to
instrument more code during bootup than is currently supported.

I think Thomas has demonstrated that this does not have a lot of value
for x86_64 platforms (where the blind spot is already quite small), but
on ARM64 and RISCV I think there is some value to developers working
on boot time.

> The concept remains the same: a register counter read protected by
> fencing. Not more, not less than the essential. And a bucket
> out-of-the-three as a starting point **before** the kernel is ready or
> fixed. Written in this way, does it sound more reasonable?

I don't know what "bucket out-of-the-three" means?

Thomas has expressed concerns about a number of issues with the latest v4
patch which I haven't responded to yet. I was trying to reproduce the bug
report with clang compilation (reported by 0-day on Sunday), and so far have
not been able to. I have a proposed solution for that, and for other feedback received.
But I hesitate to make a V5 patch before I reproduce the problem on my system
and verify my fix for that bug report.

Honestly, given how strongly Thomas feels about some of the items, this may end
up not being accepted upstream. But rather than give up at this point, I'd like to
at least address all feedback received and issue one more patch for consideration.
If it doesn't get accepted, I will have received some valuable feedback which will
help make the (out-of-tree) patch as general and as conformant with kernel standards
as possible, which is useful.

With regard to that, I'll note that sprinkling early_unsafe_cycles() definitions
around arch directories will make an out-of-tree patch less likely to apply, so
there's some tension in how the patch should be structured for upstream
acceptance and out-of-tree maintenance.

For those interested, I'm trying to address the following:
- use of arch-specific ifdefs in generic headers
- warning about divide by zero
- there are multiple ways to address this, and I want to use the
most maintainable (least source-code-disrupting and confusing)
one going forward. I liked David Laight's solution (val?:1), but it may not
be needed with other changes I plan to make.
- use of mul_u64_u32_div() for math, so that 32-bit systems which
might use this in the future are supported seamlessly
- I thought the compiler would do better with optimizing the math
(with all the values being constants except for the cycles value).
However, testing on different platforms (and different cycles KHz settings)
has given mixed results.
- Examination of the assembly code produced seems to indicate that
using mul_u64_u32_div() is likely the best option.
- Given that this feature is not intended to be used in production kernels,
I don't think that optimizing the instruction sequence for something that
occurs less than a hundred times in early boot (the upper bound on existing
blind spot printks that I've seen) will matter, for instrumentation purposes.
The amount of cycles that printk uses for message buffering, formatting and
output completely swamps the cost of the cycles-to-nanoseconds conversion.
- use of NSEC_ prefix instead of NS_ prefix
- addition of fences to the assembly for reading the cycle-counter.
- I need to consider whether these are needed or not. I believe you said that
you got mangled values without them. Can you confirm that?

Thanks again for your proposals and feedback. And thanks for testing the code out
in different environments.
-- Tim

P.S. Thomas - I appreciate your additional feedback, and will work to address it. Sorry
I haven't responded sooner.