On x86-32

From: Joshua Hudson

Date: Wed May 27 2026 - 22:53:09 EST


A couple of years ago I reached for x86-32 to solve a niche
problem; and ended up not using it because I couldn't obtain
enough information to use it.

The problem in question: I needed a potentially _very large_
contiguous arena of RAM, and couldn't figure out how to get
it while ASLR was enabled.

The intended solution: build the binary as an x86-32 binary
but use normal x86-64 system calls. Assuming this works
like I think it does, the result is everything is loaded into the
first 4GB of RAM, but mmap() with explicit offsets starting
at 4GB can reach from there up to the stratosphere and
never encounter anything.

The other solution I tried: using a stack segment declaration
in the binary to say put the stack right after the program,
while still allowing the program start address to float. Turns
out that the base address of the stack in the stack segment
declaration gets ignored. My program uses very little stack;
but I had to declare quite a bit in case the incoming
environment was large.

The solution I ended up using: declare a 16GB bss knowing
it won't actually allocate the RAM until written to and hope
it's enough for the worst case. The page penalty didn't seem
too high. But if the worst case were a hundred times worse
I shudder to see how the page table penalty is not an issue
after rebuilding it.

----

I'm kind of baffled why x86-32 has its own system call table.
It would to me have made more sense to either a) use the
x86 system call gate or b) set a flag in the process saying
don't random-allocate memory above 4GB. Incidentally,
you *can* call the x86 system call gate from an x86-64 binary
today, and it acts like an x86 system call, as has confused
quite a few programmers in the past.