Re: [PATCH 07/14] KVM: arm64: Restrict host access to the ITS tables

From: Fuad Tabba

Date: Mon Mar 16 2026 - 12:22:15 EST


Hi Sebastian,

On Tue, 10 Mar 2026 at 12:49, Sebastian Ene <sebastianene@xxxxxxxxxx> wrote:
>
> Setup shadow structures for ITS indirect tables held in
> the GITS_BASER<n> registers.
> Make the last level of the Device Table and vPE Table
> inacessible to the host.

inacessible -> inaccessible
> In a direct layout configuration, donate the table to
> the hypervisor since the software is not expected to
> program them directly.

This commit message is too brief and doesn't fully explain the
problem, the impact, and the mechanism of the solution. It also
appears to contradict the actual code changes.

For example, could you elaborate why must the last level of indirect
tables be inaccessible?

Can you also please explain the mechanism? You are parsing
GITS_BASER_INDIRECT to determine if a shadow Level 1 table must be
shared with the host, while unconditionally donating the original
physical tables. You also explicitly exclude Collection tables. The
msg should briefly justify why Collection tables are safe to leave
accessible to the host.

There is also a contradiction in the message. You state "In a direct
layout configuration, donate the table...". However, your code donates
the original hardware table unconditionally on every iteration of the
loop, regardless of whether GITS_BASER_INDIRECT is set. Please ensure
the commit log accurately reflects the code implementation.

Maybe you could say that the problem is Host DMA attacks via ITS table
manipulation. Whereas the mechanism is to unconditionally donate
hardware tables to EL2. For indirect Device/vPE tables, share a L1
shadow table with the host and strictly donate the L2 pages to prevent
the host from writing malicious L2 pointers.

>
> Signed-off-by: Sebastian Ene <sebastianene@xxxxxxxxxx>
> ---
> arch/arm64/kvm/hyp/nvhe/its_emulate.c | 143 ++++++++++++++++++++++++++
> 1 file changed, 143 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/its_emulate.c b/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> index 4a3ccc90a1a9..865a5d6353ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> +++ b/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> @@ -141,6 +141,145 @@ static struct pkvm_protected_reg *get_region(phys_addr_t dev_addr)
> return NULL;
> }
>
> +static int pkvm_host_unmap_last_level(void *shadow, size_t num_pages, u32 psz)
> +{
> + u64 *table = shadow;
> + int ret, i, end = (num_pages << PAGE_SHIFT) / sizeof(table);
> + phys_addr_t table_addr;

RCT, mixing initialized variables and uninitialized variables, plus
variables of conceptually different "types" in the same declaration.

Please use sizeof(*table): sizeof(table) evaluates to the size of the
pointer (8 bytes), NOT the size of the array element. In this case,
this happens to be the same, but it's still wrong.

Maybe the following is clearer:
+ int end = num_pages * (PAGE_SIZE / sizeof(*table));


> +
> + for (i = 0; i < end; i++) {
> + if (!(table[i] & GITS_BASER_VALID))
> + continue;
> +
> + table_addr = table[i] & PHYS_MASK;
> + ret = __pkvm_host_donate_hyp(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);

The ITS-configured page size and the host page size could be
different, but the number of pages to donate for Level 2 tables is
calculated based on psz (the ITS).

If the ITS hardware is configured for 4KB pages, but the host kernel
is using (e.g.,) 64KB pages, psz >> PAGE_SHIFT evaluates to 0.

You need to account for mismatched page sizes, perhaps by using
DIV_ROUND_UP(psz, PAGE_SIZE) (or something similar) to ensure the
containing host page is donated.

> + if (ret)
> + goto err_donate;
> + }
> +
> + return 0;
> +err_donate:
> + for (i = i - 1; i >= 0; i--) {

Please use the while (i--) idiom for rollback loops.


> + if (!(table[i] & GITS_BASER_VALID))
> + continue;
> +
> + table_addr = table[i] & PHYS_MASK;
> + __pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);

Please wrap this in WARN_ON(...). If donating back to the host fails
during a rollback, we have a fatal page leak that needs to be loudly
flagged, similar to how you handle it in pkvm_unshare_shadow_table.


> + }
> + return ret;
> +}
> +
> +static int pkvm_share_shadow_table(void *shadow, u64 nr_pages)
> +{
> + u64 i, ret, start_pfn = hyp_virt_to_pfn(shadow);

Same comment as before with RCT and the mixing of declarations.


> +
> + for (i = 0; i < nr_pages; i++) {
> + ret = __pkvm_host_share_hyp(start_pfn + i);
> + if (ret)
> + goto unshare;
> + }
> +
> + ret = hyp_pin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
> + if (ret)
> + goto unshare;
> +
> + return ret;
> +unshare:

Please use the while (i--) idiom for rollback loops.

Also, please use consistent naming conventions for the labels. Here
you call it unshare, and earlier it was err_donate.


> + for (i = i - 1; i >= 0; i--)
> + __pkvm_host_unshare_hyp(start_pfn + i);
> + return ret;
> +}
> +
> +static void pkvm_unshare_shadow_table(void *shadow, u64 nr_pages)
> +{
> + u64 i, start_pfn = hyp_virt_to_pfn(shadow);
> +
> + hyp_unpin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
> +
> + for (i = 0; i < nr_pages; i++)
> + WARN_ON(__pkvm_host_unshare_hyp(start_pfn + i));
> +}
> +
> +static void pkvm_host_map_last_level(void *shadow, size_t num_pages, u32 psz)
> +{
> + u64 *table;

RCT, and you forgot to initialize table:
+ u64 *table = shadow;

> + int i, end = (num_pages << PAGE_SHIFT) / sizeof(table);

Same sizeof(table) pointer-size bug as above.


> + phys_addr_t table_addr;
> +
> + for (i = 0; i < end; i++) {
> + if (!(table[i] & GITS_BASER_VALID))
> + continue;
> +
> + table_addr = table[i] & ~GITS_BASER_VALID;

Inconsistent masking logic, since in pkvm_host_unmap_last_level you
correctly used PHYS_MASK to extract the address, but here in the
rollback path you use ~GITS_BASER_VALID.

While both currently work because the upper bits and lower bits (below
the page size) are defined as RES0 in the GIC spec, ~GITS_BASER_VALID
is architecturally fragile. If a future hardware revision repurposes
the upper RES0 bits [62:52] for new attributes (e.g., memory
encryption flags), ~GITS_BASER_VALID will leak those attribute bits
into the physical address calculation.

Since PHYS_MASK correctly handles the address extraction across all
page sizes (relying on the lower bits being RES0) and safely masks off
future upper attribute bits, please standardize on using table_addr =
table[i] & PHYS_MASK; for both functions.


> + WARN_ON(__pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT));
> + }
> +}
> +
> +static int pkvm_setup_its_shadow_baser(struct its_shadow_tables *shadow)
> +{
> + int i, ret;
> + u64 baser_val, num_pages, type;
> + void *base, *host_base;
> +
> + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> + baser_val = shadow->tables[i].val;
> + if (!(baser_val & GITS_BASER_VALID))
> + continue;
> +
> + base = kern_hyp_va(shadow->tables[i].base);
> + num_pages = (1 << shadow->tables[i].order);
> +
> + ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn(base), num_pages);
> + if (ret)
> + goto err_donate;
> +
> + if (baser_val & GITS_BASER_INDIRECT) {
> + host_base = kern_hyp_va(shadow->tables[i].shadow);
> + ret = pkvm_share_shadow_table(host_base, num_pages);
> + if (ret)
> + goto err_with_donation;
> +
> + type = GITS_BASER_TYPE(baser_val);
> + if (type == GITS_BASER_TYPE_COLLECTION)
> + continue;
> +
> + ret = pkvm_host_unmap_last_level(base, num_pages,
> + shadow->tables[i].psz);
> + if (ret)
> + goto err_with_share;
> + }
> + }
> +
> + return 0;
> +err_with_share:
> + pkvm_unshare_shadow_table(host_base, num_pages);
> +err_with_donation:
> + __pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages);
> +err_donate:
> + for (i = i - 1; i >= 0; i--) {

Please use the while (i--) idiom for rollback loops.


> + baser_val = shadow->tables[i].val;
> + if (!(baser_val & GITS_BASER_VALID))
> + continue;
> +
> + base = kern_hyp_va(shadow->tables[i].base);
> + num_pages = (1 << shadow->tables[i].order);
> +
> + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));

The sequence of rollback operations here creates a TOCTOU vulnerability.

- First, you donate base (the Level 1 indirect table) back to the host.
- Then, you pass base into pkvm_host_map_last_level().
- Finally, pkvm_host_map_last_level() reads table[i] out of base to
determine which Level 2 pages to donate back to the host.

Because the host regains ownership of base _first_, it can be running
concurrently on another CPU. A malicious host can overwrite the Level
1 table with pointers to arbitrary hypervisor-owned memory. The
hypervisor will then read those malicious pointers and dutifully grant
the host access to its own secure memory.

The order of operations needs to be reversed: you must read base to
roll back the L2 pages, unshare the shadow table, and *only then*
donate base back to the host.

Also, num_pages = (1 << shadow->tables[i].order); calculates a 32-bit
signed integer because the literal 1 is a signed 32-bit int. If order
is 31, this evaluates to a negative number. If order is 32 or higher,
this is undefined behavior. Because num_pages is declared as a u64,
you should use the standard kernel macro BIT_ULL().

Here's my suggested fix (not tested). Reorder the operations to safely
rollback L2 before donating L1, use the standard `while (i--)` loop,
and fix the page calculation:

+ while (i--) {
+ baser_val = shadow->tables[i].val;
+ if (!(baser_val & GITS_BASER_VALID))
+ continue;
+
+ base = kern_hyp_va(shadow->tables[i].base);
+ num_pages = BIT_ULL(shadow->tables[i].order);
+
+ if (baser_val & GITS_BASER_INDIRECT) {
+ host_base = kern_hyp_va(shadow->tables[i].shadow);
+
+ type = GITS_BASER_TYPE(baser_val);
+ if (type != GITS_BASER_TYPE_COLLECTION)
+ pkvm_host_map_last_level(base, num_pages,
+ shadow->tables[i].psz);
+
+ pkvm_unshare_shadow_table(host_base, num_pages);
+ }
+
+ WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));
+ }



> + if (baser_val & GITS_BASER_INDIRECT) {
> + host_base = kern_hyp_va(shadow->tables[i].shadow);
> + pkvm_unshare_shadow_table(host_base, num_pages);
> +
> + type = GITS_BASER_TYPE(baser_val);
> + if (type == GITS_BASER_TYPE_COLLECTION)
> + continue;
> +
> + pkvm_host_map_last_level(base, num_pages, shadow->tables[i].psz);
> + }
> + }

You have duplicated the entire table decoding logic (calculating base,
num_pages, checking INDIRECT...) down here in the rollback path.
Consider abstracting "setup one table" and "teardown one table" into
helper functions to make pkvm_setup_its_shadow_baser more readable and
less prone to copy-pasta errors.

Cheers,
/fuad


> +
> + return ret;
> +}
> +
> static int pkvm_setup_its_shadow_cmdq(struct its_shadow_tables *shadow)
> {
> int ret, i, num_pages;
> @@ -205,6 +344,10 @@ int pkvm_init_gic_its_emulation(phys_addr_t dev_addr, void *host_priv_state,
> if (ret)
> goto err_with_shadow;
>
> + ret = pkvm_setup_its_shadow_baser(shadow);
> + if (ret)
> + goto err_with_shadow;
> +
> its_reg->priv = priv_state;
>
> hyp_spin_lock_init(&priv_state->its_lock);
> --
> 2.53.0.473.g4a7958ca14-goog
>