RE: [PATCH] drm/amdkfd: fix integer overflow in get_queue_ids()

From: Deucher, Alexander

Date: Tue May 26 2026 - 17:32:48 EST


AMD General

> -----Original Message-----
> From: Muhammad Bilal <meatuni001@xxxxxxxxx>
> Sent: Saturday, May 23, 2026 10:27 AM
> To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; airlied@xxxxxxxxx; simona@xxxxxxxx; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; stable@xxxxxxxxxxxxxxx; Muhammad Bilal
> <meatuni001@xxxxxxxxx>
> Subject: [PATCH] drm/amdkfd: fix integer overflow in get_queue_ids()
>
> get_queue_ids() computes the allocation size as:
>
> size_t array_size = num_queues * sizeof(uint32_t);
>
> num_queues is a user-controlled u32 copied directly from the ioctl argument
> (args.suspend_queues.num_queues or args.resume_queues.num_queues)
> via kfd_ioctl_set_debug_trap() with no prior validation or clamping.
>
> On 32-bit kernels, size_t is 32 bits wide. A caller supplying num_queues =
> 0x40000001 causes the multiplication to silently wrap:
>
> 0x40000001 * 4 = 0x100000004 -> truncated to 0x4
>
> memdup_user() then allocates only 4 bytes. q_array_invalidate() is called
> immediately after with the original num_queues value and iterates
> 0x40000001 times writing KFD_DBG_QUEUE_INVALID_MASK into the 4-byte
> buffer, producing an unbounded heap buffer overflow.
> q_array_get_index() in both callers walks the same buffer using the same
> unchecked count.
>
> Both call sites are affected:
> - suspend_queues() calls get_queue_ids() unconditionally
> - resume_queues() calls it only when usr_queue_id_array is non-NULL
>
> Both callers already propagate IS_ERR() returns to userspace, so returning
> ERR_PTR(-EINVAL) on overflow requires no new error handling.
>
> The copy_to_user() calls at the tail of both functions also compute
> num_queues * sizeof(uint32_t), but are only reachable after a successful
> get_queue_ids() return, so they are safe once the allocation is correctly
> bounded.
>
> Fix by replacing the unchecked multiplication with check_mul_overflow().
> Cast num_queues to size_t so all three arguments match the destination type,
> avoiding implicit type mismatch on compilers that implement the macro with
> typeof() rather than __builtin_mul_overflow() directly.
> Add an explicit #include <linux/overflow.h> rather than relying on the
> transitive pull through linux/slab.h.
>
> Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process
> queues operation")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Muhammad Bilal <meatuni001@xxxxxxxxx>

Thanks for the patch. I think it should already be fixed with this patch:
https://lists.freedesktop.org/archives/amd-gfx/2026-May/144364.html

Alex

> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e0a31e11f0ff..c08ad718dbd7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -25,6 +25,7 @@
> #include <linux/ratelimit.h>
> #include <linux/printk.h>
> #include <linux/slab.h>
> +#include <linux/overflow.h>
> #include <linux/list.h>
> #include <linux/types.h>
> #include <linux/bitops.h>
> @@ -3308,11 +3309,14 @@ static void copy_context_work_handler(struct
> work_struct *work)
>
> static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t
> *usr_queue_id_array) {
> - size_t array_size = num_queues * sizeof(uint32_t);
> + size_t array_size;
>
> if (!usr_queue_id_array)
> return NULL;
>
> + if (check_mul_overflow((size_t)num_queues, sizeof(uint32_t),
> &array_size))
> + return ERR_PTR(-EINVAL);
> +
> return memdup_user(usr_queue_id_array, array_size); }
>
> --
> 2.53.0