Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer

From: Matthew Brost

Date: Tue Mar 17 2026 - 01:11:19 EST

On Mon, Mar 16, 2026 at 11:25:23AM +0100, Danilo Krummrich wrote:
> On Mon Mar 16, 2026 at 5:32 AM CET, Matthew Brost wrote:
> > Diverging requirements between GPU drivers using firmware scheduling
> > and those using hardware scheduling have shown that drm_gpu_scheduler is
> > no longer sufficient for firmware-scheduled GPU drivers. The technical
> > debt, lack of memory-safety guarantees, absence of clear object-lifetime
> > rules, and numerous driver-specific hacks have rendered
> > drm_gpu_scheduler unmaintainable. It is time for a fresh design for
> > firmware-scheduled GPU drivers—one that addresses all of the
> > aforementioned shortcomings.
>
> I think we all agree on this and I also think we all agree that this should have
> been a separate component in the first place -- and just to be clear, I am
> saying this in retrospective.

Yes. Tvrtko actually suggested this years ago, and in my naïveté I
rejected it. I’m eating my hat here.

>
> In fact, this is also the reason why I proposed building the Rust component
> differently, i.e. start with a Joqueue (or drm_dep as called in this patch) and
> expand as needed with a loosely coupled "orchestrator" for drivers with strictly
> limited software/hardware queues later.

Yes, I actually have a hardware-scheduling layer built on top of drm_dep
[1] after hacking for several hours today. It’s very unlikely to
actually work since I’m typing blind without a test platform, but it
conceptually proves that this layer separation works and is clean.

[1] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/22c8aa993b5c9e4ad0c312af2f3e032273d20966

>
> The reason I proposed a new component for Rust, is basically what you also wrote
> in your cover letter, plus the fact that it prevents us having to build a Rust
> abstraction layer to the DRM GPU scheduler.
>
> The latter I identified as pretty questionable as building another abstraction
> layer on top of some infrastructure is really something that you only want to do
> when it is mature enough in terms of lifetime and ownership model.
>

I personally don’t think the language matters that much. I care about
lifetime, ownership, and teardown semantics. I believe I’ve made this
clear in C, so the Rust bindings should be trivial.

> I'm not saying it wouldn't be possible, but as mentioned in other threads, I
> don't think it is a good idea building new features on top of something that has
> known problems, even less when they are barely resolvable due to other existing
> dependencies, such as some drivers relying on implementation details
> historically, etc.
>

It’s a new component, well thought out and without any baggage, so I
don’t understand the above statement. Invariants and annotations
everywhere (e.g., you cannot abuse this).

> My point is, the justification for a new Jobqueue component in Rust I consider
> given by the fact that it allows us to avoid building another abstraction layer
> on top of DRM sched. Additionally, DRM moves to Rust and gathering experience
> with building native Rust components seems like a good synergy in this context.
>

If I knew Rust off-hand, I would have written it in Rust :). Perhaps
this is an opportunity to learn. But I think the Rust vs. C holy war
isn’t in scope here. The real questions are what semantics we want, the
timeline, and maintainability. Certainly more people know C, and most
drivers are written in C, so having the common component in C makes more
sense at this point, in my opinion. If the objection is really about the
language, I’ll rewrite it in Rust.

> Having that said, the obvious question for me for this series is how drm_dep
> fits into the bigger picture.
>
> I.e. what is the maintainance strategy?
>

I will commit to maintaining code I believe in, and immediately write
the bindings on top of this so they’re maintained from day one.

> Do we want to support three components allowing users to do the same thing? What
> happens to DRM sched for 1:1 entity / scheduler relationships?
>
> Is it worth? Do we have enough C users to justify the maintainance of yet
> another component? (Again, DRM moves into the direction of Rust drivers, so I
> don't know how many new C drivers we will see.) I.e. having this component won't
> get us rid of the majority of DRM sched users.
>

Actually, with [1], I’m fairly certain that pretty much every driver
could convert to this new code. Part of the problem, though, is that
when looking at this, multiple drivers clearly break dma-fencing rules,
so an annotated component like DRM dep would explode their drivers. Not
to mention the many driver-side hacks that each individual driver would
need to drop (e.g., I would not be receptive to any driver directly
touching drm_dep object structs).

> What are the expected improvements? Given the above, I'm not sure it will

Clear object model and lifetimes — therefore memory-safe. Bypass paths
for compositors, compute, and kernel page-fault handlers. No kicking a
worker just to drop a ref; asynchronous teardown (e.g., user hits Ctrl‑C
and returns). Reclaim-safe final puts of queues, and built‑in
driver-unload barriers. Maintainable, as I understand every single LOC,
with verbose documentation (generated with Copilot, but I’ve reviewed it
multiple times and it’s correct), etc.

Regardless, given all of the above, at a minimum my driver needs to move
on one way or another.

> actually decrease the maintainance burdon of DRM sched.

We can deprecate DRM sched, which is now possible as of [1]. I can
commit to compile-testing most drivers, aside from the ones with the
horrible hacks.

Matt