Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling
From: Onur Özkan
Date: Thu Mar 19 2026 - 08:57:08 EST
Hi Boris,
On Thu, 19 Mar 2026 12:08:28 +0100
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:
> On Fri, 13 Mar 2026 12:16:44 +0300
> Onur Özkan <work@xxxxxxxxxxxxx> wrote:
>
> > +impl Controller {
> > + /// Creates a [`Controller`] instance.
> > + fn new(pdev: ARef<platform::Device>, iomem:
> > Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
> > + let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
> > +
> > + Arc::pin_init(
> > + try_pin_init!(Self {
> > + pdev,
> > + iomem,
> > + pending: Atomic::new(false),
> > + wq,
> > + work <- kernel::new_work!("tyr::reset"),
> > + }),
> > + GFP_KERNEL,
> > + )
> > + }
> > +
> > + /// Processes one scheduled reset request.
> > + ///
> > + /// Panthor reference:
> > + /// -
> > drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
> > + fn reset_work(self: &Arc<Self>) {
> > + dev_info!(self.pdev.as_ref(), "GPU reset work is
> > started.\n"); +
> > + // SAFETY: `Controller` is part of driver-private data and
> > only exists
> > + // while the platform device is bound.
> > + let pdev = unsafe { self.pdev.as_ref().as_bound() };
> > + if let Err(e) = run_reset(pdev, &self.iomem) {
> > + dev_err!(self.pdev.as_ref(), "GPU reset failed:
> > {:?}\n", e);
> > + } else {
> > + dev_info!(self.pdev.as_ref(), "GPU reset work is
> > done.\n");
> > + }
>
> Unfortunately, the reset operation is not as simple as instructing the
> GPU to reset, it's a complex synchronization process where you need to
> try to gracefully put various components on hold before you reset, and
> then resume those after the reset is effective. Of course, with what
> we currently have in-tree, there's not much to suspend/resume, but I
> think I'd prefer to design the thing so we can progressively add more
> components without changing the reset logic too much.
>
> I would probably start with a Resettable trait that has the
> {pre,post}_reset() methods that exist in Panthor.
Yeah, I checked Panthor and it has these functions for the reset logic.
I will implement that in v2 and will dig further to see if there is
anything else to cover in regards to proper syncing.
>
> The other thing we need is a way for those components to know when a
> reset is about to happen so they can postpone some actions they were
> planning in order to not further delay the reset, or end up with
> actions that fail because the HW is already unusable. Not too sure how
> we want to handle that though. Panthor is currently sprinkled with
> panthor_device_reset_is_pending() calls in key places, but that's
> still very manual, maybe we can automate that a bit more in Tyr,
> dunno.
>
Hmm, sounds like a perfect guard use-case. Is it possible to require
users to access the hw behind a guard (e.g., try_access_hw())? We would
then check if a reset is in progress and hold the user or return an
error until the reset is complete.
Thanks,
Onur
> > +
> > + self.pending.store(false, Release);
> > + }
> > +}