Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling
From: Onur Özkan
Date: Thu Apr 09 2026 - 07:41:57 EST
On Fri, 03 Apr 2026 12:01:09 -0300
Daniel Almeida <daniel.almeida@xxxxxxxxxxxxx> wrote:
>
>
> > On 19 Mar 2026, at 08:08, Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, 13 Mar 2026 12:16:44 +0300
> > Onur Özkan <work@xxxxxxxxxxxxx> wrote:
> >
> >> +impl Controller {
> >> + /// Creates a [`Controller`] instance.
> >> + fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
> >> + let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
> >> +
> >> + Arc::pin_init(
> >> + try_pin_init!(Self {
> >> + pdev,
> >> + iomem,
> >> + pending: Atomic::new(false),
> >> + wq,
> >> + work <- kernel::new_work!("tyr::reset"),
> >> + }),
> >> + GFP_KERNEL,
> >> + )
> >> + }
> >> +
> >> + /// Processes one scheduled reset request.
> >> + ///
> >> + /// Panthor reference:
> >> + /// - drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
> >> + fn reset_work(self: &Arc<Self>) {
> >> + dev_info!(self.pdev.as_ref(), "GPU reset work is started.\n");
> >> +
> >> + // SAFETY: `Controller` is part of driver-private data and only exists
> >> + // while the platform device is bound.
> >> + let pdev = unsafe { self.pdev.as_ref().as_bound() };
> >> + if let Err(e) = run_reset(pdev, &self.iomem) {
> >> + dev_err!(self.pdev.as_ref(), "GPU reset failed: {:?}\n", e);
> >> + } else {
> >> + dev_info!(self.pdev.as_ref(), "GPU reset work is done.\n");
> >> + }
> >
> > Unfortunately, the reset operation is not as simple as instructing the
> > GPU to reset, it's a complex synchronization process where you need to
> > try to gracefully put various components on hold before you reset, and
> > then resume those after the reset is effective. Of course, with what we
> > currently have in-tree, there's not much to suspend/resume, but I think
> > I'd prefer to design the thing so we can progressively add more
> > components without changing the reset logic too much.
> >
> > I would probably start with a Resettable trait that has the
> > {pre,post}_reset() methods that exist in Panthor.
> >
> > The other thing we need is a way for those components to know when a
> > reset is about to happen so they can postpone some actions they were
> > planning in order to not further delay the reset, or end up with
> > actions that fail because the HW is already unusable. Not too sure how
> > we want to handle that though. Panthor is currently sprinkled with
> > panthor_device_reset_is_pending() calls in key places, but that's still
> > very manual, maybe we can automate that a bit more in Tyr, dunno.
> >
>
>
> We could have an enum where one of the variants is Resetting, and the other one
> gives access to whatever state is not accessible while resets are in progress.
>
> Something like
>
> pub enum TyrData {
> Active(ActiveTyrData),
> ResetInProgress(ActiveTyrData)
> }
>
> fn access() -> Option<&ActiveTyrData> {
> … // if the “ResetInProgress” variant is active, return None
> }
>
That's an interesting approach, but if it's all about `fn access` function, we
can already do that with a simple atomic state e.g.,:
// The state flag in reset::Controller
state: Atomic<ResetState>
fn access(&self) -> Option<&Arc<Devres<IoMem>>> {
match self.state.load(Relaxed) {
ResetState::Idle => Some(&self.iomem),
_ => None,
}
}
What do you think? Would this be sufficient?
Btw, a sample code snippet from the caller side would be very helpful for
designing this further. That part is kind a blurry for me.
Thanks,
Onur
>
> >> +
> >> + self.pending.store(false, Release);
> >> + }
> >> +}
>