On Thu, Apr 17, 2025 at 03:20:44PM +0100, Tvrtko Ursulin wrote:
On 17/04/2025 13:11, Danilo Krummrich wrote:
On Thu, Apr 17, 2025 at 12:27:29PM +0100, Tvrtko Ursulin wrote:
On 17/04/2025 08:45, Philipp Stanner wrote:
On Mon, 2025-04-07 at 17:22 +0200, Philipp Stanner wrote:
Problem exactly is that jobs can outlive the entities and the scheduler,
while some userspace may have a dma fence reference to the job via sync
file. This new callback would not solve it for xe, but if everything
required was reference counted it would.
I think you're mixing up the job and the dma_fence here, if a job outlives the
scheduler, it clearly is a bug, always has been.
AFAIK, Xe reference counts it's driver specific job structures *and* the driver
specific scheduler structure, such that drm_sched_fini() won't be called before
all jobs have finished.
Yes, sorry, dma fence. But it is not enough to postpone drm_sched_fini until
the job is not finished. Problem is exported dma fence holds the pointer to
drm_sched_fence (and so oopses in drm_sched_fence_get_timeline_name on
fence->sched->name) *after* job had finished and driver was free to tear
everything down.
Well, that's a bug in drm_sched_fence then and independent from the other topic.
Once the finished fence in a struct drm_sched_fence has been signaled it must
live independent of the scheduler.
The lifetime of the drm_sched_fence is entirely independent from the scheduler
itself, as you correctly point out.
Starting to reference count things to keep the whole scheduler etc. alive as
long as the drm_sched_fence lives is not the correct solution.
Multiple solutions have been discussed already, e.g. just wait for the pending
list to be empty, reference count the scheduler for every pending job. Those all
had significant downsides, which I don't see with this proposal.
I'm all for better ideas though -- what do you propose?
I think we need to brainstorm both issues and see if there is a solution
which solves them both, with bonus points for being elegant.
The problems are not related. As mentioned above, once signaled a
drm_sched_fence must not depend on the scheduler any longer.
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 6b72278c4b72..ae3152beca14 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1465,6 +1465,10 @@ void drm_sched_fini(struct drm_gpu_scheduler
*sched)
sched->ready = false;
kfree(sched->sched_rq);
sched->sched_rq = NULL;
+
+ if (!list_empty(&sched->pending_list))
+ dev_err(sched->dev, "%s: Tearing down scheduler
while jobs are pending!\n",
+ __func__);
It isn't fair to add this error since it would out of the blue start firing
for everyone expect nouveau, no? Regardless if there is a leak or not.
I think it is pretty fair to warn when detecting a guaranteed bug, no?
If drm_sched_fini() is call while jobs are still on the pending_list, they won't
ever be freed, because all workqueues are stopped.
Is it a guaranteed bug for drivers are aware of the drm_sched_fini()
limitation and are cleaning up upon themselves?
How could a driver clean up on itself (unless the driver holds its own list of
pending jobs)?
Once a job is in flight (i.e. it's on the pending_list) we must guarantee that
free_job() is called by the scheduler, which it can't do if we call
drm_sched_fini() before the pending_list is empty.
In other words if you apply the series up to here would it trigger for
nouveau?
No, because nouveau does something very stupid, i.e. replicate the pending_list.
Reportedly it triggers for the mock scheduler which also has no
leak.
That sounds impossible. How do you ensure you do *not* leak memory when you tear
down the scheduler while it still has pending jobs? Or in other words, who calls
free_job() if not the scheduler itself?
Also, I asked in my initial reply if we have a list of which of the current
drivers suffer from memory leaks. Is it all or some etc.
Not all, but quite some I think. The last time I looked (which is about a year
ago) amdgpu for instance could leak memory when you unbind the driver while
enough jobs are in flight.