Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices

From: Sakari Ailus

Date: Mon Mar 16 2026 - 12:52:01 EST


Hi Luo,

On Fri, Mar 13, 2026 at 09:50:56PM +0800, luo.liu.linux wrote:
>
> Hi Sakari,
>
> Apologies if my previous explanation wasn't clear enough.
>
> To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
> but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
> specifically ensuring that resource allocation and reclamation are handled properly.
>
> I would like to share a real-world scenario that motivated this patch:
>
> We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
> subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
> and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
> stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
>
> During the debugging process, I inspected the internal global lists:
>
> static LIST_HEAD(subdev_list);
> static LIST_HEAD(notifier_list);
>
> By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
> after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
> the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
>
> This was the critical clue that led me to the root cause:
>
> The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
> unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
>
> Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
> The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
> no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
>
>
> Given this experience, I believe this interface provides a vital visibility point for engineers to:
>
> 1,Verify that subdevices are correctly removed from the global list upon driver unload.
> 2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.

I guess you'd have found this with either KASAN or linked list debugging?

--
Sakari Ailus