Re: [REGRESSION?] scsi: sas: wildcard user scan may iterate over huge max_id

From: James Bottomley

Date: Mon Mar 30 2026 - 08:24:17 EST


On Sat, 2026-03-28 at 10:28 +0800, Li Lingfeng wrote:
> Hi,
>
> I think commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle
> wildcard and multi-channel scans") may introduce a regression for
> wildcard scans on some SAS hosts.
>
> Userspace trigger:
>
>    echo "- - -" > /sys/class/scsi_host/host0/scan
>
> results in:
>
>    channel = SCAN_WILD_CARD
>    id      = SCAN_WILD_CARD
>    lun     = SCAN_WILD_CARD
>
> Before this commit, sas_user_scan() iterated sas_host->rphy_list and
> called scsi_scan_target() for matching rphys. In effect, scanning was
> limited to channel 0 and to target ids present in sas_host-
> >rphy_list.
>
> After this commit, sas_user_scan() does:
>
>    - scan channel 0 via scan_channel_zero()
>    - scan channels 1..shost->max_channel via
> scsi_scan_host_selected()
>
> When id == SCAN_WILD_CARD, the latter path goes through
> scsi_scan_channel(), which iterates ids from 0 to shost->max_id.
>
> This looks problematic for drivers that use a very large max_id. For
> example, smartpqi sets:
>
>    shost->max_id = ~0;
>
> In that case, a wildcard scan may end up iterating from id 0 to ~0 in
> scsi_scan_channel(). In my testing/analysis, this makes the scan take
> a very long time, and the id-space walk itself does not seem
> meaningful for this SAS transport scan path.
>
> So while the commit fixes incomplete wildcard channel handling, it
> also appears to expand the id scan range from:
>
>    sas_host->rphy_list target ids
>
> to:
>
>    0..shost->max_id
>
> for the additional channels.
>
> It seems to me that wildcard SAS scans should probably remain bounded
> by transport-discovered SAS targets, instead of falling back to a
> host-wide id enumeration for the extra channels. One possible
> direction may be to avoid calling scsi_scan_host_selected() with id
> == SCAN_WILD_CARD from sas_user_scan(), or otherwise constrain the id
> range in a transport-aware way.
>
> Am I understanding this correctly? If so, what would be the preferred
> way to address this? I would appreciate feedback on whether this is
> considered a real regression, and on the best fix direction.

In the case of smartpqi, it isn't designed to be user scanned, I think.
So, as you say, it would take a long time to scan one channel. Since
it sets max_channels to 3, it would only take 4 times longer which
hardly constitutes a regression.

Doing serial scans is very scsi-2 so most discoverable device fabrics
don't bother and get the default settings for the scan max_channels
(which is zero). The only devices that seem to care about this at all
are fat firmware devices that bundle RAID or other capabilities by re-
purposing channels and they seem to be the ones that want this
behaviour:

https://lore.kernel.org/linux-scsi/CAFdVvOwjy+2ORJ6uJkspiLTPF05481U7gcS4QohFOFGPqAs8ig@xxxxxxxxxxxxxx/

Regards,

James