Re: [PATCH] ata: sata_mv: Fix the return value of the probe function

From: Damien Le Moal
Date: Thu Oct 21 2021 - 21:42:02 EST


On 2021/10/21 20:23, Zheyu Ma wrote:
> On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
> <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> On 2021/10/21 17:37, Sergey Shtylyov wrote:
>>> On 21.10.2021 8:57, Zheyu Ma wrote:
>>>
>>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
>>>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
>>>>
>>>> During the process of driver probing, the probe function should return < 0
>>>> for failure, otherwise, the kernel will treat value > 0 as success.
>>>>
>>>> Signed-off-by: Zheyu Ma <zheyuma97@xxxxxxxxx>
>>>> ---
>>>> drivers/ata/sata_mv.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>>>> index 9d86203e1e7a..7461fe078dd1 100644
>>>> --- a/drivers/ata/sata_mv.c
>>>> +++ b/drivers/ata/sata_mv.c
>>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
>>>>
>>>> default:
>>>> dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
>>>> - return 1;
>>>> + return -ENODEV;
>>>
>>> Doesn't -EINVAL fit better here?
>>
>> If the error message is correct and this can only happen if there is a bug
>> somewhere, I do not think the error code really matters much. The dev_err()
>> should probably be changed to dev_alert() or even dev_crit() for this case.
>>
>
> I don't think so, the error code does matter. If mv_chip_id() returns
> 1 which eventually causes the probe function to return 1, then the
> kernel will assume that the driver and the hardware match successfully
> (even if that is not the case), which will cause the following error
> if modprobe is called to remove the driver.
>
> [ 21.944486] general protection fault, probably for non-canonical
> address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 21.945317] KASAN: null-ptr-deref in range
> [0x00000000000000d8-0x00000000000000df]
> [ 21.954442] Call Trace:
> [ 21.954624] ? scsi_remove_host+0x32/0x660
> [ 21.954923] ? lockdep_hardirqs_on+0x7e/0x110
> [ 21.955240] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [ 21.955634] ? mutex_lock_io_nested+0x60/0x60
> [ 21.956027] ? _raw_spin_unlock_irqrestore+0x41/0x60
> [ 21.956395] ? async_synchronize_cookie_domain+0x35f/0x4a0
> [ 21.956802] ? async_synchronize_full_domain+0x20/0x20
> [ 21.957179] ? lock_release+0x63f/0x8f0
> [ 21.957468] mutex_lock_nested+0x1b/0x30
> [ 21.957761] scsi_remove_host+0x32/0x660
> [ 21.958054] ata_host_detach+0x75d/0x830
> [ 21.958349] ata_pci_remove_one+0x3b/0x40
> [ 21.958649] pci_device_remove+0xa9/0x250
> [ 21.958949] ? pci_device_probe+0x7d0/0x7d0
> [ 21.959261] device_release_driver_internal+0x4f7/0x7a0
> [ 21.959647] driver_detach+0x1e8/0x2c0
> [ 21.959929] bus_remove_driver+0x134/0x290
> [ 21.960234] ? sysfs_remove_groups+0x97/0xb0
> [ 21.960552] driver_unregister+0x77/0xa0
> [ 21.960859] pci_unregister_driver+0x2c/0x1c0
> [ 21.961178] cleanup_module+0x15/0x28 [sata_mv]

How do you trigger this ? A bad device tree or something like that ?

>
> This is not the case if the correct error code is returned.
>
>>>
>>> [...]
>>>
>>> MBR, Sergey
>>>
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>
> Regards,
> Zheyu Ma
>


--
Damien Le Moal
Western Digital Research