Re: [PATCH] mtd: partitions: Handle add_mtd_device() failures gracefully

From: Steve deRosier
Date: Tue Apr 10 2018 - 12:34:43 EST


On Tue, Apr 10, 2018 at 7:47 AM, Geert Uytterhoeven
<geert@xxxxxxxxxxxxxx> wrote:
> Hi Marek,
>
> On Tue, Apr 10, 2018 at 4:37 PM, Marek Vasut <marek.vasut@xxxxxxxxx> wrote:
>> On 04/10/2018 03:26 PM, Geert Uytterhoeven wrote:
>>> On Mon, Apr 9, 2018 at 11:59 PM, Marek Vasut <marek.vasut@xxxxxxxxx> wrote:
>>>> On 04/09/2018 02:25 PM, Geert Uytterhoeven wrote:
>>>>> Currently add_mtd_device() failures are plainly ignored, which may lead
>>>>> to kernel crashes later.
>>>
>>>>> Fix this by ignoring and freeing partitions that failed to add in
>>>>> add_mtd_partitions(). The same issue is present in mtd_add_partition(),
>>>>> so fix that as well.
>>>>>
>>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
>>>>> ---
>>>>> I don't know if it is worthwhile factoring out the common handling.
>>>>>
>>>>> Should allocate_partition() fail instead? There's a comment saying
>>>>> "let's register it anyway to preserve ordering".
>>>
>>>>> --- a/drivers/mtd/mtdpart.c
>>>>> +++ b/drivers/mtd/mtdpart.c
>>>
>>>>> @@ -746,7 +753,15 @@ int add_mtd_partitions(struct mtd_info *master,
>>>>> list_add(&slave->list, &mtd_partitions);
>>>>> mutex_unlock(&mtd_partitions_mutex);
>>>>>
>>>>> - add_mtd_device(&slave->mtd);
>>>>> + ret = add_mtd_device(&slave->mtd);
>>>>> + if (ret) {
>>>>> + mutex_lock(&mtd_partitions_mutex);
>>>>> + list_del(&slave->list);
>>>>> + mutex_unlock(&mtd_partitions_mutex);
>>>>> + free_partition(slave);
>>>>> + continue;
>>>>> + }
>>>>
>>>> Why is the partition even in the list in the first place ? Can we avoid
>>>> adding it rather than adding and removing it ?
>>>
>>> Hence my question "Should allocate_partition() fail instead?".
>>> Note that if we go that route, it should be a "soft" failure, as we
>>> probably don't
>>> want to drop all other partitions on the device.
>> Is the number of partitions ie. in /proc/mtdparts an ABI ?
>
> I don't know.
>

I don't know if it's an ABI, but having consistent /dev/mtdX numbering
is important, even in the case of a failed partition. Many scripts on
embedded systems are hard-coded to /dev/mtdX identifies with the
expectation that they can access a particular address region of flash.
I'm sure that's what the "let's register it anyway to preserve
ordering" comment was trying to get across. I've even seen weird
things in dts files where later entries specify earlier addresses in
order to leave the old /dev/mtdX numbering alone.

Obviously, a better user solution is to construct the mtdX number from
/proc/mtd based on filtering for the name field, but not everyone
does.

I'd be wary about doing any fix that disturbs the numbering as you'll
be disturbing users. At a minum, a loud warning in the log.

That said - obviously fixing the kernel crash must happen.

- Steve