Re: [PATCH 1/3] mtd: sharpsl: add sharpslpart MTD partition parser

From: Boris Brezillon
Date: Fri Jun 09 2017 - 03:17:08 EST


Hi Brian,

On Thu, 8 Jun 2017 19:32:51 -0700
Brian Norris <computersforpeace@xxxxxxxxx> wrote:

> Hi Boris and Andrea,
>
> Sorry I didn't thoroughly read through this earlier discussion before
> reviewing the later versions. I also don't want to rehash old
> disagreements. But I had a few questions.
>
> On Tue, Apr 18, 2017 at 11:35:56AM +0200, Boris Brezillon wrote:
> > On Tue, 18 Apr 2017 10:58:02 +0200
> > Andrea Adami <andrea.adami@xxxxxxxxx> wrote:
> > > On Mon, Apr 17, 2017 at 4:44 PM, Boris Brezillon <
> > > boris.brezillon@xxxxxxxxxxxxxxxxxx> wrote:
> > > > I'll try to document myself on the on-flash format of the FTL and
> > > > partition table before giving a definitive opinion, but I have the
> > > > feeling this ancient FTL is not 100% safe, and by accepting an old
> > > > (maybe unreliable?) FTL we are setting a precedent, and refusing other
> > > > (broken) proprietary/vendor FTLs will be almost impossible after that.
>
> IIUC, drivers/mtd/*ftl*.c shows several FTLs in the kernel... And I'll
> admit I've basically never reviewed them. I don't think they've really
> created much precedent either...except that I'm bringing them up right
> now ;) and possibly that no good alternatives have been developed,
> except for UBI (and IMO, UBI doesn't necessarily apply in all the same
> situations that some "boot partitions" might; you have to bootstrap
> *somewhere*).

Okay, so you're fine merging FTLs as long as the code is pretty and
it's already used on real devices, even if the FTL is badly designed?

>
> ...
>
> > > > On Sun, 16 Apr 2017 18:07:47 +0200
> > > > Marek Vasut <marek.vasut@xxxxxxxxx> wrote:
> > > >
> > > >> On 04/15/2017 10:11 PM, Andrea Adami wrote:
> > > >> > The Sharp SL Series (Zaurus) PXA handhelds have 16/64/128M of NAND
> > > flash
> > > >> > and share the same layout of the first 7M partition, managed by Sharp
> > > FTL.
> > > >> >
> > > >> > The purpose of this self-contained patch is to add a common parser and
> > > >> > remove the hardcoded sizes in the board files (these devices are not
> > > yet
> > > >> > converted to devicetree).
> > > >> > Users will have benefits because the mtdparts= tag will not be
> > > necessary
> > > >> > anymore and they will be free to repartition the little sized flash.
> > > >> >
> > > >> > The obsolete bootloader can not pass the partitioning info to modern
> > > >> > kernels anymore so it has to be read from flash at known logical
> > > addresses.
> > > >> > (see http://www.h5.dion.ne.jp/~rimemoon/zaurus/memo_006.htm )
>
> ...
>
> > > This is done with a special kernel (linux-kexecboot) embedding the minimal
> > > cpio and acting as 2nd stage bootloader.
> > > It passes the mtdparts found in cmdline and does the extra trick of reading
> > > it from mtd1 for zaurus.
> > > You can even customize the cmdline in /boot/boot.cfg and hack the mtdparts
> > > there.
> > >
> > > Neverthless, what you don't seem to understand is that I cannot force
> > > people to use kexecboot or to customize cmdline parts as I like...
> > > I do just build kernels and images for testing...I maintain the OE build
> > > infrastructure, not one distro.
> >
> > Well, you can say "if you want to use a mainline kernel, stop using
> > this sharp FTL+partition-table and start using a 2nd stage
> > bootloader like kexecboot". What do they use right now to boot a new
> > kernel (newer than the 2.4 one)?
>
> IIUC, that doesn't get anybody to "stop using the FTL"; it just gets
> them to stop parsing it in the kernel.

It does prevent new people to use it. At least it cannot be done
easily. Once it's in mainline, anyone will be able to activate this
FTL+parser code and use it on their device.

> They would still be parsing it in
> userspace, just to generate new parameters for the kexec'd kernel? Seems
> like a lot of bloat for zero gain.
>
> ...
>
> > > > Just going through all these details to say that, IMO, we should only
> > > > consider inclusion of this feature if we think it's safe, because I
> > > > think all that is done here can be done from user-space.
>
> Wait, why is "it can be done from user-space" relevant to safety? The
> end solution isn't any better, just because you layer user-space in
> between to do the parsing job. Or am I misunderstanding?

It's definitely not safer to do it in userspace, it's just about not
merging ancient FTLs in mainline without carefully reviewing them first.
I never said I would reject this FTL, I just said I wanted to review
the design and decide after that. But when I said that, all I got in
return was "no matter if it's safe or not, it has to be upstreamed
because it's used on real devices". Well, I disagree with this kind of
statement. Does that mean we should support all vendor FTLs just
because they are shipped on devices? I'm not against FTLs in general,
but shouldn't we decide to merge them only after reviewing their design
and making sure they can be safely used?

>
> > > Read only is safe.
> >
> > This is a lie.
>
> I don't see where you've disproved the claim of safety. The following
> all seems to be a non sequitur.

Just because it's only read-only from the kernel point of view, not in
general.

>
> But my understanding (about the "safety" question) is that this whole
> FTL construct is:
> (a) required by the bootloader and
> (b) not touched outside the bootloader, except (mostly, barring the
> advanced tooling that people use infrequently, for installation?) read
> only in a parser like this proposed one

Andrea, maybe I'm wrong, but I had the impression you had tools to
update partitions embedded in the FTL (I'm not talking about partitions
defined in the partition table, but the FTL embeds several sections,
including one which is storing a kernel image).

>
> (Please correct me if I'm wrong.)
>
> I don't think we can reasonably disallow (a); it's often difficult or
> impossible to replace bootloaders.

I agree.

>
> If (b) is true, then I don't see why this is a *huge* issue. Yes,
> read-only NAND is technically not immune to unreliability issues like
> read disturb, but judging by the lifetime of these products (they're
> still being used, with out-of-tree parsing, no?) it can't have been too
> bad.

No, I was really taking about write accesses that can happen on top of
the FTL. If there's a tool to update the kernel partition embedded in
the FTL, and the FTL design has some flaws, then you might corrupt the
partition table even though you were not directly manipulating it.

>
> And about technical details: based on my review of what we're parsing
> here, the most worrisome part is that the logical mapping of the FTL is
> stored in OOB. But this all is of the same (or older?) era as JFFS2, so
> that's also not highly unusual. And it does at least have several
> (non-ECC) means of error detection (storing redundant versions of the
> FTL mapping within the same page; a parity check on the mapping offsets;
> and multiple copies of the partition table), though I'm not sure how
> well it will hold up for error *correction*.

Okay.

>
> So, I'm not super happy with the format, and I wouldn't suggest it on
> any new products (esp. considering that NAND has only gotten more
> unreliable in the meantime), but I don't think it's as bad as you seem
> to think.

Really, if you look back at my initial reply, you'll see that I was
willing to review the FTL design before giving a definitive opinion.
But being told that, no matter the conclusion of this review, the FTL
had to be supported because 'it's used on real devices' did not
encourage me to finish my review.

>
> > AFAIU, you have all the necessary tools to update the
> > partition table from user-space, so even if you only have read-only
> > support in the kernel, one can corrupt it from userspace, and the
> > kernel may not be able to recover from this corruption.
>
> How is this any different from, e.g., GPT on block devices? Just because
> user space can clobber the partition table doesn't mean we don't allow
> GPT.

If you can update other partitions embedded in the FTL (like the kernel
partition) independently, and those updates can corrupt your partition
table, then it's a bit different from the example you're giving, because
the user did not asked for a partition table update and may end up with
a device that do not boot.

>
> Or are you implying issues with read disturb and the like?

Nope, I was really talking about simple read/write consideration,
update atomicity, ... The sort of things taken care of by something
like UBI.

>
> > Honestly, if we want to support this FTL+partition-table-format in the
> > kernel, I'd recommend that we add RW support, otherwise you'll keep
> > having those external tools.
>
> I don't understand that point, and I don't think Andrea did either.

My point is that, if we decide to support this FTL in the kernel, why
keeping those user-space tools to update the partition table (and other
parts available in this FTL). Having all the logic maintained in the
same place makes code maintenance easier (when a bug is detected, you
don't have to update 2 different code base for example).

Anyway, I'm glad someone else finally had a look at this code, and as I
said earlier, I won't block it, so we're all good.

Thanks,

Boris