Re: [PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read

From: Miquel Raynal
Date: Wed May 03 2023 - 04:05:17 EST


Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Tue, 2 May 2023 19:13:38 +0300:

> On 02.05.2023 16:05, Miquel Raynal wrote:
> > Hi Arseniy,
> >
> > avkrasnov@xxxxxxxxxxxxxx wrote on Tue, 2 May 2023 15:24:09 +0300:
> >
> >> On 02.05.2023 15:17, Miquel Raynal wrote:
> >>> Hi Arseniy,
> >>>
> >>> Richard, your input is welcome below :-)
> >>>
> >>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see:
> >>>>>>>>>>> 0) First attempt to mount JFFS2.
> >>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done.
> >>>>>>>>>>> 2) Umount JFFS2. Done.
> >>>>>>>>>>> 3) Second attempt to mount JFFS2.
> >>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done.
> >>>>>>>>>>> 5) It reads page N in ECC mode, and i get:
> >>>>>>>>>>>      jffs2: mtd->read(0x100 bytes from N) returned ECC error
> >>>>>>>>>>> 6) Mount failed.
> >>>>>>>>>>>
> >>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is
> >>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers.
> >>>>>>>>>
> >>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error.
> >>>>>>>>> does it have to use raw write/read on step 1) and 4)?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then
> >>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2
> >>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page.
> >>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken
> >>>>>>>> ECC codes.
> >>>>>>>>
> >>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts
> >>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without
> >>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work
> >>>>>>>> separately.
> >>>>>>>
> >>>>>>> The fact that there might be helpers just for writing OOB areas or just
> >>>>>>> in-band areas are optimizations. NAND pages are meant to be written a
> >>>>>>> single time, no matter what portion you write. In some cases, it is
> >>>>>>> possible to perform subpage writes if the chip supports it. Pages may
> >>>>>>> be split into several areas which cover a partial in-band area *and* a
> >>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas
> >>>>>>> of a given subpage, you *cannot* write the other part later without
> >>>>>>
> >>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries
> >>>>>> to write page after writing clean markers to it before? In the old vendor's
> >>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works
> >>>>>> correctly.
> >>>>>
> >>>>> Can you point the code you're mentioning? (both what JFFS2 which looks
> >>>>> strange to you and the old vendor hack)
> >>>>
> >>>> Here is version of the old vendor's driver:
> >>>>
> >>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260
> >>>>
> >>>> In my version there is no BUG() there, but it is same driver for the same chip.
> >>>>
> >>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first
> >>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so
> >>>> may be it is unexpected behaviour of JFFS2?
> >>>
> >>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here.
> >>>
> >>> Are you sure you flash is recognized by JFFS2 as being a NAND device?
> >>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because
> >>> cleanmarker seem to be discarded when using a NAND device, and
> >>> recognizing the device as a NAND device requires the above option to be
> >>> set apparently.
> >>
> >> Yes, I have
> >>
> >> CONFIG_JFFS2_FS_WRITEBUFFER=y
> >>
> >> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This
> >> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing:
> >> cleanmarkers are not discarded with NAND device.
> >
> > Excellent. So when cleanmarker_size == 0, it means there is no
> > cleanmarker. But if it is a NAND device, we write the marker anyway.
> >
> > Well I guess it used to work on old controllers using a Hamming ECC
> > engine not protecting any user OOB bytes, so writing the clean markers
> > would simply not lead to ECC bytes being produced/written. Or it might
> > have worked as well on controller drivers not enabling the ECC engine
> > when performing OOB-only writes. It also requires the chip to be old
> > enough to support multiple writes on the same (sub)page as long as the
> > written bits do not overlap?
>
> Yes, with controller which supports such modes there will be no problem here!
> What i see, is that this controller doesn't support multiple writes to the
> same page in ECC mode(e.g. it can't update ECC correctly).

I don't think this is a controller limitation. The NAND chip cannot
write ECC bytes a first time and then overwrite other ECC bytes, that
cannot work. The fact that we write ECC bytes in the first place is
because the ECC engine covers the free OOB bytes used by JFFS2 to write
its cleanmarkers.

> So in v2 i've added
> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly.

I am sorry but the above sentence is not clear to me. I believe you
meant the free OOB bytes are moved outside of the area protected by the
ECC engine. In this case I guess it should be fine.

> So for me main question here is:
>
> How JFFS2 should work with controllers where we can't update data and OOB
> independently? Driver of this filesystem knows nothing about this features of
> the controller.
>
> Or JFFS2 works incorrectly in my case when it tries to call write page callback
> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above).
>
> Or may be it is better to suppress OOB write callback (or set it to NULL) in this
> driver as in vendor's driver?

I would assume using the unprotected free OOB bytes to store the
cleanmarkers should work. But that's a bit fragile and very filesystem
oriented. I don't like this much. But on the other side JFFS2 is
legacy, you should use UBI (which does not play with OOB areas) :-)

Thanks,
Miquèl

>
> Thanks, Arseniy
>
> >
> > Perhaps that's what the hack in the old driver is for. But that's
> > IMHO broken in case of unexpected reboot :-)
> >
> > Miquèl