Re: [PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read

From: Miquel Raynal
Date: Thu May 04 2023 - 08:18:02 EST


Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Thu, 4 May 2023 14:37:45 +0300:

> On 03.05.2023 13:23, Arseniy Krasnov wrote:
> >
> >
> > On 03.05.2023 11:03, Miquel Raynal wrote:
> >> Hi Arseniy,
> >>
> >> avkrasnov@xxxxxxxxxxxxxx wrote on Tue, 2 May 2023 19:13:38 +0300:
> >>
> >>> On 02.05.2023 16:05, Miquel Raynal wrote:
> >>>> Hi Arseniy,
> >>>>
> >>>> avkrasnov@xxxxxxxxxxxxxx wrote on Tue, 2 May 2023 15:24:09 +0300:
> >>>>
> >>>>> On 02.05.2023 15:17, Miquel Raynal wrote:
> >>>>>> Hi Arseniy,
> >>>>>>
> >>>>>> Richard, your input is welcome below :-)
> >>>>>>
> >>>>>>>>>>>>>> I just checked JFFS2 mount/umount again, here is what i see:
> >>>>>>>>>>>>>> 0) First attempt to mount JFFS2.
> >>>>>>>>>>>>>> 1) It writes OOB to page N (i'm using raw write). It is cleanmarker value 0x85 0x19 0x03 0x20. Mount is done.
> >>>>>>>>>>>>>> 2) Umount JFFS2. Done.
> >>>>>>>>>>>>>> 3) Second attempt to mount JFFS2.
> >>>>>>>>>>>>>> 4) It reads OOB from page N (i'm using raw read). Value is 0x85 0x19 0x03 0x20. Done.
> >>>>>>>>>>>>>> 5) It reads page N in ECC mode, and i get:
> >>>>>>>>>>>>>>      jffs2: mtd->read(0x100 bytes from N) returned ECC error
> >>>>>>>>>>>>>> 6) Mount failed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We already had problem which looks like this on another device. Solution was to use OOB area which is
> >>>>>>>>>>>>>> not covered by ECC for JFFS2 cleanmarkers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> ok, so there is not ECC parity bytes and mtd->read() returns ECC error.
> >>>>>>>>>>>> does it have to use raw write/read on step 1) and 4)?
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> If i'm using non raw access to OOB, for example write OOB (user bytes) in ECC mode, then
> >>>>>>>>>>> steps 1) and 4) and 5) passes ok, but write to this page will be impossible (for example JFFS2
> >>>>>>>>>>> writes to such pages later) - we can't update ECC codes properly without erasing whole page.
> >>>>>>>>>>> Write operation will be done without problem, but read will trigger ECC errors due to broken
> >>>>>>>>>>> ECC codes.
> >>>>>>>>>>>
> >>>>>>>>>>> In general problem that we discuss is that in current implementation data and OOB conflicts
> >>>>>>>>>>> with each other by sharing same ECC codes, these ECC codes could be written only once (without
> >>>>>>>>>>> erasing), while data and OOB has different callbacks to access and thus supposed to work
> >>>>>>>>>>> separately.
> >>>>>>>>>>
> >>>>>>>>>> The fact that there might be helpers just for writing OOB areas or just
> >>>>>>>>>> in-band areas are optimizations. NAND pages are meant to be written a
> >>>>>>>>>> single time, no matter what portion you write. In some cases, it is
> >>>>>>>>>> possible to perform subpage writes if the chip supports it. Pages may
> >>>>>>>>>> be split into several areas which cover a partial in-band area *and* a
> >>>>>>>>>> partial OOB area. If you write into the in-band *or* out-of-band areas
> >>>>>>>>>> of a given subpage, you *cannot* write the other part later without
> >>>>>>>>>
> >>>>>>>>> Thanks for details! So in case of JFFS2 it looks like strange, that it tries
> >>>>>>>>> to write page after writing clean markers to it before? In the old vendor's
> >>>>>>>>> driver OOB write callback is suppressed by return 0 always and JFFS2 works
> >>>>>>>>> correctly.
> >>>>>>>>
> >>>>>>>> Can you point the code you're mentioning? (both what JFFS2 which looks
> >>>>>>>> strange to you and the old vendor hack)
> >>>>>>>
> >>>>>>> Here is version of the old vendor's driver:
> >>>>>>>
> >>>>>>> https://github.com/kszaq/linux-amlogic/blob/master_new_amports/drivers/amlogic/nand/nand/aml_nand.c#L3260
> >>>>>>>
> >>>>>>> In my version there is no BUG() there, but it is same driver for the same chip.
> >>>>>>>
> >>>>>>> About JFFS2 - i didn't check its source code, but what I can see using printk(), is that it first
> >>>>>>> tries to write cleanmarker using OOB write callback. Then later it tries to write to this page, so
> >>>>>>> may be it is unexpected behaviour of JFFS2?
> >>>>>>
> >>>>>> TBH I am not knowledgeable about JFFS2, maybe Richard can help here.
> >>>>>>
> >>>>>> Are you sure you flash is recognized by JFFS2 as being a NAND device?
> >>>>>> Did you enable CONFIG_JFFS2_FS_WRITEBUFFER correctly? Because
> >>>>>> cleanmarker seem to be discarded when using a NAND device, and
> >>>>>> recognizing the device as a NAND device requires the above option to be
> >>>>>> set apparently.
> >>>>>
> >>>>> Yes, I have
> >>>>>
> >>>>> CONFIG_JFFS2_FS_WRITEBUFFER=y
> >>>>>
> >>>>> And i see, that jffs2_mark_erased_block() calls jffs2_cleanmarker_oob() which checks that we have MTD_NANDFLASH. This
> >>>>> check is true, so then jffs2_write_nand_cleanmarker() is called and there is OOB write in it. So I see opposite thing:
> >>>>> cleanmarkers are not discarded with NAND device.
> >>>>
> >>>> Excellent. So when cleanmarker_size == 0, it means there is no
> >>>> cleanmarker. But if it is a NAND device, we write the marker anyway.
> >>>>
> >>>> Well I guess it used to work on old controllers using a Hamming ECC
> >>>> engine not protecting any user OOB bytes, so writing the clean markers
> >>>> would simply not lead to ECC bytes being produced/written. Or it might
> >>>> have worked as well on controller drivers not enabling the ECC engine
> >>>> when performing OOB-only writes. It also requires the chip to be old
> >>>> enough to support multiple writes on the same (sub)page as long as the
> >>>> written bits do not overlap?
> >>>
> >>> Yes, with controller which supports such modes there will be no problem here!
> >>> What i see, is that this controller doesn't support multiple writes to the
> >>> same page in ECC mode(e.g. it can't update ECC correctly).
> >>
> >> I don't think this is a controller limitation. The NAND chip cannot
> >> write ECC bytes a first time and then overwrite other ECC bytes, that
> >> cannot work. The fact that we write ECC bytes in the first place is
> >> because the ECC engine covers the free OOB bytes used by JFFS2 to write
> >> its cleanmarkers.
> >>
> >>> So in v2 i've added
> >>> patch which moves OOB out of ECC area, thus JFFS2 driver will work correctly.
> >>
> >> I am sorry but the above sentence is not clear to me. I believe you
> >> meant the free OOB bytes are moved outside of the area protected by the
> >> ECC engine. In this case I guess it should be fine.
> >
> > Exactly, free bytes which are reported by OOB layout callbacks were moved out of
> > ECC area.
> >
> >>
> >>> So for me main question here is:
> >>>
> >>> How JFFS2 should work with controllers where we can't update data and OOB
> >>> independently? Driver of this filesystem knows nothing about this features of
> >>> the controller.
> >>>
> >>> Or JFFS2 works incorrectly in my case when it tries to call write page callback
> >>> after calling write OOB callback (IIUC it is better to ask Richard as You mentioned above).
> >>>
> >>> Or may be it is better to suppress OOB write callback (or set it to NULL) in this
> >>> driver as in vendor's driver?
> >>
> >> I would assume using the unprotected free OOB bytes to store the
> >> cleanmarkers should work. But that's a bit fragile and very filesystem
> >> oriented. I don't like this much. But on the other side JFFS2 is
> >> legacy, you should use UBI (which does not play with OOB areas) :-)
> >
> > Problem here is that we can't use UBI in this case, because it does not support
> > small fs images. So the only way to make JFFS2 work is to move free OOB bytes to
> > non protected area. Otherwise i think we have strange situation that JFFS2 can't
> > work correctly on specific type on NAND controller. We already had same problem
> > on another NAND controller, and solution was to move OOB free bytes no non-protected
> > area:
> >
> > https://lore.kernel.org/all/20230329114240.378722-1-mmkurbanov@xxxxxxxxxxxxxx/
> >
> > Thanks, Arseniy
>
> Upd: may be i can add option for this driver, which makes JFFS2 work correctly on this chip.
> This feature suppresses OOB writes as in old driver. By default it is disabled and OOB is
> ECC protected(current behaviour), if enabled - it prints WARN_ONCE() and always returns 0.
> What do You think?
>
> Or may be add an option, which moves free bytes of OOB to ECC non-protected area and it is disabled
> by default.

I prefer having a single ooblayout where we expose unprotected user OOB
bytes only. As of today, the only upstream user of user OOB bytes is
JFFS2 anyway.

Thanks,
Miquèl