Re: [PATCH v1 4/5] mtd: rawnand: meson: clear OOB buffer before read

From: Liang Yang
Date: Tue Apr 18 2023 - 08:23:48 EST


Hi Arseniy and Miquel,

On 2023/4/18 13:12, Arseniy Krasnov wrote:
[ EXTERNAL EMAIL ]



On 13.04.2023 13:35, Arseniy Krasnov wrote:


On 13.04.2023 13:22, Miquel Raynal wrote:
Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Thu, 13 Apr 2023 12:36:24 +0300:

On 13.04.2023 11:22, Miquel Raynal wrote:
Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Thu, 13 Apr 2023 10:00:24 +0300:
On 13.04.2023 09:11, Liang Yang wrote:

On 2023/4/13 13:32, Liang Yang wrote:
Hi Miquel,

On 2023/4/12 22:32, Miquel Raynal wrote:
[ EXTERNAL EMAIL ]

Hello,

liang.yang@xxxxxxxxxxx wrote on Wed, 12 Apr 2023 22:04:28 +0800:
Hi Miquel and Arseniy,

On 2023/4/12 20:57, Miquel Raynal wrote:
[ EXTERNAL EMAIL ]

Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Wed, 12 Apr 2023 15:22:26 +0300:
On 12.04.2023 15:18, Miquel Raynal wrote:
Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Wed, 12 Apr 2023 13:14:52 +0300:
    >>>> On 12.04.2023 12:36, Miquel Raynal wrote:
Hi Arseniy,

avkrasnov@xxxxxxxxxxxxxx wrote on Wed, 12 Apr 2023 12:20:55 +0300:
      >>>>>> On 12.04.2023 10:44, Miquel Raynal wrote:
Hi Arseniy,

AVKrasnov@xxxxxxxxxxxxxx wrote on Wed, 12 Apr 2023 09:16:58 +0300:
        >>>>>>>> This NAND reads only few user's bytes in ECC mode (not full OOB), so

"This NAND reads" does not look right, do you mean "Subpage reads do
not retrieve all the OOB bytes,"?
        >>>>>>>> fill OOB buffer with zeroes to not return garbage from previous reads
to user.
Otherwise 'nanddump' utility prints something like this for just erased
page:

...
0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    OOB Data: ff ff ff ff 00 00 ff ff 80 cf 22 99 cb ad d3 be
    OOB Data: 63 27 ae 06 16 0a 2f eb bb dd 46 74 41 8e 88 6e
    OOB Data: 38 a1 2d e6 77 d4 05 06 f2 a5 7e 25 eb 34 7c ff
    OOB Data: 38 ea de 14 10 de 9b 40 33 16 6a cc 9d aa 2f 5e

Signed-off-by: Arseniy Krasnov <AVKrasnov@xxxxxxxxxxxxxx>
---
   drivers/mtd/nand/raw/meson_nand.c | 5 +++++
   1 file changed, 5 insertions(+)

diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c
index f84a10238e4d..f2f2472cb511 100644
--- a/drivers/mtd/nand/raw/meson_nand.c
+++ b/drivers/mtd/nand/raw/meson_nand.c
@@ -858,9 +858,12 @@ static int meson_nfc_read_page_sub(struct nand_chip *nand,
   static int meson_nfc_read_page_raw(struct nand_chip *nand, u8 *buf,
                      int oob_required, int page)
   {
+    struct mtd_info *mtd = nand_to_mtd(nand);
       u8 *oob_buf = nand->oob_poi;
       int ret;
   >>>>>>>> +    memset(oob_buf, 0, mtd->oobsize);

I'm surprised raw reads do not read the entire OOB?

Yes! Seems in case of raw access (what i see in this driver) number of OOB bytes read
still depends on ECC parameters: for each portion of data covered with ECC code we can
read it's ECC code and "user bytes" from OOB - it is what i see by dumping DMA buffer by
printk(). For example I'm working with 2K NAND pages, each page has 2 x 1K ECC blocks.
For each ECC block I have 16 OOB bytes which I can access by read/write. Each 16 bytes
contains 2 bytes of user's data and 14 bytes ECC codes. So when I read page in raw mode
controller returns 32 bytes (2 x (2 + 14)) of OOB. While OOB is reported as 64 bytes.

In all modes, when you read OOB, you should get the full OOB. The fact
that ECC correction is enabled or disabled does not matter. If the NAND
features OOB sections of 64 bytes, you should get the 64 bytes.

What happens sometimes, is that some of the bytes are not protected
against bitflips, but the policy is to return the full buffer.

Ok, so to clarify case for this NAND controller:
1) In both ECC and raw modes i need to return the same raw OOB data (e.g. user bytes
     + ECC codes)?

Well, you need to cover the same amount of data, yes. But in the ECC
case the data won't be raw (at least not all of it).

So "same amount of data", in ECC mode current implementation returns only user OOB bytes (e.g.
OOB data excluding ECC codes), in raw it returns user bytes + ECC codes. IIUC correct
behaviour is to always return user bytes + ECC codes as OOB data even in ECC mode ?

If the page are 2k+64B you should read 2k+64B when OOB are requested.

If the controller only returns 2k+32B, then perform a random read to
just move the read pointer to mtd->size + mtd->oobsize - 32 and
retrieve the missing 32 bytes?

1) raw read can read out the whole page data 2k+64B, decided by the len in the controller raw read command:
    cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir);
after that, the missing oob bytes(not used) can be copied from meson_chip->data_buf. so the implementation of meson_nfc_read_page_raw() is like this if need.
    {
        ......
        meson_nfc_read_page_sub(nand, page, 1);
        meson_nfc_get_data_oob(nand, buf, oob_buf);
        oob_len = (nand->ecc.bytes + 2) * nand->ecc.steps;
        memcpy(oob_buf + oob_len, meson_chip->data_buf + oob_len, mtd->oobsize - oob_len);

    }
2) In ECC mode, the controller can't bring back the missing OOB bytes. it can read out the user bytes and ecc bytes per meson_ooblayout_ops define.

And then (if oob_required) you can bring the missing bytes with
something along:
nand_change_read_column_op(chip, mtd->writesize + oob_len,
               oob_buf + oob_len,
               mtd->oobsize - oob_len,
               false);
Should not be a huge performance hit.

After finishing ECC mode reading, the column address internal in NAND device should be the right pos; it doesn't need to change the column again. so adding controller raw read for the missing bytes after ECC reading may works.
use raw read for the missing bytes, but they are not protected by host ECC. to the NAND type of storage, is it ok or missing bytes better to be filled with 0xff?

IIUC Miquèl's reply, valid behaviour is to return full OOB data in both modes. So in:
ECC mode it is user bytes(corrected by ECC, read from info buffer) + ECC + missing bytes. ECC and missing bytes read in RAW mode.

I believe the ECC bytes you'll get will be corrected.
You can check this by using nandflipbits in mtd-utils.

Sorry, didn't get it, i'm new in NAND area. Bytes of ECC codes are available only in raw mode (at least in this NAND
driver) also as missing bytes of OOB.

Gasp. Yeah that's a controller limitation, okay.

So IIUC ECC codes are metadata to correct data bytes, and thus
couldn't be corrected.

We consider them metadata, but they are fully part of the ECC scheme
and thus their correction is part of the process, bitflips in the ECC
bytes will count as data bitflips actually.

I talked a bit about ECC engines at a previous conference if it can
help:
https://elinux.org/ELC_Europe_2020_Presentations
'Understand ECC Support for NAND Flash Devices in Linux'
And also wrote a blog post with a chapter about ECC engines:
https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/


Thanks for this!

Thanks, Arseniy

Hello again @Liang @Miquel!

One more question about OOB access, as I can see current driver uses the following
callbacks:

nand->ecc.write_oob_raw = nand_write_oob_std;
nand->ecc.write_oob = nand_write_oob_std;


Function 'nand_write_oob_std()' writes data to the end of the page. But as I
can see by dumping 'data_buf' during read, physical layout of each page is the
following (1KB ECC):

0x000: [ 1 KB of data ]
0x400: [ 2B user data] [ 14B ECC code]
0x410: [ 1 KB of data ] (A)
0x810: [ 2B user data] [ 14B ECC code]
0x820: [ 32B unused ]



So, after 'nand_write_oob_std()' (let data be sequence from [0x0 ... 0x3f]),
page will look like this:

0x000: [ 0xFF ]
0x400: [ ........ ]
0x7f0: [ 0xFF ]
0x800: [ 00 ....................... ]
0x830: [ ........................ 3f ]

Here we have two problems:
1) Attempt to display raw data by 'nanddump' utility produces a little bit
invalid output, as driver relies on layout (A) from above. E.g. OOB data
is at 0x400 and 0x810. Here is an example (attempt to write 0x11 0x22 0x33 0x44):

0x000007f0: 11 22 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |."..............|
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
OOB Data: 33 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |3D..............|
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Hi Arseniy,

I realized the write_oob_raw() and write_oob() are wrong in meson_nand.c. I suggest both of them should be reworked and follow the format of meson nand controller. i.e. firstly format the data in Layout (A) and then write. reading is firstly reading the data of layout (A) and then compost the layout (B).



2) Attempt to read data in ECC mode will fail, because IIUC page is in dirty
state (I mean was written at least once) and NAND controller tries to use
ECC codes at 0x400 and 0x810, which are obviously broken in this case. Thus

As i said above, write_oob_raw() and write_oob() should be reworked.
i don't know what do you mean page was written at least once. anyway the page should be written once, even just write_oob_raw().

we have strange situation: OOB seems written without any errors, but we can't
read this page. First idea is to write OOB data to 0x400 and 0x810 in raw mode,
but this does not work - if there is some data, NAND controller will try to
use ECC engine to check these user bytes on next attempt to read this page. But
as these 4 bytes were written in raw mode, ECC codes are missed.

We suggest the following thing: use only area at 0x820 for OOB (see A) - it is not covered
by ECC engine, so write to this zone won't conflict with ECC in future. In this case
we change 'meson_ooblayout_free()' function which instead of describing 2 user bytes
for each ECC block, will return 16 tail bytes for each ECC block.

What to You think?

the key point is that the data 0x820-0x840 is not protected by Host ECC.so i don't think we have to change it.

Thanks,
Liang




Thanks,
Miquèl