mtd: nand: raw: Possible bug in nand_onfi_detect()?

From: Alexander Dahl
Date: Wed Mar 06 2024 - 09:45:37 EST


Hello everyone,

I think I found a bug in nand_onfi_detect() which was introduced with
commit c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page
read to constraint controllers") back in 2020.

Background on how I found this: I'm currently struggling getting raw
nand flash access to fly with an at91 sam9x60 SoC and a S34ML02G1
Spansion SLC raw NAND flash on a custom board. The setup is
comparable to the sam9x60 curiosity board and can be reproduced with
that one.

NAND flash on sam9x60 curiosity board works fine with what is in
mainline Linux kernel. However after removing the line 'rb-gpios =
<&pioD 5 GPIO_ACTIVE_HIGH>;' from at91-sam9x60_curiosity.dts all data
read from the flash appears to be zeros only. (I did not add that
line to the dts of my custom board first, this is how I stumbled over
this.)

I have no explanation for that behaviour, it should work without R/B#
by reading the status register, maybe we investigate that
in depth later. However those all zeros data reads happens when
reading the ONFI param page as well es data read from OOB/spare area
later and I bet it's the same with usual data.

This read error reveals a bug in nand_onfi_detect(). After setting
up some things there's this for loop:

for (i = 0; i < ONFI_PARAM_PAGES; i++) {

For i = 0 nand_read_param_page_op() is called and in my case all zeros
are returned and thus the CRC calculated does not match the all zeros
CRC read. So the usual break on successful reading the first page is
skipped and for reading the second page nand_change_read_column_op()
is called. I think that one always fails on this line:

if (offset_in_page + len > mtd->writesize + mtd->oobsize) {

Those variables contain the following values:

offset_in_page: 256
len: 256
mtd->writesize: 0
mtd->oobsize: 0

The condition is true and nand_change_read_column_op() returns with
-EINVAL, because mtd->writesize and mtd->oobsize are not set yet in
that code path. Those are probably initialized later, maybe with
parameters read from that ONFI param page?

Returning with error from nand_change_read_column_op() leads to
jumping out of nand_onfi_detect() early, and no ONFI param page is
evaluated at all, although the second or third page could be intact.

I guess this would also fail with any other reason for not matching
CRCs in the first page, but I have not faulty NAND flash chip to
confirm that.

Greets
Alex