Re: [PATCH v3 3/3] mtd: rawnand: rockchip-nand-controller: add skipbbt option

From: Miquel Raynal
Date: Wed Jul 12 2023 - 09:58:00 EST


Hi Johan,

Richard, a question for you below :-)

jbx6244@xxxxxxxxx wrote on Fri, 7 Jul 2023 17:27:56 +0200:

> Hi Miquel,
>
> Some comments/questions below,

There are a lot, I've selected the ones which appear the most relevant
to me right now.

> On 7/4/23 17:13, Miquel Raynal wrote:
> > Hi Johan,
> >
> > jbx6244@xxxxxxxxx wrote on Thu, 15 Jun 2023 19:34:26 +0200:
> >
> >> On Rockchip SoCs the first boot stages are written on NAND
>
> >> with help of manufacturer software that uses a different format
> >> then the MTD framework. Skip the automatic BBT scan with the
>
> Not only about the boot blocks.
> The rest of the Nand is formatted as well by closed source.
> It formats the location at the new BBT pages as well.
>
> >> NAND_SKIP_BBTSCAN option so that the original content is unchanged
> >> during the driver probe.
> >> The NAND_NO_BBM_QUIRK option allows us to erase bad blocks with
> >> the nand_erase_nand() function and the flash_erase command.
> >> With these options the user has the "freedom of choice" by neutral
> >> access mode to read and write in whatever format is needed.
> >
>
> > Can the boot_rom_mode thing help?
>
> This boot_rom_mode thing is for switching ECC mode in boot block pages and is not related to BBT pages.
>
> >
> > For writing this part you can disable the bad block protection using
> > debugfs and then write an externally processed file in raw mode I guess.
>
> The use of debugfs only might make sense when the driver can pass the probe function without errors.
> It does however not save guard existing data when it creates and writes to a new BBT page location.
> When the data at the new BBT pages previously is written with a different ECC or data/OOB format the hardware driver probe fails.
>
> Your suggestion does not work for the Rockchip case.
> Please advise what to do with this patch.
>
> ===
>
> [ 2148.026403] calling rk_nfc_driver_init+0x0/0x1000 [rockchip_nand_controller] @ 2062
> [ 2148.039156] rockchip-nfc 10500000.nand-controller: timing 11e1
> [ 2148.056891] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xde
> [ 2148.067700] nand: Hynix H27UCG8T2ATR-BC 64G 3.3V 8-bit
> [ 2148.076717] nand: 8192 MiB, MLC, erase size: 2048 KiB, page size: 8192, OOB size: 640
> [ 2148.089022] rockchip-nfc 10500000.nand-controller: hynix_nand_init
> [ 2148.099317] rockchip-nfc 10500000.nand-controller: h27ucg8t2atrbc_choose_interface_config
> [ 2148.111779] rockchip-nfc 10500000.nand-controller: timing 11e1
> [ 2148.129836] rockchip-nfc 10500000.nand-controller: rd hw page: 1048575
> [ 2148.149364] rockchip-nfc 10500000.nand-controller: read page: fffff ecc error!
> [ 2148.160742] rockchip-nfc 10500000.nand-controller: rd hw page: 1048319
> [ 2148.180164] rockchip-nfc 10500000.nand-controller: read page: ffeff ecc error!
> [ 2148.191413] rockchip-nfc 10500000.nand-controller: rd hw page: 1048063
>
> [..]
>
> Check for existing BBT pages.
>
> [ 2148.339836] rockchip-nfc 10500000.nand-controller: read page: ffdff ecc error!
> [ 2148.350864] rockchip-nfc 10500000.nand-controller: rd hw page: 1047807
> [ 2148.369835] rockchip-nfc 10500000.nand-controller: read page: ffcff ecc error!
> [ 2148.380597] Bad block table not found for chip 0
> [ 2148.388479] Scanning device for bad blocks
> [ 2148.395591] rockchip-nfc 10500000.nand-controller: rd hw page: 255
> [ 2148.414087] rockchip-nfc 10500000.nand-controller: read page: max_bitflips: 0
>
> [..]
>
> Check all pages to create a new BBT.
>
> [ 2258.693279] rockchip-nfc 10500000.nand-controller: rd hw page: 1030143
> [ 2258.710970] rockchip-nfc 10500000.nand-controller: read page: max_bitflips: 0
> [ 2258.720804] rockchip-nfc 10500000.nand-controller: rd hw page: 1030399
> [ 2258.738394] rockchip-nfc 10500000.nand-controller: read page: fb8ff ecc error!
> [ 2258.748229] Bad eraseblock 4024 at 0x0001f7000000
>
> [..]
>
> All BBT pages are marked bad, so it thinks there's no space left
>
> [ 2261.403708] rockchip-nfc 10500000.nand-controller: rd hw page: 1048575
> [ 2261.422750] rockchip-nfc 10500000.nand-controller: read page: fffff ecc error!
> [ 2261.433634] Bad eraseblock 4095 at 0x0001ffe00000
>
> [ 2261.441723] No space left to write bad block table
>
> This is not done.
> A Nand disk should not be altered if it's formatted in a different format unless an explicit command is given.
>
> [ 2261.449860] nand_bbt: error while writing bad block table -28
> [ 2261.467156] rockchip-nfc 10500000.nand-controller: failed to init NAND chips
> [ 2261.477917] rockchip-nfc: probe of 10500000.nand-controller failed with error -28
> [ 2261.497011] probe of 10500000.nand-controller returned 28 after 113456649 usecs
> [ 2261.508273] initcall rk_nfc_driver_init+0x0/0x1000 [rockchip_nand_controller] returned 0 after 113468040 usecs
>
> Probe failed, but there nothing wrong with hardware.
> MTDx partitions are not created.
> User space commands that rely strings like "/dev/mtd0" don't work.
>
> ===
>
> Application
> Kernel
> Drivers
> Hardware
>
> Why do we fail a hardware driver probe when the level above doesn't deal with all real world situations.
> Shouldn't that be more separated in MTD levels.
>
> ===
>
> >
> > For the boot I think I proposed a DT property. I don't remember how far
> > the discussion went.
>
> Is there a web link of that discussion?

https://lore.kernel.org/linux-mtd/20230609104426.3901df54@xps-13/

Maybe the term "DT property" did not appear but that's what I had in
mind :-) I don't know if it must be a chip or a partition property.

Richard, here is where I would like your feedback, I am kind of opposed
to a "skip_bbt" module parameter. It's not a strong opinion, if you
think it's fine I'm open.

I would rather prefer a DT property which says "do not use the
standard pattern".

Johan, how are bad blocks supposed to be tracked if the bad block
tables are written in a way which does not allow Linux to read it?

> How do I link 'compatible' to '/dev/mtd0' for example.
> In a '/proc/mtd' entry?
> See /drivers/mtd/spi-nor/debugfs.c
>
>
> In order to write boot blocks in a pattern it needs to know the chip ID as used in nand_flash_ids[].
> How can we export this ID to userspace?

There was an attempt a few days back, it's not upstream, I can't find
it back ATM. I'll send it once I find it back.

> Also in a '/proc/mtd' entry?
>
> ===
>
> partitions {
> compatible = "fixed-partitions";
> #address-cells = <2>;
> #size-cells = <2>;
>
> partition@0 {
> reg = <0x0 0x0 0x0 0x02000000>;
> compatible = "rockchip,boot-blk";
> label = "boot-blk";
> };
>
> partition@2000000 {
> reg = <0x0 0x02000000 0x1 0xfa000000>;
> label = "rootfs";
> };
>
> partition@1fc000000 {
> reg = <0x1 0xfc000000 0x0 0x04000000>;
> compatible = "mtd,bbt"
> label = "bbt";

This does not work, it would take me hours to explain why, it's all
about:
- stability
- how mtd has been implemented

> };
> };
>
> How many erase blocks is MTD using for BBT pages? 4 or 8 or ?

As many blocks are needed to cover the size of the device, times 2.

> BBT pages are not clearly defined in the DT.

No, because this is software, not hardware. MTD is a Linux thing, which
gives you access to an mtd device through a number of operations. It
handles bad blocks.

> They are somehow marked bad in the overlapping partition.

Not sure what "overlapping partition" means.

> How are suppose to erase BBT pages in a automated way?
> Is there a need for a new BBT compatible?

No, Linux has run with this standard pattern for 20 years, downstream
kernels sometimes make silly design decisions, we do not want to
support them.

> What is your idea about compatibles suggestion:
> rockchip,boot-blk
> mtd,bbt
>
> ===
>
> Johan
>
> >
> >>
> >> Signed-off-by: Johan Jonker <jbx6244@xxxxxxxxx>
> >> ---
> >>
> >> Changes V3:
> >> Change prefixes
> >>
> >> Changed V2:
> >> reword
> >>
> >> ---
> >>
> >> I'm aware that the maintainer finds it "awful",
> >> but it's absolute necessary to:
> >> 1: read/write boot blocks in user space without touching original content
> >> 2: format a NAND for MTD either with built in or external driver module
> >>
> >> So we keep it include in this serie.
> >> ---
> >> drivers/mtd/nand/raw/rockchip-nand-controller.c | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/raw/rockchip-nand-controller.c b/drivers/mtd/nand/raw/rockchip-nand-controller.c
> >> index 5a0468034..fcda4c760 100644
> >> --- a/drivers/mtd/nand/raw/rockchip-nand-controller.c
> >> +++ b/drivers/mtd/nand/raw/rockchip-nand-controller.c
> >> @@ -188,6 +188,10 @@ struct rk_nfc {
> >> unsigned long assigned_cs;
> >> };
> >>
> >> +static int skipbbt;
> >> +module_param(skipbbt, int, 0644);
> >> +MODULE_PARM_DESC(skipbbt, "Skip BBT scan if data on the NAND chip is not in MTD format.");
> >> +
> >> static inline struct rk_nfc_nand_chip *rk_nfc_to_rknand(struct nand_chip *chip)
> >> {
> >> return container_of(chip, struct rk_nfc_nand_chip, chip);
> >> @@ -1153,6 +1157,9 @@ static int rk_nfc_nand_chip_init(struct device *dev, struct rk_nfc *nfc,
> >>
> >> nand_set_controller_data(chip, nfc);
> >>
> >> + if (skipbbt)
> >> + chip->options |= NAND_SKIP_BBTSCAN | NAND_NO_BBM_QUIRK;
> >> +
> >> chip->options |= NAND_USES_DMA | NAND_NO_SUBPAGE_WRITE;
> >> chip->bbt_options = NAND_BBT_USE_FLASH | NAND_BBT_NO_OOB;
> >>
> >> --
> >> 2.30.2
> >>
> >
> >
> > Thanks,
> > Miquèl


Thanks,
Miquèl