Re: [PATCH v3 10/10] mtd: rawnand: brcmnand: allow for on-die ecc

From: Miquel Raynal
Date: Mon Jan 29 2024 - 05:52:52 EST


Hi David,

dregan@xxxxxxxxxxxx wrote on Fri, 26 Jan 2024 11:57:39 -0800:

> Hi Miquèl,
>
> On Thu, Jan 25, 2024 at 10:19 PM Miquel Raynal
> <miquel.raynal@xxxxxxxxxxx> wrote:
> >
> > Hi David,
> >
> > dregan@xxxxxxxxxxxx wrote on Thu, 25 Jan 2024 11:47:46 -0800:
> >
> > > Hi Miquèl,
> > >
> > > On Wed, Jan 24, 2024 at 9:40 AM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote:
> > > >
> > > > Hi David,
> > > >
> > > > dregan@xxxxxxxxxxxx wrote on Tue, 23 Jan 2024 19:04:58 -0800:
> > > >
> > > > > Allow settings for on-die ecc such that if on-die ECC is selected
> > > > > don't error out but require ECC strap setting of zero
> > > > >
> > > > > Signed-off-by: David Regan <dregan@xxxxxxxxxxxx>
> > > > > Reviewed-by: William Zhang <william.zhang@xxxxxxxxxxxx>
> > > > > ---
> > > > > Changes in v3: None
> > > > > ---
> > > > > Changes in v2:
> > > > > - Added to patch series
> > > > > ---
> > > > > drivers/mtd/nand/raw/brcmnand/brcmnand.c | 14 ++++++++++----
> > > > > 1 file changed, 10 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > > index a4e311b6798c..42526f3250c9 100644
> > > > > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > > @@ -2727,9 +2727,11 @@ static int brcmnand_setup_dev(struct brcmnand_host *host)
> > > > > cfg->blk_adr_bytes = get_blk_adr_bytes(mtd->size, mtd->writesize);
> > > > >
> > > > > if (chip->ecc.engine_type != NAND_ECC_ENGINE_TYPE_ON_HOST) {
> > > > > - dev_err(ctrl->dev, "only HW ECC supported; selected: %d\n",
> > > > > - chip->ecc.engine_type);
> > > > > - return -EINVAL;
> > > > > + if (chip->ecc.strength) {
> > > > > + dev_err(ctrl->dev, "ERROR!!! HW ECC must be set to zero for non-hardware ECC; selected: %d\n",
> > > > > + chip->ecc.strength);
> > > >
> > > > Can you use a more formal string? Also clarify it because I don't
> > > > really understand what it leads to.
> > >
> > > How about:
> > >
> > > dev_err(ctrl->dev, "HW ECC set to %d, must be zero for on-die ECC\n",
> >
> > Actually I am wondering how legitimate this is. Just don't enable the
> > on host ECC engine if it's not in use. No need to check the core's
> > choice.
>
> Our chip ECC engine will either be on if it's needed or off if it's not.
> Either I can do that in one place or put checks in before each
> read/write to turn on/off the ECC engine, which seems a lot more
> work and changes and possible issues/problems.
> Turning it on/off as needed has not been explicitly tested and
> could cause unforeseen consequences. This
> is a minimal change which should have minimal impact.
>
> >
> > >
> > > >
> > > > > + return -EINVAL;
> > > > > + }
> > > > > }
> > > > >
> > > > > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN) {
> > > > > @@ -2797,7 +2799,11 @@ static int brcmnand_setup_dev(struct brcmnand_host *host)
> > > > > if (ret)
> > > > > return ret;
> > > > >
> > > > > - brcmnand_set_ecc_enabled(host, 1);
> > > > > + if (chip->ecc.engine_type == NAND_ECC_ENGINE_TYPE_ON_DIE) {
> > > > > + dev_dbg(ctrl->dev, "Disable HW ECC for on-die ECC\n");
> > > >
> > > > Not needed.
> > >
> > > Will remove.
> > >
> > > >
> > > > > + brcmnand_set_ecc_enabled(host, 0);
> > > > > + } else
> > > > > + brcmnand_set_ecc_enabled(host, 1);
> > > >
> > > > Style is wrong, but otherwise I think ECC should be kept disabled while
> > > > not in active use, so I am a bit surprised by this line.
> > >
> > > This is a double check to turn on/off our hardware ECC.
> >
> > I expect the engine to be always disabled. Enable it only when you
> > need (may require an additional patch before this one).
>
> We are already turning on the ECC enable at this point,
> this is just adding the option to turn it off if the NAND chip
> itself will be doing the ECC instead of our controller.

Sorry if I have not been clear.

This sequence:
- init
- enable hw ECC engine
Is broken.

It *cannot* work as any operation going through exec_op now may
perform page reads which should be unmodified by the ECC engine. You
driver *must* follow the following sequence:

- init and disable (or keep disabled) the hw ECC engine
- when you perform a page operation with correction you need to
- enable the engine
- perform the operation
- disable the engine

Thanks,
Miquèl