Re: [PATCH] mtd: brcmnand: Workaround false ECC uncorrectable errors

From: Brian Norris
Date: Wed Dec 02 2015 - 15:54:46 EST


Hi,

On Wed, Dec 02, 2015 at 09:44:04PM +0100, Jonas Gorski wrote:
> On Wed, Dec 2, 2015 at 9:17 PM, Simon Arlott <simon@xxxxxxxxxxx> wrote:
> > On 01/12/15 10:41, Jonas Gorski wrote:
> >> On Sat, Nov 28, 2015 at 8:23 PM, Simon Arlott <simon@xxxxxxxxxxx> wrote:
> >>> +
> >>> + /* Go to start of buffer */
> >>> + buf -= FC_WORDS;
> >>> +
> >>> + /* Erased if all data bytes are 0xFF */
> >>> + buf_erased = memchr_inv(buf, 0xFF, FC_WORDS) == NULL;
> >>> +
> >>> + if (!buf_erased)
> >>> + goto out_free;
> >>
> >> We now have a function exactly for that use case in 4.4,
> >> nand_check_erased_buf [1], consider using that. This also has the
> >> benefit of treating bit flips as correctable as long as the ECC scheme
> >> is strong enough.
> >
> > I have no idea whether or not it's appropriate to specify
> > bitflips_threshold > 0 so it'd just be a more complex way to do
> > a memchr_inv() search for 0xFF.
>
> The threshold would be the amount of bitflips the code can correct, so
> basically ecc.strength (at least that is my understanding).
>
> > The code also has to check for the hamming code bytes being all 0x00,
> > because according to the comments [2], the controller also has
> > difficulty with the non-erased all-0xFFs scenario too.
>
> According to brcmnand.c hamming can fix up to fifteen bitflips, but in

Hamming only protects 1 bitflip. The '15' is the value used by the
controller to represent Hamming (i.e., there is no BCH-15).

> the current code you would fail a hamming protected all-0xff-page for
> even a single bitflip in the data or in the ecc bytes, which means
> that all-0xff-pages wouldn't be protected at all.

BTW, I think Kamal had code to handle protecting bitflips in erased
pages code in the Broadcom STB Linux BSP. Perhaps he can port that to
upstream with nand_check_erased_ecc_chunk()? IIUC, that would probably
handle your case too, Simon, although it wouldn't be optimal for an
all-0xff check (i.e., bitflip_threshold == 0).

If that's really an issue (i.e., we have an implementation + data), I'm
sure we could add optimization to nand_check_erased_ecc_chunk() to
support the bitflip_threshold == 0 case.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/