Re: [PATCH] libnvdimm: rework region badblocks clearing

From: Verma, Vishal L
Date: Mon May 01 2017 - 12:43:07 EST


On Mon, 2017-05-01 at 09:38 -0700, Dan Williams wrote:
> On Mon, May 1, 2017 at 9:20 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx>
> wrote:
> > On Mon, 2017-05-01 at 09:16 -0700, Dan Williams wrote:
> > > On Mon, May 1, 2017 at 9:12 AM, Kani, Toshimitsu <toshi.kani@hpe.
> > > com>
> > > wrote:
> > > > On Mon, 2017-05-01 at 08:52 -0700, Dan Williams wrote:
> > > > > On Mon, May 1, 2017 at 8:43 AM, Dan Williams <dan.j.williams@
> > > > > inte
> > > > > l.co
> > > > > m> wrote:
> > > > > > On Mon, May 1, 2017 at 8:34 AM, Kani, Toshimitsu <toshi.kan
> > > > > > i@hp
> > > > > > e.co
> > > > > > m> wrote:
> > > > > > > On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote:
> > > >
> > > > Â:
> > > > > > >
> > > > > > > Hi Dan,
> > > > > > >
> > > > > > > I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP
> > > > > > > set
> > > > > > > this time, and hit the following BUG with BTT.ÂÂThis is a
> > > > > > > separate issue (not introduced by this patch), but it
> > > > > > > shows
> > > > > > > that we have an issue with the DSM call path as well.
> > > > > >
> > > > > > Ah, great find, thanks! We don't see this in the unit tests
> > > > > > because the nfit_test infrastructure takes no sleeping
> > > > > > actions
> > > > > > in its simulated DSM path. Outside of converting btt to use
> > > > > > sleeping locks I'm not sure I see a path forward. I wonder
> > > > > > how
> > > > > > bad the performance impact of that would be? Perhaps with
> > > > > > opportunistic spinning it won't be so bad, but I don't see
> > > > > > another choice.
> > > > >
> > > > > It's worse than that. Part of the performance optimization of
> > > > > BTT
> > > > > I/O was to avoid locking altogether when we could rely on a
> > > > > BTT
> > > > > lane percpu, so that would also need to be removed.
> > > >
> > > > I do not have a good idea either, but I'd rather disable this
> > > > clearing in the regular BTT write path than adding sleeping
> > > > locks
> > > > to BTT. Clearing a bad block in the BTT write path is
> > > > difficult/challenging since it allocates a new block.
> > >
> > > Actually, that may make things easier. Can we teach BTT to track
> > > error blocks and clear them before they are reassigned?
> >
> > I was thinking the same after sending it.ÂÂI think we should be
> > able to
> > do that.
>
> Ok, but we obviously can't develop something that detailed while the
> merge window is open, so I think that means we need to revert commit
> e88da7998d7d "Revert 'libnvdimm: band aid btt vs clear poison
> locking'" and leave BTT I/O-error-clearing disabled for this cycle
> and
> try again for 4.13.

Agreed, I'll work on something to track badblocks and clear them
outside the IO path.