Re: [PATCH] Revert "mm/vunmap: add cond_resched() in vunmap_pmd_range"

From: Minchan Kim
Date: Fri Nov 13 2020 - 11:25:40 EST


On Thu, Nov 12, 2020 at 02:49:19PM -0800, Andrew Morton wrote:
> On Thu, 12 Nov 2020 12:01:01 -0800 Minchan Kim <minchan@xxxxxxxxxx> wrote:
>
> >
> > On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote:
> > > Hi Andrew,
> > >
> > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote:
> > > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > >
> > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e.
> > > > >
> > > > > While I was doing zram testing, I found sometimes decompression failed
> > > > > since the compression buffer was corrupted. With investigation,
> > > > > I found below commit calls cond_resched unconditionally so it could
> > > > > make a problem in atomic context if the task is reschedule.
> > > > >
> > > > > Revert the original commit for now.
> >
> > How should we proceed this problem?
> >
>
> (top-posting repaired - please don't).
>
> Well, we don't want to reintroduce the softlockup reports which
> e47110e90584a2 fixed, and we certainly don't want to do that on behalf
> of code which is using the unmap_kernel_range() interface incorrectly.
>
> So I suggest either
>
> a) make zs_unmap_object() stop doing the unmapping from atomic context or

It's not easy since the path already hold several spin_locks as well as
per-cpu context. I could pursuit the direction but it takes several
steps to change entire locking scheme in the zsmalloc, which will
take time(we couldn't leave zsmalloc broken until then) and hard to
land on stable tree.

>
> b) figure out whether the vmalloc unmap code is *truly* unsafe from
> atomic context - perhaps it is only unsafe from interrupt context,
> in which case we can rework the vmalloc.c checks to reflect this, or

I don't get the point. I assume your suggestion would be "let's make the
vunmap code atomic context safe" but how could it solve this problem?

The point from e47110e90584a2 was softlockup could be triggered if
vunamp deal with large mapping so need *explict reschedule* point
for CONFIG_PREEMPT_VOLUNTARY. However, CONFIG_PREEMPT_VOLUNTARY
doesn't consider peempt count so even though we could make vunamp
atomic safe to make a call under spin_lock:

spin_lock(&A);
vunmap
vunmap_pmd_range
cond_resched <- bang

Below options would have same problem, too.
Let me know if I misunderstand something.

>
> c) make the vfree code callable from all contexts. Which is by far
> the preferred solution, but may be tough.
>
>
> Or maybe not so tough - if the only problem in the vmalloc code is the
> use of mutex_trylock() from irqs then it may be as simple as switching
> to old-style semaphores and using down_trylock(), which I think is
> irq-safe.
>
> However old-style semaphores are deprecated. A hackyish approach might
> be to use an rwsem always in down_write mode and use
> down_write_trylock(), which I think is also callable from interrrupt
> context.
>
> But I have a feeling that there are other reasons why vfree shouldn't
> be called from atomic context, apart from the mutex_trylock-in-irq
> issue.