Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock

From: Matthew Wilcox
Date: Wed Dec 27 2023 - 05:15:23 EST


On Wed, Dec 27, 2023 at 09:41:29AM +0800, Aiqun Yu (Maria) wrote:
> On 12/26/2023 6:46 PM, Hillf Danton wrote:
> > On Wed, 13 Dec 2023 12:27:05 -0600 Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> > > Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:
> > > > On Wed, Dec 13, 2023 at 06:17:45PM +0800, Maria Yu wrote:
> > > > > +static inline void write_lock_tasklist_lock(void)
> > > > > +{
> > > > > + while (1) {
> > > > > + local_irq_disable();
> > > > > + if (write_trylock(&tasklist_lock))
> > > > > + break;
> > > > > + local_irq_enable();
> > > > > + cpu_relax();
> > > >
> > > > This is a bad implementation though. You don't set the _QW_WAITING flag
> > > > so readers don't know that there's a pending writer. Also, I've seen
> > > > cpu_relax() pessimise CPU behaviour; putting it into a low-power mode
> > > > that takes a while to wake up from.
> > > >
> > > > I think the right way to fix this is to pass a boolean flag to
> > > > queued_write_lock_slowpath() to let it know whether it can re-enable
> > > > interrupts while checking whether _QW_WAITING is set.
> >
> > lock(&lock->wait_lock)
> > enable irq
> > int
> > lock(&lock->wait_lock)
> >
> > You are adding chance for recursive locking.
>
> Thx for the comments for discuss of the deadlock possibility. While I think
> deadlock can be differentiate with below 2 scenarios:
> 1. queued_write_lock_slowpath being triggered in interrupt context.
> tasklist_lock don't have write_lock_irq(save) in interrupt context.
> while for common rw lock, maybe write_lock_irq(save) usage in interrupt
> context is a possible.
> so may introduce a state when lock->wait_lock is released and left the
> _QW_WAITING flag.
> Welcome others to suggest on designs and comments.

Hm? I am confused. You're talking about the scenario where:

- CPU B holds the lock for read
- CPU A attempts to get the lock for write in user context, fails, sets
the _QW_WAITING flag
- CPU A re-enables interrupts
- CPU A executes an interrupt handler which calls queued_write_lock()
- If CPU B has dropped the read lock in the meantime,
atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED) succeeds
- CPU A calls queued_write_unlock() which stores 0 to the lock and we
_lose_ the _QW_WAITING flag for the userspace waiter.

How do we end up with CPU A leaving the _QW_WAITING flag set?