Re: [PATCH] kvm: x86: keep srcu writer side operation mutually exclusive

From: Hao Peng
Date: Sun Oct 23 2022 - 23:31:06 EST


On Tue, Oct 11, 2022 at 1:38 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Sun, Oct 09, 2022, Hao Peng wrote:
> > On Sat, Oct 8, 2022 at 1:12 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Sat, Oct 08, 2022, Hao Peng wrote:
> > > > From: Peng Hao <flyingpeng@xxxxxxxxxxx>
> > > >
> > > > Synchronization operations on the writer side of SRCU should be
> > > > invoked within the mutex.
> > >
> > > Why? Synchronizing SRCU is necessary only to ensure that all previous readers go
> > > away before the old filter is freed. There's no need to serialize synchronization
> > > between writers. The mutex ensures each writer operates on the "new" filter that's
> > > set by the previous writer, i.e. there's no danger of a double-free. And the next
> > > writer will wait for readers to _its_ "new" filter.
> > >
> > Array srcu_lock_count/srcu_unlock_count[] in srcu_data, which is used
> > alternately to determine
> > which readers need to wait to get out of the critical area. If two
> > synchronize_srcu are initiated concurrently,
> > there may be a problem with the judgment of gp. But if it is confirmed
> > that there will be no writer concurrency,
> > it is not necessary to ensure that synchronize_srcu is executed within
> > the scope of the mutex lock.
>
> I don't see anything in the RCU documentation or code that suggests that callers
> need to serialize synchronization calls. E.g. the "tree" SRCU implementation uses
> a dedicated mutex to serialize grace period work
>
> struct mutex srcu_gp_mutex; /* Serialize GP work. */
>
> static void srcu_advance_state(struct srcu_struct *ssp)
> {
> int idx;
>
> mutex_lock(&ssp->srcu_gp_mutex);
>
> <magic>
> }
>
>
> and its state machine explicitly accounts for "Someone else" starting a grace
> period
>
> if (idx != SRCU_STATE_IDLE) {
> mutex_unlock(&ssp->srcu_gp_mutex);
> return; /* Someone else started the grace period. */
> }
>
> and srcu_gp_end() also guards against creating more than 2 grace periods.
>
> /* Prevent more than one additional grace period. */
> mutex_lock(&ssp->srcu_cb_mutex);
>
> And if this is a subtle requirement, there is a lot of broken kernel code, e.g.
> mmu_notifier, other KVM code, srcu_notifier_chain_unregister(), etc...

srcu_gp_mutex is meaningless because the workqueue already guarantees
that the same work_struct will not be reentrant.
If synchronize_srcu is not mutually exclusive on the update side, it may cause
a GP to fail for a long time. I will continue to analyze when I have time.