Re: [PATCH v1] rcu: Fix and improve RCU read lock checks when !CONFIG_DEBUG_LOCK_ALLOC

From: Paul E. McKenney
Date: Thu Jul 13 2023 - 18:27:55 EST


On Fri, Jul 14, 2023 at 03:00:25AM +0800, Gao Xiang wrote:
> On 2023/7/14 02:14, Paul E. McKenney wrote:
> > On Fri, Jul 14, 2023 at 12:09:27AM +0800, Alan Huang wrote:
>
> ...
>
> > > >
> > > > From what Sandeep described, the code path is in an RCU reader. My
> > > > question is more, why doesn't it use SRCU instead since it clearly
> > > > does so if BLK_MQ_F_BLOCKING. What are the tradeoffs? IMHO, a deeper
> > > > dive needs to be made into that before concluding that the fix is to
> > > > use rcu_read_lock_any_held().
> > >
> > > Copied from [1]:
> > >
> > > "Background: Historically erofs would always schedule a kworker for
> > > decompression which would incur the scheduling cost regardless of
> > > the context. But z_erofs_decompressqueue_endio() may not always
> > > be in atomic context and we could actually benefit from doing the
> > > decompression in z_erofs_decompressqueue_endio() if we are in
> > > thread context, for example when running with dm-verity.
> > > This optimization was later added in patch [2] which has shown
> > > improvement in performance benchmarks.”
> > >
> > > I’m not sure if it is a design issue.
>
> What do you mean a design issue, honestly? I feel uneasy to hear this.
>
> > I have no official opinion myself, but there are quite a few people
> > who firmly believe that any situation like this one (where driver or
> > file-system code needs to query the current context to see if blocking
> > is OK) constitutes a design flaw. Such people might argue that this
> > code path should have a clearly documented context, and that if that
> > documentation states that the code might be in atomic context, then the
> > driver/fs should assume atomic context. Alternatively, if driver/fs
>
> I don't think a software decoder (for example, decompression) should be
> left in the atomic context honestly.
>
> Regardless of the decompression speed of some algorithm theirselves
> (considering very slow decompression on very slow devices), it means
> that we also don't have a way to vmap or likewise (considering
> decompression + handle extra deduplication copies) in the atomic
> context, even memory allocation has to be in an atomic way.
>
> Especially now have more cases that decodes in the RCU reader context
> apart from softirq contexts?

Just out of curiosity, why are these cases within RCU readers happening?
Is this due to recent MM changes? But yes, you would have the same
problem in softirq context.

> > needs the context to be non-atomic, the callers should make it so.
> >
> > See for example in_atomic() and its comment header:
> >
> > /*
> > * Are we running in atomic context? WARNING: this macro cannot
> > * always detect atomic context; in particular, it cannot know about
> > * held spinlocks in non-preemptible kernels. Thus it should not be
> > * used in the general case to determine whether sleeping is possible.
> > * Do not use in_atomic() in driver code.
> > */
> > #define in_atomic() (preempt_count() != 0)
> >
> > In the immortal words of Dan Frye, this should be good clean fun! ;-)
>
> Honestly, I think such helper (to show whether it's in the atomic context)
> is useful to driver users, even it could cause some false positive in some
> configuration but it's acceptable.
>
> Anyway, I could "Always use a workqueue.", but again, the original commit
> was raised by a real vendor (OPPO), and I think if dropping this, all
> downstream users which use dm-verity will be impacted and individual end
> users will not be happy as well.

Well, given what we have discussed thus far, for some kernels, you would
always use a workqueue. One option would be to force CONFIG_PREEMPT_COUNT
to always be enabled, but this option has not been rejected multiple
times in the past.

But you are quite free to roll your own z_erofs_wq_needed().

Thanx, Paul