Re: [PATCH] kthread: kthread_should_stop() checks if we're a kthread

From: Petr Mladek
Date: Thu Nov 30 2023 - 06:24:54 EST


On Tue 2023-11-28 12:40:12, Kent Overstreet wrote:
> On Tue, Nov 28, 2023 at 10:30:49AM +0100, Petr Mladek wrote:
> > Adding Andrew into Cc. He usually takes changes in kernel/kthread.c.
> >
> > On Mon 2023-11-20 17:15:03, Kent Overstreet wrote:
> > > bcachefs has a fair amount of code that may or may not be running from a
> > > kthread (it may instead be called by a userspace ioctl); having
> > > kthread_should_stop() check if we're a kthread enables a fair bit of
> > > cleanup and makes it safer to use.
> > >
> > > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > ---
> > > kernel/kthread.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index 1eea53050bab..fe6090ddf414 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -155,7 +155,8 @@ void free_kthread_struct(struct task_struct *k)
> > > */
> > > bool kthread_should_stop(void)
> > > {
> > > - return test_bit(KTHREAD_SHOULD_STOP, &to_kthread(current)->flags);
> > > + return (current->flags & PF_KTHREAD) &&
> > > + test_bit(KTHREAD_SHOULD_STOP, &to_kthread(current)->flags);
> > > }
> > > EXPORT_SYMBOL(kthread_should_stop);
> >
> > I agree that it makes the API more safe because &to_kthread(current)
> > is NULL when the process is not a kthread.
> >
> > Well, I do not like the idea of quietly ignoring a misuse of
> > the kthread_*() API. I would personally prefer to do:
>
> It's only a misuse in the most pedantic sense, IMO.
>
> Is it ever possible that with this change calling kthread_should_stop()
> from a non-kthread could cause a problem?

Of course, calling the updated kthread_should_stop() won't
cause problems in non-kthread context.

But it would make it more complicated to check for misuse
in the future.

Q: Why is a check for misuse important?
A: It partly depends on the personal opinion. But in general, it
prevents other problems.

Q: Is it worth it in this case?
A: I do not have strong opinion.


I have basically two concerns:

1. Consistent behavior:

For example, kthread_use_mm() and kthread_unuse_mm() already
warn about a misuse.


2. Understanding the code

Unconditional use of kthread_should_stop() makes the feeling
that the code is intended for kthread context. People, not
familiar with the code might miss that it might run also
in userspace process.

And indeed, I have looked how kthread_should_stop() is
used in bcachefs code. For example, bch2_move_ratelimit()
seems to handle only the kthread context:

int bch2_move_ratelimit(struct moving_context *ctxt)
{
[...]
if (ctxt->wait_on_copygc && !c->copygc_running) {
bch2_trans_unlock_long(ctxt->trans);
wait_event_killable(c->copygc_running_wq,
!c->copygc_running ||
kthread_should_stop());
}


Yes, wait_event_killable() is primary for userspace
process. AFAIK, signals are not delivered to kthreads
by default, see sig_task_ignored().

But the code does not check return value so that
it does not detect when usespace process calling
this function was killed.


do {
delay = ctxt->rate ? bch2_ratelimit_delay(ctxt->rate) : 0;
[...]
if (delay) {
[...]
set_current_state(TASK_INTERRUPTIBLE);
}
[...]
if ((current->flags & PF_KTHREAD) && kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
return 1;
}

if (delay)
schedule_timeout(delay);
[...]
} while (delay);

Here, the do-while cycle returns early when the kthread gets
stopped. But it does not check whether there is a pending
signal when called from userspace process.

IMHO, it would be better to make it clear that the function might
be called from both kthread and userspace processes. Otherwise,
someone could later replace wait_event_killable() to wait_event()
because kthreads do not react to signals by default.

Also it would be worth to make it more clear what code is for
the kthread context and what is for the userspace context.
And the following would make it more obvious:

if (in_kthread) {
// handle kthread_should_stop
} else {
// handle signal_pending()
}

IMPORTANT: If the function could return early because of signal
then it might make sense to make the kthread call path
more robust. The kthread should not return before another
process called kthread_stop().

I vaguely remember that it might cause some problems.
I am not sure how big problems. Either the kthread would stay in
a limbo state because nobody would read the exit
code. Or the eventual kthread_stop() caller could
blow up or something like this.


BTW: kthread_should_stop() is called in wait_event_killable()
without PF_KTHREAD check so that it could probably blow up.


My experience says that it is far from trivial to handle kthread_stop(),
freezer, and signals correctly. IMHO, doing some implicit
NOPs do not help to get the right picture.

BTW: try_to_freeze() might cause a deadlock when a kthread_stop()
caller waits for the kthread() return but the kthread()
is blocked in __refrigerator().

At least this is my understanding of the commit 8a32c441c1609f80e
("freezer: implement and use kthread_freezable_should_stop()").

Maybe, __refrigerator() should check kthread_should_stop()
for all kthreads. But maybe, it might break something else.


> > // define this in include/linux/kthread.h
> > static inline bool in_kthread(void)
> > {
> > return current->flags & PF_KTHREAD
> > }
> >
> > // add WARN() into kthread_should_stop()
> > bool kthread_should_stop(void)
> > {
> > if (WARN_ON_ONCE(!in_kthread))
> > return false;
> >
> > return test_bit(KTHREAD_SHOULD_STOP, &to_kthread(current)->flags);
> > }
> > EXPORT_SYMBOL(kthread_should_stop);

I like this solution because the warning is printed even when
KTHREAD_SHOULD_STOP is not set. It means that misuse might
be caught easily.

Anyway, I am going to let Andrew to do a final decision.

Best Regards,
Petr