Re: [PATCH 1/2] torture: use for_each_present() loop in torture_online_all()

From: Paul E. McKenney
Date: Thu Nov 17 2022 - 10:06:45 EST


On Thu, Nov 17, 2022 at 07:30:32AM +0100, Sven Schnelle wrote:
> Hi Paul,
>
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
>
> >> > Yes, rcutorture has lower-level checks for CPUs being hotplugged
> >> > behind its back. Which might be sufficient. But this patch is in
> >> > response to something bad happening if the CPU is also not present in
> >> > the cpu_present_mask. Would that same bad thing happen if rcutorture saw
> >> > the CPU in cpu_online_mask, but by the time it attempted to CPU-hotplug
> >> > it, that CPU was gone not just from cpu_online_mask, but also from
> >> > cpu_present_mask?
> >> >
> >> > Or are CPUs never removed from cpu_present_mask?
> >>
> >> In the current implementation CPUs can only be added to the
> >> cpu_present_mask, but never removed. This might change in the future
> >> when we get support from firmware for that, but the current s390 code
> >> doesn't do that.
> >
> > Very good!
> >
> > Then could the patch please check that bits are never removed?
> > That way the code will complain should firmware support be added.
> >
> > Thanx, Paul
>
> I'm not sure whether i fully understand that. If the CPU could
> be removed from the system and the cpu_present_mask, that could
> happen at any time. So i don't see how we should check about that?

Well, that is my question to you. ;-)

Suppose we have the following sequence of events:

o rcutorture sees that CPU 5 is in cpu_present_mask, but offline.

o rcutorture therefore decides to online CPU 5.

o s390 firmware removes CPU 5, and s390 architecture code then
clears it from the cpu_present_mask.

o rcutorture proceeds with onlining CPU 5.

Don't we then get the same problem that prompted you to change from
cpu_possible_mask to cpu_present mask? If not, why can't the rcutorture
code continue to use cpu_possible_mask?

If it really is bad to try to online or offline a CPU that is in
cpu_possible_mask but not in cpu_present_mask, and if CPUs can be removed
from cpu_present_mask, then we need some way to synchronize the removal
of CPUs from cpu_present_mask. There are of course a lot of possible
ways to do that synchronization, for example, protecting cpu_present_mask
with a mutex or similar.

Alternatively, s390 could restrict things. One way to do that would
be to turn off rcutorture's use of CPU hotplug when running on s390,
for example, by using the module parameters provided for that purpose.
Another way to do that would be to refrain from removing CPUs from
cpu_present_mask while rcutorture is running.

Are there other approaches?

Thanx, Paul