Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs

From: Nitesh Narayan Lal
Date: Thu Sep 17 2020 - 14:45:40 EST



On 9/17/20 2:18 PM, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
>
>> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
>> the number of housekeeping CPUs by reading an atomic variable
>> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
>>
>> This API is introduced for the purpose of drivers that were previously
>> relying only on num_online_cpus() to determine the number of MSIX vectors
>> to create. In an RT environment with large isolated but a fewer
>> housekeeping CPUs this was leading to a situation where an attempt to
>> move all of the vectors corresponding to isolated CPUs to housekeeping
>> CPUs was failing due to per CPU vector limit.
>>
>> If there are no isolated CPUs specified then the API returns the number
>> of all online CPUs.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx>
>> ---
>> include/linux/sched/isolation.h | 7 +++++++
>> kernel/sched/isolation.c | 23 +++++++++++++++++++++++
>> 2 files changed, 30 insertions(+)
> I'm not a scheduler expert, but a couple comments follow.
>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..94c25d956d8a 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
>> extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
>> extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
>> extern void __init housekeeping_init(void);
>> +extern unsigned int num_housekeeping_cpus(void);
>>
>> #else
>>
>> @@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
>> static inline void housekeeping_affine(struct task_struct *t,
>> enum hk_flags flags) { }
>> static inline void housekeeping_init(void) { }
>> +
>> +static unsigned int num_housekeeping_cpus(void)
>> +{
>> + return num_online_cpus();
>> +}
>> +
>> #endif /* CONFIG_CPU_ISOLATION */
>>
>> static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index 5a6ea03f9882..7024298390b7 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
>> EXPORT_SYMBOL_GPL(housekeeping_overridden);
>> static cpumask_var_t housekeeping_mask;
>> static unsigned int housekeeping_flags;
>> +static atomic_t __num_housekeeping_cpus __read_mostly;
>>
>> bool housekeeping_enabled(enum hk_flags flags)
>> {
>> @@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
>> }
>> EXPORT_SYMBOL_GPL(housekeeping_enabled);
>>
>> +/*
> use correct kdoc style, and you get free documentation from your source
> (you're so close!)
>
> should be (note the first line and the function title line change to
> remove parens:
> /**
> * num_housekeeping_cpus - Read the number of housekeeping CPUs.
> *
> * This function returns the number of available housekeeping CPUs
> * based on __num_housekeeping_cpus which is of type atomic_t
> * and is initialized at the time of the housekeeping setup.
> */

My bad, I missed that.
Thanks for pointing it out.

>
>> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
>> + *
>> + * This function returns the number of available housekeeping CPUs
>> + * based on __num_housekeeping_cpus which is of type atomic_t
>> + * and is initialized at the time of the housekeeping setup.
>> + */
>> +unsigned int num_housekeeping_cpus(void)
>> +{
>> + unsigned int cpus;
>> +
>> + if (static_branch_unlikely(&housekeeping_overridden)) {
>> + cpus = atomic_read(&__num_housekeeping_cpus);
>> + /* We should always have at least one housekeeping CPU */
>> + BUG_ON(!cpus);
> you need to crash the kernel because of this? maybe a WARN_ON? How did
> the global even get set to the bad value? It's going to blame the poor
> caller for this in the trace, but the caller likely had nothing to do
> with setting the value incorrectly!

Yes, ideally this should not be triggered, but if somehow it does then we have
a bug and that needs to be fixed. That's probably the only reason why I chose
BUG_ON.
But, I am not entirely against the usage of WARN_ON either, because we get a
stack trace anyways.
I will see if anyone else has any other concerns on this patch and then I can
post the next version.

>
>> + return cpus;
>> + }
>> + return num_online_cpus();
>> +}
>> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
--
Thanks
Nitesh

Attachment: signature.asc
Description: OpenPGP digital signature