Re: [PATCH v2 3/3] sched: Make uclamp changes depend on CAP_SYS_NICE

From: Qais Yousef
Date: Fri Jun 11 2021 - 08:48:31 EST


On 06/10/21 15:13, Quentin Perret wrote:
> There is currently nothing preventing tasks from changing their per-task
> clamp values in anyway that they like. The rationale is probably that
> system administrators are still able to limit those clamps thanks to the
> cgroup interface. However, this causes pain in a system where both
> per-task and per-cgroup clamp values are expected to be under the
> control of core system components (as is the case for Android).
>
> To fix this, let's require CAP_SYS_NICE to increase per-task clamp
> values. This allows unprivileged tasks to lower their requests, but not
> increase them, which is consistent with the existing behaviour for nice
> values.

Hmmm. I'm not in favour of this.

So uclamp is a performance and power management mechanism, it has no impact on
fairness AFAICT, so it being a privileged operation doesn't make sense.

We had a thought about this in the past and we didn't think there's any harm if
a task (app) wants to self manage. Yes a task could ask to run at max
performance and waste power, but anyone can generate a busy loop and waste
power too.

Now that doesn't mean your use case is not valid. I agree if there's a system
wide framework that wants to explicitly manage performance and power of tasks
via uclamp, then we can end up with 2 layers of controls overriding each
others.

Would it make more sense to have a procfs/sysfs flag that is disabled by
default that allows sys-admin to enforce a privileged uclamp access?

Something like

/proc/sys/kernel/sched_uclamp_privileged

I think both usage scenarios are valid and giving sys-admins the power to
enforce a behavior makes more sense for me.

Unless there's a real concern in terms of security/fairness that we missed?


Cheers

--
Qais Yousef