Re: [PATCH v6 7/7][Resend] cpufreq: schedutil: New governor based on scheduler utilization data

From: Steve Muckle
Date: Fri Apr 01 2016 - 14:15:16 EST


On 03/31/2016 05:32 AM, Rafael J. Wysocki wrote:
> On Thu, Mar 31, 2016 at 2:24 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> On Mon, Mar 28, 2016 at 11:17:44AM -0700, Steve Muckle wrote:
>>> The scenario I'm contemplating is that while a CPU-intensive task is
>>> running a thermal interrupt goes off. The driver for this thermal
>>> interrupt responds by capping fmax. If this happens just after the tick,
>>> it seems possible that we could wait a full tick before changing the
>>> frequency. Given a 10ms tick it could be rather annoying for thermal
>>> management algorithms on some platforms (I'm familiar with a few).
>>
>> So I'm blissfully unaware of all the thermal stuffs we have; but it
>> looks like its somehow bolten onto cpufreq without feedback.
>>
>> The thing I worry about is thermal scaling the CPU back past where RT/DL
>> tasks can still complete in time. It should not be able to do that, or
>> rather, missing deadlines because thermal is about as useful as
>> rebooting the device.

I'd agree that impacting RT/DL activity because of throttling may be as
bad as as a reset, but that seems worst case. There could be some
graceful shutdown or notification/alarm that can be done. Or a platform
can simply choose to reset.

Shouldn't we try to give the system designer the option of doing
something in software (by throttling the CPUs as low as necessary to
continue operation) rather than giving up and relying on a hardware reset?

> Right. If thermal throttling kicks in, the game is pretty much over.
>
> That's why ideas float about taking the thermal constraints into
> account upfront, but that's a different discussion entirely.

Current mainstream mobile platforms frequently throttle during normal
operation. I think it's important to have a robust throttling mechanism
at least until the more proactive thermal management scheme is fully
developed and proves to be equally capable (if and when that happens).

>> I guess I'm saying is, the whole cpufreq/thermal 'interface' needs work
>> anyhow.
>
> Yes, it does.

Agreed!

thanks,
Steve