Re: PM regression with LED changes in next-20161109

From: Jacek Anaszewski
Date: Thu Nov 10 2016 - 08:55:31 EST


Hi,

On 11/10/2016 02:04 PM, Hans de Goede wrote:
Hi,

On 10-11-16 13:56, Jacek Anaszewski wrote:
Hi,

On 11/10/2016 09:49 AM, Hans de Goede wrote:
Hi,

On 09-11-16 21:45, Jacek Anaszewski wrote:
Hi Tony,

On 11/09/2016 08:23 PM, Tony Lindgren wrote:
Hi,

Looks like commit 883d32ce3385 ("leds: core: Add support for poll()ing
the sysfs brightness attr for changes.") breaks runtime PM for me.

On my omap dm3730 based test system, idle power consumption is over 70
times higher now with this patch! It goes from about 6mW for the core
system to over 440mW during idle meaning there's some busy timer now
active.

Reverting this patch fixes the issue. Any ideas?

Thanks for the report. This is probably caused by
sysfs_notify_dirent().
I'm afraid that we can't keep this feature in the current shape.
Hans, I'm dropping the patch. We probably will have to delegate this
call to a workqueue task. Think about use cases when the LED is blinked
with high frequency e.g. from ledtrig-disk.c.

sysfs_notify_dirent() already uses a workqueue, here is the actual
implementation of it (from fs/kernfs/file.c) :

void kernfs_notify(struct kernfs_node *kn)
{
static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
unsigned long flags;

if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
return;

spin_lock_irqsave(&kernfs_notify_lock, flags);
if (!kn->attr.notify_next) {
kernfs_get(kn);
kn->attr.notify_next = kernfs_notify_list;
kernfs_notify_list = kn;
schedule_work(&kernfs_notify_work);
}
spin_unlock_irqrestore(&kernfs_notify_lock, flags);
}

Indeed. As a next step of this investigation Tony could disable
particular calls made in kernfs_notify_workfn to check what
exactly causes excessive power consumption.

So using a workqueue is not going to help. Note that I already
feared this, which is why my initial implementation only called
sysfs_notify_dirent() for user initiated changes and not for
triggers / blinking.

AFAIR there were no calls to led_notify_brightness_change() in
the initial implementation and it was entirely predestined for
being called by LED class drivers on brightness changes made
by firmware.

I think we may need to reconsider what getting the brightness
sysfs atrribute actually returns. Currently when a LED is
blinking it will return 0 resp. the actual brightness depending
on when in the blink cycle the user reads the brightness
sysfs atrribute.

So a user can do "echo 128 > brightness && cat brightness" and
get out 0, or 128, depending purely on timing.

This seems to contradict what Documentation/ABI/testing/sysfs-class-led
has to say:

What: /sys/class/leds/<led>/brightness
Date: March 2006
KernelVersion: 2.6.17
Contact: Richard Purdie <rpurdie@xxxxxxxxx>
Description:
Set the brightness of the LED. Most LEDs don't
have hardware brightness support, so will just be turned
on for
non-zero brightness settings. The value is between 0 and
/sys/class/leds/<led>/max_brightness.

Writing 0 to this file clears active trigger.

Writing non-zero to this file while trigger is active
changes the
top brightness trigger is going to use.

Even though it only talks about writing, the logical thing would be for
reading to be the exact opposite of writing, so we would get:

Reading from this file while a trigger is active returns
the
top brightness trigger is going to use.

The current docs say not about (sw) blinking, but that should be treated
just
like a trigger IMHO.

You'r right, we should describe the semantics on reading, but it would
have to be as follows:

Reading from this file returns LED brightness at given moment, i.e.
even though LED class device brightness setting is greater than 0, the
momentary brightness can be 0 if the readout occurred during low phase
of blink cycle.

Why would it need to read like this, because this is the current behavior ?

We have led_update_brightness() which was introduced long time ago and
is used in brightness_show(). Note that if LED controller changed
actual LED brightness e.g. due to battery voltage dropping below
certain threshold, we wouldn't be able to find it out otherwise
(except if we added separate sysfs file for that).


I doubt anyone is relying on this current behavior because it is really
unpredictable which value one can get.

I believe it would be better to change the read semantics to follow
the write semantics, this would be much more consistent.

Making the read behavior match the write behavior should be easy I would
be happy to write a patch for this.

Let's better agree on the description of the current semantics.
It has been around for a long time.

If we can get consensus on what the read behavior for the brightness
attribute
should be, then I think that a better poll() behavior will automatically
follow
from that.

It seems that we should get back to your initial approach. i.e. only
brightness changes caused by hardware should be reported.

Ok, if you really want to keep the read behavior as is, I can provide
an updated patch for this.

Yes please.

--
Best regards,
Jacek Anaszewski