Re: [PATCH v2 2/2] alarmtimer: Modify alarmtimer suspend callback to check for imminent alarm using PM notifier

From: Thomas Gleixner
Date: Tue Feb 13 2024 - 17:39:05 EST


Pranav!

On Tue, Feb 13 2024 at 20:30, Pranav Prasad wrote:
>> How is this supposed to work? rtc is NULL.
>
> alarmtimer_get_soonest() has the following:
> rtc = alarmtimer_get_rtcdev();
> if (!rtc)
> return 0;
>
> rtc is set in alarmtimer_get_soonest(). If rtc is NULL, the notifier
> returns NOTIFY_DONE before even reaching pm_wakeup_event(), hence there is
> no NULL pointer dereference expected.

struct rtc_device *rtc = NULL;

if (!alarmtimer_get_soonest(rtc, ....)
return 0 ;

pm_wakeup_event(rtc->dev);

static bool alarmtimer_get_soonest(struct rtc_device *rtc, ....)
{
rtc = alarmtimer_get_rtcdev();
if (!rtc)
return false;
...
return true;
}

How is the assignment in alarmtimer_get_soonest() causing 'rtc' at the
callsite to become magically non NULL unless you have this shiny new AI
enhanced compiler with the long awaited 'do what I mean' optimization
enabled by default.

My old school brain based compiler is absolutely sure that both places
which I pointed out are straight forward unconditional NULL pointer
dereferences. See below.

>> How was this ever tested?
>
> I tested it by forcing kernel suspend writing 'mem' to /sys/power/state and
> using a large window (120s instead of the current 2s) so that a pending
> alarm is likely. The debug print is logged as expected without any kernel
> crash, and the suspend gets aborted.

I have no idea what you tested, but definitely not the complete
submitted code.

The only reason why this did not explode in your face in
pm_wakeup_event() is that this function has a NULL pointer check and
struct rtc_device has @dev as first member, which means that

rtc == &rtc->dev

resulting in the NULL pointer check to be effective because &rtc->dev
evaluates to NULL.

But that does not make the code more correct. It's still am
unconditional NULL pointer dereference by definition, no?

Just flip the ordering of @dev in @owner in struct rtc_device and watch
the show.

alarmtimer_suspend() did not explode in your face because either the
notifier aborted suspend, which means the function was never reached, or
there was no timer armed. Why?

Because rtc_cancel_timer() has no NULL pointer check and unconditionally
dereferences @rtc.

Just for the record. I missed to spot this gem in your patch:

> + /* Find the soonest timer to expire */
> + if (!alarmtimer_get_soonest(rtc, &min, &expires, &type))
> return 0;
>
> - if (ktime_to_ns(min) < suspend_check_duration_ms * NSEC_PER_MSEC) {
> - pm_wakeup_event(dev, suspend_check_duration_ms);
> - return -EBUSY;
> - }
> -

So you delete the threshold check. What makes sure that a timer which
got armed _after_ the notifier ran or one of the nanosleep timers which
are only accounted for after freezing are checked for expiring before
the threshold?

Nothing, right?

But sure, the patch did what you wanted to demonstrate and that's why it
is correct and perfect, right?

I conceed that your patch works perfectly correct under the following
condition:

Either alarmtimer_pm_callback() aborts the suspend or
alarmtimer_get_soonest() does not find an armed timer when called in
alarmtimer_suspend()

Unfortunately that's not reflecting reality in production systems unless
I'm missing something important here.

Thanks,

tglx