Re: [PATCH v4 07/13] firmware: arm_scmi: Add notification dispatch and delivery

From: Cristian Marussi
Date: Mon Mar 23 2020 - 04:28:06 EST


Hi

On 3/18/20 8:26 AM, Lukasz Luba wrote:
Hi Cristian,

On 3/16/20 2:46 PM, Cristian Marussi wrote:
On Thu, Mar 12, 2020 at 09:43:31PM +0000, Lukasz Luba wrote:


On 3/12/20 6:34 PM, Cristian Marussi wrote:
On 12/03/2020 13:51, Lukasz Luba wrote:
Hi Cristian,

Hi Lukasz

just one comment below...
[snip]
+ÂÂÂ eh.timestamp = ts;
+ÂÂÂ eh.evt_id = evt_id;
+ÂÂÂ eh.payld_sz = len;
+ÂÂÂ kfifo_in(&r_evt->proto->equeue.kfifo, &eh, sizeof(eh));
+ÂÂÂ kfifo_in(&r_evt->proto->equeue.kfifo, buf, len);
+ÂÂÂ queue_work(r_evt->proto->equeue.wq,
+ÂÂÂÂÂÂÂÂÂÂ &r_evt->proto->equeue.notify_work);

Is it safe to ignore the return value from the queue_work here?


[snip]

On the other side considering the impact of such scenario, I can imagine that
it's not simply that we could only have a delayed delivery, but we must consider
that if the delayed event is effectively the last one ever it would remain
undelivered forever; this is particularly worrying in a scenario in which such
last event is particularly important: imagine a system shutdown where a last
system-power-off remains undelivered.

Agree, another example could be a thermal notification for some critical
trip point.


As a consequence I think this rare racy condition should be addressed somehow.

Looking at this scenario, it seems the classic situation in which you want to
use some sort of completion to avoid missing out on events delivery, BUT in our
usecase:

- placing the workers loaned from cmwq into an unbounded wait_for_completion()
ÂÂ once the queue is empty seems not the best to use resources (and probably
ÂÂ frowned upon)....using a few dedicated kernel threads to simply let them idle
ÂÂ waiting most of the time seems equally frowned upon (I could be wrong...))
- the needed complete() in the ISR would introduce a spinlock_irqsave into the
ÂÂ interrupt path (there's already one inside queue_work in fact) so it is not
ÂÂ desirable, at least not if used on a regular base (for each event notified)

So I was thinking to try to reduce sensibly the above race window, more
than eliminate it completely, by adding an early flag to be checked under
specific conditions in order to retry the queue_work a few times when the race
is hit, something like:

ISR (core N)ÂÂÂÂÂÂÂ |ÂÂÂ WQ (core N+1)
-------------------------------------------------------------------------------
ÂÂÂÂÂÂÂÂÂÂÂ | atomic_set(&exiting, 0);
ÂÂÂÂÂÂÂÂÂÂÂ |
ÂÂÂÂÂÂÂÂÂÂÂ | do {
ÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂ ...
ÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂÂ if (queue_is_empty)ÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 0 events queued
ÂÂÂÂÂÂÂÂÂÂÂ +ÂÂÂÂÂÂÂÂÂ atomic_set(&exiting, 1)ÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 0 events queued
static int cnt=3ÂÂÂ |ÂÂÂÂÂÂÂÂÂ --> breakout of whileÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 0 events queued
kfifo_in()ÂÂÂÂÂÂÂ |ÂÂÂ ....
ÂÂÂÂÂÂÂÂÂÂÂ | } while (scmi_process_event_payload);
kfifo_in()ÂÂÂÂÂÂÂ |
exiting = atomic_read()ÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
do {ÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂ ret = queue_work()ÂÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂ if (ret || !exiting)|ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂbreak;ÂÂÂÂÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂ mdelay(5);ÂÂÂÂÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂ exiting =ÂÂÂÂÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂÂÂ atomic_read;ÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
} while (--cnt);ÂÂÂ |ÂÂÂÂ ...cmwq backing outÂÂÂÂÂÂÂ - WORK_PENDINGÂÂÂÂÂÂÂ 1 events queued
ÂÂÂÂÂÂÂÂÂÂÂ | ---- WORKER EXITÂÂÂÂÂÂÂÂÂÂÂÂ - !WORK_PENDINGÂÂÂÂÂÂÂ 0 events queued

like down below between the scissors.

Not tested or tried....I could be missing something...and the mdelay is horrible (and not
the cleanest thing you've ever seen probably :D)...I'll have a chat with Sudeep too.

Indeed it looks more complicated. If you like I can join your offline
discuss when Sudeep is back.

Yes this is as of now my main remaining issue to address for v6.
I'll wait for Sudeep general review/feedback and raise this point.

Regards

Cristian

Regards,
Lukasz