Re: [RFC] dt-bindings: mailbox: add doorbell support to ARM MHU

From: Viresh Kumar
Date: Wed Jun 10 2020 - 05:33:40 EST


On 05-06-20, 10:42, Jassi Brar wrote:
> Since origin upto scmi_xfer, there can be many forms of sleep like
> schedule/mutexlock etc.... think of some userspace triggering sensor
> or dvfs operation. Linux does not provide real-time guarantees. Even
> if remote (scmi) firmware guarantee RT response, it makes sense to
> timeout a response only after the _request is on the bus_ and not
> when you submit a request to the api (unless you serialise it).
> IOW, start the timeout from mbox_client.tx_prepare() when the
> message actually gets on the bus.

There are multiple purposes of the timeout IMO:

- Returning early if the other side is dead/hung, in such a case the
timeout can be put when the request is put on the bus as we don't
care of the time it takes to complete the request until the time the
request can be fulfilled. This can be a example of i2c/spi memory
read.

- Ensuring maximum time in which the request needs to be serviced.
There may be hard requirements, like in case for DVFS from
scheduler's hot path (which is essential for better working of the
overall system). And for such a case the timeout is placed at the
right place IMO, i.e. right after a request is submitted to mailbox.

And some more points I wanted to share..

- I am not sure I understood the *serializing* part you guys were
talking about. I believe mailbox framework is already serializing
the requests it is receiving on a single channel with a spin lock,
right ? Why does the client need to serialize them as well? Is that
for avoiding timeouts ?

- For me, and Sudeep as well IIUC, the bigger problem isn't that
timeouts are happening and requests are failing (and so changing the
timeout to a bigger value isn't going to fix anything), but the
problem is that it is taking too long (because of the queue of
requests on a channel) for a request to finish after being
submitted. Scheduler doesn't care of the underneath logistics for
example, all it cares for is the time it takes to change the
frequency of a CPU. If you can do it fast enough in a guaranteed
manner, then you can use fast switching, otherwise not.

- The hardware can very well support the case today where this can be
done in parallel and (almost) in a guaranteed time-frame. While the
software wants to add a limit to that and so wants to serialize
requests.

- As many people have already suggested it (like me, Sudeep, Rob,
maybe Bjorn as well), it seems silly to not allow driving the h/w in
the most efficient way possible (and allow fast cpu switching in
this case).

> Interesting logs ! The time taken to complete _successful_ requests
> are arguably better in bad_trace ... there are many <10usec responses
> in bad_trace, while the fastest response in good_trace is 53usec.

Indeed this is interesting. It may be worth looking (separately) into
why don't we see those 3 us long requests anymore, or maybe they were
just not there in the logs.

> And the requests that 'fail/timeout' are purely the result of not
> serialising them or checkout for timeout at wrong place as explained
> above.

We can't allow for the requests to go on for ever in some cases, while
in other cases it may be absolutely fine.

--
viresh