Re: [PATCH 1/2] platform/chrome: skip old cros_ec responses

From: Enric Balletbo i Serra
Date: Fri Apr 10 2020 - 03:38:47 EST


Hi Mat,

On 10/4/20 1:59, Mat King wrote:
> On Thu, Apr 9, 2020 at 4:42 PM Enric Balletbo i Serra
> <enric.balletbo@xxxxxxxxxxxxx> wrote:
>>
>> Hi Mathew,
>>
>> Thanks for your patch.
>>
>> On 8/4/20 20:16, Mathew King wrote:
>>> From: Jett Rink <jettrink@xxxxxxxxxxxx>
>>>
>>> The ISHTP layer can give us old responses that we already gave up on. We
>>> do not want to interpret these old responses as the current response we
>>> are waiting for.
>>>
>>
>> Looking at the code and with the above explanation I am not sure I get what is
>> doing this patch, could you explain a bit more, thanks.
>
> The cros_ish should only have one request in flight at a time, we send
> the request and wait for the response from the ISH. If the ISH is too
> slow to respond we give up on that request and we can send a new
> request. The ISH may still send the response to the request that timed
> out and without this patch we would treat the old response as the
> response to the current command. This is a condition that should not
> normally happen but we have seen it with a bad ISH image. So we just
> add an Id to the request header which is copied into the response
> header when the ISH processes the message to ensure that response is
> for the current request.
>

Could you include that in the commit message. Thanks.

>>
>>> Signed-off-by: Jett Rink <jettrink@xxxxxxxxxxxx>
>>> Signed-off-by: Mathew King <mathewk@xxxxxxxxxxxx>
>>> ---
>>> drivers/platform/chrome/cros_ec_ishtp.c | 32 ++++++++++++++++++-------
>>> 1 file changed, 23 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/platform/chrome/cros_ec_ishtp.c b/drivers/platform/chrome/cros_ec_ishtp.c
>>> index 93a71e93a2f1..6f90deb5cf55 100644
>>> --- a/drivers/platform/chrome/cros_ec_ishtp.c
>>> +++ b/drivers/platform/chrome/cros_ec_ishtp.c
>>> @@ -48,7 +48,8 @@ static const guid_t cros_ish_guid =
>>> struct header {
>>> u8 channel;
>>> u8 status;
>>> - u8 reserved[2];
>>> + u8 id;
>>> + u8 reserved;
>>> } __packed;
>>>
>>> struct cros_ish_out_msg {
>>> @@ -90,6 +91,7 @@ static DECLARE_RWSEM(init_lock);
>>> * data exceeds this value, we log an error.
>>> * @size: Actual size of data received from firmware.
>>> * @error: 0 for success, negative error code for a failure in process_recv().
>>> + * @expected_id: Expected id for response that we are waiting on.
>>> * @received: Set to true on receiving a valid firmware response to host command
>>> * @wait_queue: Wait queue for host to wait for firmware response.
>>> */
>>> @@ -98,6 +100,7 @@ struct response_info {
>>> size_t max_size;
>>> size_t size;
>>> int error;
>>> + u8 expected_id;
>>> bool received;
>>> wait_queue_head_t wait_queue;
>>> };
>>> @@ -162,6 +165,7 @@ static int ish_send(struct ishtp_cl_data *client_data,
>>> u8 *out_msg, size_t out_size,
>>> u8 *in_msg, size_t in_size)
>>> {
>>> + static u8 current_id;
>>> int rv;
>>> struct header *out_hdr = (struct header *)out_msg;
>>> struct ishtp_cl *cros_ish_cl = client_data->cros_ish_cl;
>>> @@ -174,8 +178,11 @@ static int ish_send(struct ishtp_cl_data *client_data,
>>> client_data->response.data = in_msg;
>>> client_data->response.max_size = in_size;
>>> client_data->response.error = 0;
>>> + client_data->response.expected_id = ++current_id;
>>
>> So on every ish_send call this variable is increased in a range 1 to 255 first,
>> then overflows and goes from 0 to 255. Is this what you want to do?
>
> On our internal code review this was switched from a post increment to
> a pre increment so that the variable name made sense. The id itself
> doesn't really matter as long as it is changed every response. We
> could also call it next_id and do a post increment and then we would
> start at 0. Do you have preference?
>

So is like a token (actually a ring of tokens) to make sure the response matches
with the question.

if (response.token != in_msg->hdr.token)
ignore the response

This id dancing (expected_id, current_id, id) confused me a bit. IMO the code
will be more readable simply with token or id for both, the response and the
question, and have a next_token/id. But I don't have a strong preference, so I
will let you the final decision.

>>
>>> client_data->response.received = false;
>>>
>>> + out_hdr->id = client_data->response.expected_id;
>>> +
>>> rv = ishtp_cl_send(cros_ish_cl, out_msg, out_size);
>>> if (rv) {
>>> dev_err(cl_data_to_dev(client_data),
>>> @@ -249,17 +256,23 @@ static void process_recv(struct ishtp_cl *cros_ish_cl,
>>>
>>> switch (in_msg->hdr.channel) {
>>> case CROS_EC_COMMAND:
>>> - /* Sanity check */
>>> - if (!client_data->response.data) {
>>> + if (client_data->response.received) {
>>> dev_err(dev,
>>> - "Receiving buffer is null. Should be allocated by calling function\n");
>>> - client_data->response.error = -EINVAL;
>>> - goto error_wake_up;
>>> + "Previous firmware message not yet processed\n");
>>> + goto end_error;
>>> }
>>>
>>> - if (client_data->response.received) {
>>> + if (client_data->response.expected_id != in_msg->hdr.id) {
>>
>> And here you compare that the response received matches with the message id. I
>> assume the ISH is sending a sequential id on every message?
>
> The ISH will copy the id from the request to the id of the response
>

Ok.

>>
>>> dev_err(dev,
>>> - "Previous firmware message not yet processed\n");
>>> + "Dropping old response id %d\n",
>>> + in_msg->hdr.id);
>>
>>
>> How often this message appears?
>
> Since we have stabilized our ISH code it rarely happens if ever. If it
> does happen it indicates an error in the ISH code or the driver code
> but we still want to attempt to recover gracefully if possible.
>

Still, I am wondering if is a good idea to have this message rate limited
(dev_err_ratelimited) If for some reason the ISH code is bad or losts sync the
kernel can be flooded by this message and being annoying.

>>
>>> + goto end_error;
>>> + }
>>> +
>>> + /* Sanity check */
>>> + if (!client_data->response.data) {
>>> + dev_err(dev,
>>> + "Receiving buffer is null. Should be allocated by calling function\n");
>>> client_data->response.error = -EINVAL;
>>> goto error_wake_up;
>>> }
>>> @@ -289,9 +302,10 @@ static void process_recv(struct ishtp_cl *cros_ish_cl,
>>> memcpy(client_data->response.data,
>>> rb_in_proc->buffer.data, data_len);
>>>
>>> +error_wake_up:
>>> /* Set flag before waking up the caller */
>>> client_data->response.received = true;
>>> -error_wake_up:
>>> +
>>> /* Wake the calling thread */
>>> wake_up_interruptible(&client_data->response.wait_queue);
>>>
>>>

Please resend the patch modifying the commit message and using

'platform/chrome: cros_ec_ishtp: ...'

as title prefix.

Thanks,
Enric