Re: [PATCH v10 54/55] Input: atmel_mxt_ts: Implement synchronization during various operation

From: Wang, Jiada
Date: Mon Apr 06 2020 - 00:19:11 EST


Hi Dmitry

On 2020/04/02 23:00, Dmitry Osipenko wrote:
02.04.2020 16:24, Dmitry Osipenko ÐÐÑÐÑ:
02.04.2020 14:50, Wang, Jiada ÐÐÑÐÑ:
Hi Dmitry

On 2020/04/02 1:04, Dmitry Osipenko wrote:
31.03.2020 13:50, Jiada Wang ÐÐÑÐÑ:
From: Sanjeev Chugh <sanjeev_chugh@xxxxxxxxxx>

There could be scope of race conditions when sysfs is being handled
and at the same time, device removal is occurring. For example,
we don't want the device removal to begin if the Atmel device
cfg update is going on or firmware update is going on. In such
cases, wait for device update to be completed before the removal
continues.

ÂÂÂÂ ThreadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Thread 2:
=========================
=========================
mxt_update_fw_store()ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mxt_remove()
mutex_lock(&data->lock)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ...
mxt_initialize()ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ //Tries to acquire lock
ÂÂ request_firmware_nowait()ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mutex_lock(&data->lock)
...ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ==>waits for lock()
...ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ .
...ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ .
mutex_unlock(&data->lock)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ .
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ //Gets lock and
proceeds
mxt_free_input_device();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ...
mutex_unlock(&data->lock)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ //Frees atmel driver
data
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kfree(data)

If the request_firmware_nowait() completes after the driver removal,
and callback is triggered. But kernel crashes since the module is
already removed.

This commit adds state machine to serialize such scenarios.

Won't it be easier to bump driver's module use-count by __module_get()
while firmware is updating? Or remove sysfs during of mxt_remove()? >

thanks for your inspiration, I will replace state machine with module
use-count.

I'm actually now thinking that the suggestion about the module-count
wasn't very correct because this won't really help in regards to
mxt_update_fw_store() / mxt_remove() racing.

I see that mxt_remove() already invokes the mxt_sysfs_remove(), which
should block until mxt_update_fw_store() is completed, shouldn't it?

I guess the kfree(data) isn't the real cause of the problem and
something like this should help:

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c
b/drivers/input/touchscreen/atmel_mxt_ts.c
index b2edf51e1595..4e66106feeb9 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -4254,6 +4254,7 @@ static void mxt_sysfs_remove(struct mxt_data *data)
struct i2c_client *client = data->client;

sysfs_remove_group(&client->dev.kobj, &mxt_attr_group);
+ sysfs_remove_group(&client->dev.kobj, &mxt_fw_attr_group);
}

static void mxt_reset_slots(struct mxt_data *data)
@@ -4649,31 +4650,19 @@ static int mxt_remove(struct i2c_client *client)
{
struct mxt_data *data = i2c_get_clientdata(client);

- mutex_lock(&data->lock);
- if (data->e_state == MXT_STATE_UPDATING_CONFIG_ASYNC ||
- data->e_state == MXT_STATE_UPDATING_CONFIG) {
- data->e_state = MXT_STATE_GOING_AWAY;
- mutex_unlock(&data->lock);
- mxt_wait_for_completion(data, &data->update_cfg_completion,
- MXT_CONFIG_TIMEOUT);
- } else {
- data->e_state = MXT_STATE_GOING_AWAY;
- mutex_unlock(&data->lock);
- }
+ mxt_sysfs_remove(data);

- disable_irq(data->irq);
- sysfs_remove_group(&client->dev.kobj, &mxt_fw_attr_group);
if (data->reset_gpio) {
sysfs_remove_link(&client->dev.kobj, "reset");
gpiod_unexport(data->reset_gpio);
}
+
mxt_debug_msg_remove(data);
- mxt_sysfs_remove(data);
mxt_free_input_device(data);
mxt_free_object_table(data);

if (debug_state)
cancel_delayed_work_sync(&data->watchdog_work);
+ disable_irq(data->irq);

return 0;
}


I'm looking at this again and the original tear-down order of the
mxt_remove() looks okay, so no need to change it.

Reading the commit message, it says that request_firmware_nowait() races
with kfree(data), but that can't happen because the data is
resource-managed and request_firmware_nowait() bumps device's use-count.

https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/firmware_loader/main.c#L1043

I think this patch was ported from some very old kernel version and it's
simply not applicable to the upstream anymore.


I had some test,
and confirmed you are right,
this commit is no longer applicable to upstream,
but as discussed in another patch,
disable_irq() need to be moved after remove of mxt_fw_attr_group.
I will add this change in a new commit.

Thanks,
Jiada