Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy for a long time

From: Dmitry Osipenko
Date: Fri Apr 24 2020 - 10:46:00 EST


24.04.2020 10:10, Jon Hunter ÐÐÑÐÑ:
...
>> Could you please clarify why pm_runtime_get_sync() can't be used by the
>> I2C driver's in NOIRQ phase?
>
> Yes take a look at commit 1e2ef05bb8cf ("PM: Limit race conditions
> between runtime PM and system sleep (v2)").

I2C driver now uses irq-safe RPM since ede2299f7 ("i2c: tegra: Support
atomic transfers"), and thus, the RPM's workqueue shouldn't be a
problem. I guess RPM should work fine in this case, don't you think so?

...
>> Yes, keeping PCI regulators always-enabled should be a good immediate
>> solution.
>
> I was thinking about that, and I am not sure it is. I don't think that
> the failure to send the I2C command should break suspend.

It shouldn't, but looks like it should be a separate problem.

....
> So I confirmed that DMA is not the issue in this case. I tested this by
> ensuring that DMA is never used. However, it is a potential problem
> indeed.
>
>> Could you please try to apply this hunk and see if it makes any
>> difference (I'll probably make it as proper patch):
>
> Per my tests, I don't believe that it will as disabling DMA does not
> resolve the problem.
>
>> It also could be that there is more than the suspend ordering problem,
>> but for now it is hard to tell without having a detailed log which
>> includes I2C/DMA/RPM traces.
>
> I have taken a look and I don't see any issues with ordering. I2C is
> suspended after PCI. This did not change.

Do you see a "completion done after timeout" messages in the KMSG log of
the v5.6 kernel?

Could you please try this hunk? Although, I'll be surprised if it
changes anything.

--- >8 ---
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 36d7114823ce..7196084b15fd 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -1028,6 +1028,13 @@ tegra_i2c_wait_completion_timeout(struct
tegra_i2c_dev *i2c_dev,
msecs_to_jiffies(timeout_ms));
disable_irq(i2c_dev->irq);

+ /*
+ * There is a chance that completion may happen after IRQ
+ * synchronization, which is done by disable_irq().
+ */
+ if (ret == 0 && completion_done(complete))
+ ret = 1;
+
/*
* Under some rare circumstances (like running KASAN +
* NFS root) CPU, which handles interrupt, may stuck in
--- >8 ---