Re: [PATCH v2 1/2] i2c: tegra: Better handle case where CPU0 is busy for a long time

From: Jon Hunter
Date: Fri Apr 24 2020 - 03:10:34 EST



On 23/04/2020 17:33, Dmitry Osipenko wrote:
> 23.04.2020 13:56, Jon Hunter ÐÐÑÐÑ:
>>
>> On 22/04/2020 15:07, Dmitry Osipenko wrote:
>>
>> ...
>>
>>>> So I think that part of the problem already existed prior to these
>>>> patches. Without your patches I see ...
>>>>
>>>> [ 59.543528] tegra-i2c 7000d000.i2c: i2c transfer timed out
>>>
>>> Does this I2C timeout happen with my patches? Could you please post full
>>> logs of an older and the recent kernel versions?
>>
>> I believe that it does, but I need to check.
>>
>>>> [ 59.549036] vdd_sata,avdd_plle: failed to disable
>>>>
>>>> [ 59.553778] Failed to disable avdd-plle: -110
>>>>
>>>> [ 59.558150] tegra-pcie 3000.pcie: failed to disable regulators: -110
>>>>
>>>>
>>>> However, now with your patches the i2c is failing to resume and this
>>>> breaks system suspend completely for Tegra30. So far this is the only
>>>> board that appears to break.
>>>>
>>>> So the above issue needs to be fixed and I will chat to Thierry about this.
>>>
>>> Okay
>>
>> So I have been looking at this some more and this starting to all look a
>> bit of a mess :-(
>>
>> Firstly, the Tegra PCI driver suspends late (noirq) as we know. The PCI
>> driver will warn if it cannot disable the regulators when suspending but
>> does not actually fail suspend. So this warning is just indicating that
>> we were unable to disable the regulators.
>>
>> Now I don't see that we can ever disable the PCI regulators currently
>> when entering suspend because ...
>>
>> 1. We are in the noirq phase and so we will not get the completion
>> interrupt for the I2C transfer. I know that you recently added the
>> atomic I2C transfer support, but we can get the regulator framework
>> to use this (I have not looked in much detail so far).
>
> That's not good :) I didn't realize that *all* interrupts of every
> device are disabled before .noirq is invoked. It appeared to me that the
> IRQs disabling and .noirq invocation is performed for the drivers one
> after another, but now I see that it's not true.
>
> https://elixir.bootlin.com/linux/v5.7-rc2/source/drivers/base/power/main.c#L1446
>
>> 2. Even if the regulator framework supported atomic I2C transfers, we
>> also have the problem that the I2C controller could be runtime-
>> suspended and pm_runtime_get_sync() will not work during during this
>> phase to resume it correctly. This is a problem that needs to be
>> fixed indeed!
>
> Could you please clarify why pm_runtime_get_sync() can't be used by the
> I2C driver's in NOIRQ phase?

Yes take a look at commit 1e2ef05bb8cf ("PM: Limit race conditions
between runtime PM and system sleep (v2)").

>> 3. Then we also have the possible dependency on the DMA driver that is
>> suspended during the noirq phase.
>
> Yes, this is correct.
>
> Again, some regulator drivers may do something on suspend too, although
> looks like the current upstream Tegra devices are not affected by this
> potential problem.
>
>> It could be argued that if the PCI regulators are never turned off
>> (currently) then we should not even bother and this will likely resolve
>> this for now. However, really we should try to fix that correctly.
>
> Yes, keeping PCI regulators always-enabled should be a good immediate
> solution.

I was thinking about that, and I am not sure it is. I don't think that
the failure to send the I2C command should break suspend.

> Also, the RPM's system suspend/resume needs to fixed for the pci-tegra
> driver, like I already suggested before.

Yes but I don't think that is the cause of the issue in this particular
case.

>> What I still don't understand is why your patch breaks resume. Even if
>> the I2C transfer fails, and is deemed harmless by the client driver, we
>> should still be able to suspend and resume correctly.
>
> If DMA is getting synchronized after DMA driver being suspended, then it
> could be a problem.

So I confirmed that DMA is not the issue in this case. I tested this by
ensuring that DMA is never used. However, it is a potential problem
indeed.

> Could you please try to apply this hunk and see if it makes any
> difference (I'll probably make it as proper patch):

Per my tests, I don't believe that it will as disabling DMA does not
resolve the problem.

> It also could be that there is more than the suspend ordering problem,
> but for now it is hard to tell without having a detailed log which
> includes I2C/DMA/RPM traces.

I have taken a look and I don't see any issues with ordering. I2C is
suspended after PCI. This did not change.

Cheers
Jon

--
nvpublic