On Wed, Mar 13, 2024 at 10:24:08AM -0700, Abhinav Kumar wrote:
On 3/13/2024 1:18 AM, Johan Hovold wrote:
Right, but your proposed fix would not actually fix anything and judging
from the sparse commit message and diff itself it is clearly only meant
to mitigate the case where user space is involved, which is *not* the
case here.
There can be a race condition between the time the DP driver gets the
hpd disconnect event and when the hpd thread processes that event
allowing the commit to sneak in. This is something which has always been
there even without pm_runtime series and remains even today.
In this race condition, the setting of "link_ready" to false can be a
bit delayed if we go through the HPD event processing increasing the
race condition window.
If link_ready is false, atomic_check() fails, thereby failing any
commits and hence not allowing the atomic_disable() / atomic_enable()
cycle and hence avoiding this reset.
The patch is moving the setting of link_ready to false earlier by not
putting it through the HPD event thread and hence trying to reduce the
window of the issue.
Perhaps I'm missing something in the race that you are trying to
describe (and which I've asked you to describe in more detail so that I
don't have to spend more time trying to come up with a reproducer
myself).
I do understand how your patch works, but my point is that it does
not fix the race that we are hitting on sc8280xp and, unless I'm missing
something, it is not even sufficient to fix the race you are talking
about as user space can still trigger that ioctl() before you clear the
link_ready flag.
That's why I said that it is only papering over the issue by making the
race window smaller (and this should also be highlighted in the commit
message).
For some reason it also made things worse on sc8280xp, but I did not
spend time on tracking down exactly why.
Johan