RE: Re: [PATCH net-next v2 6/9] net: phy: add backplane kr driver support

From: Florinel Iordache
Date: Mon Apr 27 2020 - 08:40:43 EST


> > +/* Backplane mutex between all KR PHY threads */ static struct mutex
> > +backplane_lock;
>
>
> > +/* Read AN Link Status */
> > +static int is_an_link_up(struct phy_device *phydev) {
> > + struct backplane_device *bpdev = phydev->priv;
> > + int ret, val = 0;
> > +
> > + mutex_lock(&bpdev->bpphy_lock);
>
> Last time i asked the question about how this mutex and the phy mutex interact.
> I don't remember seeing an answer.
>
> Andrew

Hi Andrew,
Yes, your question was:
<<How does this mutex interact with phydev->lock? It appears both are trying to do the same thing, serialise access to the PHY hardware.>>
The answer is: yes, you are right, they both are protecting the critical section related to accessing the PHY hardware for a particular PHY.
As you can see the backplane device (bpdev) has associated one phy_device (phydev) so bpdev->bpphy_lock and phydev->lock are equivalent.
Normally your assumption is correct: backplane driver should use the same phydev->lock but there is the following problem:
Backplane driver needs to protect all accesses to a PHY hardware including the ones coming from backplane scheduled workqueues for all lanes within a PHY.
But phydev->lock is already acquired for a phy_device (from phy.c) before each phy_driver callback is called (e.g.: config_aneg, suspend, ...)
So if I would use phydev->lock instead of bpdev->bpphy_lock then this would result in a deadlock when it is called from phy_driver callbacks.
However a possible solution would be to remove all these locks using bpphy_lock and use instead only one phydev->lock in backplane kr state machine: (bp_kr_state_machine).
But this solution will result in poorer performance, the training total duration will increase because only one single lane can enter the training procedure at a time therefore it would be possible for multi-lane phy training to ultimately fail because training is not finished in under 500ms. So I wanted to avoid this loss of training performance.
Yet another possible solution would be to keep the locks where they are, at the lowest level exactly at phy_read/write_mmd calls, in order to allow lanes training running in parallel, but use instead the phydev->lock as would be normal to be and according to your suggestion.
But in this case I must avoid the deadlock I mentioned above by differentiating between the calls coming from phy_driver callbacks where the phydev->lock is already acquired for this phy_device by the phy framework so the mutex should be skipped in this case and the calls coming from anywhere else (for example from backplane kr state machine) when the phydev->lock was not already acquired for this phy_device and the mutex must be used.
If you agree with this latest solution then I can implement it in next version by using a flag in backplane_device called: 'phy_mutex_already_acquired' or 'skip_phy_mutex' which must be set in all backplane phy_driver callbacks and will be used to skip the locks on phydev->lock used at phy_read/write_mmd calls in these cases.

I'm sorry I have not answered this question the first time when you asked it.
Florin.