Re: [PATCH 6/6] soundwire: qcom: do not probe devices before bus/link init

From: Pierre-Louis Bossart
Date: Mon May 01 2023 - 10:01:07 EST




On 5/1/23 07:24, Krzysztof Kozlowski wrote:
> On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
>>
>>
>> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>>> Soundwire devices are supposed to be kept in reset state (powered off)
>>> till their probe() or component bind() callbacks. However if they are
>>> already powered on, then they might enumerate before the master
>>> initializes bus in qcom_swrm_init() leading to occasional errors like:
>>
>> The problem statement is really hard to follow.
>>
>> The peripheral can only be enumerated AFTER
>> a) the manager starts the bus clock and transmitting PING frames
>> b) the peripheral detects the sync words for 16 frames in a row.
>> c) the peripheral reports as Attached in the Device0 slot
>>
>> That sequence holds whether the manager does the enumeration manually or
>> relies on hardware-assisted autoenumeration. This is what the spec requires.
>>
>> So why can't the bus clock start be controlled by the manager driver,
>> and started once all required initializations are done?
>>
>> I mean, there's got to be some sort of parent-child hierarchy with
>> manager first, peripheral(s) second, I don't get how these steps could
>> be inverted or race.
>>
>>> qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
>>> wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
>>> wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
>>> qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
>>>
>>> The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
>>> 1. request_threaded_irq()
>>> 2. sdw_bus_master_add() - which will cause probe() and component bind()
>>> of Soundwire devices, e.g. WCD938x codec drivers. Device drivers
>>> might already start accessing their registers.
>>
>> not if the bus clock hasn't started...
>>
>>> 3. qcom_swrm_init() - which initializes the link/bus and enables
>>> interrupts.
>>
>> if you can move the clock start in 3) then problem solved. Why can't
>> this be done?
>
> Responding to all your three responses:
> The clock is enabled in this 3. qcom_swrm_init(), so the old code to my
> knowledge is written exactly how you expect.
>
> However even with stopped clock, the device enumerates at
> sdw_bus_master_add(), before anything is enabled.

Erm, that's not physically possible...

The peripheral can report as attached and be enumerated by the manager,
i.e. assigned a non-zero "Device Number" after the peripheral
synchronizes on 16 frames with valid static and dynamic syncwords. That
can only happen if there is a clock toggling and PING frames transmitted
on the data line.

There's something else at play here.

> I also checked the reset values of these registers - clock is off after
> reset. Assuming of course I look at correct clock registers... but I
> have only one.
>
>>
>>> Any access to device registers at (2) above, will fail because link/bus
>>> is not yet initialized.
>>>
>>> However the fix is not as simple as moving qcom_swrm_init() before
>>> sdw_bus_master_add(), because this will cause early interrupt of new
>>> slave attached. The interrupt handler expects bus master (ctrl->bus.md)
>>> to be allocated, so this would lead to NULL pointer exception.
>>>
>>> Rework the init sequence and change the interrupt handler. The correct
>>> sequence fixing accessing device registers before link init is now:
>>> 1. qcom_swrm_init()
>>> 2. request_threaded_irq()
>>> 3. sdw_bus_master_add()
>>> which still might cause early interrupts, if Soundwire devices are not
>>> in powered off state before their probe. This early interrupt issue is
>>
>> You'd need to clarify in which step the bus clock starts. In general,
>> you want to clock started last.
>
> Clock is enabled in qcom_swrm_init() step, but as I wrote above, it
> looks like it does not matter for enumeration.

without a clock you can't have any enumeration.


>>> + * from reset by its probe() or bind() function, as a result of
>>> + * sdw_bus_master_add().
>>> + * Add a simple check to avoid NULL pointer except on early interrupts.
>>> + * Note that if this condition happens, the slave device will not be
>>> + * enumerated. Its driver should be fixed.
>>
>> ???
>>
>> The codec driver is NEVER involved in enumeration.
>
> If the device stays in power down, only the driver bind can bring it on.
> enumeration won't happen when device is powered down, right?

The codec driver can indeed control the codec power with sideband links
- i.e. not with SoundWire but gpios/I2C/SPI, etc. - and it could very
well prevent the codec hardware from showing up on the bus until TBD
platform-specific criteria are met.

But that's not really taking part in the enumeration process, rather
gating the enumeration process.

>> The only thing a codec driver should do is provide a callback to be
>> notified of a status change for additional initialization, but the
>> enumeration can be done even in the absence of a codec driver.
>>
>> The proof in the pudding is that you can 'blacklist' a codec driver and
>> bind it later, after the hardware is enumerated. You can even unbind a
>> codec driver and nothing bad will happen (we hardened that sequence last
>> year).
>>
>> probe != enumeration != initialization for SoundWire codecs.
>>
>> Probe and enumeration can happen in any order
>> Initialization can only happen after both probe and enumeration happened.
>
> I am speaking here about component_master_ops->bind() callback.

That's on the manager side, I really don't see how this is related to
the codec?

>>> + * removing Soundwire bus master.
>>> + */
>>> + if (ctrl->version < SWRM_VERSION_2_0_0)
>>> + ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
>>> + 0);
>>> + if (ctrl->mmio)
>>> + ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
>>> + 0);
>>> +
>>> + cancel_delayed_work_sync(&ctrl->new_slave_work);
>>> sdw_bus_master_delete(&ctrl->bus);
>>> clk_disable_unprepare(ctrl->hclk);
>>
>> should the last two be inverted? Keeping a clock running while removing
>> stuff is asking for trouble.

actually it doesn't really matter, if the interrupts are disabled first
and you wait for in-flight work to be done. Ignore this comment.
>
> It is a reversed probe(), which is usually correct. Do you expect
> probe() like:
>
> clk_enable
> sdw_bus_master_add

it's likely the other way around,

probe():
sdw_bus_master_add
clk_enable

assuming that clk_enable() starts the bus - not sure it does based on
the answers above.