Re: [PATCH] ad_sigma_delta: fix race between IRQ and completion

From: Jonathan Cameron
Date: Fri Dec 23 2022 - 11:04:25 EST


On Mon, 19 Dec 2022 20:48:46 +1300
Daniel Beer <dlbeer@xxxxxxxxx> wrote:

> ad_sigma_delta waits for a conversion which terminates with the firing
> of a one-shot IRQ handler. In this handler, the interrupt is disabled
> and a completion is set.
>
> Meanwhile, the thread that initiated the conversion is waiting on the
> completion to know when the conversion happened. If this wait times out,
> the conversion is aborted and IRQs are disabled. But the IRQ may fire
> anyway between the time the completion wait times out and the disabling
> of interrupts. If this occurs, we get a double-disabled interrupt.

Ouch and good work tracking it down. just to check, did you see this
bug happen in the wild or spotted by code inspection?

Given that timeout generally indicates hardware failure, I'm not sure
how critical this is to fix.

I don't think this fix fully closes the race - see inline.

Jonathan

>
> This patch fixes that by wrapping the completion wait in a function that
> handles timeouts correctly by synchronously disabling the interrupt and
> then undoing the damage if it got disabled twice.
>
> Fixes: af3008485ea0 ("iio:adc: Add common code for ADI Sigma Delta devices")
> Cc: Lars-Peter Clausen <lars@xxxxxxxxxx>
> Signed-off-by: Daniel Beer <dlbeer@xxxxxxxxx>
> ---
> drivers/iio/adc/ad_sigma_delta.c | 49 +++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/iio/adc/ad_sigma_delta.c b/drivers/iio/adc/ad_sigma_delta.c
> index d8570f620785..2f1702eeed56 100644
> --- a/drivers/iio/adc/ad_sigma_delta.c
> +++ b/drivers/iio/adc/ad_sigma_delta.c
> @@ -202,6 +202,31 @@ int ad_sd_reset(struct ad_sigma_delta *sigma_delta,
> }
> EXPORT_SYMBOL_NS_GPL(ad_sd_reset, IIO_AD_SIGMA_DELTA);
>
> +static int ad_sd_wait_and_disable(struct ad_sigma_delta *sigma_delta,
> + unsigned long timeout)
> +{
> + const int ret = wait_for_completion_interruptible_timeout(
> + &sigma_delta->completion, timeout);
> +
> + if (!ret) {
> + /* Just because the completion timed out, doesn't mean that the
Multiline comment syntax in IIO is the
/*
* Just...

form.

> + * IRQ didn't fire. It might be in progress right now.
> + */
> + disable_irq(sigma_delta->spi->irq);
> +
> + /* The IRQ handler may have run after all. If that happened,

Same for this comment.

> + * then we will now have double-disabled the IRQ, and irq_dis
> + * will be true (having been set in the handler).
> + */
> + if (sigma_delta->irq_dis)
> + enable_irq(sigma_delta->spi->irq);
> + else
> + sigma_delta->irq_dis = true;

I'd set this unconditionally. It might already be set, but that shouldn't
be a problem.

Is this fix sufficient? If the interrupt is being handled on a different
CPU to the caller of this function, I think we can still race enough that
this fails to fix it up. Might need a spinlock to prevent that.

CPU 0 CPU 1
ad_sd_data_rdy_trig_poll() ad_sd_wait_and_disable()

//wait_for_completion ends

Interrupt
disable_irq()
if (sigma-delta->irq_dis) !true
else
sigma_delta->irq_dis = true

disable_irq_nosync(irq)
sigma_delta->irq_dis = true;

So we still end up with a doubly disabled irq. Add a spinlock to make the
disable and the setting of sigma_delta->irq_dis atomic then it should all be fine.



> + }
> +
> + return ret;
> +}
> +
> int ad_sd_calibrate(struct ad_sigma_delta *sigma_delta,
> unsigned int mode, unsigned int channel)
> {
> @@ -223,14 +248,11 @@ int ad_sd_calibrate(struct ad_sigma_delta *sigma_delta,
>
> sigma_delta->irq_dis = false;
> enable_irq(sigma_delta->spi->irq);
> - timeout = wait_for_completion_timeout(&sigma_delta->completion, 2 * HZ);
> - if (timeout == 0) {
> - sigma_delta->irq_dis = true;
> - disable_irq_nosync(sigma_delta->spi->irq);
> + timeout = ad_sd_wait_and_disable(sigma_delta, 2 * HZ);
> + if (timeout == 0)
> ret = -EIO;
> - } else {
> + else
> ret = 0;
> - }
> out:
> sigma_delta->keep_cs_asserted = false;
> ad_sigma_delta_set_mode(sigma_delta, AD_SD_MODE_IDLE);
> @@ -296,8 +318,7 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
>
> sigma_delta->irq_dis = false;
> enable_irq(sigma_delta->spi->irq);
> - ret = wait_for_completion_interruptible_timeout(
> - &sigma_delta->completion, HZ);
> + ret = ad_sd_wait_and_disable(sigma_delta, HZ);
>
> if (ret == 0)
> ret = -EIO;
> @@ -314,11 +335,6 @@ int ad_sigma_delta_single_conversion(struct iio_dev *indio_dev,
> &raw_sample);
>
> out:
> - if (!sigma_delta->irq_dis) {
> - disable_irq_nosync(sigma_delta->spi->irq);
> - sigma_delta->irq_dis = true;
> - }
> -
> sigma_delta->keep_cs_asserted = false;
> ad_sigma_delta_set_mode(sigma_delta, AD_SD_MODE_IDLE);
> sigma_delta->bus_locked = false;
> @@ -411,12 +427,7 @@ static int ad_sd_buffer_postdisable(struct iio_dev *indio_dev)
> struct ad_sigma_delta *sigma_delta = iio_device_get_drvdata(indio_dev);
>
> reinit_completion(&sigma_delta->completion);
> - wait_for_completion_timeout(&sigma_delta->completion, HZ);
> -
> - if (!sigma_delta->irq_dis) {
> - disable_irq_nosync(sigma_delta->spi->irq);
> - sigma_delta->irq_dis = true;
> - }
> + ad_sd_wait_and_disable(sigma_delta, HZ);
>
> sigma_delta->keep_cs_asserted = false;
> ad_sigma_delta_set_mode(sigma_delta, AD_SD_MODE_IDLE);