Re: [PATCH v2] i2c: designware: Fix corrupted memory seen in the ISR

From: Catalin Marinas
Date: Wed Sep 20 2023 - 07:03:11 EST


On Wed, Sep 20, 2023 at 12:05:50AM +0300, Serge Semin wrote:
> On Tue, Sep 19, 2023 at 11:54:10AM -0700, Jan Bottorff wrote:
> > On 9/19/2023 7:51 AM, Catalin Marinas wrote:
> > > While smp_* is ok, it really depends on what the regmap_write() does. Is
> > > it a write to a shared peripheral (if not, you may need a DSB)? Does the
> > > regmap_write() caller know this? That's why I think having the barrier
> > > in dw_reg_write() is better.
> > >
> > > If you do want to stick to a fix in i2c_dw_xfer_init(), you could go for
> > > dma_wmb(). While this is not strictly DMA, it's sharing data with
> > > another coherent agent (a different CPU in this instance). The smp_wmb()
> > > is more about communication via memory not involving I/O. But this still
> > > assumes that the caller knows regmap_write() ends up with an I/O
> > > write*() (potentially relaxed).
>
> Catalin, thank you very much for your messages. The situation is much
> clearer now. I should have noted that we indeed talking about
> different memory types: Normal and Device memories. Anyway to sum it up
> AFAICS the next situation is happening:
> 1. some data is updated,
> 2. IRQ is activated by means of writel_relaxed() to the
> DW_IC_INTR_MASK register.
> 3. IRQ is raised and being handled, but the data updated in 1. looked
> as corrupted.
>
> (Jan, correct me if I'm wrong.)
>
> Since ARM doesn't "guarantee ordering between memory accesses to
> different devices, or usually between accesses of different memory
> types", most likely the CPU core changes 1. and 2. places
> occasionally. So in that case the IRQ handler just doesn't see the
> updated data. What needs to be done is to make sure that 2. always
> happens after 1. is completed. Outer Shareability domain write-barrier
> (DMB(oshst)) does that. Am I right? That's why it's utilized in the
> __io_bw() and __dma_wmb() macros implementation.

Yes.

> Wolfram, Jan seeing the root cause of the problem is in using the
> '_relaxed' accessors for the controller CSRs I assume that the problem
> might be more generic than just for ARMs, since based on [1] these
> accessors "do not guarantee ordering with respect to locking, _normal_
> memory accesses or delay() loops." So theoretically the problem might
> get to be met on any other arch if it implements the semantic with the
> relaxed normal and device memory accesses execution.
>
> [1] "KERNEL I/O BARRIER EFFECTS" Documentation/memory-barriers.txt
>
> If so we need to have give up from using the _relaxed accessors at for
> the CSRs which may cause a side effect like IRQ raising. Instead the
> normal IO write should be utilized which "wait for the completion of
> all prior writes to memory either issued by, or propagated to, the
> same thread." [1] Thus I'd suggest the next fix for the problem:
>
> --- a/drivers/i2c/busses/i2c-designware-common.c
> +++ b/drivers/i2c/busses/i2c-designware-common.c
> @@ -72,7 +72,10 @@ static int dw_reg_write(void *context, unsigned int reg, unsigned int val)
> {
> struct dw_i2c_dev *dev = context;
>
> - writel_relaxed(val, dev->base + reg);
> + if (reg == DW_IC_INTR_MASK)
> + writel(val, dev->base + reg);
> + else
> + writel_relaxed(val, dev->base + reg);
>
> return 0;
> }
>
> (and similar changes for dw_reg_write_swab() and dw_reg_write_word().)

This should work as well.

--
Catalin