Re: [PATCH v2] i2c: designware: Fix corrupted memory seen in the ISR

From: Serge Semin
Date: Tue Sep 19 2023 - 17:06:00 EST


Hi all

On Tue, Sep 19, 2023 at 11:54:10AM -0700, Jan Bottorff wrote:
> On 9/19/2023 7:51 AM, Catalin Marinas wrote:
> >
> > While smp_* is ok, it really depends on what the regmap_write() does. Is
> > it a write to a shared peripheral (if not, you may need a DSB)? Does the
> > regmap_write() caller know this? That's why I think having the barrier
> > in dw_reg_write() is better.
> >
> > If you do want to stick to a fix in i2c_dw_xfer_init(), you could go for
> > dma_wmb(). While this is not strictly DMA, it's sharing data with
> > another coherent agent (a different CPU in this instance). The smp_wmb()
> > is more about communication via memory not involving I/O. But this still
> > assumes that the caller knows regmap_write() ends up with an I/O
> > write*() (potentially relaxed).

Catalin, thank you very much for your messages. The situation is much
clearer now. I should have noted that we indeed talking about
different memory types: Normal and Device memories. Anyway to sum it up
AFAICS the next situation is happening:
1. some data is updated,
2. IRQ is activated by means of writel_relaxed() to the
DW_IC_INTR_MASK register.
3. IRQ is raised and being handled, but the data updated in 1. looked
as corrupted.

(Jan, correct me if I'm wrong.)

Since ARM doesn't "guarantee ordering between memory accesses to
different devices, or usually between accesses of different memory
types", most likely the CPU core changes 1. and 2. places
occasionally. So in that case the IRQ handler just doesn't see the
updated data. What needs to be done is to make sure that 2. always
happens after 1. is completed. Outer Shareability domain write-barrier
(DMB(oshst)) does that. Am I right? That's why it's utilized in the
__io_bw() and __dma_wmb() macros implementation.

Wolfram, Jan seeing the root cause of the problem is in using the
'_relaxed' accessors for the controller CSRs I assume that the problem
might be more generic than just for ARMs, since based on [1] these
accessors "do not guarantee ordering with respect to locking, _normal_
memory accesses or delay() loops." So theoretically the problem might
get to be met on any other arch if it implements the semantic with the
relaxed normal and device memory accesses execution.

[1] "KERNEL I/O BARRIER EFFECTS" Documentation/memory-barriers.txt

If so we need to have give up from using the _relaxed accessors at for
the CSRs which may cause a side effect like IRQ raising. Instead the
normal IO write should be utilized which "wait for the completion of
all prior writes to memory either issued by, or propagated to, the
same thread." [1] Thus I'd suggest the next fix for the problem:

--- a/drivers/i2c/busses/i2c-designware-common.c
+++ b/drivers/i2c/busses/i2c-designware-common.c
@@ -72,7 +72,10 @@ static int dw_reg_write(void *context, unsigned int reg, unsigned int val)
{
struct dw_i2c_dev *dev = context;

- writel_relaxed(val, dev->base + reg);
+ if (reg == DW_IC_INTR_MASK)
+ writel(val, dev->base + reg);
+ else
+ writel_relaxed(val, dev->base + reg);

return 0;
}

(and similar changes for dw_reg_write_swab() and dw_reg_write_word().)

What do you think?

>
> If we wanted maximum correctness wouldn't we need something like
> writel_triggers_interrupt/regmap_write_triggers_interrupt or maybe
> preinterrupt_wmb?
>
> The ARM docs do have a specific example case where the device write triggers
> an interrupt, and that example specifically says a DSB barrier is needed.
>
> If I look at the ARM GIC IPI send function gic_ipi_send_mask in https://elixir.bootlin.com/linux/v6.6-rc2/source/drivers/irqchip/irq-gic-v3.c#L1354
> is says:
>
> /*
> * Ensure that stores to Normal memory are visible to the
> * other CPUs before issuing the IPI.
> */
> dsb(ishst);
>
> I would think the IPI send code is very carefully tuned for performance, and
> would not use a barrier any stronger than required.
>
> I believe dma_wmb maps to DMB on ARM64.

Jan. Yes, but it looks like DMB() for the Outer-shareable domains
should be enough (see my comment above). DSB() seems like overkill
here. We don't raise IPIs or use mailboxes in this case, but merely
update the CSR flags.

-Serge(y)

>
> - Jan
>
>
>
>
>
>