Re: [PATCH v2] i2c: designware: Fix corrupted memory seen in the ISR

From: Jan Bottorff
Date: Mon Sep 25 2023 - 15:39:58 EST


On 9/25/2023 5:54 AM, Serge Semin wrote:
On Wed, Sep 20, 2023 at 12:14:17PM -0700, Jan Bottorff wrote:
On 9/20/2023 6:27 AM, Yann Sionneau wrote:
Hi,

On 20/09/2023 11:08, Wolfram Sang wrote:
same thread." [1] Thus I'd suggest the next fix for the problem:

--- a/drivers/i2c/busses/i2c-designware-common.c
+++ b/drivers/i2c/busses/i2c-designware-common.c
@@ -72,7 +72,10 @@ static int dw_reg_write(void *context,
unsigned int reg, unsigned int val)
  {
      struct dw_i2c_dev *dev = context;
-    writel_relaxed(val, dev->base + reg);
+    if (reg == DW_IC_INTR_MASK)
+        writel(val, dev->base + reg);
+    else
+        writel_relaxed(val, dev->base + reg);
      return 0;
  }

(and similar changes for dw_reg_write_swab() and dw_reg_write_word().)

What do you think?
To me, this looks reasonable and much more what I would have expected as
a result (from a high level point of view). Let's hope it works. I am
optimistic, though...

It works if we make sure all the other register accesses to the
designware i2c IP can't generate IRQ.

Meaning that all register accesses that can trigger an IRQ are enclosed
in between a call to i2c_dw_disable_int() and a call to
regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_MASTER_MASK); or
equivalent.

It seems to be the case, I'm not sure what's the best way to make sure
it will stay that way.

Moreover, maybe writes to IC_ENABLE register should also use the
non-relaxed writel() version?

Since one could do something like:

[ IP is currently disabled ]

1/ enable interrupts in DW_IC_INTR_MASK

2/ update some variable in dev-> structure in DDR

3/ enable the device by writing to IC_ENABLE, thus triggering for
instance the TX_FIFO_EMPTY irq.


It does seem like there are a variety of register write combinations that
could immediately cause an interrupt, so would need a barrier.

My suggestion was based on your fix. If it won't work or if it won't
completely solve the problem, then perhaps one of the next option
shall do it:
1. Add the non-relaxed IO call for the IC_ENABLE CSR too.
2. Completely convert the IO accessors to using the non-relaxed
methods especially seeing Wolfram already noted: "Again, I am all with
Catalin here. Safety first, optimizations a la *_relaxed should be
opt-in."
https://lore.kernel.org/linux-i2c/ZQm2Ydt%2F0jRW4crK@shikoro/
3. Find all the places where the memory writes need to be fully
visible after a subsequent IO-write causing an IRQ raise and just
place dma_wmb() there (though just wmb() would look a bit more
relevant).

IMO in the worst case solution 2. must be enough at least in the
master mode seeing the ISR uses the completion variable to indicate
the cmd execution completion, which also implies the complete memory
barrier. Moreover i2c bus isn't that performant for us to be that much
concerned about the optimizations like the pipeline stalls in between
the MMIO accesses.


I did stress testing for a few days on our processor of the proposed fix that makes writes to DW_IC_INTR_MASK use writel instead of writel_relaxed in dw_reg_write. The problem we were seeing is fixed. On our system, the problem was occurring when many ssif (ipmi over i2c) transfers were done. The stress test was running "ipmitool sdr elist" in a loop. Without the change, multiple errors per day from the driver are seen in the kernel log.

I'm good with a change that just has that one change. Also applying non-relaxed to dw_reg_write_swab and dw_reg_write_word was also suggested for completeness.

Does anybody have concerns about other cases that may not get fixed by this change? We did have hypothetical cases, like with IC_ENABLE, that could have the same issue.

So my next question, is the change to dw_reg_write something that I should write and submit, or should someone else submit something more generalized, like option 2 above? I don't own the i2c driver, I'm just trying to fix one issue on one processor with minimal risk of breaking something. I don't have the broader view of what's optimal for the whole DesignWare i2c driver. I also don't have any way to test changes on other models of processors.