Re: [PATCH net] i40e: Enforce software interrupt during busy-poll exit

From: Ivan Vecera
Date: Fri Mar 15 2024 - 05:20:20 EST




On 15. 03. 24 1:53, Jesse Brandeburg wrote:
On 3/13/2024 5:54 AM, Ivan Vecera wrote:
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") I'm seeing the similar issue also with i40e driver.

In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally miss that there are outstanding
descriptors to clean when exiting busy poll.

Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.

The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.

I like the results of this fix! Thanks for working on it.


The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) In i40e_update_enable_itr()
- updates refreshed ITR intervals directly using PFINT_ITRN register
- if we are exiting ordinary poll then just enables the interrupt
using PFINT_DYN_CTLN
- if we are exiting busy poll then enables the interrupt and
additionally triggers an immediate software interrupt to catch any
pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.

This is a good idea.


@@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
*/
if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
+ q_vector->rx.target_itr >> 1);

so here you write (this is a new write)

q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
@@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- intval = i40e_buildreg_itr(I40E_TX_ITR,
- q_vector->tx.target_itr);
+ wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
+ q_vector->tx.target_itr >> 1);
q_vector->tx.current_itr = q_vector->tx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
/* Rx ITR needs to be increased, third priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
+ q_vector->rx.target_itr >> 1);

or here (new write)

q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* No ITR update, lowest priority */
- intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
if (q_vector->itr_countdown)
q_vector->itr_countdown--;
}
- if (!test_bit(__I40E_VSI_DOWN, vsi->state))
- wr32(hw, INTREG(q_vector->reg_idx), intval);

The above used to be the *only* write.

+ /* Do not enable interrupt if VSI is down */
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ if (!q_vector->in_busy_poll) {
+ intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
+ } else {
+ q_vector->in_busy_poll = false;
+ intval = i40e_buildreg_swint(I40E_SW_ITR);
+ }
+ wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);

and then you write again here.

So this function will now regularly have two writes in hot-path. Before
it was very carefully crafted to reduce the number of writes to 1.

This is made possible because the PFINT_DYN_CTLN register can do
multiple tasks at once with a single write.

Can you just modify intval to *both* trigger a software interrupt, and
update the ITR simultaneously? I'm really not sure that's even possible.

It may make more sense to only do the second write when exiting busy
poll, what do you think?

Yeah, you are right, we can eliminate these two writes by one and also for busy-poll exit. I'm setting up ITR2_IDX rate during MSI-X initialization and as this is fixed we do not need to update it everytime in i40e_update_enable_itr().

Per datasheet the PFINT_DYN_CTLN value can be encoded to do the following at once:
- enable interrupt
- update interval for particular ITR index
- software interrupt trigger limited by interval of different ITR index

Will prepare, test and submit v3 with this change.

Thanks,
Ivan