RE: liquidio vs smp_call_function_single_async()

From: Derek Chickles
Date: Thu Jun 11 2020 - 17:49:22 EST


> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Sent: Monday, June 8, 2020 6:05 AM
> To: Derek Chickles <dchickles@xxxxxxxxxxx>; Satananda Burla
> <sburla@xxxxxxxxxxx>; Felix Manlunas <fmanlunas@xxxxxxxxxxx>
> Cc: frederic@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx
> Subject: liquidio vs smp_call_function_single_async()
>
> Hi,
>
> I'm going through the smp_call_function_single_async() users, and stumbled
> over your liquidio thingy. It does:
>
> call_single_data_t *csd = &droq->csd;
>
> csd->func = napi_schedule_wrapper;
> csd->info = &droq->napi;
> csd->flags = 0;
>
> smp_call_function_single_async(droq->cpu_id, csd);
>
> which is almost certainly a bug. What guarantees that csd is unused when
> you do this? What happens, if the remote CPU is already running RX and
> consumes the packets before the IPI lands, and then this CPU gets another
> interrupt.
>
> AFAICT you then call this thing again, causing list corruption.

Hi Peter,

I think you're right that this might be a functional bug, but it won't cause list
corruption. We don't rely on the IPI to process packets; only to move NAPI
processing to another CPU. There are separate register counters that indicate
if and how many new packets have arrived, that will be re-read once it
executes.

I think a patch to check if NAPI is already scheduled would address the
unexpected rescheduling issue here. Otherwise, it can probably live as is,
as there is no harm.

Thanks,
Derek