[RFC PATCH net] sfc: Fix use-after-free due to selftest_work

From: Ding Hui
Date: Tue Apr 11 2023 - 21:00:29 EST


There is a use-after-free scenario that is:

When netif_running() is false, user set mac address or vlan tag to VF,
the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop()
and efx_net_open(), since netif_running() is false, the port will not
start and keep port_enabled false, but selftest_worker is scheduled
in efx_net_open().

If we remove the device before selftest_worker run, the efx is freed,
then we will get a UAF in run_timer_softirq() like this:

[ 1178.907941] ==================================================================
[ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90
[ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0
[ 1178.907950]
[ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022
[ 1178.907955] Call Trace:
[ 1178.907956] <IRQ>
[ 1178.907960] dump_stack+0x71/0xab
[ 1178.907963] print_address_description+0x6b/0x290
[ 1178.907965] ? run_timer_softirq+0xdea/0xe90
[ 1178.907967] kasan_report+0x14a/0x2b0
[ 1178.907968] run_timer_softirq+0xdea/0xe90
[ 1178.907971] ? init_timer_key+0x170/0x170
[ 1178.907973] ? hrtimer_cancel+0x20/0x20
[ 1178.907976] ? sched_clock+0x5/0x10
[ 1178.907978] ? sched_clock_cpu+0x18/0x170
[ 1178.907981] __do_softirq+0x1c8/0x5fa
[ 1178.907985] irq_exit+0x213/0x240
[ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330
[ 1178.907989] apic_timer_interrupt+0xf/0x20
[ 1178.907990] </IRQ>
[ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370

I am thinking about several ways to fix the issue:

[1] In this RFC, I cancel the selftest_worker unconditionally in
efx_pci_remove().

[2] Add a test condition, only invoke efx_selftest_async_start() when
efx->port_enabled is true in efx_net_open().

[3] Move invoking efx_selftest_async_start() from efx_net_open() to
efx_start_all() or efx_start_port(), that matching cancel action in
efx_stop_port().

[4] However, I also notice that in efx_ef10_set_mac_address(), the
efx_net_open() depends on original port_enabled, but others are not,
if we change all efx_net_open() depends on old state like
efx_ef10_set_mac_address() does, the UAF can also be fixed in theory.

But I'm not sure which is better, is there any suggestions? Thanks.

Signed-off-by: Ding Hui <dinghui@xxxxxxxxxxxxxx>
---
drivers/net/ethernet/sfc/efx.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 884d8d168862..dd0b2363eed1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -876,6 +876,8 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx->state = STATE_UNINIT;
rtnl_unlock();

+ efx_selftest_async_cancel(efx);
+
if (efx->type->sriov_fini)
efx->type->sriov_fini(efx);

--
2.17.1