Re: [PATCH net 1/7] net: hns3: fix add VLAN fail issue

From: Paolo Abeni
Date: Thu Nov 02 2023 - 06:32:16 EST


On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
> From: Jian Shen <shenjian15@xxxxxxxxxx>
>
> The hclge_sync_vlan_filter is called in periodic task,
> trying to remove VLAN from vlan_del_fail_bmap. It can
> be concurrence with VLAN adding operation from user.
> So once user failed to delete a VLAN id, and add it
> again soon, it may be removed by the periodic task,
> which may cause the software configuration being
> inconsistent with hardware. So add mutex handling
> to avoid this.
>
> user hns3 driver
>
> periodic task
>
> add vlan 10 ───── hns3_vlan_rx_add_vid │
> │ (suppose success) │
> │ │
> del vlan 10 ───── hns3_vlan_rx_kill_vid │
> │ (suppose fail,add to │
> │ vlan_del_fail_bmap) │
> │ │
> add vlan 10 ───── hns3_vlan_rx_add_vid │
> (suppose success) │
> foreach vlan_del_fail_bmp
> del vlan 10
>
> Fixes: fe4144d47eef ("net: hns3: sync VLAN filter entries when kill VLAN ID failed")
> Signed-off-by: Jian Shen <shenjian15@xxxxxxxxxx>
> Signed-off-by: Jijie Shao <shaojijie@xxxxxxxxxx>
> ---
> .../hisilicon/hns3/hns3pf/hclge_main.c | 21 +++++++++++++------
> .../hisilicon/hns3/hns3vf/hclgevf_main.c | 11 ++++++++--
> 2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> index c42574e29747..a3230ac928a9 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> @@ -10026,8 +10026,6 @@ static void hclge_rm_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
> struct hclge_vport_vlan_cfg *vlan, *tmp;
> struct hclge_dev *hdev = vport->back;
>
> - mutex_lock(&hdev->vport_lock);
> -
> list_for_each_entry_safe(vlan, tmp, &vport->vlan_list, node) {
> if (vlan->vlan_id == vlan_id) {
> if (is_write_tbl && vlan->hd_tbl_status)
> @@ -10042,8 +10040,6 @@ static void hclge_rm_vport_vlan_table(struct hclge_vport *vport, u16 vlan_id,
> break;
> }
> }
> -
> - mutex_unlock(&hdev->vport_lock);
> }
>
> void hclge_rm_vport_all_vlan_table(struct hclge_vport *vport, bool is_del_list)
> @@ -10452,11 +10448,16 @@ int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
> * handle mailbox. Just record the vlan id, and remove it after
> * reset finished.
> */
> + mutex_lock(&hdev->vport_lock);
> if ((test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
> test_bit(HCLGE_STATE_RST_FAIL, &hdev->state)) && is_kill) {
> set_bit(vlan_id, vport->vlan_del_fail_bmap);
> + mutex_unlock(&hdev->vport_lock);
> return -EBUSY;
> + } else if (!is_kill && test_bit(vlan_id, vport->vlan_del_fail_bmap)) {
> + clear_bit(vlan_id, vport->vlan_del_fail_bmap);
> }
> + mutex_unlock(&hdev->vport_lock);
>
> /* when port base vlan enabled, we use port base vlan as the vlan
> * filter entry. In this case, we don't update vlan filter table
> @@ -10481,7 +10482,9 @@ int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
> * and try to remove it from hw later, to be consistence
> * with stack
> */
> + mutex_lock(&hdev->vport_lock);
> set_bit(vlan_id, vport->vlan_del_fail_bmap);
> + mutex_unlock(&hdev->vport_lock);

It looks like that the 'hclge_rm_vport_vlan_table()' call a few lines
above will now happen with the vport_lock unlocked.

That looks racy and would deserve at least a comment explaining why
it's safe.

Thanks,

Paolo