Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

From: Tushar Dave
Date: Tue Dec 05 2017 - 14:19:40 EST




On 12/04/2017 05:03 PM, Fengguang Wu wrote:
Hi Tushar,

On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote:


On 11/23/2017 04:43 AM, Fengguang Wu wrote:
On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:


On 11/21/2017 06:11 PM, Fengguang Wu wrote:
Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[ÂÂ 44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[ÂÂ 44.997548] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input6
[ÂÂ 45.013419] e1000 0000:00:03.0: disabling already-disabled device
[ÂÂ 45.013447] ------------[ cut here ]------------
[ÂÂ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641
pci_disable_device+0xa1/0x105:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pci_disable_device at drivers/pci/pci.c:1640
[ÂÂ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted
4.14.0-01330-g3c07399 #1
[ÂÂ 45.017197] task: ffff88011bee9e40 task.stack: ffffc90000860000
[ÂÂ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pci_disable_device at drivers/pci/pci.c:1640
[ÂÂ 45.018603] RSP: 0000:ffffc90000863e30 EFLAGS: 00010286
[ÂÂ 45.019282] RAX: 0000000000000035 RBX: ffff88013a230008 RCX:
0000000000000000
[ÂÂ 45.020182] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000203
[ÂÂ 45.021084] RBP: ffff88013a3f31e8 R08: 0000000000000001 R09:
0000000000000000
[ÂÂ 45.021986] R10: ffffffff827ec29c R11: 0000000000000002 R12:
0000000000000001
[ÂÂ 45.022946] R13: ffff88013a230008 R14: ffff880117802b20 R15:
ffffc90000863e8f
[ÂÂ 45.023842] FS:Â 0000000000000000(0000) GS:ffff88013fd00000(0000)
knlGS:0000000000000000
[ÂÂ 45.024863] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ 45.025583] CR2: ffffc900006d4000 CR3: 000000000220f000 CR4:
00000000000006a0
[ÂÂ 45.026478] Call Trace:
[ÂÂ 45.026811]Â __e1000_shutdown+0x1d4/0x1e2:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[ÂÂ 45.027344]Â ? rcu_perf_cleanup+0x2a1/0x2a1:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[ÂÂ 45.027883]Â e1000_shutdown+0x14/0x3a:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[ÂÂ 45.028351]Â device_shutdown+0x110/0x1aa:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ device_shutdown at drivers/base/core.c:2807
[ÂÂ 45.028858]Â kernel_power_off+0x31/0x64:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kernel_power_off at kernel/reboot.c:260
[ÂÂ 45.029343]Â rcu_perf_shutdown+0x9b/0xa7:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[ÂÂ 45.029852]Â ? __wake_up_common_lock+0xa2/0xa2:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ autoremove_wake_function at
kernel/sched/wait.c:376
[ÂÂ 45.030414]Â kthread+0x126/0x12e:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread at kernel/kthread.c:233
[ÂÂ 45.030834]Â ? __kthread_bind_mask+0x8e/0x8e:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread at kernel/kthread.c:190
[ÂÂ 45.031399]Â ? ret_from_fork+0x1f/0x30:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ret_from_fork at arch/x86/entry/entry_64.S:443
[ÂÂ 45.031883]Â ? kernel_init+0xa/0xf5:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kernel_init at init/main.c:997
[ÂÂ 45.032325]Â ret_from_fork+0x1f/0x30:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ret_from_fork at arch/x86/entry/entry_64.S:443
[ÂÂ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0
b1 61 82
[ÂÂ 45.035222] ---[ end trace c257137b1b1976ef ]---
[ÂÂ 45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.
Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.

Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.
Fengguang,

Would you please try this patch and test. The patch is compile tested
only. The patch is similar to how ixgbe handled the issue.
Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.

It works! I tried 100 boots and the "e1000 0000:00:03.0: disabling
already-disabled device" error no longer show up.

Tested-by: Fengguang Wu <fengguang.wu@xxxxxxxxx>
Fengguang,

Thanks for testing. I will send patch soon.

-Tushar

Thanks,
Fengguang

Signed-off-by: Tushar Dave <tushar.n.dave@xxxxxxxxxx>
---
Âdrivers/net/ethernet/intel/e1000/e1000.hÂÂÂÂÂ |Â 3 ++-
Âdrivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++++++++++++++++++-----
Â2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h
b/drivers/net/ethernet/intel/e1000/e1000.h
index d7bdea7..8fd2458 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -331,7 +331,8 @@ struct e1000_adapter {
Âenum e1000_state_t {
ÂÂÂÂ __E1000_TESTING,
ÂÂÂÂ __E1000_RESETTING,
-ÂÂÂ __E1000_DOWN
+ÂÂÂ __E1000_DOWN,
+ÂÂÂ __E1000_DISABLED
Â};

Â#undef pr_fmt
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 1982f79..a7de31d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -945,7 +945,7 @@ static int e1000_init_hw_struct(struct e1000_adapter
*adapter,
Âstatic int e1000_probe(struct pci_dev *pdev, const struct
pci_device_id *ent)
Â{
ÂÂÂÂ struct net_device *netdev;
-ÂÂÂ struct e1000_adapter *adapter;
+ÂÂÂ struct e1000_adapter *adapter = NULL;
ÂÂÂÂ struct e1000_hw *hw;

ÂÂÂÂ static int cards_found;
@@ -955,6 +955,7 @@ static int e1000_probe(struct pci_dev *pdev, const
struct pci_device_id *ent)
ÂÂÂÂ u16 tmp = 0;
ÂÂÂÂ u16 eeprom_apme_mask = E1000_EEPROM_APME;
ÂÂÂÂ int bars, need_ioport;
+ÂÂÂ bool disable_dev = false;

ÂÂÂÂ /* do not allocate ioport bars when not needed */
ÂÂÂÂ need_ioport = e1000_is_need_ioport(pdev);
@@ -1259,11 +1260,13 @@ static int e1000_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
ÂÂÂÂ iounmap(hw->ce4100_gbe_mdio_base_virt);
ÂÂÂÂ iounmap(hw->hw_addr);
Âerr_ioremap:
+ÂÂÂ disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
ÂÂÂÂ free_netdev(netdev);
Âerr_alloc_etherdev:
ÂÂÂÂ pci_release_selected_regions(pdev, bars);
Âerr_pci_reg:
-ÂÂÂ pci_disable_device(pdev);
+ÂÂÂ if (!adapter || disable_dev)
+ÂÂÂÂÂÂÂ pci_disable_device(pdev);
ÂÂÂÂ return err;
Â}

@@ -1281,6 +1284,7 @@ static void e1000_remove(struct pci_dev *pdev)
ÂÂÂÂ struct net_device *netdev = pci_get_drvdata(pdev);
ÂÂÂÂ struct e1000_adapter *adapter = netdev_priv(netdev);
ÂÂÂÂ struct e1000_hw *hw = &adapter->hw;
+ÂÂÂ bool disable_dev;

ÂÂÂÂ e1000_down_and_stop(adapter);
ÂÂÂÂ e1000_release_manageability(adapter);
@@ -1299,9 +1303,11 @@ static void e1000_remove(struct pci_dev *pdev)
ÂÂÂÂÂÂÂÂ iounmap(hw->flash_address);
ÂÂÂÂ pci_release_selected_regions(pdev, adapter->bars);

+ÂÂÂ disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
ÂÂÂÂ free_netdev(netdev);

-ÂÂÂ pci_disable_device(pdev);
+ÂÂÂ if (disable_dev)
+ÂÂÂÂÂÂÂ pci_disable_device(pdev);
Â}

Â/**
@@ -5156,7 +5162,8 @@ static int __e1000_shutdown(struct pci_dev *pdev,
bool *enable_wake)
ÂÂÂÂ if (netif_running(netdev))
ÂÂÂÂÂÂÂÂ e1000_free_irq(adapter);

-ÂÂÂ pci_disable_device(pdev);
+ÂÂÂ if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+ÂÂÂÂÂÂÂ pci_disable_device(pdev);

ÂÂÂÂ return 0;
Â}
@@ -5200,6 +5207,8 @@ static int e1000_resume(struct pci_dev *pdev)
ÂÂÂÂÂÂÂÂ pr_err("Cannot enable PCI device from suspend\n");
ÂÂÂÂÂÂÂÂ return err;
ÂÂÂÂ }
+ÂÂÂ smp_mb__before_atomic();
+ÂÂÂ clear_bit(__E1000_DISABLED, &adapter->flags);
ÂÂÂÂ pci_set_master(pdev);

ÂÂÂÂ pci_enable_wake(pdev, PCI_D3hot, 0);
@@ -5274,7 +5283,9 @@ static pci_ers_result_t
e1000_io_error_detected(struct pci_dev *pdev,

ÂÂÂÂ if (netif_running(netdev))
ÂÂÂÂÂÂÂÂ e1000_down(adapter);
-ÂÂÂ pci_disable_device(pdev);
+
+ÂÂÂ if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+ÂÂÂÂÂÂÂ pci_disable_device(pdev);

ÂÂÂÂ /* Request a slot slot reset. */
ÂÂÂÂ return PCI_ERS_RESULT_NEED_RESET;
@@ -5302,6 +5313,8 @@ static pci_ers_result_t e1000_io_slot_reset(struct
pci_dev *pdev)
ÂÂÂÂÂÂÂÂ pr_err("Cannot re-enable PCI device after reset.\n");
ÂÂÂÂÂÂÂÂ return PCI_ERS_RESULT_DISCONNECT;
ÂÂÂÂ }
+ÂÂÂ smp_mb__before_atomic();
+ÂÂÂ clear_bit(__E1000_DISABLED, &adapter->flags);
ÂÂÂÂ pci_set_master(pdev);

ÂÂÂÂ pci_enable_wake(pdev, PCI_D3hot, 0);
--
1.8.3.1


-Tushar

Thanks,
Fengguang