Re: Linux FS6712X-EB92 5.13.x - page allocation failure followed by a controller reset and the drive drops out of the array

From: Justin Piszcz
Date: Mon Feb 19 2024 - 11:49:28 EST


On Sat, Feb 17, 2024 at 11:34 PM Randy Dunlap <rdunlap@xxxxxxxxxxxxx> wrote:
>
> Hi Justin,

[ .. ]

> I can't answer your question. I suggest that you ask it on the
> linux-raid@xxxxxxxxxxxxxxx mailing list.

Thanks, I will follow up there.

>
> Also, I have one question:
>
> in this log fragment:
>
> [1698614.263935] SLUB: Unable to allocate memory on node -1,
> gfp=0x800(GFP_NOWAIT)
> [1698614.271680] cache: skbuff_head_cache, object size: 224, buffer
> size: 256, default order: 0, min order: 0
> [1698614.281979] node 0: slabs: 32, objs: 512, free: 64
> [1933116.236646] nvme nvme9: I/O 119 QID 2 timeout, aborting
> [1933116.242365] nvme nvme9: I/O 120 QID 2 timeout, aborting
> [1933141.324640] nvme nvme9: I/O 1 QID 0 timeout, reset controller
> [1933146.444701] nvme nvme9: I/O 119 QID 2 timeout, reset controller
> [1933215.826997] nvme nvme9: Device not ready; aborting reset, CSTS=0x1
>
> there are roughly 4 days between the 1698614 log entry and the
> 1933116 log entry. Is that (logging) accurate? Did it miss anything?
> It just seems odd to me.
Not that I had seen within the logs, when I tried to access the device
via Windows file share it had been awhile since it was last accessed;
however, there is no sleep setting for the NVME SSDs (only
external/USB HDDs).