Re: Infiniate systemd loop when power off the machine with multiple MD RAIDs

From: Song Liu
Date: Tue Aug 22 2023 - 14:56:23 EST


On Wed, Aug 16, 2023 at 2:37 AM Bagas Sanjaya <bagasdotme@xxxxxxxxx> wrote:
>
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > It needs to build at least 2 different RAIDs(eg. RAID0 and RAID10, RAID5 and RAID10) and then you will see below error repeatly(need to use serial console to see it)
> >
> > [ 205.360738] systemd-shutdown[1]: Stopping MD devices.
> > [ 205.366384] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
> > [ 205.373327] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
> > [ 205.380427] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
> > [ 205.388257] systemd-shutdown[1]: Stopping MD /dev/md127 (9:127).
> > [ 205.394880] systemd-shutdown[1]: Failed to sync MD block device /dev/md127, ignoring: Input/output error
> > [ 205.404975] md: md127 stopped.
> > [ 205.470491] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
> > [ 205.770179] md: md126: resync interrupted.
> > [ 205.776258] md126: detected capacity change from 1900396544 to 0
> > [ 205.783349] md: md126 stopped.
> > [ 205.862258] systemd-shutdown[1]: Stopping MD /dev/md125 (9:125).
> > [ 205.862435] md: md126 stopped.
> > [ 205.868376] systemd-shutdown[1]: Failed to sync MD block device /dev/md125, ignoring: Input/output error
> > [ 205.872845] block device autoloading is deprecated and will be removed.
> > [ 205.880955] md: md125 stopped.
> > [ 205.934349] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:7).
> > [ 205.947707] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
> > [ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
> > [ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
> > [ 205.973155] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
> > [ 205.979789] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
> > [ 205.988475] systemd-shutdown[1]: Not all MD devices stopped, 4 left.

>From systemd code, i.e. function delete_md(), this error:

[ 205.957004] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:6).
[ 205.964177] systemd-shutdown[1]: Could not stop MD /dev/md124p1:
Device or resource busy

is most likely triggered by ioctl(STOP_ARRAY).

And based on the code, I think the ioctl fails here:

if (cmd == STOP_ARRAY || cmd == STOP_ARRAY_RO) {
/* Need to flush page cache, and ensure no-one else opens
* and writes
*/
mutex_lock(&mddev->open_mutex);
if (mddev->pers && atomic_read(&mddev->openers) > 1) {
mutex_unlock(&mddev->open_mutex);
err = -EBUSY;
goto out; ////////////////////// HERE
}
if (test_and_set_bit(MD_CLOSING, &mddev->flags)) {
mutex_unlock(&mddev->open_mutex);
err = -EBUSY;
goto out;
}
did_set_md_closing = true;
mutex_unlock(&mddev->open_mutex);
sync_blockdev(bdev);
}

>
> See Bugzilla for the full thread and attached full journalctl log.
>
> Anyway, I'm adding this regression to be tracked by regzbot:
>
> #regzbot introduced: 12a6caf273240a https://bugzilla.kernel.org/show_bug.cgi?id=217798
> #regzbot title: systemd shutdown hang on machine with different RAID levels

But the observation above doesn't seem to match the bisect result
and it doesn't seem to be related to different RAID levels.

Thanks,
Song

>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
>
> --
> An old man doll... just what I always wanted! - Clara