Re: [PATCH] libata, freezer: avoid block device removal while systemis frozen

From: Nigel Cunningham
Date: Fri Dec 13 2013 - 18:15:36 EST


Hi again.

On 14/12/13 10:07, Tejun Heo wrote:
Hello, Nigel.

On Sat, Dec 14, 2013 at 09:45:59AM +1100, Nigel Cunningham wrote:
In your first email, in the first substantial paragraph (starting
"Now, if the rest.."), you say "libata device removal waits for the
scheduled writeback work item to finish". I wonder if that's the
lynchpin. If we know the device is gone, why are we trying to write
to it?
It's just a standard part of block device removal -
invalidate_partition(), bdi_wb_shutdown().
Mmm. But perhaps there needs to be some special code in there to handle the "we can't write to this device anymore" case?

All pending I/O should have been flushed when suspend/hibernate
started, and there's no point in trying to update metadata on a
Frozen or not, it isn't guaranteed that bdi wb queue is empty when the
system went to suspend. They're likely to be empty but there's no
guarantee. Conversion to workqueue only makes the behavior more
deterministic.

device we can't access, so there should be no writeback needed (and
anything that does somehow get there should just be discarded since
it will never succeed anyway).
Even if they'll never succeed, they still need to be issued and
drained; otherwise, we'll end up with leaked items and hung issuers.
Yeah - I get that, but drained needs to work differently if the device doesn't exist?
Having said the above, I agree that we shouldn't need to freeze
kernel threads and workqueues themselves. I think we should be
giving the producers of I/O the nous needed to avoid producing I/O
during suspend/hibernate. But perhaps I'm missing something here,
too.
I never understood that part. Why do we need to control the
producers? The chain between the producer and consumer is a long one
and no matter what we do with the producers, the consumers need to be
plugged all the same. Why bother with the producers at all? I think
that's where all this freezable kthreads started but I don't
understand what the benefit of that is. Not only that, freezer is
awefully inadequate in its role too. There are flurry of activities
which happen in the IO path without any thread involved and many of
them can lead to issuance of new IO, so the only thing freezer is
achieving is making existing bugs less visible, which is a bad thing
especially for suspend/resume as the failure mode often doesn't yield
to easy debugging.

I asked the same question years ago and ISTR getting only fairly vague
answers but this whole freezable kthread is expectedly proving to be a
continuous source of problems. Let's at least find out whether we
need it and why if so. Not some "I feel better knowing things are
calmer" type vagueness but actual technical necessity of it.

My understanding is that the point is ensuring that - particularly in the case of hibernation - we don't cause filesystem corruption by writing one thing while writing the image and then doing something else (without knowledge of what happened while the image was being written) while reading the image or after restoring it.

Regards,

Nigel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/