Re: sync, reboot, and corrupting data [was Re: 2.6.29 -mm merge plans]

From: Andi Kleen
Date: Sat Jan 10 2009 - 16:58:27 EST


On Sat, Jan 10, 2009 at 10:32:23PM +0100, Pavel Machek wrote:
> On Sat 2009-01-10 16:07:29, Andi Kleen wrote:
> > On Thu, Jan 08, 2009 at 02:24:55PM +0100, Pavel Machek wrote:
> > > On Wed 2009-01-07 03:57:25, Andi Kleen wrote:
> > > > > sys_sync B which is invoked *after* sys_sync caller A should not
> > > > > return before A. If you didn't have a global lock, they'd tend to
> > > > > block one another's pages anyway. I think it's OK.
> > > >
> > > > It means that you cannot reboot because reboot does sync.
> > > > What happens when the sync gets stuck somewhere on a really
> > > > slow device?
> > >
> > > And what do you propose? Silently corrupt data on the slow device?
> >
> > Yes not writing is better than being unable to reboot.
>
> Disagreed.

Well you're just forcing the user to press power/reset/sysrq-b which
will pretty much guarantee data loss if anything is unwritten.

> maybe reboot utility should not call sync()...

I think it should call sync(), but have a suitable timeout.
Never spend more than 10 seconds on the sync. And give user visible
feedback during the countdown.

Now of course fixing the complete IO stack to support timeouts
might be too hard (although in theory they're already supposed
to have them, but as we know that doesn't always work reliable)

One alternative would be to do it with a background thread
(which seems to be en vogue right now anyways)
Ok I suppose with that Nick's lock is actually ok, although
I still don't like it very much.

-Andi

--
ak@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/