Re: [patch] sys_sync livelock fix

From: Bill Davidsen (davidsen@tmr.com)
Date: Wed Feb 13 2002 - 17:53:02 EST


On Wed, 13 Feb 2002, Daniel Phillips wrote:

> On February 13, 2002 06:21 am, Bill Davidsen wrote:

> > The current behaviour allows the system to hang forever waiting for
> > sync(2). In practice it does actually wait minutes on a busy system (df
> > has --no-sync for that reason) when there is no reason for that to happen.
> > I think that not only sucks worse, it's non-standard as well.
>
> Nothing sucks worse than losing your data. Let's concentrate on fixing
> shutdown, not breaking (linux) sync.

If you have read my previous posts, you know that the current behaviour is
broken, allows users to glitch the system performance at no benefit to the
issuing process, etc. In other words sync is now currently broken, in
terms of both theory, practice, and standard.

> > > For example, ext3 users get to enjoy rebooting with `sync ; reboot -f'
> > > to get around all those silly shutdown scripts. This very much relies
> > > upon the sync waiting upon the I/O.

Of course between the sync and execution of the reboot the disk buffers
may be totally full again, and if shutdown isn't close to instant perhaps
you have a faulty shutdown and should fix it.

> > Because people count on something broken we should keep the bug? You do
> > realize that the sync may NEVER finish?
>
> You do realize that if you lose your data you may NEVER get it back? ;-)

The sync doesn't protect my data, after my data has been written why
should I care to wait while all the data in every active program in the
system gets written. This makes checkpoints stop points on a busy system.
 
> > That's not in theory, I have news
> > servers which may wait overnight without finishing a "df" iwthout the
> > option.
>
> OK, what you're really saying is, we need a way to kill the sync process
> if it runs overtime, no?

If that maps into write the current dirty buffers and get on with life,
yes. I posted a better solution for comment, I won't duplicate that here.
  
> > > I mean, according to SUS, our sys_sync() implementation could be
> > >
> > > asmlinkage void sys_sync(void)
> > > {
> > > return;
> > > }
> > >
> > > Because all I/O is already scheduled, thanks to kupdate.

I can't see anyone taking that reading or that implementation, so let's
not discuss how badly we can do a standard conforming kernel, this is
Linux, we do the best at everything once we figure out what that really
means;-)

> > Your example is a good example of bad practive, since even with
> > ext3 a program creating files quickly would lose data, even though the
> > directory structure is returned to a known state, without stopping the
> > writing processes the results are unknown.
>
> Huh? You know about journal commit, right?

Read or reread my other notes on that, journal prevents directory
corruption, it doesn't prevent data loss like a database transaction.
Returning to a known good state does not include "without losing any data
written to unclosed files."

I leave it to Mr Reiser to clarify that or correct me if data is protected
without using unbuffered writes.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 15 2002 - 21:00:58 EST