Re: [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps()

From: Jeff Layton
Date: Mon Jun 28 2021 - 11:30:07 EST


On Mon, 2021-06-28 at 10:04 +0100, Luis Henriques wrote:
> On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote:
> <...>
> > I'm not sure this approach is viable, unfortunately. Once you've dropped
> > the cap_delay_lock, then nothing protects the i_cap_delay_list head
> > anymore.
> >
> > So you could detach these objects and put them on the private list, and
> > then once you drop the spinlock another task could find one of them and
> > (e.g.) call __cap_delay_requeue on it, potentially corrupting your list.
> >
> > I think we'll need to come up with a different way to do this...
>
> Ugh, yeah I see what you mean.
>
> Another option I can think off is to time-bound this loop, so that it
> would stop after finding the first ci->i_hold_caps_max timestamp that was
> set *after* the start of the current run. I'll see if I can come up with
> an RFC shortly.
>

Sounds like a reasonable thing to do.

The catch there is that those caps may end up being delayed up to 5s
more than they would have, since schedule_delayed always uses a 5s
delay. That delay could be made more dynamic if it becomes an issue.

Maybe have the schedule_delayed callers calculate and pass in a timeout
and schedule the next run for that point in the future? Then
delayed_work could schedule the next run to coincide with the timeout of
the next entry on the list.
--
Jeff Layton <jlayton@xxxxxxxxxx>