Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups

From: Jan Kara
Date: Tue Oct 22 2019 - 04:25:27 EST


On Mon 21-10-19 23:49:04, Roman Gushchin wrote:
> On Wed, Oct 16, 2019 at 11:18:40AM +0200, Jan Kara wrote:
> > On Tue 15-10-19 21:40:45, Roman Gushchin wrote:
> > > On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote:
> > > > On Thu 10-10-19 16:40:36, Roman Gushchin wrote:
> > > >
> > > > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> > > > > if (!list_empty(&inode->i_io_list)) {
> > > > > struct inode *pos;
> > > > >
> > > > > - inode_io_list_del_locked(inode, old_wb);
> > > > > + inode_io_list_del_locked(inode, old_wb, false);
> > > > > inode->i_wb = new_wb;
> > > > > list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > > > > if (time_after_eq(inode->dirtied_when,
> > > >
> > > > This bit looks wrong. Not the change you made as such but the fact that you
> > > > can now move inode from b_attached list of old wb to the dirty list of new
> > > > wb.
> > >
> > > Hm, can you, please, elaborate a bit more why it's wrong?
> > > The reference to the old_wb will be dropped by the switching code.
> >
> > My point is that the code in full looks like:
> >
> > if (!list_empty(&inode->i_io_list)) {
> > struct inode *pos;
> >
> > inode_io_list_del_locked(inode, old_wb);
> > inode->i_wb = new_wb;
> > list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > if (time_after_eq(inode->dirtied_when,
> > pos->dirtied_when))
> > break;
> > inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev);
> > } else {
> >
> > So inode is always moved from some io list in old_wb to b_dirty list of
> > new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists
> > of old_wb. But once you add b_attached list to the game, it is not correct
> > anymore. You should not add clean inode to b_dirty list of new_wb.
>
> I see...
>
> Hm, will checking of i_state for not containing I_DIRTY_ALL bits be enough here?
> Alternatively, I can introduce a new bit which will explicitly point at the
> inode being on the b_attached list, but I'd prefer not to do it.

Yeah, keying of i_state should work. And while we are at it, we could also
correctly handle I_DIRTY_TIME case and move inode only to b_dirty_time
list. That seems to be (mostly harmless) preexisting issue.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR