Re: KASAN: use-after-free Read in path_lookupat

From: Daniel Borkmann
Date: Mon Mar 25 2019 - 18:05:02 EST


On 03/25/2019 10:45 PM, Linus Torvalds wrote:
> On Mon, Mar 25, 2019 at 2:14 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> Maybe, but we really need to come up with sane documentation on the
>> entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode
>> group ;-/
>
> Yeah.
>
> I actually think the "destroy_inode/rcu_destroy_inode" part is the
> simplest one to understand: it's just tearing down the inode, and the
> RCU part is (obviously, as shown by this thread) somewhat subtle, but
> at the same time not really all that hard to explain: "inodes that can
> be looked up through RCU lookup need to be destroyed with RCU delay".
>
> I think drop_inode->evict_inode is arguably the much more subtle
> stuff. That's the "we're not destroying things, we're making decisions
> about the state" kind of thing.
>
> And we actually don't have any documentation at all for those two.
> Well, there's the "porting" thing, but..
>
>> And I want to understand the writeback-related issues
>> in ocfs2 and f2fs - the current kludges in those smell fishy.
>
> I'm assuming you're talking about exactly that drop_inode() kind of subtlety.
>
> At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out
> would simplify things a lot. *All* that the ocfs2_destroy_inode()
> function does is to schedule the inode freeing for RCU delay.
>
> Assuming my patch is fine (as mentioned, it was entirely untested),
> that patch would actually simplify that a lot. Get rid of
> ocfs2_destroy_inode() entirely, and just make
> ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up
> (we have that ocfs2_rcu_destroy_inode() currently as
> ocfs2_i_callback(), but with the ugly rcu-head interface).
>
>>> if (unlikely(inode_init_always(sb, inode))) {
>>> - if (inode->i_sb->s_op->destroy_inode)
>>> + if (inode->i_sb->s_op->destroy_inode) {
>>> inode->i_sb->s_op->destroy_inode(inode);
>>> - else
>>> + if (!inode->i_sb->s_op->rcu_destroy_inode)
>>> + return NULL;
>>> + }
>>> + if (!inode->i_sb->s_op->rcu_destroy_inode ||
>>> + !inode->i_sb->s_op->rcu_destroy_inode(inode))
>>> kmem_cache_free(inode_cachep, inode);
>>
>> ITYM i_callback(inode); here, possibly suitably renamed.
>
> Yeah, I guess we could just call that i_callback() directly. I wrote
> it that way because it was how the code was organized (it didn't call
> i_callback() before either), but yes, it woudl avoid some duplicate
> code to do it that way.
>
> And yes, at that point we'd probably want to rename it too.
>
> On the whole, looking at a few filesystems, I really think that
> 'rcu_destroy_inode()" would simplify several of them. That ocfs2 case
> I already mentioned, but looking around the exact same is trye in
> v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs,
> openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs,
> gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs)
>
> And in other filesystems that actually *do* have a real
> destroy_inode() that does other things too, they still want the rcu
> delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs,
> ext4, jfs, organgefs, ovl, ubifs).
>
> In fact, every single filesystem I looked at (and I looked at most)
> would be simplified by that patch.
>
> And for some it would make it easier to fix bugs in the process (ie
> bpf) because they don't currently have that rcu delay and they need
> it.

I'm fine either way, I think the rcu_destroy_inode would indeed simplify
it nicely. In any case fwiw, here's what I'd have ready for standby on bpf
side and tested as well. Decided to get rid of bpf_evict_inode() entirely
since the only callback we'd really need is on final inode destruction: