Re: KASAN: slab-use-after-free Read in ip_finish_output

From: Eric Dumazet
Date: Tue Mar 12 2024 - 09:48:55 EST


On Tue, Mar 12, 2024 at 2:21 PM Florian Westphal <fw@xxxxxxxxx> wrote:
>
> Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > > so skb->sk gets propagated down to __ip_finish_output(), long
> > > after connrack defrag has called skb_orphan().
> > >
> > > No idea yet how to fix it,
> >
> > My plan was to refine "inet: frag: Always orphan skbs inside
> > ip_defrag()" and only do the skb_orphan()
> > for skb added to a frag_list.
> >
> > The head skb would keep a reference to the socket.
>
> I tried to follow this but its beyond my abilities.
>
> Defrag messes with skb->truesize, and I do not know how to
> fix that up safely so later calls to destructor won't underflow sk
> accouting.
>
> Furthermore, depending on delivery order, the skb that gets
> passed to rest of stack might not be the head skb (the one with
> full l4 header and sk reference), its always the last one that arrived.
>
> Existing code skb_morphs() this, see inet_frag_reasm_prepare() and also
> the ->truesize munging (which is fine only because all skbs are
> orphans...).
>
> So in order to not pass already-released sk to inet output somehow
> the skb->sk reference needs to be stolen and moved from one sk
> to another.
>
> No idea how to do this, let alone do regression testing for this.
> see e.g. 48cac18ecf1de82f76259a54402c3adb7839ad01 which added
> unconditional orphaning in ipv6 netfilter defrag.
>
> ATM the only "solution" I see is to completely remove netfilter defrag
> support for outgoing packets.

Thanks for taking a look Florian.

Perhaps not messing with truesize at all would help ?

Something based on this POC :

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 153960663ce4c2389259e0dc2bb4ba0b011dc698..8fd688df4b24dec7b5a1ea365d5b34d81fe6e0e5
100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -94,6 +94,7 @@ struct inet_frag_queue {
struct rb_root rb_fragments;
struct sk_buff *fragments_tail;
struct sk_buff *last_run_head;
+ struct sock *sk;
ktime_t stamp;
int len;
int meat;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 7072fc0783ef56e59c886a2f2516e7db7d10c942..aee706226ef8d9e9514fcf7d60e4d278ff7178fa
100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -297,6 +297,8 @@ void inet_frag_destroy(struct inet_frag_queue *q)
SKB_CONSUMED;
WARN_ON(del_timer(&q->timer) != 0);

+ if (q->sk)
+ sock_put(q->sk);
/* Release all fragment data. */
fqdir = q->fqdir;
f = fqdir->f;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a4941f53b523725cd777d213500b8f6918287920..198a4c35cd1232278678a20bf0f2a49c63f548fc
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -377,6 +377,10 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)

skb->_skb_refdst = 0UL;
err = ip_frag_reasm(qp, skb, prev_tail, dev);
+ if (qp->q.sk) {
+ swap(qp->q.sk, skb->sk);
+ skb->destructor = sock_edemux;
+ }
skb->_skb_refdst = orefdst;
if (err)
inet_frag_kill(&qp->q);
@@ -384,6 +388,7 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)
}

skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;

insert_error:
@@ -487,7 +492,6 @@ int ip_defrag(struct net *net, struct sk_buff
*skb, u32 user)
struct ipq *qp;

__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
- skb_orphan(skb);

/* Lookup (or create) queue header */
qp = ip_find(net, ip_hdr(skb), user, vif);
@@ -495,7 +499,12 @@ int ip_defrag(struct net *net, struct sk_buff
*skb, u32 user)
int ret;

spin_lock(&qp->q.lock);
+ if (!qp->q.sk) {
+ struct sock *sk = skb->sk;

+ if (sk && refcount_inc_not_zero(&sk->sk_refcnt))
+ qp->q.sk = sk;
+ }
ret = ip_frag_queue(qp, skb);

spin_unlock(&qp->q.lock);