Re: [PATCH] atm: firestream: Replace GFP_ATOMIC with GFP_KERNEL in fs_send

From: Jia-Ju Bai
Date: Fri Jan 26 2018 - 23:10:08 EST




On 2018/1/27 1:08, Al Viro wrote:
On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote:
This is found by a static analysis tool named DCNS written by myself.
The trouble is, places like
net/atm/raw.c:65: vcc->send = atm_send_aal0;
net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
? DROP_PACKET : 1;
bh_unlock_sock(sk_atm(vcc));
in pppoatm_send() definitely is called under a spinlock.

Looking through the driver (in advanced bitrot, as usual for drivers/atm),
I'd say that submit_queue() is fucked in head in the "queue full" case.
And judging by the history, had been thus since the original merge...
Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
conversions.

Al's analysis above is similar to how things looked for other patches
you submited of this nature.

So because of the lack of care and serious auditing you put into these
conversions, I really have no choice but to drop them from my queue
because on the whole they are adding bugs rather than improving the
code.
FWIW, the tool *does* promise to be useful

Thanks, I am happy to hear that :)

- as far as I understand it
looks for places where GFP_ATOMIC allocation goes with blocking operations
near every callchain leading there. And that indicates something fishy
going on - either pointless GFP_ATOMIC (in benign case), or something
much nastier: a callchain that would require GFP_ATOMIC. In that case
whatever blocking operation found along the way is a bug.

In fact, my tool first collects all places of GFP_ATOMIC and mdelay in the whole kernel code.
Then it starts analysis from the entry of each interrupt handler and spin-lock function call in the whole kernel code,
to mark the places of GFP_ATOMIC and mdelay that are called in atomic context.
The remaining unmarked places of GFP_ATOMIC and mdelay are reported and can be replaced with GFP_KERNEL and mdelay (or usleep_range).

Though my tool has handled some common situations of function pointers,
But it does not well handle function pointer passing in this code (vcc->send = vcc->dev->ops->send), so the tool needs to be improved.
I am sorry for my incorrect report...

This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c -
static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)
{
u32 wp;
struct FS_QENTRY *cqe;

/* XXX Sanity check: the write pointer can be checked to be
still the same as the value passed as qe... -- REW */
/* udelay (5); */
while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) {
fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n",
q->offset);
schedule ();
}
...
static void submit_queue (struct fs_dev *dev, struct queue *q,
u32 cmd, u32 p1, u32 p2, u32 p3)
{
struct FS_QENTRY *qe;

qe = get_qentry (dev, q);
qe->cmd = cmd;
qe->p0 = p1;
qe->p1 = p2;
qe->p2 = p3;
submit_qentry (dev, q, qe);
...
static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
{
...
td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
...
submit_queue (dev, &dev->hp_txq,
QE_TRANSMIT_DE | vcc->channo,
virt_to_bus (td), 0,
virt_to_bus (td));
...

Either all callchains leading to fs_send() are in non-atomic contexts
(in which case that GFP_ATOMIC would be pointless) or there's one
where we cannot block. Any such would be a potential deadlock.

And the latter appears to be the case - fs_send() is atmdev_ops ->send(),
which can end up in atm_vcc ->send(), which can be called from under
->sk_lock.slock.

Yes, I think schedule() can cause a sleep-in-atomic-context bug.

What would be really useful:
* list of "here's a list of locations where we do something
blocking; each callchain to this GFP_ATOMIC allocation passes either
upstream of one of those without leaving atomic context in between
or downstream without entering one".
* after that - backtracking these callchains further, watching
for ones in atomic contexts. Can be done manually, but if that tool
can assist in doing so, all the better. If we do find one, we have
found a deadlock - just take the blocking operation reported for
that callchain and that's it. If it's not an obvious false positive
(e.g.
if (!foo)
spin_lock(&splat);
.....
if (foo)
schedule();
), it's worth reporting to maintainers of the code in question.
* if all callchains reach obviously non-atomic contexts
(syscall entry, for example, or a kernel thread payload, or
a function documented to require a mutex held by caller, etc.) -
then a switch to GFP_KERNEL might be appropriate. With analysis
of callchains posted as you are posting that.
* either way, having the tool print the callchains out
would be a good idea - for examining them, for writing reports,
etc.

Thanks for your very helpful advice :)
I will follow it in my patches.


Thanks,
Jia-Ju Bai