Re: Linux 3.19-rc5

From: Bruno PrÃmont
Date: Wed Jan 21 2015 - 16:37:16 EST


On Wed, 21 January 2015 Bruno PrÃmont wrote:
> On Tue, 20 January 2015 Linus Torvalds wrote:
> > On Tue, Jan 20, 2015 at 6:02 AM, Bruno PrÃmont wrote:
> > >
> > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > converting my laptop into a heater.
> > >
> > > lspci for affected nodes:
> > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > >
> > > Very basics I have, before I attempt any bisection:
> >
> > Hmm. I'm not seeing anything recent changing anything in this area, so
> > I suspect that unless somebody else steps up and says "Ahh, that
> > sounds like xyz", your bisection is the best option.

Bisecting to the end did point me at (the warning traces produced in great
quantities might not be the very same issue as the abusive CPU usage, but
certainly look very related):
[CCing people on CC for the patch]

commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Wed Sep 24 10:18:55 2014 +0200

sched: Debug nested sleeps

Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.

Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.

Another observed problem is calling a blocking function from
schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: tglx@xxxxxxxxxxxxx
Cc: ilya.dryomov@xxxxxxxxxxx
Cc: umgwanakikbuti@xxxxxxxxx
Cc: oleg@xxxxxxxxxx
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20140924082242.591637616@xxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

Which does produce the following trace (hand-copied most important parts of it):
Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
...
Call trace:
...
__might_sleep+0x143/0x170
? pccardd+0xa0/0x3e0
? pccardd+0xa0/0x3e0
mutex_lock+0x17/0x2a
pccardd+0xe9/0x3e0
? pcmcia_socket_uevent+0x30/0x30

pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
Peter's patch wants to warn about.


Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/