Re: [PATCH] fs: limit maximum concurrent coredumps

From: Andrew Morton
Date: Mon Jun 21 2010 - 21:42:41 EST


On Mon, 21 Jun 2010 18:23:03 -0700 (PDT) Roland McGrath <roland@xxxxxxxxxx> wrote:

> A core dump is just an instance of a process suddenly reading lots of its
> address space and doing lots of filesystem writes, producing the kinds of
> thrashing that any such instance might entail. It really seems like the
> real solution to this kind of problem will be in some more general kind of
> throttling of processes (or whatever manner of collections thereof) when
> they got hog-wild on page-ins or filesystem writes, or whatever else. I'm
> not trying to get into the details of what that would be. But I have to
> cite this hack as the off-topic kludge that it really is. That said, I do
> certainly sympathize with the desire for a quick hack that addresses the
> scenario you experience.

yup.

> For the case you described, it seems to me that constraining concurrency
> per se would be better than punting core dumps when too concurrent. That
> is, you should not skip the dump when you hit the limit. Rather, you
> should block in do_coredump() until the next dump already in progress
> finishes. (It should be possible to use TASK_KILLABLE so that those dumps
> in waiting can be aborted with a follow-on SIGKILL. But Oleg will have to
> check on the signals details being right for that.)

yup.

Might be able to use semaphores for this. Use sema_init(),
down_killable() and up().

Modifying the max concurrency value would require a loop of up()s and
down()s, probably all surrounded by a mutex_lock. Which is a bit ugly,
and should be done in kernel/semaphore.c I guess.

> That won't make your crashers each complete quickly, but it will prevent
> the thrashing. Instead of some crashers suddenly not producing dumps at
> all, they'll just all queue up waiting to finish crashing but not using any
> CPU or IO resources. That way you don't lose any core dumps unless you
> want to start SIGKILL'ing things (which oom_kill might do if need be),
> you just don't die in flames trying to do nothing but dump cores.

A global knob is a bit old-school. Perhaps it should be a per-memcg
knob or something.



otoh, one could perhaps toss all these tasks into a blkio_cgroup and
solve this problem with the block IO controller. After all, that's
what it's for.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/