Re: [REGRESSION] RLIMIT_DATA crashes named

From: Laura Abbott
Date: Fri Sep 16 2016 - 16:12:02 EST


On 09/16/2016 10:46 AM, Linus Torvalds wrote:
On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <labbott@xxxxxxxxxx> wrote:

Fedora received a bug report[1] after pushing 4.7.2 that named
was segfaulting with named-chroot. With some help (thank you
tibbs!), it was noted that on older kernels named was spitting
out

mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
Will be forbidden soon.

and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
workaround for valgrind") it now spits out

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data.

Ok, we can certainly revert, but before we do that I'd like to
understand a few more things.

For example, where the data limit came from, and how likely this is to
hit others that have a much harder time fixing it. Adding Sam
Varshavchik and Brent to the participants list...

In particular, this is clearly trivially fixable as noted by Brent in
that bugzilla entry:

'remove the "datasize 20M;" directive in named.conf'

along with the (much worse) option of "use boot option
ignore_rlimit_data" that the kernel dmesg itself suggests as an
option.

So for example, if that "datasize 20M;" is coming from just the Fedora
named package, it would be much nicer to just get that fixed instead.
Because RLIMIT_DATA the old way was just meaningless noise.


As far as I can tell this isn't Fedora specific.

We definitely don't want to break peoples existing setups, but as this
is *so* easy to fix in other ways (even at runtime without even
updating a kernel), and since this commit is already four months old
by now with this single bugzilla being the only report since then that
I'm aware of, my reaction is just that there are better ways to fix it
than reverting a commit that can be worked around trivially.

I was debating the merits of a revert. My concern is that this bugzilla
just represents the people who are reporting the bug and able to
correlate it to named. The actual number of people who are seeing
problems may be higher and anyone mucking with their config
could hit this and then have to go through troubleshooting steps again.
Add a config, get a segfault is a pretty terrible experience even
by Linux standards. I'd feel better about not reverting if there
were a proposed patch for named

I would like to see RLIMIT_DATA actually do something useful so worse
case I'll figure out something to carry in Fedora and this thread
can be an FYI for people googling.


Linus


Thanks,
Laura