Re: INFO: rcu detected stall in shmem_fault

From: Sergey Senozhatsky
Date: Wed Oct 10 2018 - 11:17:45 EST


On (10/10/18 14:29), Dmitry Vyukov wrote:
> >> A bit unrelated, but while we are at it:
> >>
> >> I like it when we rate-limit printk-s that lookup the system.
> >> But it seems that default rate-limit values are not always good enough,
> >> DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST can still be too
> >> verbose. For instance, when we have a very slow IPMI emulated serial
> >> console -- e.g. baud rate at 57600. DEFAULT_RATELIMIT_INTERVAL and
> >> DEFAULT_RATELIMIT_BURST can add new OOM headers and backtraces faster
> >> than we evict them.
> >>
> >> Does it sound reasonable enough to use larger than default rate-limits
> >> for printk-s in OOM print-outs? OOM reports tend to be somewhat large
> >> and the reported numbers are not always *very* unique.
> >>
> >> What do you think?
> >
> > I do not really care about the current inerval/burst values. This change
> > should be done seprately and ideally with some numbers.
>
> I think Sergey meant that this place may need to use
> larger-than-default values because it prints lots of output per
> instance (whereas the default limit is more tuned for cases that print
> just 1 line).
>
> I've found at least 1 place that uses DEFAULT_RATELIMIT_INTERVAL*10:
> https://elixir.bootlin.com/linux/latest/source/fs/btrfs/extent-tree.c#L8365
> Probably we need something similar here.

Yes, Dmitry, that's what I meant - to use something like
DEFAULT_RATELIMIT_INTERVAL * 10 in OOM. I didn't mean to change
the default values system wide.

---

We are not rate-limiting a single annoying printk() in OOM, but
functions that do a whole bunch of printks - OOM header, backtraces, etc.
Thus OOM report can be, I don't know, 50 or 70 or 100 lines (who knows).
So that's why rate-limit in OOM is more permissive in terms of number of
printed lines. When we rate-limit a single printk() we let 10 prinks()
/*10 lines*/ max every 5 seconds. While in OOM this transforms into
10 dump_header() + 10 oom_kill_process() every 5 seconds. Still can be
too many printk()-s, enough to lockup the system.

-ss