Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold

From: Michal Hocko
Date: Thu Sep 05 2019 - 10:20:15 EST


On Thu 05-09-19 10:14:52, Joel Fernandes wrote:
> On Thu, Sep 05, 2019 at 12:54:24PM +0200, Michal Hocko wrote:
> > On Wed 04-09-19 12:28:08, Joel Fernandes wrote:
> > > On Wed, Sep 4, 2019 at 11:38 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed 04-09-19 11:32:58, Joel Fernandes wrote:
> > > > > On Wed, Sep 04, 2019 at 10:45:08AM +0200, Michal Hocko wrote:
> > > > > > On Tue 03-09-19 16:09:05, Joel Fernandes (Google) wrote:
> > > > > > > Useful to track how RSS is changing per TGID to detect spikes in RSS and
> > > > > > > memory hogs. Several Android teams have been using this patch in various
> > > > > > > kernel trees for half a year now. Many reported to me it is really
> > > > > > > useful so I'm posting it upstream.
> > > > > > >
> > > > > > > Initial patch developed by Tim Murray. Changes I made from original patch:
> > > > > > > o Prevent any additional space consumed by mm_struct.
> > > > > > > o Keep overhead low by checking if tracing is enabled.
> > > > > > > o Add some noise reduction and lower overhead by emitting only on
> > > > > > > threshold changes.
> > > > > >
> > > > > > Does this have any pre-requisite? I do not see trace_rss_stat_enabled in
> > > > > > the Linus tree (nor in linux-next).
> > > > >
> > > > > No, this is generated automatically by the tracepoint infrastructure when a
> > > > > tracepoint is added.
> > > >
> > > > OK, I was not aware of that.
> > > >
> > > > > > Besides that why do we need batching in the first place. Does this have a
> > > > > > measurable overhead? How does it differ from any other tracepoints that we
> > > > > > have in other hotpaths (e.g. page allocator doesn't do any checks).
> > > > >
> > > > > We do need batching not only for overhead reduction,
> > > >
> > > > What is the overhead?
> > >
> > > The overhead is occasionally higher without the threshold (that is if we
> > > trace every counter change). I would classify performance benefit to be
> > > almost the same and within the noise.
> >
> > OK, so the additional code is not really justified.
>
> It is really justified. Did you read the whole of the last email?

Of course I have. The information that numbers are in noise with some
outliers (without any details about the underlying reason) is simply
showing that you are optimizing something probably not worth it.

I would recommend adding a simple tracepoint. That should be pretty non
controversial. And if you want to add an optimization on top then
provide data to justify it.
--
Michal Hocko
SUSE Labs