Re: [RFC PATCH v2 1/3] meminfo_extra: introduce meminfo extra

From: Leon Romanovsky
Date: Sun Mar 29 2020 - 03:21:46 EST


On Tue, Mar 24, 2020 at 09:53:16PM +0900, Jaewon Kim wrote:
>
>
> On 2020ë 03ì 24ì 20:46, Greg KH wrote:
> > On Tue, Mar 24, 2020 at 08:37:38PM +0900, Jaewon Kim wrote:
> >>
> >> On 2020ë 03ì 24ì 19:11, Greg KH wrote:
> >>> On Tue, Mar 24, 2020 at 06:11:17PM +0900, Jaewon Kim wrote:
> >>>> On 2020ë 03ì 23ì 18:53, Greg KH wrote:
> >>>>>> +int register_meminfo_extra(atomic_long_t *val, int shift, const char *name)
> >>>>>> +{
> >>>>>> + struct meminfo_extra *meminfo, *memtemp;
> >>>>>> + int len;
> >>>>>> + int error = 0;
> >>>>>> +
> >>>>>> + meminfo = kzalloc(sizeof(*meminfo), GFP_KERNEL);
> >>>>>> + if (!meminfo) {
> >>>>>> + error = -ENOMEM;
> >>>>>> + goto out;
> >>>>>> + }
> >>>>>> +
> >>>>>> + meminfo->val = val;
> >>>>>> + meminfo->shift_for_page = shift;
> >>>>>> + strncpy(meminfo->name, name, NAME_SIZE);
> >>>>>> + len = strlen(meminfo->name);
> >>>>>> + meminfo->name[len] = ':';
> >>>>>> + strncpy(meminfo->name_pad, meminfo->name, NAME_BUF_SIZE);
> >>>>>> + while (++len < NAME_BUF_SIZE - 1)
> >>>>>> + meminfo->name_pad[len] = ' ';
> >>>>>> +
> >>>>>> + spin_lock(&meminfo_lock);
> >>>>>> + list_for_each_entry_rcu(memtemp, &meminfo_head, list) {
> >>>>>> + if (memtemp->val == val) {
> >>>>>> + error = -EINVAL;
> >>>>>> + break;
> >>>>>> + }
> >>>>>> + }
> >>>>>> + if (!error)
> >>>>>> + list_add_tail_rcu(&meminfo->list, &meminfo_head);
> >>>>>> + spin_unlock(&meminfo_lock);
> >>>>> If you have a lock, why are you needing rcu?
> >>>> I think _rcu should be removed out of list_for_each_entry_rcu.
> >>>> But I'm confused about what you meant.
> >>>> I used rcu_read_lock on __meminfo_extra,
> >>>> and I think spin_lock is also needed for addition and deletion to handle multiple modifiers.
> >>> If that's the case, then that's fine, it just didn't seem like that was
> >>> needed. Or I might have been reading your rcu logic incorrectly...
> >>>
> >>>>>> + if (error)
> >>>>>> + kfree(meminfo);
> >>>>>> +out:
> >>>>>> +
> >>>>>> + return error;
> >>>>>> +}
> >>>>>> +EXPORT_SYMBOL(register_meminfo_extra);
> >>>>> EXPORT_SYMBOL_GPL()? I have to ask :)
> >>>> I can use EXPORT_SYMBOL_GPL.
> >>>>> thanks,
> >>>>>
> >>>>> greg k-h
> >>>>>
> >>>>>
> >>>> Hello
> >>>> Thank you for your comment.
> >>>>
> >>>> By the way there was not resolved discussion on v1 patch as I mentioned on cover page.
> >>>> I'd like to hear your opinion on this /proc/meminfo_extra node.
> >>> I think it is the propagation of an old and obsolete interface that you
> >>> will have to support for the next 20+ years and yet not actually be
> >>> useful :)
> >>>
> >>>> Do you think this is meaningful or cannot co-exist with other future
> >>>> sysfs based API.
> >>> What sysfs-based API?
> >> Please refer to mail thread on v1 patch set - https://protect2.fireeye.com/url?k=16e3accc-4b2f6548-16e22783-0cc47aa8f5ba-935fe828ac2f6656&u=https://lkml.org/lkml/fancy/2020/3/10/2102
> >> especially discussion with Leon Romanovsky on https://protect2.fireeye.com/url?k=74208ed9-29ec475d-74210596-0cc47aa8f5ba-0bd4ef48931fec95&u=https://lkml.org/lkml/fancy/2020/3/16/140
> > I really do not understand what you are referring to here, sorry. I do
> > not see any sysfs-based code in that thread.
> Sorry. I also did not see actual code.
> Hello Leon Romanovsky, could you elaborate your plan regarding sysfs stuff?

Sorry for being late, I wasn't in "TO:", so missed the whole discussion.

Greg,

We need the exposed information for the memory optimizations (debug, not
production) of our high speed NICs. Our devices (mlx5) allocates a lot of
memory, so optimization there can help us to scale in SRIOV mode easier and
be less constraint by the memory.

I want to emphasize that I don't like idea of extending /proc/* interface
because it is going to be painful to grep on large machines with many
devices. And I don't like the idea that every driver will need to register
into this interface, because it will be abused almost immediately.

My proposal was to create new sysfs file by driver/core and put all
information automatically there, for example, it can be
/sys/devices/pci0000:00/0000:00:0c.0/meminfo
^^^^^^^

Thanks