Re: [PATCH V3 3/5] misc: mlx5ctl: Add info ioctl

From: Saeed Mahameed
Date: Mon Nov 27 2023 - 15:39:29 EST


On 27 Nov 19:09, Greg Kroah-Hartman wrote:
On Mon, Nov 20, 2023 at 11:06:17PM -0800, Saeed Mahameed wrote:
+static int mlx5ctl_info_ioctl(struct file *file,
+ struct mlx5ctl_info __user *arg,
+ size_t usize)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ struct mlx5ctl_info *info;
+ size_t ksize = 0;
+ int err = 0;
+
+ ksize = max(sizeof(struct mlx5ctl_info), usize);

Why / How can usize be larger than the structure size and you still want
to allocate a memory chunk that big? Shouldn't the size always match?


new user-space old kernel, the driver would allocate the usiae and make
sure to clear all the buffer with 0's, then fill in what the kernel
understands and send the whole buffer back to user with trailer always
zeroed out.

And what if it's too small?


Too small means old user new kernel, the kernel will honor that and only
fill in what the user understands.

+ info = kzalloc(ksize, GFP_KERNEL_ACCOUNT);

Why account as it will go away almost instantly?


Will change that back to GFP_KERNEL.

+ if (!info)
+ return -ENOMEM;
+
+ info->size = sizeof(struct mlx5ctl_info);
+
+ info->dev_uctx_cap = MLX5_CAP_GEN(mdev, uctx_cap);
+ info->uctx_cap = mfd->uctx_cap;
+ info->uctx_uid = mfd->uctx_uid;
+ info->ucap = mfd->ucap;
+
+ strscpy(info->devname, dev_name(&mdev->pdev->dev),
+ sizeof(info->devname));
+
+ if (copy_to_user(arg, info, usize))
+ err = -EFAULT;

So if usize is smaller than the structure you don't copy it all?

What am I missing here?


We copy back to user as much data as they expect, newer kernel can have
extended functionality and we want to allow expanding the ioctl structs in
the future while keeping support for old user, so new kernel would honor
old user and copy only the size the user expects.

+
+ kfree(info);
+ return err;
+}
+
+static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ void __user *argp = (void __user *)arg;
+ size_t size = _IOC_SIZE(cmd);
+ int err = 0;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ mlx5ctl_dbg(mcdev, "ioctl 0x%x type/nr: %d/%d size: %d DIR:%d\n", cmd,
+ _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd), _IOC_DIR(cmd));
+
+ down_read(&mcdev->rw_lock);
+ if (!mcdev->mdev) {
+ err = -ENODEV;
+ goto unlock;
+ }
+
+ switch (cmd) {
+ case MLX5CTL_IOCTL_INFO:
+ err = mlx5ctl_info_ioctl(file, argp, size);
+ break;
+
+ default:
+ mlx5ctl_dbg(mcdev, "Unknown ioctl %x\n", cmd);
+ err = -ENOIOCTLCMD;

-ENOTTY is the correct error.

--- /dev/null
+++ b/include/uapi/misc/mlx5ctl.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5CTL_IOCTL_H__
+#define __MLX5CTL_IOCTL_H__
+
+struct mlx5ctl_info {
+ __aligned_u64 flags;

Is this used?


no, not yet, but it is good for future extendibility and compatibility
checking.

+ __u32 size;
+ __u8 devname[64]; /* underlaying ConnectX device */

64 should be a define somewhere, right? And why 64?


It is usually the kobj->name of the underlying device, I will have to
define this in the uAPI. 64 seemed large enough, any other suggestion ?
This field is informational only for the user to have an idea which is the
underlying physical device, it's ok if in odd situation the name has to be
truncated to fit into the uAPI buffer.

+ __u16 uctx_uid; /* current process allocated UCTX UID */
+ __u16 reserved1;

Where is this checked to be always 0? Well it's a read so I guess where
is the documentation saying it will always be set to 0?


I forgot to add the checks in the info ioctl path, will add that.
Isn't it an unwritten rule that reserved fields has to be always 0 ?
Do I really need to document this ?

+ __u32 uctx_cap; /* current process effective UCTX cap */
+ __u32 dev_uctx_cap; /* device's UCTX capabilities */
+ __u32 ucap; /* process user capability */
+ __u32 reserved2;

Same here.

And why reserve anything? What does that help with?


struct alignment and explicit padding.

thanks,

greg k-h