Re: [PATCH v2 1/2] remoteproc: Add character device interface

From: Mathieu Poirier
Date: Thu Apr 09 2020 - 13:14:52 EST


On Wed, 8 Apr 2020 at 12:35, <rishabhb@xxxxxxxxxxxxxx> wrote:
>
> On 2020-04-06 08:58, ClÃment Leger wrote:
> > Hi Arnaud,
> >
> > ----- On 6 Apr, 2020, at 16:17, Arnaud Pouliquen
> > arnaud.pouliquen@xxxxxx wrote:
> >
> >> Hi ClÃment,
> >>
> >> On 4/6/20 2:06 PM, ClÃment Leger wrote:
> >>> Hi Arnaud,
> >>>
> >>> ----- On 6 Apr, 2020, at 11:01, Arnaud Pouliquen
> >>> arnaud.pouliquen@xxxxxx wrote:
> >>>
> >>>> On 4/3/20 9:13 PM, rishabhb@xxxxxxxxxxxxxx wrote:
> >>>>> On 2020-04-02 10:28, Arnaud POULIQUEN wrote:
> >>>>>> Hi
> >>>>>>
> >>>>>> On 4/1/20 2:03 AM, Rishabh Bhatnagar wrote:
> >>>>>>> Add the character device interface for userspace applications.
> >>>>>>> This interface can be used in order to boot up and shutdown
> >>>>>>> remote subsystems. Currently there is only a sysfs interface
> >>>>>>> which the userspace clients can use. If a usersapce application
> >>>>>>> crashes after booting the remote processor does not get any
> >>>>>>> indication about the crash. It might still assume that the
> >>>>>>> application is running. For example modem uses remotefs service
> >>>>>>> to fetch data from disk/flash memory. If the remotefs service
> >>>>>>> crashes, modem keeps on requesting data which might lead to a
> >>>>>>> crash. Adding a character device interface makes the remote
> >>>>>>> processor tightly coupled with the user space application.
> >>>>>>> A crash of the application leads to a close on the file
> >>>>>>> descriptors
> >>>>>>> therefore shutting down the remoteproc.
> >>>>>>
> >>>>>> Sorry I'm late in the discussion, I hope I've gone through the
> >>>>>> whole
> >>>>>> discussion so I don't reopen a closed point...
> >>>>>>
> >>>>>> Something here is not crystal clear to me so I'd rather share
> >>>>>> it...
> >>>>>>
> >>>>>> I suppose that you the automatic restart of the application is not
> >>>>>> possible to
> >>>>>> stop and restart the remote processor...
> >>>>> Yes correct, while we wait for the application to restart we might
> >>>>> observe a
> >>>>> fatal crash.
> >>>>>>
> >>>>>> Why this use case can not be solved by a process monitor or a
> >>>>>> service
> >>>>>> in userland that detects the application crash and stop the remote
> >>>>>> firmware using
> >>>>>> the sysfs interface?
> >>>>>>
> >>>>> What happens in the case where the process monitor itself crashes?
> >>>>> This is
> >>>>> actually the approach we follow in our downstream code. We have a
> >>>>> central entity
> >>>>> in userspace that controls bootup/shutdown of some remote
> >>>>> processors based on
> >>>>> the
> >>>>> votes from userspace clients. We have observed cases where this
> >>>>> entity
> >>>>> itself crashes and remote processors are left hanging.
> >>>>
> >>>> Your description makes me feel like this patch is only a workaround
> >>>> of something
> >>>> that
> >>>> should be fixed in the userland, even if i understand that hanging
> >>>> is one of the
> >>>> most
> >>>> critical problem and have to be fixed.
> >>>> For instance, how to handle several applications that interact with
> >>>> the remote
> >>>> processor
> >>>> ( e.g. rpmsg service applications) how to stop and restart
> >>>> everything. Using the
> >>>> char
> >>>> device would probaly resolve only a part of the issue...
> >>>>
> >>>> I'm not aware about your environment and i'm not a userland expert.
> >>>> But what i
> >>>> still not
> >>>> understand why a parent process can not do the job...
> >>>> I just test a simple script on my side that treat the kill -9 of an
> >>>> application
> >>>> ("cat" in my case).
> >>>
> >>> This is not entirely true, if the parent process is killed with a
> >>> SIGKILL, then
> >>> the process will not be able to handle anything and the remoteproc
> >>> will still
> >>> be running.
> >>>
> >>> What I understood from Rishabh patch is a way to allow a single
> >>> process handling
> >>> the rproc state. We have the same kind of need and currently, if the
> >>> user application crashes, then the rproc is still running (which
> >>> happens).
> >>>
> >>>>
> >>>> #start the remote firmware
> >>>> cp $1 /lib/firmware/
> >>>> echo $1> /sys/class/remoteproc/remoteproc0/firmware
> >>>> echo start >/sys/class/remoteproc/remoteproc0/state
> >>>> #your binary
> >>>> cat /dev/kmsg
> >>>> # stop the remote firmware in case of crash (and potentially some
> >>>> other apps)
> >>>> echo stop >/sys/class/remoteproc/remoteproc0/state
> >>>>
> >>>
> >>> This is not really "production proof" and what happens if the
> >>> application is
> >>> responsible of setting the firmware which might be jitted ?
> >>> And if the script receives the SIGKILL, then we are back to the same
> >>> problem.
> >> Yes this is just a basic example, not an implementation which would
> >> depend on
> >> the
> >> environment. i'm just trying here to put forward a multi-process
> >> solution...and
> >> that I'm not an userland expert :).
> >>
> >>>
> >>> I really think, this is a step forward an easier and reliable use of
> >>> the
> >>> remoteproc
> >>> on userland to guarantee a coherent rproc state even if host
> >>> application
> >>> crashes.
> Yes what we want is simple mechanism where a single userspace process
> can boot/
> shutdown the remote processor in all scenarios. Adding more processes to
> monitor
> the already existing process might have 2 issues. One is there might be
> a delay
> between the application crash and process monitor getting to know about
> it and taking
> action. This might prove to be fatal in our case. Second, possibly the
> monitor can hang
> or get killed and is not deterministic.
> >>
> >> I can see 3 ways of handling an application crash:
> >> - just shutdown the firmware
> >> => can be done through char device
> >> - stop some other related processes and/or generate a remote proc
> >> crash dump for
> >> debug
> >> => /sysfs and/or debugfs
> >> - do nothing as you want a silence application reboot and re-attach to
> >> the
> >> running firmware
> >> => use sysfs
> >>
> >> I'm challenging the solution because splitting the API seems to me not
> >> a good
> >> solution.
> >
> > Completely ok with that, we have to fully understand the targeted
> > usecase to
> > avoid implemented a flawed interface.
> >
> >> Now i wonder how it works for the other applications that are relying
> >> on some
> >> other
> >> kernel frameworks...
> >
> > For some other device, there is a chardev. The watchdog for intance
> > uses a
> > /dev/watchdog. Regarding the gpio, it seems they are also using a
> > chardev
> > and the sysfs interface is deprecated.
> >
> >> Perhaps the answer is that these frameworks don't use sysfs but char
> >> device.
> >> That would means that the sysfs solution is not the more adapted
> >> solution and
> >> perhaps we should migrate to a char device.
> >> But in this case, i think that it should implement the whole API and
> >> be
> >> exclusive with
> >> the syfs legacy API (so no sysfs or sysfs in read-only).
> >
> > I agree with that, if another interface must be defined, then it should
> > implement everything that is supported right now with the sysfs.
> >
> The other fields that sysfs exposes right now are firmware_name,
> name(rproc name),
> state. The targeted usecase was that these are configuration parameters
> specific
> to the remoteproc and should stay in the sysfs interface. Whereas char
> device
> should provide direct access to remoteproc device.
> It would make sense to use this interface in conjunction with sysfs
> interface, where you use /dev/remoteproc0 to boot/shutdown the remote
> processor
> sysfs entries to fine tune the parameters.
> Adding ioctls to implement all sysfs functionality seems like overkill
> to me. Let
> me know what you guys think.

Hi Rishabh,

The sysfs interface is not going away - it is part of the API and that
can't be changed anymore. I suggest the new character device
interface strictly implement what is currently needed and leave what
is not for future implementation.

> >
> > ClÃment
> >
> >>
> >> Regards,
> >> Arnaud
> >>
> >>>
> >>> Regards,
> >>>
> >>> ClÃment
> >>>
> >>>> Anyway, it's just my feeling, let other people give their feedback.
> >>>>
> >>>>>> I just want to be sure that there is no alternative to this,
> >>>>>> because
> >>>>>> having two ways
> >>>>>> for application to shutdown the firmware seems to me confusing...
> >>>>> Does making this interface optional/configurable helps?
> >>>>>>
> >>>>>> What about the opposite service, mean inform the application that
> >>>>>> the remote
> >>>>>> processor is crashed?
> >>>>>> Do you identify such need? or the "auto" crash recovery is
> >>>>>> sufficient?
> >>>>> Auto recovery works perfectly for us. Although there is a mechanism
> >>>>> in
> >>>>> place using QMI(Qualcomm MSM interface) that can notify clients
> >>>>> about remote
> >>>>> processor crash.
> >>>>
> >>>> Thanks for the information.
> >>>>
> >>>> Regards
> >>>> Arnaud
> >>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Arnaud
> >>>>>>>
> >>>>>>> Signed-off-by: Rishabh Bhatnagar <rishabhb@xxxxxxxxxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/remoteproc/Kconfig | 9 +++
> >>>>>>> drivers/remoteproc/Makefile | 1 +
> >>>>>>> drivers/remoteproc/remoteproc_cdev.c | 100
> >>>>>>> +++++++++++++++++++++++++++++++
> >>>>>>> drivers/remoteproc/remoteproc_internal.h | 22 +++++++
> >>>>>>> include/linux/remoteproc.h | 2 +
> >>>>>>> 5 files changed, 134 insertions(+)
> >>>>>>> create mode 100644 drivers/remoteproc/remoteproc_cdev.c
> >>>>>>>
> >>>>>>> diff --git a/drivers/remoteproc/Kconfig
> >>>>>>> b/drivers/remoteproc/Kconfig
> >>>>>>> index de3862c..6374b79 100644
> >>>>>>> --- a/drivers/remoteproc/Kconfig
> >>>>>>> +++ b/drivers/remoteproc/Kconfig
> >>>>>>> @@ -14,6 +14,15 @@ config REMOTEPROC
> >>>>>>>
> >>>>>>> if REMOTEPROC
> >>>>>>>
> >>>>>>> +config REMOTEPROC_CDEV
> >>>>>>> + bool "Remoteproc character device interface"
> >>>>>>> + help
> >>>>>>> + Say y here to have a character device interface for
> >>>>>>> Remoteproc
> >>>>>>> + framework. Userspace can boot/shutdown remote processors
> >>>>>>> through
> >>>>>>> + this interface.
> >>>>>>> +
> >>>>>>> + It's safe to say N if you don't want to use this
> >>>>>>> interface.
> >>>>>>> +
> >>>>>>> config IMX_REMOTEPROC
> >>>>>>> tristate "IMX6/7 remoteproc support"
> >>>>>>> depends on ARCH_MXC
> >>>>>>> diff --git a/drivers/remoteproc/Makefile
> >>>>>>> b/drivers/remoteproc/Makefile
> >>>>>>> index e30a1b1..b7d4f77 100644
> >>>>>>> --- a/drivers/remoteproc/Makefile
> >>>>>>> +++ b/drivers/remoteproc/Makefile
> >>>>>>> @@ -9,6 +9,7 @@ remoteproc-y +=
> >>>>>>> remoteproc_debugfs.o
> >>>>>>> remoteproc-y += remoteproc_sysfs.o
> >>>>>>> remoteproc-y += remoteproc_virtio.o
> >>>>>>> remoteproc-y += remoteproc_elf_loader.o
> >>>>>>> +obj-$(CONFIG_REMOTEPROC_CDEV) += remoteproc_cdev.o
> >>>>>>> obj-$(CONFIG_IMX_REMOTEPROC) += imx_rproc.o
> >>>>>>> obj-$(CONFIG_MTK_SCP) += mtk_scp.o mtk_scp_ipi.o
> >>>>>>> obj-$(CONFIG_OMAP_REMOTEPROC) += omap_remoteproc.o
> >>>>>>> diff --git a/drivers/remoteproc/remoteproc_cdev.c
> >>>>>>> b/drivers/remoteproc/remoteproc_cdev.c
> >>>>>>> new file mode 100644
> >>>>>>> index 0000000..8182bd1
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/drivers/remoteproc/remoteproc_cdev.c
> >>>>>>> @@ -0,0 +1,100 @@
> >>>>>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>>>>> +/*
> >>>>>>> + * Character device interface driver for Remoteproc framework.
> >>>>>>> + *
> >>>>>>> + * Copyright (c) 2020, The Linux Foundation. All rights
> >>>>>>> reserved.
> >>>>>>> + */
> >>>>>>> +
> >>>>>>> +#include <linux/cdev.h>
> >>>>>>> +#include <linux/fs.h>
> >>>>>>> +#include <linux/module.h>
> >>>>>>> +#include <linux/mutex.h>
> >>>>>>> +#include <linux/remoteproc.h>
> >>>>>>> +
> >>>>>>> +#include "remoteproc_internal.h"
> >>>>>>> +
> >>>>>>> +#define NUM_RPROC_DEVICES 64
> >>>>>>> +static dev_t rproc_cdev;
> >>>>>>> +static DEFINE_IDA(cdev_minor_ida);
> >>>>>>> +
> >>>>>>> +static int rproc_cdev_open(struct inode *inode, struct file
> >>>>>>> *file)
> >>>>>>> +{
> >>>>>>> + struct rproc *rproc;
> >>>>>>> +
> >>>>>>> + rproc = container_of(inode->i_cdev, struct rproc, char_dev);
> >>>>>>> +
> >>>>>>> + if (!rproc)
> >>>>>>> + return -EINVAL;
> >>>>>>> +
> >>>>>>> + if (rproc->state == RPROC_RUNNING)
> >>>>>>> + return -EBUSY;
> >>>>>>> +
> >>>>>>> + return rproc_boot(rproc);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static int rproc_cdev_release(struct inode *inode, struct file
> >>>>>>> *file)
> >>>>>>> +{
> >>>>>>> + struct rproc *rproc;
> >>>>>>> +
> >>>>>>> + rproc = container_of(inode->i_cdev, struct rproc, char_dev);
> >>>>>>> +
> >>>>>>> + if (!rproc || rproc->state != RPROC_RUNNING)
> >>>>>>> + return -EINVAL;
> >>>>>>> +
> >>>>>>> + rproc_shutdown(rproc);
> >>>>>>> +
> >>>>>>> + return 0;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static const struct file_operations rproc_fops = {
> >>>>>>> + .open = rproc_cdev_open,
> >>>>>>> + .release = rproc_cdev_release,
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +int rproc_char_device_add(struct rproc *rproc)
> >>>>>>> +{
> >>>>>>> + int ret, minor;
> >>>>>>> + dev_t cdevt;
> >>>>>>> +
> >>>>>>> + minor = ida_simple_get(&cdev_minor_ida, 0,
> >>>>>>> NUM_RPROC_DEVICES,
> >>>>>>> + GFP_KERNEL);
> >>>>>>> + if (minor < 0) {
> >>>>>>> + dev_err(&rproc->dev, "%s: No more minor numbers left!
> >>>>>>> rc:%d\n",
> >>>>>>> + __func__, minor);
> >>>>>>> + return -ENODEV;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + cdev_init(&rproc->char_dev, &rproc_fops);
> >>>>>>> + rproc->char_dev.owner = THIS_MODULE;
> >>>>>>> +
> >>>>>>> + cdevt = MKDEV(MAJOR(rproc_cdev), minor);
> >>>>>>> + ret = cdev_add(&rproc->char_dev, cdevt, 1);
> >>>>>>> + if (ret < 0)
> >>>>>>> + ida_simple_remove(&cdev_minor_ida, minor);
> >>>>>>> +
> >>>>>>> + rproc->dev.devt = cdevt;
> >>>>>>> + return ret;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +void rproc_char_device_remove(struct rproc *rproc)
> >>>>>>> +{
> >>>>>>> + __unregister_chrdev(MAJOR(rproc->dev.devt),
> >>>>>>> MINOR(rproc->dev.devt), 1,
> >>>>>>> + "rproc");
> >>>>>>> + ida_simple_remove(&cdev_minor_ida, MINOR(rproc->dev.devt));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +void __init rproc_init_cdev(void)
> >>>>>>> +{
> >>>>>>> + int ret;
> >>>>>>> +
> >>>>>>> + ret = alloc_chrdev_region(&rproc_cdev, 0, NUM_RPROC_DEVICES,
> >>>>>>> "rproc");
> >>>>>>> + if (ret < 0) {
> >>>>>>> + pr_err("Failed to alloc rproc_cdev region, err %d\n",
> >>>>>>> ret);
> >>>>>>> + return;
> >>>>>>> + }
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +void __exit rproc_exit_cdev(void)
> >>>>>>> +{
> >>>>>>> + __unregister_chrdev(MAJOR(rproc_cdev), 0, NUM_RPROC_DEVICES,
> >>>>>>> "rproc");
> >>>>>>> +}
> >>>>>>> diff --git a/drivers/remoteproc/remoteproc_internal.h
> >>>>>>> b/drivers/remoteproc/remoteproc_internal.h
> >>>>>>> index 493ef92..28d61a1 100644
> >>>>>>> --- a/drivers/remoteproc/remoteproc_internal.h
> >>>>>>> +++ b/drivers/remoteproc/remoteproc_internal.h
> >>>>>>> @@ -47,6 +47,27 @@ struct dentry *rproc_create_trace_file(const
> >>>>>>> char *name,
> >>>>>>> struct rproc *rproc,
> >>>>>>> int rproc_init_sysfs(void);
> >>>>>>> void rproc_exit_sysfs(void);
> >>>>>>>
> >>>>>>> +#ifdef CONFIG_REMOTEPROC_CDEV
> >>>>>>> +void rproc_init_cdev(void);
> >>>>>>> +void rproc_exit_cdev(void);
> >>>>>>> +int rproc_char_device_add(struct rproc *rproc);
> >>>>>>> +void rproc_char_device_remove(struct rproc *rproc);
> >>>>>>> +#else
> >>>>>>> +static inline void rproc_init_cdev(void)
> >>>>>>> +{
> >>>>>>> +}
> >>>>>>> +static inline void rproc_exit_cdev(void)
> >>>>>>> +{
> >>>>>>> +}
> >>>>>>> +static inline int rproc_char_device_add(struct rproc *rproc)
> >>>>>>> +{
> >>>>>>> + return 0;
> >>>>>>> +}
> >>>>>>> +static inline void rproc_char_device_remove(struct rproc
> >>>>>>> *rproc)
> >>>>>>> +{
> >>>>>>> +}
> >>>>>>> +#endif
> >>>>>>> +
> >>>>>>> void rproc_free_vring(struct rproc_vring *rvring);
> >>>>>>> int rproc_alloc_vring(struct rproc_vdev *rvdev, int i);
> >>>>>>>
> >>>>>>> @@ -63,6 +84,7 @@ struct resource_table
> >>>>>>> *rproc_elf_find_loaded_rsc_table(struct
> >>>>>>> rproc *rproc,
> >>>>>>> struct rproc_mem_entry *
> >>>>>>> rproc_find_carveout_by_name(struct rproc *rproc, const char
> >>>>>>> *name, ...);
> >>>>>>>
> >>>>>>> +
> >>>>>>> static inline
> >>>>>>> int rproc_fw_sanity_check(struct rproc *rproc, const struct
> >>>>>>> firmware *fw)
> >>>>>>> {
> >>>>>>> diff --git a/include/linux/remoteproc.h
> >>>>>>> b/include/linux/remoteproc.h
> >>>>>>> index 16ad666..c4ca796 100644
> >>>>>>> --- a/include/linux/remoteproc.h
> >>>>>>> +++ b/include/linux/remoteproc.h
> >>>>>>> @@ -37,6 +37,7 @@
> >>>>>>>
> >>>>>>> #include <linux/types.h>
> >>>>>>> #include <linux/mutex.h>
> >>>>>>> +#include <linux/cdev.h>
> >>>>>>> #include <linux/virtio.h>
> >>>>>>> #include <linux/completion.h>
> >>>>>>> #include <linux/idr.h>
> >>>>>>> @@ -514,6 +515,7 @@ struct rproc {
> >>>>>>> bool auto_boot;
> >>>>>>> struct list_head dump_segments;
> >>>>>>> int nb_vdev;
> >>>>>>> + struct cdev char_dev;
> >>>>>>> };
> >>>>>>>
> >> >>>>> /**