Re: [PATCH 0/9] Samsung Trinity NPU device driver

From: Greg KH
Date: Mon Jul 25 2022 - 05:02:23 EST


On Mon, Jul 25, 2022 at 03:52:59PM +0900, Jiho Chu wrote:
> Hello,
>
> My name is Jiho Chu, and working for device driver and system daemon for
> several years at Samsung Electronics.
>
> Trinity Neural Processing Unit (NPU) series are hardware accelerators
> for neural network processing in embedded systems, which are integrated
> into application processors or SoCs. Trinity NPU is compatible with AMBA
> bus architecture and first launched in 2018 with its first version for
> vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2,
> is released in Dec, 2021. Another Trinity NPU for audio processing is
> referred as TRIA.
>
> TRIV2 is shipped for many models of 2022 Samsung TVs, providing
> acceleration for various AI-based applications, which include image
> recognition and picture quality improvements for streaming video, which
> can be accessed via GStreamer and its neural network plugins,
> NNStreamer.
>
> In this patch set, it includes Trinity Vision 2 kernel device driver.
> Trinity Vision 2 supports accelerating image inference process for
> Convolution Neural Network (CNN). The CNN workload is executed by Deep
> Learning Accelerator (DLA), and general Neural Network Layers are
> executed by Digital Signal Processor (DSP). And there is a Control
> Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP,
> CP) are composing Trinity Vision 2 NPU, and the device driver mainly
> supervise the CP to manage entire NPU.
>
> Controlling DLA and DSP operations is performed with internal command
> instructions. and the instructions for the Trinity is similar with
> general processor's ISA, but it is specialized for Neural Processing
> operations. The virtual ISA (vISA) is designed for calculating multiple
> data with single operation, like modern SIMD processor. The device
> driver loads a program to CP at start up, and the program can decode a
> binary which is built with the vISA. We calls this decoding program as a
> Instruction Decoding Unit (IDU) program. While running the NPU, the CP
> executes IDU program to fetch and decode instructions which made up of
> vISA, by the scheduling policy of the device driver.
>
> These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the
> Trinity can easily communicate with most ARM processors. Each IPs
> designed to have memory-mapped registers which can be used to control
> the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe
> interrupt signals from the DLA and DSP. Also, embedded Direct Memory
> Access Controller (DMAC) manages data communications between internal
> SRAM and outer main memory, IOMMU module supports unified memory space.
>
> A user can control the Trinity NPU with IOCTLs provided by driver. These
> controls includes memory management operations to transfer model data
> (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit
> workload (RUN/STOP), and statistics operations to check current NPU
> status. (STAT)
>
> The device driver also implemented features for developers. It provides
> sysfs control attributes like stop, suspend, sched_test, and profile.
> Also, it provides status attributes like app status, a number of total
> requests, a number of active requests and memory usages. For the tracing
> operations, several ftrace events are defined and embedded for several
> important points.

If you have created sysfs files, you need to document them in
Documentation/ABI/ which I do not see in your diffstat. Perhaps add
that for your next respin?

Also, please remove the "tracing" logic you have in the code, use
ftrace, don't abuse dev_info() everywhere, that's not needed at all.

thanks,

greg k-h