Re: [PATCH 00/83] AMD HSA kernel driver

From: Gabbay, Oded
Date: Thu Jul 10 2014 - 18:51:47 EST


On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > This patch set implements a Heterogeneous System Architecture
> > (HSA) driver
> > for radeon-family GPUs.
>
> This is just quick comments on few things. Given size of this, people
> will need to have time to review things.
>
> > HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> > share
> > system resources more effectively via HW features including
> > shared pageable
> > memory, userspace-accessible work queues, and platform-level
> > atomics. In
> > addition to the memory protection mechanisms in GPUVM and
> > IOMMUv2, the Sea
> > Islands family of GPUs also performs HW-level validation of
> > commands passed
> > in through the queues (aka rings).
> > The code in this patch set is intended to serve both as a sample
> > driver for
> > other HSA-compatible hardware devices and as a production driver
> > for
> > radeon-family processors. The code is architected to support
> > multiple CPUs
> > each with connected GPUs, although the current implementation
> > focuses on a
> > single Kaveri/Berlin APU, and works alongside the existing radeon
> > kernel
> > graphics driver (kgd).
> > AMD GPUs designed for use with HSA (Sea Islands and up) share
> > some hardware
> > functionality between HSA compute and regular gfx/compute (memory,
> > interrupts, registers), while other functionality has been added
> > specifically for HSA compute (hw scheduler for virtualized
> > compute rings).
> > All shared hardware is owned by the radeon graphics driver, and
> > an interface
> > between kfd and kgd allows the kfd to make use of those shared
> > resources,
> > while HSA-specific functionality is managed directly by kfd by
> > submitting
> > packets into an HSA-specific command queue (the "HIQ").
> > During kfd module initialization a char device node (/dev/kfd) is
> > created
> > (surviving until module exit), with ioctls for queue creation &
> > management,
> > and data structures are initialized for managing HSA device
> > topology.
> > The rest of the initialization is driven by calls from the radeon
> > kgd at
> > the following points :
> > - radeon_init (kfd_init)
> > - radeon_exit (kfd_fini)
> > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > - radeon_driver_unload_kms (kfd_device_fini)
> > During the probe and init processing per-device data structures
> > are
> > established which connect to the associated graphics kernel
> > driver. This
> > information is exposed to userspace via sysfs, along with a
> > version number
> > allowing userspace to determine if a topology change has occurred
> > while it
> > was reading from sysfs.
> > The interface between kfd and kgd also allows the kfd to request
> > buffer
> > management services from kgd, and allows kgd to route interrupt
> > requests to
> > kfd code since the interrupt block is shared between regular
> > graphics/compute and HSA compute subsystems in the GPU.
> > The kfd code works with an open source usermode library
> > ("libhsakmt") which
> > is in the final stages of IP review and should be published in a
> > separate
> > repo over the next few days.
> > The code operates in one of three modes, selectable via the
> > sched_policy
> > module parameter :
> > - sched_policy=0 uses a hardware scheduler running in the MEC
> > block within
> > CP, and allows oversubscription (more queues than HW slots)
> > - sched_policy=1 also uses HW scheduling but does not allow
> > oversubscription, so create_queue requests fail when we run out
> > of HW slots
> > - sched_policy=2 does not use HW scheduling, so the driver
> > manually assigns
> > queues to HW slots by programming registers
> > The "no HW scheduling" option is for debug & new hardware bringup
> > only, so
> > has less test coverage than the other options. Default in the
> > current code
> > is "HW scheduling without oversubscription" since that is where
> > we have the
> > most test coverage but we expect to change the default to "HW
> > scheduling
> > with oversubscription" after further testing. This effectively
> > removes the
> > HW limit on the number of work queues available to applications.
> > Programs running on the GPU are associated with an address space
> > through the
> > VMID field, which is translated to a unique PASID at access time
> > via a set
> > of 16 VMID-to-PASID mapping registers. The available VMIDs
> > (currently 16)
> > are partitioned (under control of the radeon kgd) between current
> > gfx/compute and HSA compute, with each getting 8 in the current
> > code. The
> > VMID-to-PASID mapping registers are updated by the HW scheduler
> > when used,
> > and by driver code if HW scheduling is not being used.
> > The Sea Islands compute queues use a new "doorbell" mechanism
> > instead of the
> > earlier kernel-managed write pointer registers. Doorbells use a
> > separate BAR
> > dedicated for this purpose, and pages within the doorbell
> > aperture are
> > mapped to userspace (each page mapped to only one user address
> > space).
> > Writes to the doorbell aperture are intercepted by GPU hardware,
> > allowing
> > userspace code to safely manage work queues (rings) without
> > requiring a
> > kernel call for every ring update.
> > First step for an application process is to open the kfd device.
> > Calls to
> > open create a kfd "process" structure only for the first thread
> > of the
> > process. Subsequent open calls are checked to see if they are
> > from processes
> > using the same mm_struct and, if so, don't do anything. The kfd
> > per-process
> > data lives as long as the mm_struct exists. Each mm_struct is
> > associated
> > with a unique PASID, allowing the IOMMUv2 to make userspace
> > process memory
> > accessible to the GPU.
> > Next step is for the application to collect topology information
> > via sysfs.
> > This gives userspace enough information to be able to identify
> > specific
> > nodes (processors) in subsequent queue management calls.
> > Application
> > processes can create queues on multiple processors, and
> > processors support
> > queues from multiple processes.
>
> I am not a fan to use sysfs to discover topoly.
>
> > At this point the application can create work queues in userspace
> > memory and
> > pass them through the usermode library to kfd to have them mapped
> > onto HW
> > queue slots so that commands written to the queues can be
> > executed by the
> > GPU. Queue operations specify a processor node, and so the bulk
> > of this code
> > is device-specific.
> > Written by John Bridgman <John.Bridgman@xxxxxxx>
>
> So general comment is you need to collapse many patches things like
> 58 fixing
> kernel style should be collapsed ie fix all previous patch that have
> broken
> style.
>
> Even worse is thing like 71, removing code you just added few patch
> earlier
> in the patchset.
Not quite, the code was added on patch 11.

> This means that if i start reviewing following
> patch order
> i might spend time on code that is later deleted/modified/fixed ie
> time i
> spend understanding and reading some code might be just wasted.
Quick answer is that you are of course right, but having said that, I
think it would be not simple at all to do that squashing.
I squashed what I could, and probably I can do a little more (like
58), but the major problem is that one of the main modules of the
driver - the scheduler (QCM) - was completely re-written (patches
46-53). Therefore, from patch 1 to 53, we use the old scheduler code
and from 54 we use the new QCM (and the old scheduler code was
completely remove at 71). So I could maybe squash 71 into 54, but that
won't help you much, I think.

So, the current advice I can give is to completely ignore the
following files because they do not exist in the final commit:
- kfd_sched_cik_static.c (stopped using in 54)
- kfd_registers.c (stopped using in 81 because we moved all register
writes to radeon)
- kfd_hw_pointer_store.c (alive from 46 to 67)
- kfd_hw_pointer_store.h (alive from 46 to 67)

Oded
> I will come back with individual patch comments and also general
> remarks.
>
> Cheers,
> JÃrÃme
>
>
> > Alexey Skidanov (4):
> > hsa/radeon: 32-bit processes support
> > hsa/radeon: NULL pointer dereference bug workaround
> > hsa/radeon: HSA64/HSA32 modes support
> > hsa/radeon: Add local memory to topology
> > Andrew Lewycky (3):
> > hsa/radeon: Make binding of process to device permanent
> > hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > mm: Change timing of notification to IOMMUs about a page to be
> > invalidated
> > Ben Goz (20):
> > hsa/radeon: Add queue and hw_pointer_store modules
> > hsa/radeon: Add support allocating kernel doorbells
> > hsa/radeon: Add mqd_manager module
> > hsa/radeon: Add kernel queue support for KFD
> > hsa/radeon: Add module parameter of scheduling policy
> > hsa/radeon: Add packet manager module
> > hsa/radeon: Add process queue manager module
> > hsa/radeon: Add device queue manager module
> > hsa/radeon: Switch to new queue scheduler
> > hsa/radeon: Add IOCTL for update queue
> > hsa/radeon: Queue Management integration with Memory Management
> > hsa/radeon: update queue fault handling
> > hsa/radeon: fixing a bug to support 32b processes
> > hsa/radeon: Fix number of pipes per ME
> > hsa/radeon: Removing hw pointer store module
> > hsa/radeon: Adding some error messages
> > hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > drm/radeon: Add register access functions to kfd2kgd interface
> > hsa/radeon: Eliminating all direct register accesses
> > drm/radeon: Remove lock functions from kfd2kgd interface
> > Evgeny Pinchuk (9):
> > hsa/radeon: fix the OEMID assignment in kfd_topology
> > drm/radeon: extending kfd-kgd interface
> > hsa/radeon: implementing IOCTL for clock counters
> > drm/radeon: adding synchronization for GRBM GFX
> > hsa/radeon: fixing clock counters bug
> > drm/radeon: Extending kfd interface
> > hsa/radeon: Adding max clock speeds to topology
> > hsa/radeon: Alternating the source of max clock
> > hsa/radeon: Exclusive access for perf. counters
> > Michael Varga (1):
> > hsa/radeon: debugging print statements
> > Oded Gabbay (45):
> > mm: Add kfd_process pointer to mm_struct
> > drm/radeon: reduce number of free VMIDs and pipes in KV
> > drm/radeon: Report doorbell configuration to kfd
> > drm/radeon: Add radeon <--> kfd interface
> > drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > drm/radeon: Add kfd-->kgd interfaces of memory
> > allocation/mapping
> > drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl
> > register
> > drm/radeon: Add calls to initialize and finalize kfd from radeon
> > hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > hsa/radeon: Add initialization and unmapping of doorbell
> > aperture
> > hsa/radeon: Add scheduler code
> > hsa/radeon: Add kfd mmap handler
> > hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
> > DESTROY_QUEUE
> > hsa/radeon: Update MAINTAINERS and CREDITS files
> > hsa/radeon: Add interrupt handling module
> > hsa/radeon: Add the isr function of the KFD scehduler
> > hsa/radeon: Handle deactivation of queues using interrupts
> > hsa/radeon: Enable interrupts in KFD scheduler
> > hsa/radeon: Enable/Disable KFD interrupt module
> > hsa/radeon: Add interrupt callback function to kgd2kfd interface
> > hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > drm/radeon: Add calls to suspend and resume of kfd driver
> > drm/radeon/cik: Don't touch int of pipes 1-7
> > drm/radeon/cik: Call kfd isr function
> > hsa/radeon: Fix memory size allocated for HPD
> > hsa/radeon: Fix list of supported devices
> > hsa/radeon: Fix coding style in cik_int.h
> > hsa/radeon: Print ioctl commnad only in debug mode
> > hsa/radeon: Print ISR info only in debug mode
> > hsa/radeon: Workaround for a bug in amd_iommu
> > hsa/radeon: Eliminate warnings in compilation
> > hsa/radeon: Various kernel styling fixes
> > hsa/radeon: Rearrange structures in kfd_ioctl.h
> > hsa/radeon: change another pr_info to pr_debug
> > hsa/radeon: Fix timeout calculation in sync_with_hw
> > hsa/radeon: Update module information and version
> > hsa/radeon: Update module version to 0.6.0
> > hsa/radeon: Fix initialization of sh_mem registers
> > hsa/radeon: Fix compilation warnings
> > hsa/radeon: Remove old scheduler code
> > hsa/radeon: Static analysis (smatch) fixes
> > hsa/radeon: Check oversubscription before destroying runlist
> > hsa/radeon: Don't verify cksum when parsing CRAT table
> > hsa/radeon: Update module version to 0.6.1
> > hsa/radeon: Update module version to 0.6.2
> > Yair Shachar (1):
> > hsa/radeon: Adding qcm fence return status
> > CREDITS | 7 +
> > MAINTAINERS | 8 +
> > drivers/Kconfig | 2 +
> > drivers/gpu/Makefile | 1 +
> > drivers/gpu/drm/radeon/Makefile | 1 +
> > drivers/gpu/drm/radeon/cik.c | 156 +--
> > drivers/gpu/drm/radeon/cikd.h | 51 +-
> > drivers/gpu/drm/radeon/radeon.h | 9 +
> > drivers/gpu/drm/radeon/radeon_device.c | 32 +
> > drivers/gpu/drm/radeon/radeon_drv.c | 6 +
> > drivers/gpu/drm/radeon/radeon_kfd.c | 630
> > ++++++++++
> > drivers/gpu/drm/radeon/radeon_kms.c | 9 +
> > drivers/gpu/hsa/Kconfig | 20 +
> > drivers/gpu/hsa/Makefile | 1 +
> > drivers/gpu/hsa/radeon/Makefile | 12 +
> > drivers/gpu/hsa/radeon/cik_int.h | 50 +
> > drivers/gpu/hsa/radeon/cik_mqds.h | 250 ++++
> > drivers/gpu/hsa/radeon/cik_regs.h | 220 ++++
> > drivers/gpu/hsa/radeon/kfd_aperture.c | 123 ++
> > drivers/gpu/hsa/radeon/kfd_chardev.c | 530
> > +++++++++
> > drivers/gpu/hsa/radeon/kfd_crat.h | 294 +++++
> > drivers/gpu/hsa/radeon/kfd_device.c | 244 ++++
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c | 981
> > ++++++++++++++++
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h | 101 ++
> > drivers/gpu/hsa/radeon/kfd_doorbell.c | 242 ++++
> > drivers/gpu/hsa/radeon/kfd_interrupt.c | 177 +++
> > drivers/gpu/hsa/radeon/kfd_kernel_queue.c | 305 +++++
> > drivers/gpu/hsa/radeon/kfd_kernel_queue.h | 66 ++
> > drivers/gpu/hsa/radeon/kfd_module.c | 130 +++
> > drivers/gpu/hsa/radeon/kfd_mqd_manager.c | 290 +++++
> > drivers/gpu/hsa/radeon/kfd_mqd_manager.h | 54 +
> > drivers/gpu/hsa/radeon/kfd_packet_manager.c | 488
> > ++++++++
> > drivers/gpu/hsa/radeon/kfd_pasid.c | 97 ++
> > drivers/gpu/hsa/radeon/kfd_pm4_headers.h | 682
> > +++++++++++
> > drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h | 107 ++
> > drivers/gpu/hsa/radeon/kfd_priv.h | 475
> > ++++++++
> > drivers/gpu/hsa/radeon/kfd_process.c | 391 +++++++
> > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c | 343 ++++++
> > drivers/gpu/hsa/radeon/kfd_queue.c | 109 ++
> > drivers/gpu/hsa/radeon/kfd_scheduler.h | 72 ++
> > drivers/gpu/hsa/radeon/kfd_topology.c | 1207
> > ++++++++++++++++++++
> > drivers/gpu/hsa/radeon/kfd_topology.h | 168 +++
> > drivers/gpu/hsa/radeon/kfd_vidmem.c | 97 ++
> > include/linux/mm_types.h | 14 +
> > include/linux/radeon_kfd.h | 106 ++
> > include/uapi/linux/kfd_ioctl.h | 133 +++
> > mm/rmap.c | 8 +-
> > 47 files changed, 9402 insertions(+), 97 deletions(-)
> > create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > create mode 100644 drivers/gpu/hsa/Kconfig
> > create mode 100644 drivers/gpu/hsa/Makefile
> > create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > create mode 100644
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > create mode 100644
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > create mode 100644
> > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > create mode 100644 include/linux/radeon_kfd.h
> > create mode 100644 include/uapi/linux/kfd_ioctl.h
> > --
> > 1.9.1