Re: [EXTERNAL] Re: [PATCH v5 08/17] drm/imagination: Add GEM and VM related code

From: Sarah Walker
Date: Fri Aug 18 2023 - 10:20:33 EST


On Fri, 2023-08-18 at 00:42 +0200, Jann Horn wrote:
> *** CAUTION: This email originates from a source not known to Imagination Technologies. Think before you click a link or open an attachment ***
>
> Hi!
>
> Thanks, I think it's great that Imagination is writing an upstream
> driver for PowerVR. :)
>
> On Wed, Aug 16, 2023 at 10:25 AM Sarah Walker <sarah.walker@xxxxxxxxxx> wrote:
> > +#define PVR_BO_CPU_CACHED BIT_ULL(63)
> > +
> > +#define PVR_BO_FW_NO_CLEAR_ON_RESET BIT_ULL(62)
> > +
> > +/* Bits 62..3 are undefined. */
> > +/* Bits 2..0 are defined in the UAPI. */
> > +
> > +/* Other utilities. */
> > +#define PVR_BO_UNDEFINED_MASK GENMASK_ULL(61, 3)
> > +#define PVR_BO_RESERVED_MASK (PVR_BO_UNDEFINED_MASK | GENMASK_ULL(63, 63))
>
> In commit 1a9c568fb559 ("drm/imagination: Rework firmware object
> initialisation") in powervr-next, PVR_BO_FW_NO_CLEAR_ON_RESET (bit 62)
> was added in the kernel-only flags group, but the mask
> PVR_BO_RESERVED_MASK (which is used in pvr_ioctl_create_bo to detect
> kernel-only and reserved flags) looks like it wasn't changed to
> include bit 62. I think that means it becomes possible for userspace
> to pass this bit in via pvr_ioctl_create_bo()?

Yes, this is a bug. Will fix (and refactor a bit).

> > +/**
> > + * pvr_page_table_l2_entry_raw_set() - Write a valid entry into a raw level 2
> > + * page table.
> > + * @entry: Target raw level 2 page table entry.
> > + * @child_table_dma_addr: DMA address of the level 1 page table to be
> > + * associated with @entry.
> > + *
> > + * When calling this function, @child_table_dma_addr must be a valid DMA
> > + * address and a multiple of %ROGUE_MMUCTRL_PC_DATA_PD_BASE_ALIGNSIZE.
> > + */
> > +static void
> > +pvr_page_table_l2_entry_raw_set(struct pvr_page_table_l2_entry_raw *entry,
> > + dma_addr_t child_table_dma_addr)
> > +{
> > + child_table_dma_addr >>= ROGUE_MMUCTRL_PC_DATA_PD_BASE_ALIGNSHIFT;
> > +
> > + entry->val =
> > + PVR_PAGE_TABLE_FIELD_PREP(2, PC, VALID, true) |
> > + PVR_PAGE_TABLE_FIELD_PREP(2, PC, ENTRY_PENDING, false) |
> > + PVR_PAGE_TABLE_FIELD_PREP(2, PC, PD_BASE, child_table_dma_addr);
> > +}
>
> For this function and others that manipulate page table entries,
> please use some kernel helper that ensures that the store can't tear
> (at least WRITE_ONCE() - that can still tear on 32-bit, but I see the
> driver depends on ARM64, so that's not a problem).

Will do.

> > +/**
> > + * pvr_page_table_l2_insert() - Insert an entry referring to a level 1 page
> > + * table into a level 2 page table.
> > + * @op_ctx: Target MMU op context pointing at the entry to insert the L1 page
> > + * table into.
> > + * @child_table: Target level 1 page table to be referenced by the new entry.
> > + *
> > + * It is the caller's responsibility to ensure @op_ctx.curr_page points to a
> > + * valid L2 entry.
> > + */
> > +static void
> > +pvr_page_table_l2_insert(struct pvr_mmu_op_context *op_ctx,
> > + struct pvr_page_table_l1 *child_table)
> > +{
> > + struct pvr_page_table_l2 *l2_table =
> > + &op_ctx->mmu_ctx->page_table_l2;
> > + struct pvr_page_table_l2_entry_raw *entry_raw =
> > + pvr_page_table_l2_get_entry_raw(l2_table,
> > + op_ctx->curr_page.l2_idx);
> > +
> > + pvr_page_table_l2_entry_raw_set(entry_raw,
> > + child_table->backing_page.dma_addr);
>
> Can you maybe add comments in functions that set page table entries to
> document who is responsible for using a memory barrier (like wmb()) to
> ensure that the creation of a page table entry is ordered after the
> thing it points to is fully initialized, so that the GPU can't end up
> concurrently walking into a page table and observe its old contents
> from before it was zero-initialized?

Will do.

> > +static int
> > +pvr_page_table_l1_get_or_insert(struct pvr_mmu_op_context *op_ctx,
> > + bool should_insert)
> > +{
> > + struct pvr_page_table_l2 *l2_table =
> > + &op_ctx->mmu_ctx->page_table_l2;
> > + struct pvr_page_table_l1 *table;
> > + int err;
> > +
> > + if (pvr_page_table_l2_entry_is_valid(l2_table,
> > + op_ctx->curr_page.l2_idx)) {
> > + op_ctx->curr_page.l1_table =
> > + l2_table->entries[op_ctx->curr_page.l2_idx];
> > + return 0;
> > + }
> > +
> > + if (!should_insert)
> > + return -ENXIO;
> > +
> > + /* Take a prealloced table. */
> > + table = op_ctx->l1_free_tables;
> > + if (!table)
> > + return -ENOMEM;
> > +
> > + err = pvr_page_table_l1_init(table, op_ctx->mmu_ctx->pvr_dev);
>
> I think when we have a preallocated table here, it was allocated in
> pvr_page_table_l1_alloc(), which already called
> pvr_page_table_l1_init()? So it looks to me like this second
> pvr_page_table_l1_init() call will allocate another page and leak the
> old allocation.

Yes, this is also a bug. Will address.

> +/**
> > + * pvr_mmu_op_context_create() - Create an MMU op context.
> > + * @ctx: MMU context associated with owning VM context.
> > + * @sgt: Scatter gather table containing pages pinned for use by this context.
> > + * @sgt_offset: Start offset of the requested device-virtual memory mapping.
> > + * @size: Size in bytes of the requested device-virtual memory mapping. For an
> > + * unmapping, this should be zero so that no page tables are allocated.
> > + *
> > + * Returns:
> > + * * Newly created MMU op context object on success, or
> > + * * -%ENOMEM if no memory is available,
> > + * * Any error code returned by pvr_page_table_l2_init().
> > + */
> > +struct pvr_mmu_op_context *
> > +pvr_mmu_op_context_create(struct pvr_mmu_context *ctx, struct sg_table *sgt,
> > + u64 sgt_offset, u64 size)
> > +{
> > + int err;
> > +
> > + struct pvr_mmu_op_context *op_ctx =
> > + kzalloc(sizeof(*op_ctx), GFP_KERNEL);
> > +
> > + if (!op_ctx)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + op_ctx->mmu_ctx = ctx;
> > + op_ctx->map.sgt = sgt;
> > + op_ctx->map.sgt_offset = sgt_offset;
> > + op_ctx->sync_level_required = PVR_MMU_SYNC_LEVEL_NONE;
> > +
> > + if (size) {
> > + const u32 l1_start_idx = pvr_page_table_l2_idx(sgt_offset);
> > + const u32 l1_end_idx = pvr_page_table_l2_idx(sgt_offset + size);
> > + const u32 l1_count = l1_end_idx - l1_start_idx + 1;
> > + const u32 l0_start_idx = pvr_page_table_l1_idx(sgt_offset);
> > + const u32 l0_end_idx = pvr_page_table_l1_idx(sgt_offset + size);
> > + const u32 l0_count = l0_end_idx - l0_start_idx + 1;
>
> Shouldn't the page table indices be calculated from the device_addr
> (which is not currently passed in by pvr_vm_map())? As far as I can
> tell, sgt_offset doesn't have anything to do with the device address
> at which this mapping will be inserted in the page tables?

This code is correct, but badly documented; this function only cares about the
number of l0/l1 pages required, not the address. Will improve it to make less
confusing.

Thanks,
Sarah