Re: [PATCH] [Draft]: media: videobuf2-dma-heap: add a vendor defined memory runtine

From: Hsia-Jun Li
Date: Thu Aug 18 2022 - 02:31:17 EST




On 8/18/22 13:50, Tomasz Figa wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi Randy,

Sorry for the late reply, I went on vacation last week.

On Sun, Aug 7, 2022 at 12:23 AM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote:



On 8/5/22 18:09, Tomasz Figa wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


On Tue, Aug 2, 2022 at 9:21 PM ayaka <ayaka@xxxxxxxxxxx> wrote:

Sorry, the previous one contains html data.

On Aug 2, 2022, at 3:33 PM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:

On Mon, Aug 1, 2022 at 8:43 PM ayaka <ayaka@xxxxxxxxxxx> wrote:
Sent from my iPad
On Aug 1, 2022, at 5:46 PM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
On Mon, Aug 1, 2022 at 3:44 PM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote:
On 8/1/22 14:19, Tomasz Figa wrote:
Hello Tomasz
?Hi Randy,
On Mon, Aug 1, 2022 at 5:21 AM <ayaka@xxxxxxxxxxx> wrote:
From: Randy Li <ayaka@xxxxxxxxxxx>
This module is still at a early stage, I wrote this for showing what
APIs we need here.
Let me explain why we need such a module here.
If you won't allocate buffers from a V4L2 M2M device, this module
may not be very useful. I am sure the most of users won't know a
device would require them allocate buffers from a DMA-Heap then
import those buffers into a V4L2's queue.
Then the question goes back to why DMA-Heap. From the Android's
description, we know it is about the copyright's DRM.
When we allocate a buffer in a DMA-Heap, it may register that buffer
in the trusted execution environment so the firmware which is running
or could only be acccesed from there could use that buffer later.
The answer above leads to another thing which is not done in this
version, the DMA mapping. Although in some platforms, a DMA-Heap
responses a IOMMU device as well. For the genernal purpose, we would
be better assuming the device mapping should be done for each device
itself. The problem here we only know alloc_devs in those DMAbuf
methods, which are DMA-heaps in my design, the device from the queue
is not enough, a plane may requests another IOMMU device or table
for mapping.
Signed-off-by: Randy Li <ayaka@xxxxxxxxxxx>
---
drivers/media/common/videobuf2/Kconfig | 6 +
drivers/media/common/videobuf2/Makefile | 1 +
.../common/videobuf2/videobuf2-dma-heap.c | 350 ++++++++++++++++++
include/media/videobuf2-dma-heap.h | 30 ++
4 files changed, 387 insertions(+)
create mode 100644 drivers/media/common/videobuf2/videobuf2-dma-heap.c
create mode 100644 include/media/videobuf2-dma-heap.h
First of all, thanks for the series.
Possibly a stupid question, but why not just allocate the DMA-bufs
directly from the DMA-buf heap device in the userspace and just import
the buffers to the V4L2 device using V4L2_MEMORY_DMABUF?
Sometimes the allocation policy could be very complex, let's suppose a
multiple planes pixel format enabling with frame buffer compression.
Its luma, chroma data could be allocated from a pool which is delegated
for large buffers while its metadata would come from a pool which many
users could take some few slices from it(likes system pool).
Then when we have a new users knowing nothing about this platform, if we
just configure the alloc_devs in each queues well. The user won't need
to know those complex rules.
The real situation could be more complex, Samsung MFC's left and right
banks could be regarded as two pools, many devices would benefit from
this either from the allocation times or the security buffers policy.
In our design, when we need to do some security decoding(DRM video),
codecs2 would allocate buffers from the pool delegated for that. While
the non-DRM video, users could not care about this.
I'm a little bit surprised about this, because on Android all the
graphics buffers are allocated from the system IAllocator and imported
to the specific devices.
In the non-tunnel mode, yes it is. While the tunnel mode is completely vendor defined. Neither HWC nor codec2 cares about where the buffers coming from, you could do what ever you want.
Besides there are DRM video in GNU Linux platform, I heard the webkit has made huge effort here and Playready is one could work in non-Android Linux.
Would it make sense to instead extend the UAPI to expose enough
information about the allocation requirements to the userspace, so it
can allocate correctly?
Yes, it could. But as I said it would need the users to do more works.
My reasoning here is that it's not a driver's decision to allocate
from a DMA-buf heap (and which one) or not. It's the userspace which
knows that, based on the specific use case that it wants to fulfill.
Although I would like to let the users decide that, users just can’t do that which would violate the security rules in some platforms.
For example, video codec and display device could only access a region of memory, any other device or trusted apps can’t access it. Users have to allocate the buffer from the pool the vendor decided.
So why not we offer a quick way that users don’t need to try and error.

In principle, I'm not against integrating DMA-buf heap with vb2,
however I see some problems I mentioned before:

1) How would the driver know if it should allocate from a DMA-buf heap or not?

struct vb2_queue.mem_ops

int (*queue_setup)(struct vb2_queue *q,unsigned int *num_buffers, unsigned int *num_planes, unsigned int sizes[], struct device *alloc_devs[]);

Sorry, I don't understand what you mean here.

Just to make sure we're on the same page - what I'm referring to is
that whether DMA-buf heap is used or not is specific to a given use
case, which is controlled by the userspace. So the userspace must be
able to control whether the driver allocates from a DMA-buf heap or
the regular way.
No, it does not depend on the use case here. We don't accept any buffers
beyond the region we decided. Even for the non-DRM, non-security video,
our codec devices are running under the secure mode.

You MUST allocate the buffer for a device from the DMA-heap we(SYNA)
decided.

That's your use case, but there could be use cases which work
differently. In fact, in ChromeOS we only use the secure allocation
path for protected content, because it imposes some overhead.


I believe some other devices may have much limitation for not the
security reason, for example, it can't access the memory above 4 GiB or
for the performance's reason.

For such limitations, there is the shared DMA pool or restricted DMA
pool functionality, which can be given to a device in DT and then the
DMA mapping API would just allocate within that area for that device.
Maybe that's what you need here?

For Synaptics VideoSmart devices, it is simple that we want to limit the memory region a IP device could access. I just try to find out some reasons here.



2) How would the driver know which heap to allocate from?

From vb2_queue.alloc_devs

What I did now is likes what MFC does, create some mem_alloc_devs.
It would be better that we could retrieve the DMA-heaps’ devices from kernel, but that is not enough, we need a place to store the heap flags although none of them are defined yet.

From Android documents, I think it is unlikely we would have heap flags.
“Standardization: The DMA-BUF heaps framework offers a well-defined UAPI. ION allowed custom flags and heap IDs that prevented developing a common testing framework because each device’s ION implementation could behave differently.”


alloc_devs is something that the driver sets and it's a struct device
for which the DMA API can be called to manage the DMA buffers for this
video device. It's not a way to select a use case-dependent allocation
method.

I see, then move to the last question, we need to expand the V4L2
framework's structure.
3) How would the heap know how to allocate properly for the device?

Because “each DMA-BUF heap is a separate character device”.

Could you elaborate? Sorry, I'm not sure how this answers my question.
Because a DMA-heap, a heap device itself is enough here, may plus HEAP
flag when there is.

I don't know what else would be need to do here.
If you allocate a buffer from a heap which is delegated for security
memory of that device, the heap driver itself would inform the TEE the
pages occupied by it or the memory allocated from the pool which is in a
region of memory reserved for this purpose.

So the heap is only for the video device?

dma-heaps?
There are heaps for dolby audio, NPU and DRM IPs.
Even the GPU shader or AI model in SPIR-V could be protected in the synaptics's platform.

But as I said in the first draft I am not sure about the DMA mapping part. alloc_devs responds for the heap, we have a device variable in the queue that mapping function could access, but that may not be enough. A plane may apply a different mapping policy or IOMMU here.

Would it be better that I create a interface here that creating a memdev with DMA-heap description ?

My intuition still tells me that it would be universally better to
just let the userspace allocate the buffers independently (like with
gralloc/Ion) and import to V4L2 using V4L2_MEM_DMABUF. It was possible
to do things this way nicely with regular Android graphics buffers, so
could you explain what difference of your use case makes it
impossible?
Without keeping the backward compatibility, it won't have any problem IF
we could tell the users the acceptable DMA-heap for each of planes and
DMA-heap's heap flags.

We don't have an ioctl for this yet, the most possible for the decoder
is doing that at GET_FMT ioctl()?.

Do we need the kernel to tell the userspace which heap to use? As you
mentioned above, the heap would be specific for the video device and
the userspace would also be specific for your use case, so why
couldn't it just find the right heap on its own (e.g. by name)?
Yes, as long as you won't mind me expanding the v4l2 allocator APIs.

As I would mention this in the synaptics pxiel formats' email, mvtp may (in the most of case) use a different heap with the one luma and chroma data.

As for heap flags, could you elaborate on what kind of flags you
imagine could be decided by a V4L2 driver?
I don't much idea here acutally, I don't even know any pretension need here. But if I could expand the APIs, I would just leave a windows for the furture usage.

Best regards,
Tomasz

--
Hsia-Jun(Randy) Li