Re: [PATCH] drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()

From: Krzysztof Kozlowski
Date: Tue Nov 21 2023 - 11:35:42 EST


On 21/11/2023 17:11, AngeloGioacchino Del Regno wrote:
> Il 21/11/23 16:34, Krzysztof Kozlowski ha scritto:
>> On 08/11/2023 14:20, Steven Price wrote:
>>> On 02/11/2023 14:15, AngeloGioacchino Del Regno wrote:
>>>> The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to request
>>>> powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO ones:
>>>> this means that in order to request poweroff of cores, we are supposed
>>>> to write a bitmask of cores that should be powered off!
>>>> This means that the panfrost_gpu_power_off() function has always been
>>>> doing nothing.
>>>>
>>>> Fix powering off the GPU by writing a bitmask of the cores to poweroff
>>>> to the relevant PWROFF_LO registers and then check that the transition
>>>> (from ON to OFF) has finished by polling the relevant PWRTRANS_LO
>>>> registers.
>>>>
>>>> While at it, in order to avoid code duplication, move the core mask
>>>> logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask()
>>>> function, used in both poweron and poweroff.
>>>>
>>>> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
>>>> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
>>
>>
>> Hi,
>>
>> This commit was added to next recently but it causes "external abort on
>> non-linefetch" during boot of my Odroid HC1 board.
>>
>> At least bisect points to it.
>>
>> If fixed, please add:
>>
>> Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx>
>>
>> [ 4.861683] 8<--- cut here ---
>> [ 4.863429] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0c8802c
>> [ 4.871018] [f0c8802c] *pgd=433ed811, *pte=11800653, *ppte=11800453
>> ...
>> [ 5.164010] panfrost_gpu_irq_handler from __handle_irq_event_percpu+0xcc/0x31c
>> [ 5.171276] __handle_irq_event_percpu from handle_irq_event+0x38/0x80
>> [ 5.177765] handle_irq_event from handle_fasteoi_irq+0x9c/0x250
>> [ 5.183743] handle_fasteoi_irq from generic_handle_domain_irq+0x28/0x38
>> [ 5.190417] generic_handle_domain_irq from gic_handle_irq+0x88/0xa8
>> [ 5.196741] gic_handle_irq from generic_handle_arch_irq+0x34/0x44
>> [ 5.202893] generic_handle_arch_irq from __irq_svc+0x8c/0xd0
>>
>> Full log:
>> https://krzk.eu/#/builders/21/builds/4392/steps/11/logs/serial0
>>
>
> Hey Krzysztof,
>
> This is interesting. It might be about the cores that are missing from the partial
> core_mask raising interrupts, but an external abort on non-linefetch is strange to
> see here.
>
> Would you be available for some tests?
>
> I'm thinking to call power_off on all cores (all shaders, all tilers, all l2s),
> regardless of what panfrost_get_core_mask() says, as it could be that your GPU
> powers on the cores that are unused by Panfrost by default, and that then we never
> turn them off, escalating to this issue.
>
> If you can please please please test:
>
> void panfrost_gpu_power_off(struct panfrost_device *pfdev)
> {
> u64 core_mask = panfrost_get_core_mask(pfdev);
> int ret;
> u32 val;
>
> gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
> gpu_write(pfdev, SHADER_PWROFF_HI, pfdev->features.shader_present >> 32);
> ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
> val, !val, 1, 1000);
> if (ret)
> dev_err(pfdev->dev, "shader power transition timeout");
>
> gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present);
> gpu_write(pfdev, TILER_PWROFF_HI, pfdev->features.tiler_present >> 32);
> ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO,
> val, !val, 1, 1000);
> if (ret)
> dev_err(pfdev->dev, "tiler power transition timeout");
>
> gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
> ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
> val, !val, 0, 1000);
> if (ret)
> dev_err(pfdev->dev, "l2 power transition timeout");
>
> gpu_write(pfdev, L2_PWROFF_HI, pfdev->features.l2_present >> 32);
> ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_HI,
> val, !val, 0, 1000);
> if (ret)
> dev_err(pfdev->dev, "l2 power transition timeout");
> }
>

Send a diff please - against next or some other commit sha from next.

Best regards,
Krzysztof