[PATCH 16/20] habanalabs: reset device if still in use when released

From: Oded Gabbay
Date: Thu Nov 17 2022 - 11:21:06 EST


From: Tomer Tayar <ttayar@xxxxxxxxx>

If the device file is released while a context is still held, it won't
be possible to reopen it until the context is eventually released.
If that doesn't happen, only a device reset will revert it back to an
operational state, i.e. need to wait for a CS timeout or an error, or to
wait for an external intervention of injecting a reset via sysfs.

At this stage, after the device was released by user, context is held
either because of CS which were left running on the device and are not
relevant anymore, or due to missing cleanup steps from user side.

All of this is in any case handled in the device reset flow, so initiate
the reset at this point instead of waiting for it.

Signed-off-by: Tomer Tayar <ttayar@xxxxxxxxx>
Reviewed-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
---
drivers/misc/habanalabs/common/device.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 708db0f48ee0..49640c8ca910 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -504,9 +504,10 @@ static int hl_device_release(struct inode *inode, struct file *filp)

hdev->compute_ctx_in_release = 1;

- if (!hl_hpriv_put(hpriv))
- dev_notice(hdev->dev,
- "User process closed FD but device still in use\n");
+ if (!hl_hpriv_put(hpriv)) {
+ dev_notice(hdev->dev, "User process closed FD but device still in use\n");
+ hl_device_reset(hdev, HL_DRV_RESET_HARD);
+ }

hdev->last_open_session_duration_jif =
jiffies - hdev->last_successful_open_jif;
--
2.25.1