Re: [PATCH v5 1/5] vfio: Add the device features for the low power entry and exit

From: Abhishek Sahu
Date: Tue Jul 26 2022 - 08:47:40 EST


On 7/26/2022 3:39 AM, Alex Williamson wrote:
> On Mon, 25 Jul 2022 20:10:44 +0530
> Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote:
>
>> On 7/22/2022 4:04 AM, Alex Williamson wrote:
>>> On Tue, 19 Jul 2022 17:45:19 +0530
>>> Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote:
>>>
>>>> This patch adds the following new device features for the low
>>>> power entry and exit in the header file. The implementation for the
>>>> same will be added in the subsequent patches.
>>>>
>>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
>>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
>>>> - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT
>>>>
>>>> With the standard registers, all power states cannot be achieved. The
>>>
>>> We're talking about standard PCI PM registers here, let's make that
>>> clear since we're adding a device agnostic interface here.
>>>
>>>> platform-based power management needs to be involved to go into the
>>>> lowest power state. For doing low power entry and exit with
>>>> platform-based power management, these device features can be used.
>>>>
>>>> The entry device feature has two variants. These two variants are mainly
>>>> to support the different behaviour for the low power entry.
>>>> If there is any access for the VFIO device on the host side, then the
>>>> device will be moved out of the low power state without the user's
>>>> guest driver involvement. Some devices (for example NVIDIA VGA or
>>>> 3D controller) require the user's guest driver involvement for
>>>> each low-power entry. In the first variant, the host can move the
>>>> device into low power without any guest driver involvement while
>>>
>>> Perhaps, "In the first variant, the host can return the device to low
>>> power automatically. The device will continue to attempt to reach low
>>> power until the low power exit feature is called."
>>>
>>>> in the second variant, the host will send a notification to the user
>>>> through eventfd and then the users guest driver needs to move
>>>> the device into low power.
>>>
>>> "In the second variant, if the device exits low power due to an access,
>>> the host kernel will signal the user via the provided eventfd and will
>>> not return the device to low power without a subsequent call to one of
>>> the low power entry features. A call to the low power exit feature is
>>> optional if the user provided eventfd is signaled."
>>>
>>>> These device features only support VFIO_DEVICE_FEATURE_SET operation.
>>>
>>> And PROBE.
>>>
>>
>> Thanks Alex.
>> I will make the above changes in the commit message.
>>
>>>> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
>>>> ---
>>>> include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 55 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>> index 733a1cddde30..08fd3482d22b 100644
>>>> --- a/include/uapi/linux/vfio.h
>>>> +++ b/include/uapi/linux/vfio.h
>>>> @@ -986,6 +986,61 @@ enum vfio_device_mig_state {
>>>> VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>>>> };
>>>>
>>>> +/*
>>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state
>>>> + * with the platform-based power management. This low power state will be
>>>
>>> This is really "allow the device to be moved into a low power state"
>>> rather than actually "move the device into" such a state though, right?
>>>
>>
>> Yes. It will just allow the device to be moved into a low power state.
>> I have addressed all your suggestions in the uAPI description and
>> added the updated description in the last.
>>
>> Can you please check that once and check if it looks okay.
>>
>>>> + * internal to the VFIO driver and the user will not come to know which power
>>>> + * state is chosen. If any device access happens (either from the host or
>>>> + * the guest) when the device is in the low power state, then the host will
>>>> + * move the device out of the low power state first. Once the access has been
>>>> + * finished, then the host will move the device into the low power state again.
>>>> + * If the user wants that the device should not go into the low power state
>>>> + * again in this case, then the user should use the
>>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature for the
>>>
>>> This should probably just read "For single shot low power support with
>>> wake-up notification, see
>>> VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below."
>>>
>>>> + * low power entry. The mmap'ed region access is not allowed till the low power
>>>> + * exit happens through VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will
>>>> + * generate the access fault.
>>>> + */
>>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3
>>>
>>> Note that Yishai's DMA logging set is competing for the same feature
>>> entries. We'll need to coordinate.
>>>
>>
>> Since this is last week of rc, so will it possible to consider the updated
>> patches for next kernel (I can try to send them after complete testing the
>> by the end of this week). Otherwise, I can wait for next kernel and then
>> I can rebase my patches.
>
> I think we're close, though I would like to solicit uAPI feedback from
> others on the list. We can certainly sort out feature numbers
> conflicts at commit time if Yishai's series becomes ready as well. I'm
> on PTO for the rest of the week starting tomorrow afternoon, but I'd
> suggest trying to get this re-posted before the end of the week, asking
> for more reviews for the uAPI, and we can evaluate early next week if
> this is ready.
>

Sure. I will re-post the updated patch series after the complete testing.

>>>> +
>>>> +/*
>>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state
>>>> + * with the platform-based power management and provide support for the wake-up
>>>> + * notifications through eventfd. This low power state will be internal to the
>>>> + * VFIO driver and the user will not come to know which power state is chosen.
>>>> + * If any device access happens (either from the host or the guest) when the
>>>> + * device is in the low power state, then the host will move the device out of
>>>> + * the low power state first and a notification will be sent to the guest
>>>> + * through eventfd. Once the access is finished, the host will not move back
>>>> + * the device into the low power state. The guest should move the device into
>>>> + * the low power state again upon receiving the wakeup notification. The
>>>> + * notification will be generated only if the device physically went into the
>>>> + * low power state. If the low power entry has been disabled from the host
>>>> + * side, then the device will not go into the low power state even after
>>>> + * calling this device feature and then the device access does not require
>>>> + * wake-up. The mmap'ed region access is not allowed till the low power exit
>>>> + * happens. The low power exit can happen either through
>>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the
>>>> + * wake-up notification has been generated).
>>>
>>> Seems this could leverage a lot more from the previous, simply stating
>>> that this has the same behavior as VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
>>> with the exception that the user provides an eventfd for notification
>>> when the device resumes from low power and will not try to re-enter a
>>> low power state without a subsequent user call to one of the low power
>>> entry feature ioctls. It might also be worth covering the fact that
>>> device accesses by the user, including region and ioctl access, will
>>> also trigger the eventfd if that access triggers a resume.
>>>
>>> As I'm thinking about this, that latter clause is somewhat subtle.
>>> AIUI a user can call the low power entry with wakeup feature and
>>> proceed to do various ioctl and region (not mmap) accesses that could
>>> perpetually keep the device awake, or there may be dependent devices
>>> such that the device may never go to low power. It needs to be very
>>> clear that only if the wakeup eventfd has the device entered into and
>>> exited a low power state making the low power exit ioctl optional.
>>>
>>
>> Yes. In my updated description, I have added more details.
>> Can you please check if that helps.
>>
>>>> + */
>>>> +struct vfio_device_low_power_entry_with_wakeup {
>>>> + __s32 wakeup_eventfd;
>>>> + __u32 reserved;
>>>> +};
>>>> +
>>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4
>>>> +
>>>> +/*
>>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device out of the low power
>>>> + * state.
>>>
>>> Any ioctl effectively does that, the key here is that the low power
>>> state may not be re-entered after this ioctl.
>>>
>>>> This device feature should be called only if the user has previously
>>>> + * put the device into the low power state either with
>>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
>>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. If the
>>>
>>> Doesn't really seem worth mentioning, we need to protect against misuse
>>> regardless.
>>>
>>>> + * device is not in the low power state currently, this device feature will
>>>> + * return early with the success status.
>>>
>>> This is an implementation detail, it doesn't need to be part of the
>>> uAPI. Thanks,
>>>
>>> Alex
>>>
>>>> + */
>>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5
>>>> +
>>>> /* -------- API for Type1 VFIO IOMMU -------- */
>>>>
>>>> /**
>>>
>>
>> /*
>> * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
>> * state with the platform-based power management. This low power state will
>> * be internal to the VFIO driver and the user will not come to know which
>> * power state is chosen. If any device access happens (either from the host
>
> Couldn't the user look in sysfs to determine the power state? There
> might also be external hardware monitors of the power state, so this
> statement is a bit overreaching. Maybe something along the lines of...
>
> "Device use of lower power states depend on factors managed by the
> runtime power management core, including system level support and
> coordinating support among dependent devices. Enabling device low
> power entry does not guarantee lower power usage by the device, nor is
> a mechanism provided through this feature to know the current power
> state of the device."
>
>> * or the guest) when the device is in the low power state, then the host will
>
> Let's not confine ourselves to a VM use case, "...from the host or
> through the vfio uAPI".
>
>> * move the device out of the low power state first. Once the access has been
>
> "move the device out of the low power state as necessary prior to the
> access."
>
>> * finished, then the host will move the device into the low power state
>
> "Once the access is completed, the device may re-enter the low power
> state."
>
>> * again. For single shot low power support with wake-up notification, see
>> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. The mmap'ed region
>> * access is not allowed till the low power exit happens through
>> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will generate the access fault.
>
> "Access to mmap'd device regions is disabled on LOW_POWER_ENTRY and
> may only be resumed after calling LOW_POWER_EXIT."
>
>> */
>> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3
>>
>> /*
>> * This device feature has the same behavior as
>> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user
>> * provides an eventfd for wake-up notification. When the device moves out of
>> * the low power state for the wake-up, the host will not try to re-enter a
>
> "...will not allow the device to re-enter a low power state..."
>
>> * low power state without a subsequent user call to one of the low power
>> * entry device feature IOCTLs. The low power exit can happen either through
>> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the
>> * wake-up notification has been generated).
>> *
>> * The notification through eventfd will be generated only if the device has
>> * gone into the low power state after calling this device feature IOCTL.
>
> "The notification through the provided eventfd will be generated only
> when the device has entered and is resumed from a low power state after
> calling this device feature IOCTL."
>
>
>> * There are various cases where the device will not go into the low power
>> * state after calling this device feature IOCTL (for example, the low power
>> * entry has been disabled from the host side, the user keeps the device busy
>> * after calling this device feature IOCTL, there are dependent devices which
>> * block the device low power entry, etc.) and in such cases, the device access
>> * does not require wake-up. Also, the low power exit through
>
> "A device that has not entered low power state, as managed through the
> runtime power management core, will not generate a notification through
> the provided eventfd on access."
>
>> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT is mandatory for the cases where the
>> * wake-up notification has not been generated.
>
> "Calling the LOW_POWER_EXIT feature is optional in the case where
> notification has been signaled on the provided eventfd that a resume
> from low power has occurred."
>
> We should also reiterate the statement above about mmap access because
> it would be reasonable for a user to assume that if any access wakes
> the device, that should include mmap faults and therefore a resume
> notification should occur for such an event. We could implement that
> for the one-shot mode, but we're choosing not to.
>
>> */
>> struct vfio_device_low_power_entry_with_wakeup {
>> __s32 wakeup_eventfd;
>> __u32 reserved;
>> };
>>
>> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4
>>
>> /*
>> * Upon VFIO_DEVICE_FEATURE_SET, prevent the VFIO device low power state entry
>> * which has been previously allowed with VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY
>> * or VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features.
>> */
>
> "Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states
> as previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
> VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. This
> device feature ioctl may itself generate a wakeup eventfd notification
> in the latter case if the device has previously entered a low power
> state."
>
> Thanks,
> Alex
>

Thanks Alex for your thorough review of uAPI.
I have incorporated all the suggestions.
Following is the updated uAPI.

/*
* Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
* state with the platform-based power management. Device use of lower power
* states depends on factors managed by the runtime power management core,
* including system level support and coordinating support among dependent
* devices. Enabling device low power entry does not guarantee lower power
* usage by the device, nor is a mechanism provided through this feature to
* know the current power state of the device. If any device access happens
* (either from the host or through the vfio uAPI) when the device is in the
* low power state, then the host will move the device out of the low power
* state as necessary prior to the access. Once the access is completed, the
* device may re-enter the low power state. For single shot low power support
* with wake-up notification, see
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. Access to mmap'd
* device regions is disabled on LOW_POWER_ENTRY and may only be resumed after
* calling LOW_POWER_EXIT.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3

/*
* This device feature has the same behavior as
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user
* provides an eventfd for wake-up notification. When the device moves out of
* the low power state for the wake-up, the host will not allow the device to
* re-enter a low power state without a subsequent user call to one of the low
* power entry device feature IOCTLs. Access to mmap'd device regions is
* disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the
* low power exit. The low power exit can happen either through LOW_POWER_EXIT
* or through any other access (where the wake-up notification has been
* generated). The access to mmap'd device regions will not trigger low power
* exit.
*
* The notification through the provided eventfd will be generated only when
* the device has entered and is resumed from a low power state after
* calling this device feature IOCTL. A device that has not entered low power
* state, as managed through the runtime power management core, will not
* generate a notification through the provided eventfd on access. Calling the
* LOW_POWER_EXIT feature is optional in the case where notification has been
* signaled on the provided eventfd that a resume from low power has occurred.
*/
struct vfio_device_low_power_entry_with_wakeup {
__s32 wakeup_eventfd;
__u32 reserved;
};

#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4

/*
* Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as
* previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or
* VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features.
* This device feature IOCTL may itself generate a wakeup eventfd notification
* in the latter case if the device has previously entered a low power state.
*/
#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5

Regards,
Abhishek