Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error

From: Jarkko Sakkinen
Date: Thu Sep 01 2022 - 19:57:08 EST


On Thu, Sep 01, 2022 at 03:34:10PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 9/1/2022 2:53 PM, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> >> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
>
> >> I think I am missing something here. A lot of logic is added here but I
> >> do not see why it is necessary. ksgxd() knows via kthread_should_stop() if
> >> the reclaimer was canceled. I am thus wondering, could the above not be
> >> simplified to something similar to V1:
> >>
> >> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >>
> >> static int ksgxd(void *p)
> >> {
> >> + unsigned long left_dirty;
> >> +
> >> set_freezable();
> >>
> >> /*
> >> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >> * required for SECS pages, whose child pages blocked EREMOVE.
> >> */
> >> __sgx_sanitize_pages(&sgx_dirty_page_list);
> >
> > IMHO, would make sense also to have here:
> >
> > if (!kthread_should_stop())
> > return 0;
> >
>
> Would this not prematurely stop the thread when it should not be?
>
> >> - __sgx_sanitize_pages(&sgx_dirty_page_list);
> >>
> >> - /* sanity check: */
> >> - WARN_ON(!list_empty(&sgx_dirty_page_list));
> >> + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> >> + if (left_dirty && !kthread_should_stop())
> >> + pr_err("%lu unsanitized pages\n", left_dirty);
> >
> > That would be incorrect, if the function returned
> > because of kthread stopped.
>
>
> I should have highlighted this but in my example I changed
> left_dirty to be "unsigned long" with the intention that the
> "return -ECANCELED" is replaced with "return 0".

It wasn't supposed to be, it's an error. Thanks for spotting
that.

>
> __sgx_sanitize_pages() returns 0 when it exits because of
> kthread stopped.
>
> To elaborate I was thinking about:
>
> +static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
> {
> + unsigned long left_dirty = 0;
> struct sgx_epc_page *page;
> LIST_HEAD(dirty);
> int ret;
>
> - /* dirty_page_list is thread-local, no need for a lock: */
> while (!list_empty(dirty_page_list)) {
> if (kthread_should_stop())
> - return;
> + return 0;
>
> page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>
> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> } else {
> /* The page is not yet clean - move to the dirty list. */
> list_move_tail(&page->list, &dirty);
> + left_dirty++;
> }
>
> cond_resched();
> }
>
> list_splice(&dirty, dirty_page_list);
> + return left_dirty;
> }
>
>
> and then with what I had in previous email the checks should work:
>
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>
> static int ksgxd(void *p)
> {
> + unsigned long left_dirty;
> +
> set_freezable();
>
> /*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> * required for SECS pages, whose child pages blocked EREMOVE.
> */
> __sgx_sanitize_pages(&sgx_dirty_page_list);
> - __sgx_sanitize_pages(&sgx_dirty_page_list);
>
> - /* sanity check: */
> - WARN_ON(!list_empty(&sgx_dirty_page_list));
> + left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> + if (left_dirty && !kthread_should_stop())
> + pr_err("%lu unsanitized pages\n", left_dirty);
>
> while (!kthread_should_stop()) {
> if (try_to_freeze())
>
>
> >
> > If you do the check here you already have a window
> > where kthread could have been stopped anyhow.
> >
> > So even this would be less correct:
> >
> > if (kthreas_should_stop()) {
> > return 0;
> > } else if (left_dirty) {
> > pr_err("%lu unsanitized pages\n", left_dirty);
> > }
> >
> > So in the end you end as complicated and less correct
> > fix. This all is explained in the commit message.
> >
> > If you unconditionally print error, you don't have
> > a meaning for the number of unsanitized pags.
>
> Understood that the goal is to only print the
> number of unsanitized pages if ksgxd has not been
> stopped prematurely.

Yeah, and thus give as useful information for sysadmin/developer
as we can.

BR, Jarkko