Re: [PATCH v6 21/21] s390: doc: detailed specifications for AP virtualization

From: Halil Pasic
Date: Tue Jul 03 2018 - 09:58:51 EST




On 07/03/2018 03:25 PM, Cornelia Huck wrote:
On Tue, 3 Jul 2018 14:20:11 +0200
Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:

On 07/03/2018 01:52 PM, Cornelia Huck wrote:
On Tue, 3 Jul 2018 11:22:10 +0200
Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:
[..]

Let me try to invoke the DASD analogy. If one for some reason wants to detach
a DASD the procedure to follow seems to be (see
https://www.ibm.com/support/knowledgecenter/en/linuxonibm/com.ibm.linux.z.lgdd/lgdd_t_dasd_online.html)
the following:
1) Unmount.
2) Offline possibly using safe_offline.
3) Detach.

Detaching a disk that is currently doing I/O asks for trouble, so the admin is encouraged
to make sure there is no pending I/O.

I don't think we can use dasd (block devices) as a good analogy for
every kind of device (for starters, consider network devices).

I did not use it for every kind of device. I used it for AP. I'm
under the impression you find the analogy inappropriate. If, could
you please explain why?

I don't think block devices (which are designed to be more or less
permanently accessed, e.g. by mounting a file system) have the same
semantics as ap devices (which exist as a backend for crypto requests).
Not everything that makes sense for a block device makes sense for
other devices as well, and I don't think it makes sense here.


I'm still confused. If it's about frequency of access (as hinted
by block devices accessed more or less permanently) I'm not sure
there is a substantial difference. I guess there are scenarios where
the AP domain is used very seldom (e.g. protected keys --> most of
the crypto ops done by CPACF but AP unwraps at the beginning), but
there are such scenarios for block too.

If it's about (persistent) state, I guess it again depends on the
scenario and on the type of the card. But I may be wrong.


In case of AP you can interpret my 'in use' as the queue is not empty. In my understanding
unbind is supposed to be hard (I used the word radical). That's why I compared it to pulling
a cable. So that's why I ask is there stuff the admin is supposed to do before doing the
unbind.

Are you asking for a kind of 'quiescing' operation? I would hope that
the crypto drivers already can deal with that via flushing the queue,
not allowing new requests, or whatever. This is not the block device
case.

The current implementation of vfio-ap which is a crypto driver too certainly
can not deal 'with that'. Whether the rest of the drivers can, I don't
know. Maybe Tony can tell.

If the current implementation of vfio-ap cannot deal with it (by
cleaning up, blocking, etc.), it needs at the very least be documented
so that it can be implemented later. I do not know what the SIE will or
won't do to assist here (e.g., if you're removing it from some masks,
the device will already be inaccessible to the guest). But the part you
were referring to was talking about the existing host driver anyway,
wasn't it?


I was thinking about both directions. Re-classifying a device form
pass-through to normal should also be possible. But the document only
talks about one direction.

I'm not familiar with the existing host drivers. If we can say 'Hey,
unbind is perfectly safe at any time: no per-cautions need to be considered!'
I'm very happy with that. Although I would find it a bit surprising.

I just wanted to make sure this is not something we forget.


I'm aware of the fact that AP adapters are not block devices. But
as stated above I don't understand what is the big difference regarding
the unbind operation.

Anyway, this is an administrative issue. If you don't have a clear
concept which devices are for host usage and which for guest usage, you
already have problems.

I'm trying to understand the whole solution. I agree, this is an administrative
issue. But the document is trying to address such administrative issues.

I'd assume "know which devices are for the host and which devices are
for the guests" to be a given, no?


My other email scratches this topic. AFAIK we don't have a solution for
that yet. Nor we have a good understanding of how and to what extent
is statically given what is given. E.g. if one wants to re-partition my AP
resources (and at some point one will have to at least do the initial
re-partitioning) do I need a reboot for the changes to take effect? Or
is this 'known' variable during the uptime of an OS.

@Tony: Please feel free to fill the gaps in my understanding.

Regards,
Halil