[RFC 00/19] KVM: s390/crypto/vfio: guest dedicated crypto adapters

From: Tony Krowiak
Date: Fri Oct 13 2017 - 13:42:52 EST


Overview:
--------
An adjunct processor (AP) facility is an IBM Z cryptographic facility. The
AP facility is comprised of three AP instructions and from 1 to 256 AP
adapter cards. The design takes advantage of the interpretive execution mode
provided by the SIE architecture. With interpretive execution mode, the AP
instructions executed on the guest are interpreted by the hardware. This
allows guests direct access to AP adapter cards. The first goal of this
patch series is to provide direct access by a KVM guest to an AP as a
pass-through device. The second goal is to provide administrators with the
means to configure KVM guests to grant direct access to AP facilities
assigned to the LPAR in which the host linux system is running.

To facilitate the comprehension of the design, let's present an overview of
the AP architecture.

AP Architectural Overview
-------------------------
Let's start with some definitions:

* AP adapter

An AP adapter is an IBM Z adapter card that can perform cryptographic
functionality. There can be from 0 to 256 adapters assigned to an LPAR.
Each adapter is identified by a number from 0 to 255. When
installed, an AP is accessed by AP instructions executed by any CPU.

* AP domain

An adapter can be partitioned into domains. An adapter can hold up to 256
domains. Each domain is identified by a number from 0 to 255. Domains can
be further classified into two types:

* Usage domains are domains that can be accessed directly to process AP
commands

* Control domains are domains that are accessed indirectly by AP
commands sent to a usage domain to control or change the domain.

* AP Queue

An AP queue is the means by which an AP command is sent to an
AP usage domain inside a specific AP. An AP queue is identified by a tuple
comprised of an AP adapter ID and a usage domain index corresponding
to a given usage domain within the adapter. This tuple forms an AP Queue
Number (APQN) uniquely identifying an AP queue. AP instructions include
a field containing the APQN to identify the AP queue to which the AP
command is targetted.

* AP Instructions:

There are three AP instructions:

* NQAP: to enqueue an AP command-request message to a queue
* DQAP: to dequeue an AP command-reply message from a queue
* PQAP: to adminster the queues

Let's now see how AP instructions are interpreted by the hardware.

Start Interpretive Execution (SIE) Instruction
----------------------------------------------
A KVM guest is started by executing the Start Interpretive Execution (SIE)
instruction. The SIE state description is a control block that contains the
state information for a KVM guest and is supplied as input to the SIE
instruction. The SIE state description contains a field that references
a Crypto Control Block (CRYCB). The CRYCB contains three bitmask fields
identifying the adapters, usage domains and control domains assigned to the
KVM guest:

* The AP Mask (APM) field specifies the AP adapters assigned to the
KVM guest. The APM controls which adapters are valid for the KVM guest.
The bits in the mask, from left to right, correspond to APIDs
0 up to the number of adapters that can be assigned to the LPAR. If a bit
is set, the corresponding adapter is valid for use by the KVM guest.

* The AP Queue Mask (AQM) field specifies the AP usage domains assigned
to the KVM guest. The bits in the mask, from left to right, correspond
to the usage domains, from 0 up to the number of domains that can be
assigned to the LPAR. If a bit is set, the corresponding usage domain is
valid for use by the KVM guest.

* The AP Domain Mask field specifies the AP control domains assigned to the
KVM guest. The ADM bitmask controls which domains can be changed by an AP
command-request message sent to a usage domain from the guest. The bits in
the mask, from left to right, correspond to domain 0 up to the number of
domains that can be assigned to the LPAR. If a bit is set, the
corresponding domain can be modified by an AP command-request message
sent to a usage domain configured for the KVM guest.

If you recall from the description of an AP Queue, AP instructions include
an APQN to identify the AP adapter and the specific usage domain within
the adapter to which an AP command-request message is to be sent (NQAP
and PQAP instructions), or from which a command-reply message is to be
received (DQAP instruction). The validity of an APQN is defined by the
matrix calculated from the APM and AQM; it is the intersection of all
assigned adapter numbers (APM) with all assigned usage domain numbers (AQM).
For example, if adapters 1 and 2 and usage domains 5 and 6 are assigned to
a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for the
guest.

The APQNs provide secure key functionality - i.e., the key is stored on the
adapter card - so when the adapter card is not virtualized - i.e., the
adapter is accessed directly by the guest - each APQN must be assigned to
at most one guest.

Example 1: Valid configuration:
------------------------------
Guest1: adapters 1,2 domains 5,6
Guest2: adapter 1,2 domain 7

This is valid because both guests have a unique set of APQNs: Guest1 has
APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQN (1,7) and (2,7).

Example 2: Invalid configuration:
--------------------------------
Guest1: adapters 1,2 domains 5,6
Guest2: adapter 1 domains 6,7

This is an invalid configuration because both guests have access to
APQNs (1,6).

Interruption architecture:

The AP interruption architecture may or may not generate interruptions to
signal to the CPU the end of an AP transaction. The SIE interruption
architecture, depending upon its configuration, may or may not redirect
AP interrupts directly to a guest if the associated queue is valid for a
guest, and may or may not report the interruption to the host.

Effective masking for guest level I and II:

A linux host running in the LPAR operates at guest-level 1 and has its own
SIE state description. When operating at guest-level 1, the masks from the
host's state description are used directly. A linux guest running in the
host operates at guest-level 2. When operating at guest-level 2, the masks
from the guest-level 1 (host) and guest-level 2 (guest) state descriptions
are combined into a single description called an effective mask by
performing a logical AND of the two state descriptions.

The effective mask algorithm is used for the APM, AQM and ADM to create
an EAPM, EAQM and EADM respectively. Use of the EAPM, EAQM and EADM
precludes a guest-level 1 host program from passing to a guest-level 2
program APQNs to which it does not have access.

Linux cryptographic bus driver:

Linux already has a cryptographic bus driver that provides one AP device per
AP adapter and one device per AP queue. There is a device driver for each
type of AP adapter device and each type of AP queue device. This design
utilizes some of the interfaces and functionality provided by the AP bus
driver.

Design Origin:
-------------

The original design was based on modelling AP Queue devices. The design
utilized the VFIO mediated device framework whereby a mediated AP queue
device would be created for each AP Queue bound to the VFIO AP Queue device
driver. This at first seemed like the most logical design choice for the
following reasons:

* Securing access to an AP Queue device by unbinding it from its default
device driver and binding it to the VFIO device driver would not preclude
the host from having access to the other usage domains contained within
the same adapter card connected to the AP queue.

* An AP command is sent to a usage domain within a specific AP adapter via
an AP queue.

It became readily apparent that modelling the design on an AP queue was very
convoluted for a number of reasons:

* There is no convenient way to notify the VFIO device driver which guest
will have access to a given mediated AP queue device until the mediated
device's file descriptor is opened by the guest. Recall that the APQNs
configured for the guest are an intersection of all of the bits set in
both the APM and AQM, so the guest's APQNs can not be validated nor
its SIE state description configured until all of the guest's mediated
AP queue device file descriptors have been opened.

For example, suppose a guest opens file descriptors for mediated AP
queue devices representing APQNs 3,5 and 4,6. If bits 3 and 4 are set in
the guest's APM and bits 5 and 6 are set in the guest's AQM, then APQNs
(3,5), (3,6), (4,5) and (4,6) will be valid for the guest, but mediated
AP queue devices have been created only for APQNs (3,5) and (4,6). In
this case, APQNs still assigned to the host would also be available to
the guest which is a potential security breach.

* Control domains are not devices and are not logically modelled as
mediated devices. In our original design, they were modelled as
attributes of a mediated AP queue device, but this was a clumsy use of
the VFIO mediated device model.

* The SIE state description models the assignment of AP resources as a
matrix via the APM, AQM and ADM.

The design we ultimately settled upon was modelled on the AP matrix as
defined by the SIE state description. Supplying the complete AP matrix
to SIE using bitmasks when starting a guest simplifies the code, is far
easier to secure, and more closely matches the model employed by SIE. This
is the design model implemented via this patch set.

The Design
----------
This design introduces four new objects:

1. AP matrix bus

The sysfs location of the AP matrix bus is /sys/bus/ap_matrix. This
bus will create a single AP matrix device (see below).

2. AP matrix device

The AP matrix device is a singleton that hangs off of the AP matrix bus.
This device holds the AP Queues that have been reserved for use by
KVM guests. The sysfs location of the AP matrix device is
/sys/devices/ap_matrix/matrix. It is also linked from the AP matrix
bus at /sys/bus/ap_matrix/devices/matrix.

3. VFIO AP matrix driver

This driver is based on the VFIO mediated device framework. When the
driver is initialized, it will:

* Get the AP matrix device created by AP matrix bus from the bus

* Register with the AP bus to indicate that it can control AP Queue
devices. This allows AP Queue devices unbound from AP device drivers
to be bound to the VFIO AP matrix driver. The AP Queues bound to the
VFIO AP matrix driver will be stored by the driver in the AP matrix
device.

* Register the AP matrix device with the VFIO mediated device
framework (MDEV). Registration with MDEV will create the sysfs
structures needed to create mediated matrix devices. Each MDEV matrix
device is used to configure the AP matrix for a KVM guest. The MDEV
matrix device's file descriptor can be used by QEMU to communicate
with the VFIO AP matrix device driver.

The VFIO AP matrix driver:

* Provides the interfaces the administrator can use to secure AP Queues
for use by KVM guests. This is accomplished by unbinding the AP Queues
needed by each KVM guest from its AP device driver and binding it to
the VFIO AP queue driver. This prevents the host linux system from
using these Queues.

* Provides an ioctl that can be used by QEMU to configure the
CRYCB referenced by the KVM guest's SIE state description. The ioctl
will

* Create an EAPM, EAQM and EADM by performing a logical AND of the
APM, AQM and ADM configured via the MDEV matrix device's sysfs
attributes files (see below) with the APM, AQM and ADM of the host's
SIE state description respectively.

* Configure the SIE state description for the KVM guest using the
effective masks created in the previous step.

4. VFIO MDEV matrix passthrough device

An MDEV matrix passthrough device must be created for each KVM guest that
will need access to AP facilities. An MDEV matrix passthrough device is
used by QEMU to configure the APM, AQM and ADM fields of the CRYCB
referenced by the KVM guest's SIE state description. The file descriptor
for the MDEV matrix passthrough device provides the communication pathway
between QEMU and the VFIO AP matrix device driver.

The MDEV matrix passthrough device, like the CRYCB, contains three
bitmasks - an APM, AQM and ADM - for specifying the AP matrix for the
KVM guest. Three sets of attributes files will be provided to allow an
administrator to set the bits in the MDEV matrix device's APM, AQM and
ADM:

* A file to assign an AP adapter
* A file to unassign an AP adapter
* A file to display the adapters assigned

* A file to assign an AP domain
* A file to unassign an AP domain
* A file to display the domains assigned

* A file to assign an AP control domain
* A file to unassign an AP control domain
* A file to display the control domains assigned

Example:
-------
Let's now provide an example to illustrate how KVM guests may be given
access to AP facilities. For this example, we will show how to configure
two guests such that executing the lszcrypt command on the guests would
look like this:

Guest1
------
CARD.DOMAIN TYPE MODE
------------------------------
05 CEX5C CCA-Coproc
05.0004 CEX5C CCA-Coproc
05.00ab CEX5C CCA-Coproc
06 CEX5A Accelerator
06.0004 CEX5A Accelerator
06.00ab CEX5C CCA-Coproc

Guest2
------
CARD.DOMAIN TYPE MODE
------------------------------
05 CEX5A Accelerator
05.0047 CEX5A Accelerator
05.00ff CEX5A Accelerator

One thing to notice in this example is that each AP Queue set is identical.
For example, the two AP Queue sets for Guest1 both contain APQI 0004 and
00ab. It would be an invalid condition if both queue sets did not contain
the same set of queues. We could not, for example, configure Guest1 with
access to AP queue 05.00ff because the AP queue set for adapter 06 does not
contain AP queue 06.00ff. The point is, one must be careful to reserve
a valid set of AP queues for a given guest.
a valid configuration.

These are the steps for configuring the Guest1 and Guest2:

1. The first thing that needs to be done is to secure the AP queues to be
used by the two guests so that the host can not access them. This is done
by unbinding each AP Queue device from its respective AP driver. In our
example, these queues are bound to the cex4queue driver. This would be
the sysfs location of these devices:

/sys/bus/ap
--- [drivers]
------ [cex4queue]
--------- [05.0004]
--------- [05.0047]
--------- [05.00ab]
--------- [05.00ff]
--------- [06.0004]
--------- [06.00ab]
--------- unbind

To unbind AP queue 05.0004 from the cex4queue device driver:

echo 05.0004 > unbind

This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
and 06.00ab.

2. The next step is to reserve the queues for use by the two KVM guests.
This is accomplished by binding them to the VFIO AP matrix device driver.
This is the sysfs location of the VFIO AP matrix device driver:

/sys/bus/ap
---[drivers]
------ [vfio_ap_matrix]
---------- bind

To bind queue 05.0004 to the vfio_ap_matrix driver:

echo 05.0004 > bind

This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
and 06.00ab.

3. Create the mediated devices needed to configure the AP matrices for the
two guests and to provide an interface to the vfio_ap_matrix driver for
use by the guests:

/sys/devices/
--- [ap_matrix]
------ [matrix] (this is the matrix device)
--------- [mdev_supported_types]
------------ [ap_matrix-passthrough] (passthrough mediated device type)
--------------- create
--------------- [devices]

To create the mediated devices for the two guests:

uuidgen > create
uuidgen > create

This will create two mediated devices in the [devices] subdirectory named
with the UUID written to the create attribute file. We call them $uuid1
and $uuid2:

/sys/devices/
--- [ap_matrix]
------ [matrix]
--------- [mdev_supported_types]
------------ [ap_matrix-passthrough]
--------------- [devices]
------------------ [$uuid1]
--------------------- adapters
--------------------- assign_adapter
--------------------- assign_control_domain
--------------------- assign_domain
--------------------- control_domains
--------------------- domains
--------------------- unassign_adapter
--------------------- unassign_control_domain
--------------------- unassign_domain
------------------ [$uuid2]
--------------------- adapters
--------------------- assign_adapter
--------------------- assign_control_domain
--------------------- assign_domain
--------------------- control_domains
--------------------- domains
--------------------- unassign_adapter
--------------------- unassign_control_domain
--------------------- unassign_domain

4. The administrator now needs to configure the matrices for mediated
devices $uuid1 (for Guest1) and $uuid2 (for Guest2).

This is how the matrix is configured for Guest1:

echo 5 > assign_adapter
echo 6 > assign_adapter
echo 4 > assign_domain
echo ab > assign_domain

When the assign.xxx file is written, the corresponding bit in the
respective MDEV matrix device's bitmask will be set. For example, when
adapter 5 is assigned, bit 5 - numbered from left to right starting with
bit 0 - will be set in the MDEV matrix device's APM.

By architectural convention, all usage domains - i.e., domains assigned
via the assign_domain attribute file - will also be configured in the ADM
field of the KVM guest's CRYCB, so there is no need to assign control
domains here unless you want to assign control domains that are not
assigned as usage domains.

If a mistake is made configuring an adapter, domain or control domain,
you can use the unassign_xxx files to unassign the adapter, domain or
control domain.

To display the matrix configuration for Guest1:

cat adapters
cat domains
cat control_domains

This is how the matrix is configured for Guest2:

echo 5 > assign_adapter
echo 47 > assign_domain
echo ff > assign_domain

When a KVM guest is started, QEMU will open the file descriptor for its
MDEV matrix device. The VFIO AP matrix device driver will be notified
and will store the reference to the KVM guest's SIE state description.
QEMU will then call the VFIO AP matrix ioctl requesting that the
KVM guest's matrix be configured. The matrix driver will set the bits in the
APM, AQM and ADM fields of the CRYCB referenced by the guest's SIE state
description from the EAPM, EAQM and EADM created by performing a logical AND
of the AP masks configured in the MDEV matrix device and the masks
configured in the host's SIE state description. When the guest comes up, it
will have access to the APQNs identified in the AP matrix specified in the
KVM guest's SIE state description. Programs running on the guest will then
be able to use the cryptographic functions provided by the AP facilities
configured for the guest.

Tony Krowiak (19):
KVM: s390: SIE considerations for AP Queue virtualization
KVM: s390: refactor crypto initialization
s390/zcrypt: new AP matrix bus
s390/zcrypt: create an AP matrix device on the AP matrix bus
s390/zcrypt: base implementation of AP matrix device driver
s390/zcrypt: register matrix device with VFIO mediated device
framework
KVM: s390: introduce AP matrix configuration interface
s390/zcrypt: support for assigning adapters to matrix mdev
s390/zcrypt: validate adapter assignment
s390/zcrypt: sysfs interfaces supporting AP domain assignment
s390/zcrypt: validate domain assignment
s390/zcrypt: sysfs support for control domain assignment
s390/zcrypt: validate control domain assignment
KVM: s390: Connect the AP mediated matrix device to KVM
s390/zcrypt: introduce ioctl access to VFIO AP Matrix driver
KVM: s390: interface to configure KVM guest's AP matrix
KVM: s390: validate input to AP matrix config interface
KVM: s390: New ioctl to configure KVM guest's AP matrix
s390/facilities: enable AP facilities needed by guest

MAINTAINERS | 13 +
arch/s390/Kconfig | 13 +
arch/s390/configs/default_defconfig | 1 +
arch/s390/configs/gcov_defconfig | 1 +
arch/s390/configs/performance_defconfig | 1 +
arch/s390/defconfig | 1 +
arch/s390/include/asm/ap-config.h | 32 +
arch/s390/include/asm/kvm_host.h | 26 +-
arch/s390/kvm/Makefile | 2 +-
arch/s390/kvm/ap-config.c | 224 ++++++++
arch/s390/kvm/kvm-s390.c | 17 +-
arch/s390/tools/gen_facilities.c | 2 +
drivers/s390/crypto/Makefile | 6 +-
drivers/s390/crypto/ap_matrix_bus.c | 115 ++++
drivers/s390/crypto/ap_matrix_bus.h | 25 +
drivers/s390/crypto/vfio_ap_matrix_drv.c | 107 ++++
drivers/s390/crypto/vfio_ap_matrix_ops.c | 790 ++++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_matrix_private.h | 50 ++
include/uapi/linux/vfio.h | 22 +
19 files changed, 1438 insertions(+), 10 deletions(-)
create mode 100644 arch/s390/include/asm/ap-config.h
create mode 100644 arch/s390/kvm/ap-config.c
create mode 100644 drivers/s390/crypto/ap_matrix_bus.c
create mode 100644 drivers/s390/crypto/ap_matrix_bus.h
create mode 100644 drivers/s390/crypto/vfio_ap_matrix_drv.c
create mode 100644 drivers/s390/crypto/vfio_ap_matrix_ops.c
create mode 100644 drivers/s390/crypto/vfio_ap_matrix_private.h